Cascade of Linear Predictors for Deconvolution of Non-Stationary Channels in Sparse and Antisparse Scenarios

—This work deals with adaptive predictive deconvolution of non-stationary channels. In particular, we investigate the use of a cascade of linear predictors in the recovering of sparse and antisparse signals. To do so, we ﬁrst discuss the behavior of the ℓ 𝑝 Prediction Error Filter (PEF), with 𝑝 ≠ 2 , showing that it can better attenuate the effects of non-minimum phase channels in comparison with the classical ℓ 2 PEF, although the ℓ 𝑝 PEF, with 𝑝 ≠ 2 , still presents intrinsic limitations in compensating the channel distortions, due to its direct forward structure. Hence, the cascade of linear predictors, i.e., one forward ﬁlter followed by a backward one, emerges as a possible solution to circumvent the structure limitation addressed. We apply the proposed cascade structure in the deconvolution of non-stationary channels, with minimum, maximum-, mixed-and variable-phase responses, and also in noisy scenarios. From the simulation results we observed that, besides the duality relation between the ℓ 1 and ℓ ∞ norms, they present different algorithmic behavior: a cascade adjusted by minimizing the ℓ 1 norm of the error attains a fast convergence, enhancing the cascade tracking capacity, but is more sensitive to noise. Adjusting the cascade by minimizing the ℓ 4 norm of the error (a smooth approximation of the ℓ ∞ norm), on the other hand, leads to a ﬁlter more robust to noise, but presents slower convergence and tracking capability.


I. INTRODUCTION
The problem of deconvolution consists in recovering a signal of interest ( ) that has been distorted by a channel, with impulse response ℎ( ), generating the distorted signal ( ) [1], as depicted in Fig. 1a.
The objective in this problem, as illustrated in Fig. 1b, is to design a deconvolution filter, with impulse response ( ), able to compensate the distortions introduced by the channel and to produce an estimate of the original signalˆ ( ).
When we consider the unsupervised version of the problem, we must adjust ( ) without access of the samples of ( ) and without the explicit knowledge of the channel. In this case, the deconvolution is carried out in order to recover a property of ( ) that is no longer present in ( ). This property is referred to as a prior information of ( ) and its recovery is equivalent to recovering the signal of interest.
A possible prior information to be employed is the independence among the samples of ( ) [1], [2], [3], [4]. Since we are modeling the channel as a timeinvariant linear system, we have where * denotes the discrete-time convolution and > 1 is the channel order. Due to the above convolution operation, it is evident that the independence property is no longer verified in the signal ( ) at the output of the channel when it presents a non-flat magnitude.
Once we have access to the distorted signal, we must adjust the deconvolution filter in order to eliminate the statistical redundancy in ( ). One possible way to do this is using the prediction error filter (PEF) [1], [2]. We illustrate this filter structure in Fig. 2.
Classically, PEF parameters are adjusted in order to minimize the mean squared error (MSE). The obtained filter presents two interesting properties [1], [5]: 1) It is a whitening filter, i.e., a sufficient long PEF provides an error signal ( ) so that where the operator E[·] is the statistical mean and we are considering a zero mean signal [1]. 2) It is a minimum-phase filter, i.e., all of its zeros lay inside the unit circle [1].
From the two properties of the classical PEF we can note the main limitations of this structure optimized by means of the MSE criterion. Because it is a whitening filter, it can only perform decorrelation in ( ) samples, which can be understood as a "weak independence". And from the PEF phase response, we can observe that this filter is as suitable structure only for the deconvolution of minimum-phase channels, which is a very restrictive condition [6]. Therefore we must seek a criterion and a suitable structure able to promote independence and generic phase response.
In [6] and [7] we have shown that the ℓ 1 PEF, with ≠ 2, performs the nonlinear decorrelation, i.e., the error signal produced has the property for all odd, nonlinear functions (·) and (·) [1]. We also investigated in [7] the phase-response of the ℓ PEF, with ≠ 2, verifying that it can present a nonminimum phase response. However, we also observed that the ℓ PEF was not able to compensate the distortions of a channel with a generic phase response. This limitation of the ℓ PEF, with ≠ 2, is supported by the work of Knockaert [8], which shows that all of the zeros of this type of filter lay inside a circle of radius 2. This result shows that the ℓ PEF, with ≠ 2, is more general than the classical one, since it provides nonlinear decorrelation and presents a non-minimum-phase response, but still has a restriction due to its forward linear structure.
One possible way to circumvent this limitation is using a cascade of linear predictors: a forward linear predictor followed by a backward one, as we did in [9]. The use of a series of deconvolution filters has been addressed in previous works as [10], [11], [12] and [13]. Compared 1 Here, we use ℓ for the p-norm of a vector.
to the mentioned works, our approach here is to employ a filter decomposition that preserves the filter linearity, leading to a simpler parameter adjustment, and that is also suitable for non-stationary channels.
The present paper further studies the potentialities of the cascade of linear predictors by applying it distinctly in two well-defined scenarios: for the blind deconvolution of antisparse signals, we propose to implement an MFE (Mean Fourth Error) cascade and for the blind deconvolution of sparse sources the MAE (Mean Absolute Error) cascade. We define a sparse signal as a signal that concentrates most of its information in a few samples (in this case, zeros or very small values do not carry significant information). In this work, we have used a Bernoulli-Gaussian distribution to model such signal. An antisparse signal, in a dual manner of the sparse one, is a signal that spreads the information equally among its samples and therefore is suitably modeled by an uniform distribution. An example of this kind of signal is a BPSK, which assumes the values +1 and -1 with equal probability, and can be considered an antisparse signal for our purposes in this work.
The MFE criterion is used as a smooth approximation of the ℓ ∞ norm, allowing the implementation of a stochastic descendent algorithm and the handling of non-stationary channels. The proposed structure is adapted by the LMF (Least Mean Fourth) algorithm, which can be directly derived from the MFE criterion. For the MAE criterion, the structure is adapted by the sign-error LMS algorithm. Simulation results show that this improved solution accomplishes efficiently the blind deconvolution task for non-stationary and non-minimumphase channels.
The present work is organized as follows: in Section II we detail our approach and present suitable algorithms to adjust the parameters of the predictors; in Sections III-A and III-B we present the simulation results. Finally, in Section IV, we state our conclusions as well as some perspectives of future works.

II. PROPOSED STRUCTURE AND ALGORITHMS
Our approach here is to employ a cascade of linear predictors, as depicted in Fig. 3.
In Fig. 3, ( ) denotes the forward linear predictor, with error signal given by: where 1 , · · · , are the parameters of the forward prediction coefficients.
In its turn, ( ) denotes the backward predictor filter, which has an error signal given by: where 1 , · · · , are the backward prediction coefficients.
Considering the structure in Fig. 3a and the equations (4) and (5), we can write the error signal ( ) as Due to the commutative property (Fig. 3b), we can also write ( ) as follows: where 2 ( ) = ( − ) − =1 ( − + 1). One of our interests in this work is to perform unsupervised deconvolution in order to recover telecommunication signals, which usually follow a uniform distribution [14], and, hence, present an antisparse structure [15]. Therefore, as we discussed in [7], the most suitable measure for this property is the ℓ ∞ norm, which is equivalent to the Maximum Likelihood estimator for a uniform distribution. However, we will employ here the ℓ 4 norm as a smooth approximation for the ℓ ∞ one.  From (6), we can express the Mean Fourth Error (MFE) by: or, from (7), by: Since we are using a smooth cost function, we can derive a gradient adaptive algorithm to optimize the cascade parameters. Hence, the update rule for the LMF [16] is simply given by: where is the learning rate. For simplicity, we considered real signals in the derivation of (10) but these adaptation rules can be easily extended for complex signals [9].
Our second objective here is with respect to sparse signals, as those typically found in seismic deconvolution [17]. For this case, we must minimize the Mean Absolute Error (MAE), which can be written as: or, from (7), For the minimization of this cost function, we can adapt our structure by means of the sign-error LMS [16]: In the following sections, we will present our simulations results of unsupervised deconvolution of nonstationary channels with a cascade of linear predictors, considering the recovery of antisparse (Section III-A) and sparse signals (Section III-B).

A. Antisparse Deconvolution
In this section, we consider a BPSK input signal [14], the samples of which assume the values ±1. We consider a second order non-stationary channel, the real-valued zeros of which evolve with the following update rule: where denotes the total of available samples. In our first experiment, we considered: To provide a stability condition for the algorithm, we considered a stationary channel for the 1.10 4 first and last samples, i.e., we have a channel with zeros at 1 = 0.1 and 2 = 0.5 for the initial samples and a channel with zeros at 1 = 0.5 and 2 = 0.9 for the final ones. For the 5.10 4 intermediary samples, we have a nonstationary channel, with its zeros updated as (14) and (15). The objective of this initialization is to show that once the proposed method has converged to a solution of a stationary channel, it can track the solution to a non-stationary one.
With respect to the cascade, we used 5 taps for the forward predictor and 5 taps for the backward one, both numbers determined by preliminaries studies. It is important to highlight that our choice for the length of the forward and backward filters assumes a symmetry for the channel, i.e., it presents approximately the same number of zeros inside and outside the unit circle. We adjusted the structure by means of the LMF algorithm.
To measure the cascade performance in the equalization task, we have adopted the Intersymbol Interference Rate (ISI) of the combined response. We define the ISI level, in dB scale, as follows: where are the coefficients of the combined response ( ) = ( ) ( ) ( ). Ideally, the combined response would be a delayed impulse, indicating the perfect channel inversion. Plugging ( ) = ( − ) into (16) we obtain an ISI level of −∞. Therefore, the lesser the ISI obtained, the better is the cascade performance in the equalization.
We show the ISI level evolution for the combined response of our first experiment in Fig. 4a, considering the mean performance over 200 Monte Carlo Simulations. We considered four values for the learning rate, For comparison purposes, we have also obtained the Wiener solution for the channel, which is the best inverse filter of the channel obtained through the minimization of the MSE criterion by having access to the channel coefficients.
For each delay in the reference signal, we compute where H is the convolution matrix and is a vector with only the th entry unitary and all the other are zero. Therefore, we choose the filter that provides the minimal MSE: We used the Wiener solution as a baseline, that represents the best result (in terms of the MSE criterion, with a supervised algorithm) that we could obtain in terms of Zero Forcing solution. In a noiseless case, the closer the performance of a filter to the Wiener solution, the better.
In this work, we have considered a 6-tap filter and a 11-tap one, and, after each new update of the channel zeros, we obtained the channel convolution matrix H and solved (17) for the best delay in the reference signal. These two Wiener solutions give us the best performance that can be achieved using 6 taps, as if only one of the cascade filters were employed, and the best performance for 11-tap filters, in which case both parts of the cascade are in use.
From Fig. 4a we can see that as we increment the learning rate, the cascade tracking capacity gets better, as we can observe from the proximity of the continuous lines (which represent the combined response for different values of ) and the dashed ones (representing the Wiener solutions), especially in the initial samples. As the cascade parameters converge ( > 3.10 4 ), the ISI lines associated with different values become closer, except for = 1.10 −4 .
For the second experiment, we considered a maximum-phase channel, with its zeros specified by: This choice of zeros gives us an equivalency between the minimum-and maximum-phase channels in terms of amplitude distortion [18].
Again we considered a 5-tap forward predictor and a 5-tap backward predictor for the cascade, with the same learning rates as before, and we show the results for the maximum-phase scenario in Fig. 4b.
As we observed for the minimum-phase channel, the tracking capacity is very close related with the learning rate: a bigger learning rate leads to a cascade with a faster convergence rate. It is also interesting to note the similarities from Fig.s 4a and 4b, due to the relationship between the zeros of the channels considered.
In our third experiment with non-stationary channels, we considered a mixed-phase one, with its zeros linearly evolving as We kept the other parameters, i.e., number of cascade coefficients, learning rates and number Monte Carlo simulations, as in the previous simulations and we present our results in Fig. 4c.
For the mixed-phase channel, we verify that for the initial samples ( < 1.10 4 ) the best performance was attained by the cascade adjusted with = 1.10 −3 ; for the next samples (up to = 4.10 4 ) the best performance was achieved with the biggest learning rate; finally, in the final samples ( > 4.10 4 ), the cascade adjusted with = 5.10 −4 , 1.10 −3 , 5.10 −3 had similar performances. We can also see that the ISI lines associated with the cascade are distant from the one associated with the 10tap Wiener solution, represented by the black dashed line. This result suggests that the cascade needs a slower transition for the channel zeros when it has a mixedphase channel than when the channel has minimum-and maximum-phase responses. Conversely, for this kind of channel we need a faster adaptation rule for the cascade.
Also, we considered a variable-phase channel, i.e. a channel that starts with a maximum-phase response, passes to a mixed-phase and, finally, turns into a minimum-phase one, with its two zeros varying in the intervals specified below We repeated the same parameters as in the first three experiments and we present our simulation results in Fig.  4d.
For this fourth channel, we observe that for 1.10 4 < < 3.10 4 the cascades adjusted with the biggest values of presented better performances, relating again the learning rate with the tracking capacity. In this particular channel we can observe some ISI peaks, due to the proximity of the channel zeros to the unit circle. When we adjusted the cascade with the smallest learning rate ( = 1.10 −4 ), the proposed filter was able to track the first observed peak; for the other values considered, we can verify an ISI peak with a delay. For the second peak, we observe in Fig. 4d that all the values considered lead to cascades able to track this peak, again with some delay. For the last samples, > 4.10 4 , we verify again the effect of the learning rate: the bigger the value of this parameter, the better was the performance of the associated cascade.
Finally, we evaluated the cascade performance under a noisy scenario. Here we considered the mixed-phase channel, with an additive Gaussian noise. We present our simulation results in Fig. 5a, for SNR = 20 dB, and Fig. 5b, for SNR = 10 dB. When we are dealing with a low level of noise (SNR ≥ 20 dB), the ℓ 4 cascade performance is little affected, and the tracking performance is almost the same of the noiseless case. On the other hand, as the noise level increases, the cascade performance gets poorer and its tracking capacity gets lower, as we can observe from Fig. 5b, for all the learning rates considered.

B. Sparse Deconvolution
Now we present our simulation results with sparse signals and must adopt an ℓ norm with 1 ≤ < 2. For the sake of the simplicity of algorithm, we use the ℓ 1 norm, which gives raise to the following adaptation rules We modeled the sparse signal by means of the Bernoulli-Gaussian distribution: first we generate a Bernoulli sequence [19], with a probability 1 of getting the value 1; then we multiply the sequence obtained    by another random sequence, with values drawn from a normal distribution, with zero mean and unit variance. The parameter 1 allows us to control the degree of sparsity of the signal. In our simulations, we used 1 = 0.1. As we did for the antisparse case, we used 5 taps for the forward component and 5 taps for the backward one, both structures adjusted by (19). We considered the same channels and learning rates and measured the performance by means of the ISI level.
In this first experiment, the result is very close to the one obtained for the antisparse signal: the larger the values for the learning rate, i.e., = 1x10 −3 and = 5x10 −3 , the better the results, since the ISI levels obtained with this parameters are close to the 5-tap Wiener solution. For small values of , the tracking capacity of the cascade is reduced and, for = 1x10 −4 and = 5x10 −4 , the performance of the sparse deconvolution is worse than the performance of the antisparse deconvolution with the same parameters.
Considering a maximum-phase channel ( 1 (0) = 10, 2 (0) = 2, 1 ( − 1) = 2, and 2 ( − 1) = 1.1), we present our deconvolution results in Fig. 6b. For this second experiment with sparse signals, the performance is also very similar to the one of the antisparse case. Again, we got the best performances, in the ISI level sense, with the larger values of the learning rate.
For this case, we see that the performance of the ℓ 1 cascade is very similar to the performance of the ℓ 4 cascade. For this scenario we observe that the ℓ 1 cascade can also replicate the observed ISI peaks, but with a delay. As we verified with the ℓ 4 cascade, when we have the channel zeros close to the unit circle, we need to speed up the cascade adaptation.
As we did for the antisparse deconvolution, we also considered here a noisy scenario, as depicted in Fig. 7. The cascade adaptation based on the ℓ 1 norm was more susceptible to the additive noise, even for an SNR = 20 dB, and the performance is very different than the one obtained in the noiseless scenario. In addition, as we increase the noise level, the tracking ability is reduced.
This result reinforces our previous comment about the algorithmic differences of the cascade adaptation based on the ℓ 1 and ℓ 4 norms: using the ℓ 1 norm, the cascade parameters present a fast convergence rate, which improves the filter tracking capacity, but is more susceptible to noise. On the other hand, using the ℓ 4 norm to adjust the parameters, leads to a cascade with a slower convergence and a lower tracking capacity, but is more robust with respect to noise.

IV. CONCLUSION
In this work we extended our previous results in blind deconvolution using a cascade of linear predictors. We considered here the deconvolution of antisparse signals, by means of the ℓ 4 cascade, and sparse signals, using the ℓ 1 cascade, both for non-stationary channels.
We observed that for minimum-and maximum-phase channels, both structures were able to track the channel evolution. For the mixed-phase one, the ℓ 4 cascade presented an inferior performance than the ℓ 1 cascade, but in the presence of noise we observe the opposite. This particular result shows that besides the norms duality, different optimization criteria lead to different algorithmic behavior: using the ℓ 1 norm favors a fast convergence, increasing the tracking capacity, but is more susceptible to noise; using the ℓ 4 norm yields a slower convergence and less tracking capacity, but is more robust with respect to noise.
Finally, for a variable phase channel, both structures were not able to track the channel behavior, mostly when the channel zeros were close to the unit circle. This result makes patent the necessity to investigate methods to speed up the cascade adjustment. He is the head of the Signal Processing for Communications Laboratory and his research interests concern signal processing and machine learning with emphasis in blind deconvolution, source separation and adaptive filtering.