Transient Analysis of the Bias-Compensated LMS Algorithm

—In most supervised adaptive ﬁltering settings, only the additive noise of the reference signal is taken into account. However, in many practical situations the excitation data is also immersed in noise, which leads to a bias in the estimation procedure. In order to mitigate such issue, adaptive algorithms with bias compensation schemes have been proposed. This paper advances for the ﬁrst time a stochastic model that predicts the average and mean-square learning behavior of the bias-compensated least mean squares algorithm in the transient region. Asymptotic predictions can also be obtained as a result of the devised analysis. Tracking capabilities and the impact of employing sub-optimal length adaptive ﬁlter are also considered, without restricting the input signal to be white. Results indicate that the proposed analysis reveals accurate agreement with simulation results.


I. INTRODUCTION
T HE flexibility of adaptive filtering algorithms (AFAs) derives from the fact that they do not require prior statistical information about the involved signals. They also demand low computational effort, since they are restricted to the exploration of local data in order to feed the adaptive estimator. Such features make them appropriate for a plethora of important applications nowadays, such as system identification 1 , temporal series prediction, adaptive control and active noise cancelling [1].
Recently, several papers have raised attention to the fact that the acquisition procedure of the input data is also prone to imperfections (such as quantization noise), so the input of the adaptive filter is also corrupted by noise signal. In this case, the emergence of a bias hampers the algorithm learning. In order to circumvent such an issue, bias-compensated schemes have been devised. The first of them proposed the BC-LMS (Bias-compensated LMS) [2], an algorithm obtained from a modification of the cost function in order to take into account the statistical properties of the input noise. An affine projection version of the BC-LMS was derived in [3]. The optimality in the mean square sense of the BC-LMS was recently demonstrated in [4], which also proposes a closedform equation for asymptotic performance for a white input, based on energy conservation arguments. The instability in the algorithm induced by the estimation of the input noise variance was addressed in [5].
This paper describes for the first time a comprehensive stochastic model that predicts the average learning behavior of the BC-LMS algorithm. The input signal is not restricted to be white, and both deficient-lenght and tracking configurations are addressed in a unified way. The paper presents the following structure: Section II describes the fundamentals of the BC-LMS algorithm. Section III provides an analysis in the mean sense, whereas Section IV offers a second-order (i.e., mean square) theoretical analysis. Section V addresses the tracking setting, assuming a first-order Markovian perturbation on the ideal plant coefficients. Section VI models the BC-LMS learning behavior when the length of the adaptive filter is suboptimal. Section VII compares some simulated learning curves with the theoretical ones. At last, Section VIII contains the concluding remarks.

II. BC-LMS ALGORITHM
Usually, the design of supervised AFAs considers that only the reference signal d(k) ∈ R is noisy, that is: where 2 w ∈ R N contains the coefficients of the ideal (and unknown) transfer function, ν(k) ∈ R denotes the additive noise and u(k) u(k) u(k − 1) . . . u(k − N + 1) T concatenates N successive samples of the input signal u(k).
Considering that the measuring process of the input data is inaccurate, the input x(k) of the adaptive filter is also corrupted by noise signal η(k), that is, where η(k) is usually assumed to be a stationary process with zero-mean and variance σ 2 η . Fig. 1 depicts the points where noise is assumed to disturb the feedback process.
Most adaptive filters based on a tapped-delay structure utilizes the input vector to update the coefficient vector w(k) ∈ R N , where the correction term is usually proportional to the step size β ∈ R + , whose choice establishes the well-known trade-off between u(k) convergence rate and asymptotic performance. In the supervised context assumed by this paper, these algorithms also employ error signal e(k) ∈ R, which is computed through Since the LMS derivation does not take into account the additive noise η(k), its adoption in the configuration presented in Fig. 1 induces the emergence of a bias in the resulting estimate [2].
The bias compensated LMS (BC-LMS) algorithm [2] employs an estimateσ 2 η of the variance of η(k) in order to mitigate the bias induced by the LMS algorithm by inserting an additional term in the LMS update equation, which gives place to the following update rule:

III. FIRST-ORDER ANALYSIS
The analyses of this paper are based on the deviation vector w(k) ∈ R N , defined as Using (5) and (6), one obtains the following exact relationship The application of the expectation operator E[·] in (8) demands the evaluation of complex joint moments. In order to make the mathematics tractable, the following hypotheses are adopted: Independence assumption (IA). The input data (i.e., u(k) and η(k)) are statistically independent fromw(k).
Noise Assumption (NA). The additive noise ν(k) is white and statistically independent from the remaining random variables.
Remarks: although it does not generate accurate predictions when the value of the step size is large, IA is an almost ubiquitous statistical hypothesis, due to the fact that overcoming it leads to cumbersome mathematical complexity [6]. In its turn, NA is a very popular simplification that, although physically plausible in some settings, was circumvented in recent analyses [7].
Using IA and NA, recursion (8) implies the following difference equation: where R x E x(k)x T (k) and R η E η(k)η T (k) are the autocorrelation matrices of signals x(k) and η(k), respectively.
If the algorithm indeed converges in the mean (i.e., where the asymptotic bias does not depend on the value of the step size. Remark: Note that when the traditional hypothesis of a white noise signal η(k) is utilized, Eqs. (8) and (9) turn into: and which implies that the BC-LMS is asymptotically unbiased whenever σ 2 η =σ 2 η . Unfortunately, the average behavior of each deviation coef- In its turn, matrix R x can be decomposed according to [1] where the i-th column of Q ∈ R N contains the i-th eigenvector q i ∈ R N and the i-th element of the main diagonal of the diagonal matrix Λ contains the corresponding eigenvalue. Since where v i (k) q T iw (k), by multiplying both sides of (9) by Q T , one may convert (8) to where c βQ T (R η −σ 2 η I)w . From (14), one may conclude, for the i-th element of v(k + 1), that the following recursion is valid so that the BC-LMS is stable in the average if or, in an equivalent way, where λ max is the largest eigenvalue of R x . By comparing (17) to the upper bound that guarantees convergence in the mean for the LMS [1], that is, it is possible to infer that the theoretical upper bound on the value of β that guarantees stability in the mean of the BC-LMS is always lower than the one of the LMS. Unfortunately, such upper bound has restricted usefulness, since an adaptive algorithm may diverge even when its firstorder statistics do not, due to the fact that the variance of its coefficients may grow without limit [8]. Such fact motivates the study of the mean-square learning behavior of the BC-LMS, which is the goal of the next section.

IV. MEAN SQUARE ANALYSIS
Note that Eq. (8) can be rewritten as By multiplying (19) by its transpose, one obtains where W(k) w(k)w T (k) is a quantity whose average evolution is of interest. Consider vec[A] as the operator that concatenates the columns of matrix A in a single column vector. Denoting the Kronecker product by ⊗ , one has

By definingṽ(k) vec [W(k)], (22) can be rewritten as
where g E vec w(k) [w ] T ,ḡ E vec w w T (k) can be inferred from (8), and are input data-dependent matrices. In order to theoretically compute these matrices, the input signal is henceforth assumed to be Gaussian (although not necessarily white), so that an additional assumption is necessary: Gaussianity assumption (GA). Signals u(k) and η(k) are samples from a Gaussian distribution.
Remark: Using GA, fourth-order moments of these random variables can be computed using the formulas presented in [9].
It is noteworthy that (25) can be employed to predict the evolution of the mean square distortion (MSD) since the MSD can be estimated through where the operator unvec[·] reverses the vec[·] operation and Tr[X] denotes the trace of matrix X.

V. TRACKING ANALYSIS
It is important to obtain performance guarantees in nonstationary settings, which occur when the ideal plant to be identified is time variant. In this section, the tracking capability of the BC-LMS is analyzed when the coefficients of ideal plant vary slowly, according to a first-order Markovian model [10], [11]: where ϑ(k) ∈ R N is a zero-mean i.i.d. stochastic perturbation whose (diagonal) autocorrelation matrix is denoted by R ϑ . Note that time-varying channels introduce a "lag" in the adaptive process to the emulation of the optimal and unknown vector w (k).
In order to simplify the second-order analysis of the tracking setting, consider an additional stochastic hypothesis: Tracking Assumption (TA). The random vector ϑ(k) is statistically independent from the remaining random variables.
Using IA, NA and TA, and following steps similar to those that led to Eq. (25), one may demonstrate that, under the considered tracking scenario, the recursion that describes the second-order quantity can be written as [12] which implies that, under the adopted Markovian model, the tracking configuration does not modify the range of values of the step size that guarantees stability. Scenario

VI. DEFICIENT-LENGTH ANALYSIS
Sometimes system identification procedures may operate in a deficient-length setting, which occurs when the length of the ideal transfer function surpass the adaptive filter length [13]. This may happen when the designer intends to deal with computational limitations or if the unknown transfer function is long [8], [14]. In such a suboptimal scenario, suppose that the reference signal can be written as where w ∈ R L contains the additional coefficients of the ideal plant (which the adaptive filter does not cover up) and In order to simplify the mathematics, the following stochastic assumption was assumed in the following second-order analysis: Whiteness Assumption (WA). The input signal u(k) is white. Using IA, NA and WA and after some manipulations, one obtains the following recursion for the mean square analysis: where H E x(k)u T (k) ⊗ x(k)u T (k) .

VII. RESULTS
In the following simulations, the following transfer function was utilized: where L > 0 in deficient-length scenarios and N = 20 for all considered simulations. The noise signal ν(k) was sampled from a white Gaussian process. Signal u(k) (resp. η(k)) is obtained by filtering a unitary-variance white Gaussian signal (resp. white Gaussian signal with variance σ 2 η ) by a coloring filter B u (z) (resp. B η (z)). The coloring filters employed in the four considered scenarios are presented in Tab. I. Note that the devised analysis is not restricted neither for white u(k) nor for white η(k), which avoids common statistical assumptions in the literature. Fig. 2a depicts the averaged evolution of the deviations in the first scenario, with β = 10 −2 , σ 2 ν = 10 −3 , σ 2 η = 10 −2 andσ 2 η = 0.008. The theoretical curves, evaluated using (8), reveal good agreement with the empirical ones, even when the estimated variance of η is different from its actual value.
The prediction of the MSD evolution is the main goal of the second scenario, in which σ 2 η = 0.007,σ 2 η = 0.01, β =    5×10 −3 , and σ 2 ν = 10 −3 . From Fig. 2b, one may note that the advanced stochastic model also accurately predicts the actual mean square learning curve.
The prediction of the performance loss induced by a suboptimal adaptive filter is studied in the third scenario, for whichσ 2 η = σ 2 η = β = 10 −2 , and σ 2 ν = 10 −3 . Fig. 3a depicts the comparison between the theoretical and empirical MSD curves for different values of L, wherew i = 0.1, for i ∈ {0, 1, . . . , L − 1}. One observes that Eq. (35) accurately describes the empirical curves. Note that the adoption of the same value of β for distinct ideal plant length does not influence the algorithm stability, as predicted by the proposed stochastic model.
The last (fourth) scenario addresses the time-variant configuration, in which σ 2 ν = 10 −6 and β = σ 2 η =σ 2 η = 10 −2 . Fig. 3b depicts the the asymptotic mean square distortion for different values of β. The steady-state MSD was computed by averaging the last 10 4 iterations of a execution with 3 × 10 5 iterations. It can also be observed that there indeed exists an optimal value of β (as usual in tracking scenarios [10]), and that optimal value is well approximated by the advanced model.

VIII. CONCLUSION
This paper presented a comprehensive analysis of the learning behavior of the BC-LMS algorithm. Theoretical predictions for both first-and second-order statistics were provided, and the resulting model was successfully extended in order to address both tracking and deficient-length configurations.