A Cepstral Method to Estimate the Stable Optimal Solution for Feedforward Occlusion Cancellation in Hearing Aids

—The occlusion effect is a common complaint from users of hearing aids with narrow or unvented earmolds. This phenomenon makes the user hear his own voice mufﬂed. In the scientiﬁc literature, ﬁxed and adaptive controllers have been proposed for occlusion effect reduction. This work proposes a cepstral method to estimate the stable optimal solution for feedforward occlusion cancellation and a ﬁxed controller that utilizes this estimate to reduce the occlusion effect in hearing aids. The cepstral method operates on a feedback structure at a calibration process. Simulations have shown that the performance of the cepstral method improves as the length of the signal uttered by the hearing aid user increases, resulting in estimates with average normalized misalignment less than -19 and -34dB for signals lasting 1.5 and 5s. The estimates are signiﬁcantly more accurate below 500Hz, which is the frequency range common to the occlusion effect. In addition, results have pointed out that the controller attenuated the occlusion effect, averagely decreasing by 0.17dB the distortion power and increasing by 0.13 the objective perceptual quality MOS-LQO score.

the external auditory channel through vibro-acoustic systems. Digital hearing aids are basically composed of microphones, analog/digital and digital/analog converters, digital signal processor, loudspeakers, transmission channel and battery.
Although most commercial devices available today use advanced digital signal processing techniques, a large amount of hearing aid users still report significant dissatisfaction [5], which are very often associated to the howling effect [6] and the occlusion effect [7]. The howling effect results from an instability of the closed-loop system generated by the acoustic coupling between the loudspeaker and the microphone of the hearing aid device, which causes the signal played back by the loudspeaker to be fed back into the microphone. To prevent the acoustic feedback, the simplest solution is the placement of a well fitted earmold into the user's ear [8].
A side effect of a well-suited earmold, with a very narrow or a completely closed ventilation opening, is that, when the hearing aid user speaks, the sound pressure produced by the laryngeal vocalizations is not sufficiently dissipated and thus conducted through the bony portions of the skull to the blocked ear. These vibrations result in an increased low-frequency sound pressure level reaching the tympanic membrane that makes the hearing aid user listen to his own voice muffled [9]. Such power magnification typically has the peak centered at around 200-500 Hz [7], [8], [10], [11]. This bone-conducted phenomenon is called the occlusion effect and is especially annoying for people with severe to profound sensorineural losses at high frequencies [9].
Since 2008, fixed and adaptive signal processing strategies have been proposed to cancel or at least attenuate the occlusion effect. Also known as control systems, the fixed solutions ensure the system stability but do not deal with the acoustic dynamics, and may suffer loss of performance due to changes in the ear canal or displacement of the earmold [7], [12], [13], [14]. The adaptive approaches, on the other hand, present a slow convergence of the filter coefficients and require continuous adaptation in search of an optimal solution since the occlusion effect occurs in short time intervals [10]. Both strategies can be implemented in feedforward or feedback structures, where the latter exploits a possible internal microphone of the hearing aid. It is worth mentioning that few models of hearing aid devices possess the internal microphone.
The first proposal of a occlusion cancellation system was presented in [7] using a fixed controller operating on a feedback structure. Afterwards, adaptive solutions were first proposed in [13] and later in [12], [14]. In [12], [13], a feedforward cancellation architecture is used. The adaptive method in [14], on the other hand, employs a feedback scheme. More recently, a fixed controller operating on a feedforward cancellation structure was proposed in [1].
Among the aforementioned cancellation proposals, the work presented in [1] stands out for being the only one to estimate the occlusion path, nomenclature for designating the system that models the bone conduction of the sound pressure produced by the laryngeal vocalizations to the blocked ear. The estimation is performed by assuming knowledge of the acoustic path, system that denotes the acoustic coupling between the user lips and the hearing aid external microphone. However, in practice, this is not true and the acoustic path would also need to be estimated. Moreover, knowledge of the forward path, which represents the user hearing loss compensation system, is also inconveniently required for the controller design.
As a possible answer for these limitations, [1] is expanded in this paper. The goal of this work is twofold: first and foremost, without assuming to know the acoustic path, to propose a cepstral method for estimating the stable optimal solution of the feedforward occlusion cancellation; second, to propose a controller designed based on this estimate and without assuming to know the forward path for attenuating the occlusion effect in the feedforward cancellation structure. This paper is organized as follows: Section II presents the occlusion effect modeling and the two cancellation structures used in this work as well as demonstrates the stable optimal solution for the feedforward structure; Section III presents a cepstral analysis of the feedback structure, explains in detail the proposed cepstral method for estimating the stable optimal solution of the feedforward occlusion cancellation and describes the proposed controller for attenuating the occlusion effect; Section IV describes the configuration of the simulations carried out; in Section V, the performance results of both proposed methods are presented and discussed; finally, Section VI concludes the paper.

II. OCCLUSION EFFECT MODELING AND CANCELLATION SYSTEMS
The modeling of the occlusion effect is shown in Figure 1 disregarding 1 ( ). The acoustic, forward and occlusion paths are represented by the impulse responses ( ), ( ) and ( ), respectively. The signal ( ) denotes the sound signal uttered by the hearing aid user. The external microphone signal ( ) is assumed to be ambient noise-free because of a noise cancellation algorithm. The signal ( ), to be picked up by a possible hearing aid internal microphone, is actually the signal heard by the user and defined as where the * symbol denotes the convolution operation. Therefore, the occlusion effect is characterized by the addition of ( ) * ( ) to the desired value of ( ) and thus can be interpreted from the reverberation point of view.
In this work, the feedforward structure adopted for occlusion cancellation is depicted in Figure 1, where the cancellation  path is represented by the impulse response 1 ( ). This is the same structure employed in [12], [13]. And the feedback structure selected for occlusion cancellation is illustrated in Figure 2, where the cancellation path is represented by the impulse response 2 ( ). It is evident that the feedback architecture exploits the internal microphone signal ( ).
As ideally ( ) = ( ) ( ) ( ), the optimal frequency response for the controller or adaptive filter is which in time domain leads to where ( ) denotes the impulse response of the inverse system to the acoustic path. It is clear that ( ) relies directly on ( ) and ( ) but is independent of ( ). The independence of ( ) is an important advantage over the open-loop cancellation structure proposed in [1]. Due to the properties of the DTFT, ( ) is absolutely summable and therefore this optimal solution is stable.
As the acoustic path models the inherent propagation delay from the user lips to the hearing aid external microphone, its impulse response can be written as where ( ) is the unit impulse function,˜ ( ) = 0 for < 0,˜ (0) ≠ 0 and > 0. Consequently, as demonstrated in Appendix A, ( ) is in general a two-sided signal composed of left-sided increasing exponentials for < − and rightsided decreasing exponentials for ≥ − . But, as its energy is concentrated around = − , ( ) can be considered of finite duration with ( ) ≠ 0 only for 1 ≤ ≤ 2 , where 1 < − < 2 . Note that 1 is always negative. Combining the above approximation of ( ) with the fact that ( ) ≠ 0 only for = 0, 1, . . . , − 1, as shown in Section IV-A, the impulse response of the stable optimal solution defined in (4) is such that ( ) ≠ 0 only for = 1 , 1 + 1, . . . , + 2 − 1. Therefore, the stable optimal solution is non-causal, which makes the feedforward occlusion cancellation a very challenging task.
Although it can not be implemented for real-time processing, the knowledge of ( ) is of paramount importance either to design a controller with appropriate frequency response or to initialize the coefficients of an adaptive filter for feedforward occlusion cancellation. In the sequel, a cepstral method operating on the feedback structure to estimate ( ) and a controller operating on the feedforward structure that exploits this estimate to attenuate the occlusion effect are proposed.
Hence, the frequency domain relationship between the error signal ( ) and the external microphone signal ( ) is given by which making use of (3) can be written as Applying the natural logarithm in (8) yields If ( ) 2 ( ) < 1, the second term on the right-hand side of (9) can be expanded in Taylor's series as And if ( ) 2 ( ) < 1, a sufficient condition to ensure the closed-loop system stability, the third term on the righthand side of (9) can be expanded in Taylor's series as Replacing (10) and (11) in (9) and then applying the inverse DTFT, the cepstral domain relationship between ( ) and ( ) is given by where {·} * denotes the th convolution power. In the feedback system for occlusion cancellation, the cepstrum ( ) of the error signal is the cepstrum ( ) of the external microphone signal added to time domain series as function of ( ), ( ) and 2 ( ). The cepstral analysis modified the representation of the system components in relation to the system signals. In (8), they are all in the frequency domain. But in (12), the signals ( ) and ( ) are in the cepstral domain while the system components are actually in the time domain.
However, it should be reminded that ( ) is defined by (12) if and only if the conditions for the Taylor series expansions in (10) and (11) are fulfilled. Otherwise, nothing can be inferred about the mathematical definition of ( ) as a function of ( ), ( ) and 2 ( ). And, even though (12) is valid, the practical existence of the impulse responses in ( ) depends whether the analyzed time window of ( ) is large enough to include their effects [15].
B. Cepstral method operating on the feedback structure to estimate ( ) The proposed method aims at estimating ( ) from (12) in a fashion similar to [15], [16], [17] for acoustic feedback cancellation. The estimation would be performed in a calibration process before using the hearing aid or when the user finds it convenient. The source signal ( ) is supposed to be a vowel sound of finite duration uttered by the hearing aid user.
1) Computation of ( ) − ( ): The availability of both signals ( ) and ( ) enables the computation of their real cepstra through the fast Fourier transform (FFT). The subtraction of ( ) from ( ) would result in the convolution power series presented in (12). To that effect, however, the signals used in the cepstra computation must be related by (8).
Analysis of (8) as an input-output system reveals that, in general, the signal ( ) is of infinite duration. But, due to the FFT nature, the computation of ( ) is inevitably performed using a finite-length segment of ( ). The time windowing of a signal leads to the so-called spectral leakage, where energy from one frequency range leaks into another, resulting in a distorted spectral estimate. The leakage can be reduced by using a smooth window or increasing the segment length. With this in mind, a Hann window is used in this work for spectral estimation of both ( ) and ( ). In the target application, the spectral leakage makes (12) less accurate and ultimately disrupts the estimation of ( ). 2) Estimation of ( ) from ( ) − ( ): The reasoning behind the proposed method comes from speculating that ( ) * ( ), the impulse response for = 1 of the first time series in (12), can be extracted from ( ) − ( ) by appropriately choosing ( ) and 2 ( ). At the calibration stage, hearing loss compensation is not required and these impulse responses can be freely chosen as long as they do not result in a very uncomfortable acoustic environment.
The choice of both ( ) and 2 ( ) plays a key role as it serves the following three purposes: 1) To ensure that the conditions for Taylor series expansions are met and, consequently, (12) is valid; 2) To make ( ) * 2 ( ), the impulse response for = 1 of the first time series in (12), causal; 3) To make the non-zero samples of ( ) * 2 ( ) do not overlap with the non-zero samples of the impulse responses for > 1 of the first time series in (12). In this work, the impulse responses of both forward and feedback paths are simply defined as a broadband gain and a delay, i.e., and where > 0 and > 0. The conditions for the Taylor series expansions are met by appropriately setting the broadband gains. After simple algebraic manipulations, it is straightforward that and Both causality and non-overlapping of ( ) * 2 ( ) are accomplished by means of an appropriate choice of . Taking into acount that ( ) ≠ 0 only for = 1 , . . . , And, since * 2 ( ) ≠ 0 only for = 2 1 , . . . , 2( + 2 − 1), it turns out that [ ( ) * 2 ( )] * 2 ≠ 0 only for = 2( 1 + ), . . . , 2( + 2 + − 1). Thus, the non-zero samples of ( ) * 2 ( ) and [ ( ) * ( )] * 2 do not overlap when The non-overlapping condition is sufficient to ensure the causality condition. Therefore, is not utilized to extract ( ) * ( ) from ( ) − ( ) because all the impulse responses of the second time series in (12) are known and could be cancelled out. But, since [ ( ) * 2 ( )] * ≠ 0 for = ( + ) and > 1 , it concludes that only ( ) * 2 ( ) = ( − − ) can be overlapped to ( ) * 2 ( ). And that will only happen if ≤ + 2 − 1. Assuming this range for , ( ) * 2 ( ) needs to be removed as described below. Specified the parameters of ( ) and 2 ( ) in accordance with the above discussion, the proposed method starts by obtaining which is the resulting impulse response for = 1 presented in (12). This is performed by selecting the first (19) where ( ) = 1/ ( + ) represents the impulse response of the inverse system to 2 ( ) and is known. Note that the convolution with ( ) consists of a sliding on the sample axis and a multiplication.
Finally, the method computesˆ ( ), an estimate of the stable optimal solution ( ) for the feedforward occlusion cancellation, aŝ In addition to make the method feasible, the parameters , , and have influence on its performance. When analyzing (8) considering (13)−(15), it can be demonstrated that the lower | |, | |, or , the lower the energy ratio of ( ) not included in its finite-length segment used for computing ( ). Decreasing this ratio reduces the distortion caused by the spectral leakage and consequently improveŝ ( ). Therefore, it would be beneficial to set these values as small as possible. In light of this, and are set to their lower bounds, i.e., = + 2 − 2 1 and = 1. But some restrictions are imposed on the broadband gains. A very small | | can make the ambient sound inaudible to the hearing aid user during the calibration, which is not acoustically comfortable. With the aim of making the user naturally listen to the ambient sound, = 1 is used in this work. This is accomplished by setting such that not only (15) is fulfilled but also 0 < < 1, resulting in = 0.1.
C. Controller operating on the feedforward cancellation structure to attenuate the occlusion effect The proposed controller for attenuating the occlusion effect operates on the feedforward cancellation structure shown in Figure 1 and uses the estimateˆ ( ) obtained by the cepstral method described in the previous section.
The controller is defined by the following impulse response where ( ) is the unit step function. Therefore, the controller impulse response is actually the estimate of the impulse response of the stable optimal solution, which is non-causal, truncated so as to become causal.

IV. SIMULATION CONFIGURATIONS
This section describes the configuration of the two experiments carried out in a simulated environment. The first one evaluates the performance of the cepstral method for estimating the stable optimal solution ( ) of the occlusion effect. The second experiment evaluates the performance of the controller for attenuating the occlusion effect.
A. Simulated Environment 1) Occlusion path: The occlusion path is modeled by the impulse response available in [12], which was measured in a volunteer with a custom and unventilated earmold and digitally recorded at a 16 kHz sampling rate. The impulse response ( ) and frequency response magnitude of the occlusion path are shown in Figure 3, where it is observed that = 150. 2) Acoustic path: The acoustic path is represented in two different ways. First, as done in [1], [12], the acoustic path is a delay line defined as lips and the external microphone of hearing aids [12]. In this case, (15) becomes | | < 0.41 and then = 0.1 is used. Closer to a real-world situation, the second acoustic path is modeled by a measured room impulse response available in [19]. The impulse response was downsampled to 16 kHz and its first 17 samples were discarded to simulate the typical 14sample delay from lips to the hearing aid external microphone. Then, it was truncated for computational cost reasons. The impulse response ( ) and the frequency response magnitude of the second acoustic path are shown in Figure 4. In this situation, (15) becomes | | < 0.14 and = 0.1 is used again.
3) Forward path: As in [1], [12], the forward path was modeled as a 1-sample delay, i.e., It is pertinent to emphasize that, as discussed in Section III-B, the forward path is not required to compensate the user hearing loss during the calibration process. And, as discussed in Section II-A, the stable optimal solution for the feedforward occlusion cancellation is independent of ( ).
B. Evaluation Metrics 1) Misalignment: A very common metric in system identification is the misalignment (MIS), which measures the mismatch between the identified and true systems. The performance of the cepstral method for estimating the stable optimal solution of the feedforward occlusion cancellation is evaluated through the normalized MIS defined as [ In order to scrutinize the method performance over the frequency range common to the occlusion effect, the normalized MIS is also computed separately for the frequencies below and above 500 Hz, which are respectively defined as and where 1 is the frequency corresponding to 500 Hz.
2) Distortion power: The performance of the fixed feedforward controller for attenuating the occlusion effect is evaluated through the power of the difference between the internal microphone signal ( ) and its ideal value, i.e., where is the length of the speech signal. This metric can be interpreted as the power of the distortion caused by the occlusion effect in the sound heard by the user.
3) Wideband Perceptual Evaluation of Speech Quality: Objective measures of speech quality have evolved from those based on purely mathematical criteria towards perceptually salient metrics. The W-PESQ (Wideband Perceptual Evaluation of Speech Quality) is a standard algorithm for objective quality evaluation of wideband (sampled at 16 kHz) speech signals [20], [21], [22], [23]. It compares the psychophysical representations of both a possibly degraded speech signal and its corresponding uncorrupted reference [23].
The raw W-PESQ scores can be mapped to the 1-5 mean opinion score (MOS) scale, resulting in the so-called MOS-Listening Quality Objective (MOS-LQO) score [24]. The correspondence between the five-point scale and the degradation category rating (DCR) is shown in Table I. However, the maximum MOS-LQO score provided by W-PESQ is 4.644 when the reference and degraded signals are identical.
The W-PESQ achieves a correlation of 80% with MOS when assessing speech impairment by reverberation although  it was not designed for this purpose [25], [26]. Hence, the W-PESQ was applied to perceptually evaluate the performance of the proposed controller for attenuating the occlusion effect. Thereunto, the ideal and true internal microphone signals worked as the reference and degraded signals, respectively. In order to fully exploit the W-PESQ, the reference signals are recommended to meet the following specifications [27]: duration between 8 and 12 s, speech activity between 40% and 80%, a minimum active speech of 3.2 s, an active speech level of −30 dBov, a minimum leading and trailing silence of 0.5 s, a maximum leading and trailing silence of about 2 s, utterances separated by a silent period of at least 1 s.

C. Speech Database
The speech database was provided by the Medical Engineering Group of the Brazilian National Council of Scientific and Technological Development (GPEM/CNPq). The database consists of 20 recordings, from Brazilian healthy adult subjects (1 per subject), of the sustained vowel /a/. The vowel /a/ was selected as speech material due to its large use in acoustic analysis of voice [28].
The information about gender and age of the subjects as well as time duration of the speech signals are summarized in Table II, where standard deviation is abbreviated to SD. The active power level of each signal was normalized to −30 dBov through the Recommendation ITU-T P.56 algorithm [29].
The speech signals are used to compose the source signal ( ) in the two experiments performed. In the first experiment, longer speech signals are generated by concatenating each signal with itself. Variable size segments of each resulting signal are used as ( ) in order to evaluate the performance of the proposed cepstral method for estimating ( ) as a function of the vowel length uttered by the hearing aid user.
In the second experiment, the source signal ( ) is build from the original speech signals as follows: 1 s of leading silence, speech signal, 1.5 s of silence, replica of the speech signal and 1 s of trailing silence. This temporal structure satisfies the reference signal requirements for the W-PESQ described in Section IV-B3, leading to an appropriate application of the algorithm in the performance evaluation of the proposed controller to attenuate the occlusion effect.

V. SIMULATION RESULTS
This section is devoted to reporting the results of the two experiments performed. First, the performance of the cepstral method for estimating the stable optimal solution ( ) of the feedforward occlusion cancellation is addressed. In the sequel, the performance of the controller 1 ( ) for attenuating the occlusion effect is presented. In both cases, the simulated environment, the evaluation metrics and the signals described in Section IV are employed.
A. Estimation of the stable optimal solution ( ) 1) Scenario 1: The impulse response ( ) of the first acoustic path is defined in (22). Consequently, the impulse responses of its stable inverse system and the stable optimal solution are defined as ( ) = ( + 14) and ( ) = ( + 14), respectively, where it is concluded that the stable optimal solution is indeed non-causal.
The method was set up as follows: 1 = −100, 2 = 100, = 150, = 0.1, = 450, = 1 and = 1. The performance results of the proposed cepstral method in estimating ( ) for several speech lengths are summarized in Table III. Examples ofˆ ( ) obtained by the method for 1 s and 3 s-long speech signals are illustrated in Figure 5.
As can be seen, the method performance improves as the speech length increases. This behavior is due to the spectral leakage reduction caused by the use of larger windows, which makes (12) more accurate as discussed in Section III-B1. It is also observed that the estimate obtained by the method is significantly more accurate in the range below 500 Hz. The cause of this result is that the distortion in ( )− ( ) caused by the spectral leakage is fundamentally of high frequency, as can be noticed in Figure 5a. The highest accuracy ofˆ ( ) in this frequency range is of great value because the occlusion effect is a low-frequency phenomenon.
A careful inspection of Figure 5 reveals that the values selected for 1 and 2 causes the proposed method to extract only noise from ( ) − ( ) for < −14 and > 150. Less noisyˆ ( ) could be obtained by increasing 1 or decreasing 2 , ideally 1 = 2 = −14. But 1 = −100 and 2 = 100 were adopted in order to pretend ignorance of ( ) to the point of not overly harming its estimation.
Furthermore, it is noticed that at least 0.5 s of speech is required for the method to obtain a good estimate of ( ). For signals longer than 1.5 s, the method achieves average values of MIS and MIS < less than −21 dB and −31 dB, respectively. Outstanding mean values of MIS and MIS < approximately equal −35 dB and −47 dB, respectively, are achieved for 5slong speech signals. The results shown in Table III can serve as a valid indication that, at the first scenario, the proposed method accurately estimates the stable optimal solution ( ) of the feedforward occlusion cancellation for speech signals longer than 1 s. And it is reasonable to believe that a person can easily sustain a vowel sound for that time.
2) Scenario 2: The impulse response ( ) of the second acoustic path is the one depicted in Figure 4. The resulting ( ) is shown in Figure 6, where the non-causality of the   Table IV. Examples ofˆ ( ) obtained by the method for 1 s and 3 s-long speech signals are illustrated in Figure 7.
As can be seen, the mean results are slightly lower than those obtained in the first scenario because the infinite length of ( ) inevitably causes overlapping of the impulse responses presented in ( ) − ( ), thereby disrupting its  estimation. But some observations similar to the first scenario can be taken in the second one. As the use of larger windows reduces the spectral leakage and thus improves the accuracy of (12), the longer the speech signal the better the estimate obtained by the method. As the distortion in ( ) − ( ) caused by the spectral leakage is fundamentally of high frequency, the method performance is significantly better below 500 Hz which is the frequency range common to the occlusion effect. And at least 0.5 s of speech is required for the method to obtain a minimally acceptable estimate of ( ). Scrutiny of Figure 7 reveals the influence of 1 and 2 on the method performance. As can be noted, the distortion caused by the spectral leakage prevents the method from accurately estimating the samples of ( ) with low absolute values, especially its tails. Thus an appropriate choice of 1 and 2 must take into consideration the decay time of the left ( < − ) and right ( ≥ − ) side of ( ), respectively. And, since ≥ + 2 −2 1 , increasing | 1 | or 2 increases . Consequently, as discussed in Section III-B, the distortion caused by the spectral leakage intensifies and the method performance worsens. With this in mind, 1 = −300 and 2 = 100 were indiscriminately selected with the intention of estimating the highest absolute values of ( ). Nevertheless, for signals longer than 1. approximately equal to −34 dB and −37 dB, respectively, are achieved for 5s-long speech signals. The results shown in Table IV reinforce that the proposed method accurately estimates the stable optimal solution ( ) of the feedforward occlusion cancellation for speech signals longer than 1 s, which is a considerably short time for a person sustain a vowel sound. This is, as far as we know, a novelty in the literature.
An estimate of ( ) can be very useful either in a controller design or at initializing the coefficients of an adaptive filter. In the sequel, the performance of the feedforward controller, proposed in Section III-C, that usesˆ ( ) to attenuate the occlusion effect is evaluated.

B. Cancellation of the occlusion effect
The performance evaluation of the proposed controller 1 ( ), defined in (21), was carried out by using an estimatê ( ) obtained with a 3 s-long speech signal. The frequency response magnitudes of both 1 ( ) and ( ) for the first and second acoustic scenarios are shown in Figure 8. The frequency response magnitude of the solution of [12], proposed for acoustic paths with 14-sample delay propagation as the two paths used in this work, is also illustrated. A zoomed plot highlights the responses between 150 and 600 Hz, which includes the frequency range common to the occlusion effect. 1) Scenario 1: At the first acoustic scenario, it is noted from Figure 8 that the solution of [12] is closest to ( ) up to approximately 420 Hz, more accurately modeling the peak around 340 Hz. From this frequency onwards, the proposed controller is closest to ( ). Unlike ( ), the solution of [12] does not intensily attenuate the frequencies above 650 Hz. On the contrary, it amplifies with an increasing average gain from 3600 Hz. This high frequency amplification may not greatly adversely affect its performance because the first three formant of vowel sounds are typically below 2500 Hz [30].
Regarding phase, it was found that the solution of [12] is very close to ( ) up to 375 Hz, presenting an error of less than 10 degrees. And the proposed controller is close to the ( ), presenting an error of less than 25 degrees, between 250 and 360 Hz, frequency range where the power magnification caused by the occlusion effect is concentrated. From then on, however, both solutions are generally far from ( ). The performance results of the proposed controller at the first scenario are presented in Table V, along with the results obtained with no control, the stable optimal solution ( ) and the solution of [12]. As can be seen, ( ) is really optimal because its mean P and MOS-LQO are null and maximum, respectively. Compared with none controller, the proposed controller decreased the mean P by 0.29 dB and increased the mean MOS-LQO by 0.14. And the standard deviation of both metrics is significantly reduced. These results reveal that the proposed controller successfully reduces the occlusion effect.
Also comparing with none controller, the solution of [12] raised the mean P and MOS-LQO by 4.49 dB and 0.13, respectively. These results suggest that, on average, this solution perceptually reduces the occlusion effect despite significantly distorting the sound heard by the hearing aid user. The distortion may be a consequence of the high frequency amplification previously discussed. Nevertheless, in light of the high variance associated with MOS-LQO, the audible performance difference may be considered negligible.
The results demonstrate that the proposed controller out-  performed the solution of [12]. Although the 0.01 increase in mean MOS-LQO is negligible, the standard deviation is conveniently decreased by 0.08. Nonetheless, the 4.78 dB decrease in mean P is considerable given the associated variances and can serve as valid indication of its superior performance.
2) Scenario 2: At the second acoustic scenario, it is noted from Figure 8 that the proposed controller is closer to ( ) than the controller of [12] over the entire frequency range. But the amplification level (higher than 10 dB) at the frequencies below 75 Hz is not modeled. This non-modeled magnitude peak may not adversely affect the sound quality obtained with both controllers because the W-PESQ filters its input signals with a high-pass filter with cutoff frequency at 100 Hz [21], [23]. In relation to phase, it was found that the solution of [12] is not close to ( ) at the lower frequencies as in the first scenario. In fact, except at a very few frequencies, both controllers are far from ( ) over the entire frequency range. The performance results of the proposed controller at the second scenario are presented in Table VI, along with the results obtained with no control, the stable optimal solution ( ) and the solution of [12]. The P values obtained with none controller are identical to the first scenario because, at this situation, the difference between ( ) and its ideal value is equal to ( ) * ( ). On the other hand, the MOS-LQO values differ because the W-PESQ is not premised on the mere difference between the reference and degraded signals. Furthermore, it is observed again that ( ) is really optimal. Compared with none controller, the proposed controller decreased the mean P by 0.17 dB and increased the mean MOS-LQO by 0.13. And the standard deviation of both metrics is slightly reduced. Such improvements are lower than those obtained in the first scenario but can serve as valid indication that the proposed controller also successfully reduces the occlusion effect in the second acoustic scenario.
The solution of [12] once again raised the mean P and MOS-LQO, now by 0.82 dB and 0.09 respectively, compared with none controller. Although the variations are smaller than in the first scenario, especially in P, these results reinforce the indication that, on average, this solution perceptually reduces the occlusion effect despite distorting the sound heard by user.
Contrasting with the solution of [12], the proposed controller achieved lower mean P and MOS-LQO. The 0.99 dB improvement in P is proportionally much larger than the 0.03 worsening in MOS-LQO. Despite the slight low mean MOS-LQO, the standard deviation is reduced by 0.13. These results indicate that the proposed controller performs better. Nevertheless, bearing in mind the variances associated with both metrics, all the performance differences discussed at the second acoustic scenario may be considered negligible.

VI. CONCLUSIONS
This work proposed a cepstral method to estimate the stable optimal solution for feedforward occlusion cancellation and a fixed controller that utilizes this estimate to reduce the occlusion effect in hearing aids. The estimation is performed at a calibration process, where the hearing aid user utters a sustained vowel, and using a feedback structure with appropriate choices of the forward and feedback paths.
Simulations have shown that the performance of the cepstral method improves as the length of the signal uttered by the user increases, resulting in estimates with average normalized misalignment less than -19 and -34 dB for signals lasting 1.5 and 5 s. The estimates are significantly more accurate below 500Hz, which is the frequency range common to the occlusion effect. In addition, results have pointed out that the controller successfully attenuated the occlusion effect, averagely decreasing by 0.17 dB the power of the distortion produced in the sound heard by the user and increasing by 0.13 the objective perceptual quality MOS-LQO score.
These results have demonstrated that the proposed cepstral method to estimate the stable optimal solution for the feedforward occlusion cancellation has qualities worthy of further exploring in the future, either to utilize its estimate to design an even more efficient fixed controller or as initialization of the coefficients of an adaptive filter.

APPENDIX A INVERSE SYSTEM TO THE ACOUSTIC PATH
The impulse response of the acoustic path is defined as where˜ ( ) = 0 for < 0 and > ,˜ (0) ≠ 0 and > 0. Considering that the acoustic path is stable, its frequency response is given by Defining as the th pole of ( ) and assuming that there are no multiples poles, (30) can be expanded as Assuming, without loss of generality, that | | < 1, = 1, 2, . . . , 1 , and | | > 1, = 1 + 1, . . . , , then the impulse response ( ) of the inverse system is defined as where ( ) is the unit step function. Therefore, in general, ( ) is two-sided signal, i.e., a signal that is of infinite extent for both ≥ 0 and < 0, composed of left-sided increasing exponentials for < − and right-sided decreasing exponentials for ≥ − .