Optimized Subvector Processing in Split Vector Quantization

Split vector quantization (SVQ) is efficient but suboptimal. Here a renormalization process is proposed for intraframe splitting and joining of subvectors, which integrates gracefully with trained interframe prediction. Renormali zation increases the availability of codevectors for the quantiza tion of each subvector in ordered vectors such as the line spectra l frequency (LSF) vectors. For 16-dimensional LSF vectors fr om wideband speech, renormalized SVQ (RSVQ) is shown to achiev a savings of 4 bit/frame over standard SVQ, reaching transpa rent coding at 42 bit/frame. Further, predictive RSVQ saves an aditional 4 bit/frame for transparent coding down to 38 bit/fra me.


I. INTRODUCTION
S PLIT vector quantization overcomes the curse of dimen- sionality inherent in vector quantization by splitting the vector into lower-dimensional subvectors.It is particularly efficient when the distortion measure used for quantization is separable.In particular, it is widely used for the quantization of line spectral frequency (LSF) vectors that represent the shortterm spectral envelope of speech signals [1].
However, an amount of suboptimality remains that is referred to as the split loss [2].It may be partially counteracted by classified vector quantization (VQ) [3] and by combining split VQ with multistage VQ [4].
In a different approach, a method is proposed that increases the availability of usable codevectors in the split codebooks when the vectors are ordered.An important case are the LSF vectors.In particular, wideband speech is used here to extract them.
The method consists of two actions.First, in the training or in the encoding phase, the bandwidth for the current split is normalized to cover the range defined by the previously coded LSFs in the neighboring splits.Second, the band spanned by the codevectors in the split is renormalized to close the gap between the left and right neighboring splits in the vector to be decoded.Previously, for narrowband speech, normalization of LSF vectors has been used per se [5] or combined with classified VQ [6].
Still, the performance may be improved by predictive RSVQ (renormalized SVQ), which introduces interframe coding along with the inherent intraframe coding in RSVQ.This further enhances the decrease in split loss provided by renormalization as explained in Section IV, which is followed by the results in training and performance evaluation in Section V right before the conclusion.Before that, RSVQ is described in Section III following the discussion about distortion measures in Section II.

II. DISTORTION MEASURES AND VECTOR PARTITIONING
The log spectral distortion (SD) is the most usual measure of distortion for assessing the quantization performance for parameter vectors that represent the spectral envelope of speech signals.This is due to its correlation with human perception, at least within the distortion range covered by usual rate range, and to the transparent coding rules set forth by Paliwal and Atal in their introduction of SVQ [1] for narrowband speech, which essentially state for transparent quantization that 1) The mean SD is about 1 dB.
2) There is no outlier exceeding 4 dB in SD.
3) The number of outliers having SD in the range from 2 dB to 4 dB is less than 2%.Later, these criteria were verified to hold for wideband speech as well [3].
For LSF vectors f and f with synthesis filters having power spectral densities (PSDs) P (f ) and P (f ), respectively, the SD is defined by where f is the cycle frequency in cycle/sample.However, Eq. ( 1) is too complex to be used for designing the codebook in the training phase of the quantizer and even on encoding and decoding because the full-dimension LSF vector is needed for computing its PDF.Besides, particularly for SVQ, a separable distortion measure contributes to an initial decrease in split loss.In general, a weighted Euclidean distance is used instead.In particular, we use the following dynamically weighted square measure [1] where W is the diagonal weighting matrix with main diagonal entries for i = 1, 2, . . ., p, where p is the dimension of the LSF vectors and P r (f i ) stands for the rth power of the PSD at frequency f i .The exponent of the PSD is set to r = 0.3.
For split VQ, LSF vector f = {f i } p i=1 is partitioned as where ς is the number of partitions or splits and the ith subvector consists of with initial LSF index δ i and dimension D i .Overall, the beginning and end boundaries are δ 1 = 1 and δ ς +D ς −1 = p.Likewise, the reconstructed LSF vector f = fi p i=1 is partitioned as where the reconstructed subvector for the ith split after split vector quantizing with the ith codebook is Now it is observed that the distortion measure in Eq. ( 2) is separable in split components, that is, where the distortion component due to the ith subvector is

III. RENORMALIZATION OF SUBVECTORS
Renormalization enables a major reduction in split loss.As shown in Section II, by using a separable distortion measure, some amount of split loss is prevented from the outset.This means that there is no information about split loss in the cumulative split distortion that we can use in order to reduce the split loss any further.
Nonetheless, as we can see from the distribution of LSF vectors in the training database shown in Fig. 5, presented in Section V, the bands spanned by nearby splits overlap considerably, sometimes reaching the second neighbor.
Based on the observation above, it is postulated that much of the split loss originates when the effective size of the codebook is reduced by the enforcement of the stability relations which exclude from the search the split codevectors that do not obey them.For instance, when coding the ith subvector, only those codevectors are considered for which it holds that that is, the lowest LSF in the split must be greater than the highest quantized LSF in the previous split.
One possibility for normalization of the ith split is to set its lower band edge to the quantized value of the largest LSF coefficient in the lower split of the current frame m in the training database, or of the frame to be encoded, as and let f iU = 1/2 be the upper band edge.We will call this version of RSVQ the sequential RSVQ and it is represented in Fig. 1, which indicates that the original lower band edge is mapped to the normalized zero frequency and the Nyquist frequency is mapped to the normalized unity frequency.Next, keeping the relative positions within the split band, the LSF subvector to be quantized in the ith split is normalized as for j = δ i , δ i + 1, . . ., δ i + D i − 1.Then, the normalized subvector ϕ ′ i (m) is quantized as φ′ i (m), selected from the codebook for the ith split.
Fig. 1.Normalization as performed by sequential RSVQ at the ith split.
In the decoding phase, as shown in Fig. 2, the ith subvector is reinserted by mapping the normalized zero frequency to the lower band edge set by the highest frequency in the next lower split and by mapping the normalized unity frequency to the Nyquist frequency.More specifically, the lower band edge for renormalization of the LSF subvector ϕ i (m) is identical to the highest quantized LSF in the preceding split of the same LSF vector and is set as to be used for the renormalization that generates the quantized subvector φi (m) as Fig. 2. Renormalization as performed by sequential RSVQ at the ith split.
Another sequencing order for normalization is used in interlaced RSVQ, which is anchored in the two splits neighboring the one under quantization as shown in Fig. 3, mapping the original lower band edge to the zero frequency and the original upper band edge to the unity frequency.More specifically, the lower band edge for the ith split is determined by Eq. (10), similarly to the sequential case, and the upper band edge is set as that is, the upper edge is the lowest LSF value in the following split.
Fig. 3. Normalization as performed by interlaced RSVQ at the ith split.Now, renormalizations in the decoding phase use the same assignment for the lower band edge of the ith split in the mth frame as described by Eq. ( 12), but the upper band edge now is adaptive and set by to be used in the renormalization of the selected codevector for j = δ i , δ i + 1, . . ., δ i + D i − 1. Naturally, both the (i − 1)th and the (i + 1)th split must have been quantized first with standard SVQ.
In an experiment reported in Section V, the sequential version is found to be superior to the interlaced one by a small margin.

IV. PREDICTIVE SPLIT QUANTIZATION
Nearby frames share a considerable amount of correlation that may be removed by a linear predictor.In fact, a linear predictor can remove covariance if the mean subvector is subtracted.We use the mean subvector ϕ i for the training database to get the centered subvectors ν i (m) = ϕ(m) − ϕ for splits i = 1, 2, . . ., ς.Now, for the centered subvectors, autocorrelation coefficients are enough for prediction.A first-order vector-valued moving-average predictor is used around the subvector quantizer as shown in Fig. 4, where α i is the scalar prediction coefficient, νi (m) is the reconstructed centered subvector for the mth frame, r i (m) is the prediction residual subvector and ri (m) is the codevector selected by the split vector quantizer.Finally, the reconstructed LSF subvector is found by restoring the mean subvector as In predictive RSVQ, the vector processing in the quantizerpredictor loop works in much the same way as for standard SVQ except for the subvectors to be quantized, which are the normalized subvectors ϕ ′ i (m) instead of the original subvectors ϕ i (m).
Given the structure drawn in Fig. 4, starting from the corresponding memoryless quantizer, the prediction coefficient α i is adjusted iteratively by minimizing the norm square of the residual vector over the N frames in the training database.This leads to the prediction coefficient estimate by means of the correlation coefficients, which are computed as where N is the number of frames in the training database, D i is the ith split dimension and the centered subvector in the ith split is represented in terms of its entries as In training sessions reported in Section V, the split prediction coefficients have converged to values lying between 0.8 and 0.9.

V. QUANTIZER TRAINING AND EVALUATION
The designs and tests reported here used the TIMIT speech database [7].The speech signals were segmented with asymmetric Hamming windows at a rate of 50 Hz using the 3GPP AMR wideband coder [8] to provide the linear prediction coefficients used to obtain the LSF vectors.Then the training partition, having 705,580 frames, was used for design and the test partition with 257,852 frames was assigned to the quantization tests.
A preliminary test was performed to find out which of the two sequences of split quantization performs better.The test was performed at 46 bit/frame with (9,10,9,9,9) bit/split for the 5 splits having dimensions (3,3,3,3,4), respectively, with results displayed in Table I.Both RSVQ versions perform considerably better than SVQ but the sequential RSVQ version  is found to be slightly better than interlaced RSVQ.So it was chosen for the following tests.
The distributions in the training database of the LSF values at the endpoints of the splits are shown superimposed in Fig. 5, where the overlapping is seen to be significant.
The tests involving memoryless quantizers covered the range of rates from 40 bit/frame to 46 bit/frame using the same dimensional splitting above and a summary of their results is shown in Tables II and III.It is noted that SVQ performs transparently from 46 bit/frame upwards since the number of outliers > 4 dB is nonzero for the rate just below.On the other hand, for RSVQ transparent coding extends down to 42 bit/frame since the mean SD rises above 1 dB at 41 bit/frame.Therefore, RSVQ saves 4 bit/frame over SVQ for memoryless quantizers.
For predictive quantizers, the results are presented in Tables IV and V, where a decrease of around 0.3 dB in mean SD can be observed at 46 bit/frame due to the predictor.
Transparent coding starts at 41 bit/frame for predictive SVQ and at 38 bit/frame for predictive RSVQ since outliers above 4 dB make their appearance when the rate is decreased by 1 bit/frame in both cases.Therefore, prediction saves 5 bit/frame for SVQ and makes the savings for RSVQ reach 8 bit/frame.It is further observed that the 38 bit/frame transparent coding threshold for predictive RSVQ lies just 3 bit/frame above the informal lower bound derived in [3].

VI. CONCLUSION
Split vector quantization is the most efficient suboptimal method when used in isolation, particularly for LSF memoryless vector quantization.A renormalization process has been proposed for wideband LSF SVQ which reduces its split loss by an amount of 4 bit/frame as measured by the transparent coding threshold rate.Further, one-step predictive VQ has been used and proven capable of reducing the transparent coding

Fig. 5 .
Fig. 5. Marginal density functions for endpoints in splits 1 through 5 over the training partition of the database.

TABLE II PERFORMANCE
OF STANDARD SPLIT VECTOR QUANTIZATION FOR 16-DIMENSIONAL LSF VECTORS IN (3,3,3,3,4)-DIMENSIONAL SPLITS, INCLUDING MEAN LOG SPECTRAL DISTORTION AND TWO CLASSES OF OUTLIERS.