Inter-Frame Post-Processing for Intra-Coded Video

We propose a video codec architecture based on mixed quality frames which allows for low-complexity intracoded video to undergo inter-frame post-processing to improve its rate-distortion performance. The video sequence is divided into key and non-key frames by applying different quantization parameters among them. The application of different quantization parameters reduces the bit-rate, but also reduces the quality of the non-key frames. In order to enhance the quality of these non-key frames at the decoding process without additional information, we propose the use of the higher quality (key) frames through motion estimation. For that, in blocks where key and non-key frames “match”, we try to apply details of key frames to non-key ones. Tests were carried with H.264-Intra, Motion JPEG 2000 and Motion JPEG video sequences, recording PSNR improvements of up to 1 dB.


I. INTRODUCTION
L OW-complexity video encoding is often necessary for devices with power and computation constraints.For example, it can be applied in devices like wireless video cameras, low-power video sensors, surveillance cameras, multimedia portable devices (as mobile phones and PDA), etc.
Different from the recent video codec standards, where encoders are computationally complex, due to a predictive and transform coding, and the decoders are simpler.Distributed video coding permits shift the video codec complexity from the encoder to the decoder.These codecs are based on the Slepian-Wolf [1] theorem applied to distributed source coding (DSC), where a set of correlated information source could be compressed without communicating to each other.By modeling the correlation between multiple sources at the decoder side together with channel codes, DSC has the ability to shift the computational complexity encoder to the decoder side.However, the Slepian-Wolf theorem treats only cases with lossless information.The Wyner-Ziv theorem [2] extends the previous theorem for the lossy case.
In general, DVC architectures use different source coding, such as H.26x or MPEG-x, and also different side-information generation, for example: syndrome, hashes, CRC or cosets.These schemes allow for separate encoding and joint decoding, i.e., distributed source coding.
In [3], the authors introduced a practical distributed source coding using syndromes framework applied to signal compression.In [4] the author incorporated error information (cosets) at the encoding of linear block codes and applied Edson M. Hung and Ricardo L. de Queiroz are with Universidade de Brasilia, Brazil.E-mail: mintsu@image.unb.br and queiroz@ieee.orgDebargha Mukherjee is with Google Inc., USA.This work was supported by Hewlett-Packard Brazil, and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) under Grant 47.3696/2007-0.
techniques for decoding linear block codes with random errors and erasures in computer memory cells.
Low-complexity video compression is often achieved solely relying on intra-frame prediction, which is known as intraonly coding.Intra-only coding avoids the usage of the motion estimation technique during encoding, which simplifies the codec and makes it more robust against errors [5], [6].Even the "zero-motion-vector" case, where motion estimation is avoided by assuming a no-motion vector, is often avoided in many applications for complexity reasons [7].Intra-only coding is also used in digital cinema and in surveillance systems [5], [8], [9].
There is recent interest in some distributed video coders (DVC) which also make use of intra-coding [10]- [20].
There are related works based on video quality enhancement [21], spatio-temporal filtering [22], or video denoising.By using multiple motion estimation hypothesis [23], [24], we performed a multi-hypothesis motion compensation using a distortion-based weighted mean.Studies about flickering [25] also yield video enhancement based on temporal correlation.The main difference between the proposed method and those previous ones is the mixed-quality approach, presented in the next section.
The proposed approach to intra-coding is to allow a small quality variation among frames in order to reduce the bit-rate.At the decoder side, we can use the better-quality frames to improve the lower-quality ones.In order to do that, we extract an enhancement layer by taking the difference between the better-quality frame and its requantized version.The requantizing process is performed by quantizing the better-quality frame to a quality that is compatible to the frame we want to enhance.However, this information is subject to motion estimations due to temporal variations.
In essence, our method is similar to example-based [26], [27] video super-resolution in mixed-resolution approaches [28], [29].However, here, we enhance quality rather than spatial resolution.Hence, our method can be seen as examplebased quality enhancement (as a parallel to super-resolution) and our framework can be seen as having mixed-quality frames rather than mixed-resolution ones.
This article is organized as follows.Section II describes the mixed-quality frames architecture, while in Section III the proposed enhancement method is presented.The experimental results are shown in Section IV and Section V contains the conclusions.

II. MIXED-QUALITY AMONG FRAMES
We propose a mixed-quality video codec architecture, i.e., encoding frames at time-varying quality targets.As we encode some frames with lower quality we reduce the bit rate.But, different from the mixed-resolution architecture [17]- [19], here we generate a bit stream that is still compatible with a regular decoder.The proposed optional enhancement method works at the decoder side and uses the higher quality frames to enhance the lower quality ones.The decision of using or not an enhancement method at the decoder may depend upon application constraints like battery autonomy, processor capacity, temporal delay, acceptable video quality, etc.
In order to encode a video sequence with mixed quality we just need to use different quantization steps (Q) among frames.We then have two types of frames, as illustrated in Figure 1, depending on the value of Q: the key frames with a better quality (Q key ) and the non-key frames with a reduced quality (Q non−key > Q key ).The application of different Qs reduces the bit-rate and reduces the quality of non-key frames as well.So, to enhance the quality of these non-key frames at the decoding process without additional information, we propose the use of the higher quality (key) frames through motion estimation.The usage of a GOP (group of pictures) is not mandatory to the proposed method.However, to symplify the implementation, a GOP is determined in this work.
The decoding process can be done with a regular decoder.The optional enhancement process may add significant complexity in the decoding process due to motion estimation operations.
As previously mentioned, the proposed method is inspired by other works in example-based super-resolution of video [26]- [29].However, instead of super-resolving by improving spatial resolution, we improve the quality.

III. EXAMPLE-BASED QUALITY ENHANCEMENT
We use a regular decoder that separates key-frames from non-key frames, as shown in Figure 2. Let a given nonkey frame be denoted as F non−key .Let this frame be enhanced by n key-frames F key, (1) , F key, (2) , . . ., F key, (n) .Then, a requantization operation (with Q non−key ) is applied to the key frames resulting in a new set of "low-quality" key frames: F LQ key, (1) , F LQ key, (2) , . . ., F LQ key, (n) .The layer L k = F key, (k) − F LQ key, (k) represents the information lost through requantizing the k-th key frame.L k is subject to motion compensation before applying it to enhance a non-key frame.In this work, we use windowed overlapped block motion compensation [30]- [32] in order to reduce the blocking artifacts.Motion estimation (M E ) is performed at the decoder between the frames F LQ key and F non−key .Note that both have compatible quality degradation, for a more reliable matching.The actual frame is divided into blocks with variable sizes (16 × 16-and 8 × 8-pixels).For each one, we look for the best-match block within a displacement window at the reference frame.The criteria may be the minimization of the SAD (sum of absolute differences) or SSD (sum of squared differences).
When trying to match the current (non-key) frame and the low quality key-frame using block motion estimation, we try to minimize the difference between 16×16-pixel macroblocks in both images.We also test subsets as partitioned blocks of 8 × 8-pixels.Performing motion estimation on four partitioned blocks, however, will lead to overall SAD/SSD equal or lower than that for the whole macroblock.Hence, partitioned blocks would invariably be chosen.However, we expect, and have empirically verified, that the 16 × 16-pixel blocks typically yield better overall results.The reasoning for this is that we are looking for larger structures using block-based tools.Once a good match is found, we "borrow" details from one block to apply to the other, but mistakes may cause artifacts.Lowquality versions of smaller blocks of different objects may eventually match.Thus, their details would be different, only adding noise and artifacts to the image to be enhanced.So, larger blocks are more reliable in estimating an object match through block matching.Hence, we suggest a penalty factor (with an empirical value of two) to be applied to the partitioned block prediction error.
L k is motion compensated using motion vectors between F non−key and F LQ key, (k) in order to find a contribution layer L k such that where M C (•) is the motion compensation operation and V k is the set of motion vectors resulting from the M E (F non−key , F LQ key, (k) operation.The enhanced non-key frame is then given by: where L is a function of all {L k } and p cf is a confidence factor.The side information generation method at the DISCOVER Distributed Video Codec proposes equal weights for the forward and backward predictions [14].Here, we use multiple predictions in a weighted average as formulated in [29]: where L(i, j) is the enhancement of a block at the (i, j) position of the fused enhancement layer, L k (i, j) is the enhancement block prediction in the (i, j) position at the k-th reference (forward or backward) key frame, and D k (i, j) is the SSD distortion at the given position.The motion estimation method always picks a prediction block to enhance a non-key frame block.However, at sudden scene changes, the enhancement layer may decrease the objective and the subjective quality of a non-key frame.In order to reduce this problem, we only apply a percentage (p cf ) of the fused enhancement layer ( L) to the non-key frames (F non−key ).That percentage is interactively obtained by finding The p cf parameter is obtained by minimizing the mean square error (MSE) among the enhanced non-key frame and the closest key-frames.In other words, we calculate the MSE in (4) for each possible value of p cf and choose the p cf that results in the smallest MSE.This may reduce the flicker and may also diminish the influence of mismatches between a nonkey frame and the enhancement layer.Finally, we add the enhancement layer to the low quality key frame as in (2).

IV. EXPERIMENTS
In order to evaluate the performance of the proposed technique, we processed video sequences at CIF (352×288 pixels) and high definition (1280 × 720 pixels) resolutions.They were encoded with H.264-Intra with GOP length of 4 (that is, for each key frame, there are three non-key frames), using the JM 15.1 reference codec implementation.At mixed-quality encoding, the key frames were encoded with quantization parameter (QP) in the set {22, 27, 32, 37} in order to generate the rate-distortion (RD) curves.We set Q non−key = 2Q key , i.e.QP non−key = QP key + 6 [33], [34].In the enhancement method, we use a motion estimation window of 32 × 32 pixels for both full macroblocks and partitioned blocks.
The process of changing the quality of frames may cause flickering.The larger the difference Q non−key − Q key implies more intense flickering, but larger the quality improvement.However, reducing the quality of the non-key frames too much yields more sizeable bit-rate savings but also may cause objectionable flickering after the enhancement process.One has to carefully weigh the trade-off, in order to avoid subjective image quality degradation.
Figure 3(a) shows the performance of fixed-QP intra-only H.264 compression compared to the mixed-QP H.264-intra with the enhancement technique using different configurations.In order to plot the curves, we selected for the fixed-QP encoding the QP s that yield the closest bit-rates compared to the mixed-QP case.Tests using two key-frame references (the closest forward and backward key frames) and four references (the two closest in each direction) were performed.We also compared the overlapped block motion compensation (OBMC) technique with the ordinary motion compensation (MC).Figure 3(b) is a differential version of Figure 3(a), where the fixed-QP rate-distortion curve was used as reference.Despite the decrease in codec performance when we use mixed-QP decoding (compared to the fixed QP case), we can achieve significant RD gains when we apply the proposed postprocessing technique.
In Figure 3(c) we show results for the sequence Foreman encoded with QP key = 32 and QP non−key = 38.In this case, with two reference frames and overlapped block motion compensation we obtain an average gain of 0.49dB.With four references frames and regular block motion compensation there is an average gain of 0.87dB.Finally, there is an average gain of 0.91dB when using four reference frames and overlapped motion compensation.Despite the modest objective video quality gains, we show in Figure 4 a significant visual improvement.In order to evaluate the gains, we compare the original 51-st frame of sequence Foreman with a non-key frame, with and without enhancement.Figure 5(a) shows a comparison among the proposed methods and the regular fixed-quality compression applied to a low-motion video sequence.Figures 5(b) and (c) show the differential results for a sequence with low-and high-motion scenes, respectively.Figure 6(a) shows RD plots for the Shields video sequence which has high-and complex-motion scenes.Figures 6(b) and 6(c) also shows the differential curves of the proposed  method for high resolution sequences.
We have also applied the same enhancement technique to CIF-size video sequences compressed with the Motion JPEG 2000 (implemented with the Kakadu software [35]).In this case, instead of determining a fixed quantization, we set a fixed bit rate to each frame.At the mixed quality version, the bit rate ratio between the low quality frames (non-key frames) and high quality frames (key frames) was set to 7/10.As shown in 7(a)-7(c), we can observe a performance improvement, after enhancement.
We further applied our method to motion JPEG (MJPEG).In an MJPEG mixed quality architecture we performed the tests using a quantization matrix at the non-key frames whose entries are three times larger than those of the key-frames.Figure 8(a)-(c) also shows a performance improvement of the use of the mixed quality approach with inter-frame postprocessing.
In Table I, we use an objective metric [36] to calculate the bit-rate savings of the mixed quality (or rate) frames sequence compared to a fixed-quality (or rate) parameter coding.The results show a performance reduction in the RD relation when the mixed quality is used.However, it can outperform the fixed quality (or fixed rate) approach when applying the proposed enhancement technique.Observe that the sequence Foreman compressed with H.264-intra achieves the best enhancement configuration when we add the overlapped motion compensation within multihypothesis motion estimation/compensation (in this experiment we used two and four reference frames).

V. CONCLUSIONS
We proposed a simple architecture that allows for a decoderside enhancement for an intra-only video coding scheme.For that, a mixed quality approach, i.e. varying frame quality, is applied.The proposed method is an example-based quality enhancement, similar to super-resolution for spatial resolution enhancement.In this sense, the proposed mixed-quality framework is a parallel to mixed-spatial-resolution approaches.-4.28% Foreman Experiments show that the proposed technique works for many types of video codecs to enhance low quality frames using high-frequency details from the key-frames, without any additional information being sent to the decoder.An improvement in performance occurs when we use multiple reference frames and overlapped block motion compensation.

Frame N- 1 NFig. 1 .Fig. 2 .
Fig. 1.Video encoding with mixed-quality frames.(a) Encoding key and non-key frames with different parameters.(b) Decoder with low-quality frame enhancement using the key-frames.

Fig. 3 .Fig. 4 .Fig. 5 .
Fig. 3. Results for encoding sequence Foreman, comparing regular fixed quality H.264 intra-only; mixed frame-quality; and mixed frame-quality video sequence approach with the proposed enhancement.(a) RD curves.(b) The differential plot of (a), taking the regular fixed quality parameter video as reference.(c) Comparison of the frame-by-frame enhancement gains to the sequence Foreman encoded with Q key =32, Q non−key =38 and GOP=4.

Fig. 6 .
Fig.6.Results comparing H.264 intra-only regular fixed frame-quality parameter video, mixed frame-quality video sequence and mixed frame-quality video sequence enhanced with the proposed method applied to Shields video sequence.(a) RD curves of the sequence.(b)(c) Differential RD curves comparing H.264 intra-only performance for regular fixed frame-quality, mixed frame-quality and the mixed frame-quality approach enhanced with the proposed method.The tests were performed with Shields and Parkrun video sequences, respectively.

Fig. 7 .Fig. 8 .
Fig. 7. Differential RD curves comparing Motion JPEG 2000 performance for regular fixed-frame-rate, mixed-frame-rate and the mixed-frame-rate approach enhanced with the proposed method.The tests were performed with (a) Foreman, (b) Akiyo and (c) Mobile video sequences.