By Vijay Kumar Kodavalla, Lead Architect, Wipro Technologies, Bangalore, India
Dr. P.G. Krishna Mohan, Professor, JNTU College of Engineering, Hyd, India
The state-of-the-art video compression standards such as MPEG2, MPEG4, H.264, DivX and VC1 are suitable for applications such as broadcast and streaming video-on-demand. Typically encoder complexity is 5-10 times that of the decoder, which is intentionally designed as video is compressed once and decoded many times. However there is another class of emerging applications such as wireless video cameras, wireless low-power surveillance networks, and disposable video cameras and certain medical applications etc. For these emerging applications existing popular video compression standards are not suitable due to very high encoder complexity to exploit inter frame predictive coding by using motion estimation, apart from intra coding. Further the said encoder complexity demands very high computation power, energy and huge memory. But for the said emerging applications a low complexity encoder and possibly at the expense of high complexity decoder is desired, as computational power, energy and memory are scarce at the encoder. Distributed Video Coding (DVC) is a new coding paradigm for video compression for the said emerging applications. It has been widely accepted and many researchers have been actively working on DVC, though there are still many gaps in terms of implementation and practical usage as of now. This paper highlights gaps and challenges in implementation of DVC and its practical usage. This paper is backed up with vast experience of designing various codec’s including H.264, MPEG2 and VC-1, with support up-to 1080P60 resolutions.
Distributed Video Coding is based on Slepian wolf and Wyner ziv (WZ) information theoretic results. The theorems state that Rate Distortion (RD) performance achieved, when performing the joint encoding and joint decoding of two correlated sources can also be obtained by separate encoding and joint decoding. Distributed coding exploits the source statistics in the decoder and, hence the encoder can be very simple, at the expense of the more complex decoder. The traditional distribution of complex encoder and simple decoder is essentially reversed. In DVC, the complex task (Motion estimation) is shifted to decoder, here decoder is responsible to obtain the side information, a guess of the one type of input frame and the encoder only sends parity bits to correct errors or to improve its quality.
Following are the key gaps and challenges in practical usage of DVC in the present state:
- Feedback channel from decoder to encoder
- Lack of process for color components coding
- Lack of compressed video transport definition
- Splitting group of picture among intra coding and Wyner-Ziv (WZ) coding
- Inconsistencies in Rate Distortion (RD) performance with different video streams
- No standardization yet
The said gaps and challenges have to be handled to meet the DVC objectives and its practical usage. The Section 2 highlights typical DVC codec and their key components. The Section 3 describes indetail said gaps and challenges which have to be handled to meet the DVC objectives and its practical usage, followed by conclusions in Section 4.
2. Typical DVC Codec
A typical DVC encoder and decoder are shown in Figure 1. The incoming video sequence is first divided into group of pictures (GOPs). The first frame of each GOP is called key frame and will be encoded using conventional intra encoder. The remaining frames in a GOP are encoded using
Figure 1. Typical DVC Codec
distributed coding principles, and are referred to as WZ frames. H.264/AVC intra or JPEG2000 can be used as conventional intra coder. A comparative study of JPEG2000 and H.264/AVC intra has been done in Reference . For high spatial resolution sequences (704x576 and above), JPEG2000 RD performance is comparable with AVC High Profile Intra with around 0.1 dB difference in PSNR in favor of AVC High Profile. Furthermore for high spatial resolution sequences, JPEG2000 outperforms the Main Profile with gains around 0.1~1.0 dB in PSNR. For intermediate and low spatial resolution (352x288 and below) sequences JPEG2000 RD performance is outperformed by both Main and High Profiles of AVC Intra. The WZ frames first undergo transform such as Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT), depending on the conventional intra coder chosen. The transformed coefficients are then quantized before splitting into bit planes and encoding. The Turbo, Trellis, LDPC or LDPCA (Low Density Parity Check Accumulate) codes can be used for encoding. It has been shown in Reference  that LDPCA performs better than remaining codes in DVC. Here, channel codes are deployed for rate adaptive error correction of frames estimated at decoder, by sending parity bits. The parity bits are stored in the buffer and sent to decoder in chunks based on the requests from the decoder. Hence a feedback channel is expected between decoder and encoder. At the decoder, Side Information (SI) approximating the WZ frames is generated by motion compensated interpolation or extrapolation (MCI or MCE) of previously decoded frames. The SI is used in the decoder, along with the parity bits of the WZ frames requested via a feedback channel to decode the quantized bit planes. The SI together with the decoded quantized bit planes are used to reconstruct the transform coefficients bands. When all the transform coefficient bands are decoded, the decoded WZ frame can be obtained by applying inverse transform. The Key frames go through conventional intra encoder and decoder.
The RD performance of DVC is comparatively better than that of H.264 intra in most of the cases. The RD performance is best with low to medium amount of motion activity such as Hall monitor video surveillance sequence. The Peak Signal-to- Noise Ratio (PSNR) achieved for Hall monitor sequence with DVC is around 2-3dB better than that of H.264 intra for a given bit rate. The RD performance is still better with medium to high amount of motion activity such as Coast guard sequence. The PSNR achieved for Coast guard sequence with DVC is around 1~2dB better than that of H.264 intra for a given bit rate. The RD performance is comparable with very high amount of motion activity such as Foreman video conferencing sequence. The PSNR achieved for Foreman sequence with DVC is around (-1) ~ (+1) dB lower/better compared to that of H.264 intra for a given bit rate. The RD performance is lower with significant motion activity such as Soccer sequence. The PSNR achieved for Soccer sequence with DVC is around 3dB lower than that of H.264 intra for a given bit rate. We have presented implementation details and these results in another submitted paper, Reference .
3. Gaps and challenges in practical usage of DVC
Though intended to be at par in performance, still DVC RD performance doesn’t equate to that of H.264 AVC. Apart from this, there are several other key gaps and challenges in practical deployment of DVC, which are listed in following sub-sections.
3.1 Feedback channel from decoder to encoder
As explained in Section 2, decoder will be iteratively requesting for more parity bits till the decoding is successful, through the feedback channel. Therefore it is the decoders responsibility to perform rate control to guarantee that only a minimum of parity bits are sent by encoder in order to correct the mismatches/errors present in each bitplane, and thus to achieve the minimum rate for a target quality. In some of the practical systems usually bidirectional communication channels are not available. For example, for many emerging applications of DVC such as mobile usage, document scanner on-the-go, video conferencing, etc., it is physically impossible to have a feedback channel. Even if feedback channel were to be available, the decoding delay will be too high, since several iterative decoding operations may be needed to decode the data to the target quality level. Even if decode delays are acceptable, it poses another problem of storing encoded video streams till decode is successful. To overcome this problem, the decoder should receive the required number of parity bits in a single iteration. This is possible only if the encoder can estimate the near accurate rate and sends the required number of parity bits to the decoder, as depicted in Figure 2.
Figure 2. DVC Codec without feedback and with proposed Encoder rate control
3.2 Lack of process for color components coding
Various methods proposed in the literature described only handling of luma component. Thus in the literature, methods of encoding and decoding of luma component alone are dealt with and even RD performance comparisons with that of conventional intra coder are done for luma only. For the sake of completion and for practical usage, even procedure for chroma components encoding and decoding is required.
3.3 Lack of compressed video transport definition
In the literature transport definition of DVC is not dealt with, which is a key aspect of connectivity and interoperability. May be one of the current challenge in defining transport for combined WZ coder and conventional intra key frame coder is due to presence of feedback channel, which makes decoding interactive and iterative. It is necessary to have a transport multiplexer and de-multiplexer as shown in Figure 3.
3.4 Splitting group of picture among intra coding and WZ coding
The incoming video stream can be split into WZ frames and key frames, based on certain simple motion activity metrics using histograms. This is to keep the encoder simpler. Obviously having higher GOP for low motion activity frames is highly desirable to reduce the encoded frame rates. But supporting higher GOP demands frame buffer at the encoder and introduces encoding delays, which are undesirable from point of view of minimizing encoder complexity. For example to support GOP upto four, a frame buffer which can store atleast four frames is needed at the encoder to evaluate and decide on actual GOP. And there is no necessity of frame buffer if GOP is always chosen as two. Hence for a practical DVC, a fixed GOP of two is highly desirable.
3.5 Inconsistencies in RD performance with different video streams
As explained in Section 2, RD performance of DVC is better or comparable to that of H.264 intra, for low to high motion activity video sequences. However for significant motion activity video sequences, RD performance of DVC is lower than that of H.264 intra. With such video sequences, it is desirable to bypass WZ coder and instead code only using conventional intra coder, as shown in Figure 4; when the motion activity is significant. In this context it is advantageous if significant motion activity in video sequence can be detected using simple motion activity metrics using histograms.
As shown in Figure 1, the encoding of WZ frames and key frames are separate, where as decoding is combined.
The conventional intra encoder and decoder used in key frame path employs de-blocking filter to minimize blocking artifacts (as a part of standard). And in encoding and decoding WZ frames, block based transforms are used without any filtering. This causes reduced RD performance of decoded WZ frames apart from visible blocking artifacts. Apart from this, flicker may also be seen because key frames are deblock filtered, whereas WZ frames are not. Hence while displaying decoded Key and WZ frames in interleaved way, flicker may be visible. Hence it is highly desirable to deblock filter the decoded WZ frames as well in an adaptive way, as proposed in Figure 5.
Figure 3. DVC Codec with proposed Transport Mux & De-Mux
Figure 4. DVC Codec with Video splitter and proposed WZ bypass
Figure 5. DVC Decoder with proposed De-blocking filter
Unlike video compression standards such as MPEG2, MPEG4, H.264 and so on, DVC is not standardized yet. Hence researchers are adopting their own architectures and methods. For example, there are pixel domain and transform domain methods, with or without a transform. Various transforms are used in different implementations such as DCT, DWT etc. Various encoders/decoders are used such as Turbo, Trellis, LDPC and LDPCA etc. Standardization of DVC is necessary before it start seeing practical usage. Standardization is necessary to maintain consistency among various methods in the interest of interoperability among various encoders and decoders.
Undoubtedly DVC is new coding paradigm for video compression, which can meet needs of emerging applications such as wireless video cameras, wireless low-power surveillance networks, and disposable video cameras etc. It has been widely accepted by research community and industry as next generation video coding paradigm and has been extensively working upon. Though lot of work has been done, there are still gaps and challenges in practical usage, which are highlighted in this paper.
 Vijay Kumar Kodavalla, “Distributed Video Coding: Codec Architecture and Implementation”, SPPRA2011, Innsbruck, Austria [Submitted]
 Vijay Kumar Kodavalla, “Distributed Video Coding: Adaptive Video Splitter”, IP-SOC2010, Grenoble, France
 X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, M. Ouaret, “The Discover Codec: Architecture, Techniques And Evaluation”, Picture Coding Symposium (PCS), Lisboa, Portugal, November 2007.
 D. Varodayan, A. Aaron and B. Girod, “Rate- Adaptive Codes for Distributed Source Coding”, EURASIP Signal Processing Journal, Special Section on Distributed Source Coding, vol. 86, no. 11, November 2006.
 M. Ouaret, F. Dufaux, T. Ebrahimi, “On comparing JPEG2000 and intraframe AVC”, International Society for Optical Engineering, SPIE Optics & Photonics, San Diego, California, USA, August 13-17, 2006.
 C. Brites, J. Ascenso, and F. Pereira, "Improving Transform Domain Wyner-Ziv Video Coding Performance", IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, May 2006.
 Simone Milani and Giancarlo Calvagno, “A Distributed video coder based on H.264/AVC standard”, Dept. of Information Engineering, University of Padova, Italy.
 A. Aaron, E. Setton, and B. Girod, “Towards practical Wyner-Ziv coding of video”, in Proc. IEEE International Conference on Image Processing, Barcelona, Spain, Sept. 2003
 A. Aaron, R. Zhang, and B. Girod, “Wyner-Ziv coding of motion video”, in Proc. Asilomar Conference on Signals and Systems, Pacific Grove, CA, Nov. 2002.