Image compression, once the domain of the PC, is becoming pervasive in embedded environments. This trend is largely a result of the increased processing power and multimedia capabilities of new DSPs. Consequently, it is essential for embedded designers to gain a better grasp of image algorithms in an effort to implement them efficiently. This means familiarity with the Baseline JPEG compression standard, one of the most popular image compression algorithms in use today.
Although there are many specified versions of JPEG, Baseline JPEG compression embodies a minimum set of requirements. It is lossy, such that the original image cannot be exactly reconstructed. However, there is a "Quality Factor" that allows a tradeoff between image quality and compressed image size. JPEG exploits the characteristics of human vision, eliminating or reducing data to which the eye is less sensitive. It works well on grayscale and color images, especially on p hotographs, but it is not intended for two-tone images.
An uncompressed image might be stored in one of several formats. One popular format is 24 bit/pixel RGB - 8 bits each of red, green and blue subpixels. However, in order to achieve better compression ratios, it is common to decorrelate the RGB fields into separate luminance (Y) and chrominance (Cb, Cr) components via simple linear equations.
Because the human eye is more sensitive to luminance than chrominance, image bandwidth can be reduced by subsampling the (Cb,Cr) fields. Instead of having a Cr and Cb component for each Y component, we can halve the horizontal bandwidth by subsampling the chrominance values (4:2:2 format). Moreover, we can subsample this vertically by 2 (4:2:0 format) to further reduce image bandwidth.
Whichever format is chosen, the image needs to be separated into Y, Cb and Cr buffers, because JPEG acts on each component individually and in identical fashion. Because JPEG operates on 8x8 blocks of byte d ata, in each image buffer the data is partitioned into these blocks, from left to right and top to bottom.
The discrete cosine transform (DCT) stage of JPEG exploits the fact that the human eye favors low-frequency image information over high-frequency details. The 8x8 DCT transforms the image from the spatial domain into the frequency domain. This essentially correlates the input image with each of 64 DCT basis functions that represent different spatial frequencies and records the relative strength of correlation as the coefficient in the output DCT matrix.
The DCT 8x8 output matrix represents a set of magnitudes with increasing spatial frequencies. The DC component is in the top left (0,0) position, and it represents the average value of the 8x8 input image. The other 63 components are AC coefficients, and they correspond to high-frequency details, where frequency increases to the right and bottom of the DCT block.
The results of the DCT are now quantized in order to achieve larg e gains in compression ratio. Quantization refers to representing the actual coefficient values as one of a set of predetermined allowable values, so that the overall data can be encoded in fewer bits because the allowable values are a small fraction of all possible values.
To the human eye, small errors in high-frequency representation are not easily noticed, and eliminating some high-frequency components entirely is often visually acceptable. This reduces the amount of DCT information that needs to be coded for a given 8x8 block.
The quantization process is straightforward once the quantization table is assembled, but the table itself can be quite complex, often with a separate quantization coefficient for each element of the 8x8 DCT block, with larger table values corresponding to higher frequencies in the block. The actual process of quantization is a simple element-wise division between the DCT coefficient and the quantization coefficient for a given row and column.
Next, we pr epare the quantized data in an efficient format for encoding. We can rearrange the coefficients into a one-dimensional array sorted from the DC value to the highest-order spatial frequency coefficient. This is accomplished using zig-zag sorting, a process which traverses the 8x8 block in a back-and-forth direction of increasing spatial frequency. The first element in each 64x1 output array represents the DC coefficient from the DCT matrix, and the remaining 63 coefficients represent the AC components. These two types of information are different enough to warrant separating them and applying different methods of coding to achieve optimal compression efficiency.
The DC components represent the intensity of each 8x8 pixel block. Because of this, significant correlation exists between adjacent blocks. So, while the DC coefficient of any given input array is fairly unpredictable by itself, real images usually do not vary widely in a localized area. As such, the previous DC value can be used to predict the current DC coefficient value. By using a differential prediction model (DPCM), we can increase the probability that the value we encode will be small, and thus reduce the number of bits in the compressed image.
To obtain the coding value we simply subtract the DC coefficient of the previously processed 8x8 pixel block from the DC coefficient of the current block. This value is called the "DPCM difference". Once this value is calculated, it is compared to a table to identify the symbol group to which it belongs, and it is then encoded appropriately.
Because the values of the AC coefficients tend towards zero after the quantization step, these coefficients are run-length encoded. The concept of run-length encoding is a straightforward principle. In real image sequences, pixels of the same value can always be represented as individual bytes, but it doesn't make sense to send the same value over and over again. For example, we have seen that the o utput of the DCT produces many zero-value bytes. The zig-zag ordering helps produce these zeros in groups at the end of each sequence.
Instead of coding each zero individually, we simply encode the number of zeros in a given 'run.' This run-length coded information is then variable-length coded (VLC). The most common VLC algorithm for JPEG is Huffman coding.
A simple way to code a given set of N symbols would be to assign log2N bits of binary code to each. In reality, however, most symbols do not occur with equal probability. For these cases, we can reduce the average number of bits used to compress a sequence by encoding the most common symbols with shorter bit sequences and the less frequently used symbols with longer bit sequences. This is the main purpose of Huffman coding.
Huffman coding is a variable-length encoding technique that is used to compress a stream of symbols with a known probability distribution. Huffman codes are structured as a code tree, which is built by pairing symbols with the two least probabilities of occurrence in a sequence. Each time a pair is combined, the new symbol takes on the sum of the two separate probabilities. The process continues with each node having its probabilities combined and then being paired with the next smallest probable symbol until all the symbols are represented in the coded tree.
Once the code tree is constructed, code sequences can be assigned within the tree structure by assigning a 0 or a 1 bit to each branch. The code for each symbol is then read by concatenating each bit from the branches, starting at the center of the structure and extending to the branch for which the relevant symbol is defined. This procedure produces a unique optimal code for each symbol data size.
Even though the codes can have different lengths, each code can be unambiguously decoded if the symbols are read beginning with the most significant bit. The JPEG standard provides a number of Huffman code tables to describe the different ways that the input quantized data is to be encoded.
The encoded data is written into the JPEG File Interchange Format (JFIF), which, as the name suggests, is a simplified format allowing JPEG-compressed images to be shared across multiple platforms and applications. JFIF includes embedded image and coding parameters, framed by appropriate header information.
Look for the more complete version of this article in the January 20th issue of EE Times' Planet Analog magazine supplement.