Linux Game Programming for PC & Embedded Systems using SDL
Presented by
Fore June
Author of Windows Fan, Linux Fan

Games and SDL
SDL Installation
SDL for Embedded
SDL API
SDL Events
 SDL Graphics
SDL Threads
Thread Example
SDL Animation
SDL Sound
 Raw Video Player
Video Formats
Video Compression
 Game Trees
About The Author

An Introduction to Video Compression in C/C++ now available at Amazon

@Copyright by Fore June, 2006

Video Compression

1 2 3 4 5 6 7

  1. Run-Level Encoding

    After DCT transform, forward quantization and reordering, we may obtain long sequences of zeros followed by nonzero values. One of the effective ways to encode this kind of data is the three-dimensional ( 3D ) run-level encoding. A 3D run-level codeword is represented by a tuple ( run, level, last ) where run is the number of zeros preceding a nonzero coefficient, level is the value of the nonzero coefficient, and last indicates if the codeword is the final one with nonzero coefficient in the block. For example, the output of the sequence

    is

    Implementation of run-level encoding can be done by defining a struct run3D comprising members run, level, and last as follows. The struct run3D can hold one run-level codeword.

    Suppose a macroblock of 8x8 DCT coefficients have been quantized, zigzag-reordered, and saved in an array Y[]. The following piece of code shows how to run-level-encode this block of 64 coefficients.

    The corresponding code to recover the block of 64 DCT coefficients from the run-level codewords is shown below.

    The following listing, run.cpp is a complete program that demonstrates the operations of quantization, reordering and run-level encoding and the reverses of the operations. It reads DCT coefficients from the file "sample_video.dct" which has saved blocks of 64 DCT coefficients in short ( 16-bit ) form. Actually, "sample_video.dct" is obtained from RGB raw video data after the operations of 4:2:0 YCbCr down sampling and DCT Transformation as discussed in Section 15.

    The following are sample outputs of the program run.cpp. You can see from the data that reordering and run-level encoding are reversible while quantization is not.

  2. Huffman Coding

    Rather than saving the 3D run-level tuples directly, people encode them using an entropy encoder which in general yields significant compression. Entropy encoding is a reversible or lossless process; exact data can be recovered in the decoding process from the encoded data. Arithmetic coding and Huffman coding are two popular methods of entropy encoding with the former giving slightly better results and consuming more computing power. This kind of encoding is also referred to as variable-length coding ( VLC ) because the codewords representing symbols are of varying lengths. We shall only discuss the Huffman encoding method here.

    The commonly used generalized ASCII code uses 8 bits to represent a character and unicode uses 16 bits to do so; these are fixed-length codes, which are simple but inefficient in the representation. Huffman coding assigns a variable-length codeword to each symbol ( or tuple here ) based on the probability of the occurrence of the symbols. Frequently occurring symbols are represented with short codewords whilst less common symbols are represented with longer codewords; in this way we have a shorter average codeword length and thus saving space to store the codewords, which leads to data compression.

    We say that a code has the prefix property and is a prefix code if no codeword is the prefix, or start of the codeword for another symbol. A code with codewords {1, 01, 00} has the prefix property; a code consisting of {1, 0, 01, 00} does not, because "0" is a prefix of both "01" and "00". A non-prefix code like { 1, 0, 01, 00 } cannot be instantaneously decoded because when we receive a bitstream such as "001", we do not know whether it consists of { '0', '0', '1' }, or { '0', '01' } or { '00', 1 }. On the other hand, a prefix code can be instantaneously decoded. That is, a message can be transmitted as a sequence of concatenated codewords, without any out-of-band markers to frame the words in the message. The receiver can decode the message unambiguously, by repeatedly finding and removing prefixes that form valid codewords, which are impossible if the message is formed by a non-prefix code as shown in the above example. Prefix codes are also known as prefix-free codes, prefix condition codes, comma-free codes, and instantaneous codes.

    Not only that Huffman codes are prefix codes, they are also optimal in the sense that no other prefix code can yield a shorter average codeword length than a corresponding Huffman code. Huffman code was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes." Huffman codes are easiest to understand and implement if we use trees to represent them though the tree concept was not used when Huffman first developed the code. Briefly, Huffman tree is a binary tree with a weight associated with each node. Let us first consider some simple examples.

    Consider the following two codes.

    Symbol	Code 1 	Code 2
    a 	000 	000
    b 	001 	11
    c 	010 	01
    d 	011 	001
    e 	100 	10
    

    Code 1 is fixed-length and code 2 has the prefix property ( note that a fixed-length code always has the prefix property ). They can be represented by binary trees shown below.

    Decoding a bitstream is simply a process of traversing the binary tree; we start from the root of the tree and go left or right based on whether the current bit examined is 0 or 1 until we reach a leaf which is associated with a symbol; we then start from the root again and examine the next bit and so on. For example, Code 2 decodes the string "000100111" uniquely to "aecb".

  3. Huffman Codes

    Consider an example that the probabilities of the occurrence of the symbols are known:

    Symbol   Frequency   Code 1   Code 2
    a 0.35 000 00
    b 0.20 001 10
    c 0.20 010 011
    d 0.15 011 010
    e 0.10 100 110

    We can calculate the average lengths L of the two codes. Obviously, Code 1 is a fixed-length code and the average length is 3. For Code 2, we need to take into account the frequency of occurrence of each symbol.

    Code 1 : L = 3 bits

    Code 2 : L = 0.35 x 2 + 0.20 x 2 + 0.20 x 3 + 0.15 x 3 + 0.10 x 3 = 2.45 ( bits )

    Code 2 is a prefix code and has a significantly shorter average codeword length than Code 1. Can we do better than Code 2, having a code that has shorter L than Code 2? This question can be answered by constructing a Huffman tree. We start from a forest of trees, each of which has only one single node ( which is a root as well as a leaf ); each root consists of the symbol and the weight ( probability ) of it. In this example, we totally have five single-noded trees as shown below.

    Next, we merge the two trees whose roots have lowest weights and calculate the sum of the two weights. We assign the sum as the weight to the root of the merged tree. The resulted forest is shown below.

    We repeat the above merging process until there is only one tree in the forest. The following figures show two more iterations of the process.

     

    To obtain the codeword for a symbol, we traverse the tree, starting from the root, until we arrive at a leaf, which contains the symbol; in the traversal, we generate a 0 on going left and a 1 on going right ( it works just as well if we generate a 1 on going right and a 0 on left ). We then obtain the following Huffman code.

    symbol     Codeword   Length
    a     102
    b     002
    c     012
    d     1103
    e     1113

    The average length of this Huffman code is

    L = 0.35 x 2 + 0.20 x 2 + 0.2 x 2 + 0.15 x 3 + 0.10 x 3 = 2.25 ( bits )

    which is shorter than that of Code 2 ( 2.45 bits ) discussed above. Actually, one can prove that a Huffman code always gives the shortest average code length of all prefix codes for a given set of probabilities of occurrences of symbols. ( We'll skip the proof here. )

    A Huffman tree can be conveniently constructed using a priority queue. A priority queue is a special queue that associates each element with a priority value. The element in the front of the queue always has the highest priority and is the first one to be deleted. In other words, when an element with a priority higher than the priorities of all the elements currently in the queue enters the queue, it will be moved to the front of the queue. ( The C/C++ Standard Template Library ( STL ) also provides functions of a priority queue. ) We can set the priority of a node to be the reciprocal of its weight. That is, the lower the weight, the higher the priority. With this association, a Huffman tree can be constructed by

    1. Start with a forest consisting of single-node trees; the root of a tree contains a symbol and its weight which is the reciprocal of its priority value.
    2. Insert the roots ( nodes ) of all trees of the forest into a priority queue.
    3. Delete two nodes from the priority queue; the two deleted nodes always have the least weights ( highest priorities ).
    4. Merge the nodes to form a new root with combined weights; insert the new root back to the priority queue.
    5. Repeat steps 3 - 4 until the priority queue is empty.

    The priority queue implementation is straight forward and intuitive. However, in our video compression application here, we shall make use of some of the properties of binary trees to do a more efficient implementation.

  4. Huffman Tree properties

    We'll first prove a lemma to help us simplify the implementation of a Huffman tree and we shall use a table to implement it. Consider a binary tree where

    n0 = number of leaves ( nodes of degree 0 )
    n1 = number of nodes of degree 1 ( nodes having one child )
    n2 = number of nodes of degree 2 ( nodes having two children )

    Lemma:

    For a non-empty binary tree,
      n0 = n2 + 1

    Proof:

    Total number of nodes is
      n = n0 + n1 + n2

    Except the root, a node always has a branch leading to it. Thus the total number of branches is

      nB = n - 1

    But all branches stem from nodes of degree 1 or 2, so

      nB = n1 + 2n2
    or
      n - 1 = n1 + 2n2
    yielding
      n0 + n1 + n2 - 1 = n1 + 2n2

    From which we have

      n0 = n2 + 1

    From the construction of a Huffman tree, we know that it does not have any node of degree 1 ( i.e. n1 = 0 ) and thus the number of internal nodes is equal to n2. Also, the number of symbols is equal to the number of leaves ( n0 ) in the tree. Therefore, if we have n symbols, the corresponding Huffman tree will have n - 1 ( = n2 ) internal nodes. To save a Huffman tree, we need an entry for each of the left and right child pointers and an entry for each symbol. Therefore, the total number of entries NT in the table that holds the Huffman Tree is

    For instance, consider a case that we have 5 symbols: a, b, c, d, e. The codewords and the Huffman tree are shown in Table 1 and the figure below.

    Table 1.
    SymbolsCodewords
    a 0
    b 100
    c 1010
    d 1011
    e 11

    In this example, the size of the table NT implementing the Huffman tree is

    The Huffman tree is represented by Table 2, where traversing left gives a 0, and traversing right gives a 1 and NT = 13.

     
    Table 2
    table
    index
    table
    content
    Comments
    120left child of node 0 ( root )
    1110right child of node 0
    108left child of node 1
    94right child of node 1
    81left child of node 2
    76right child of node 2
    62left child of node 3
    53right child of node 3
    4'e' symbol at leaf
    3'd'  
    2'c'  
    1'b'  
    0'a'  

    Note that in Table 2, the root is pointing at the highest table location ( NT - 1) and when traversing the tree we move from the top of the table down to a location that contains a symbol. When we reach a location whose index is smaller than NT, we know that we have reached a terminal node ( leaf ) containing a symbol. The following piece of code shows how we traverse the table to obtain the symbols. In the code, htree[] is the table containing the Huffman tree.

      loc = 3 * N - 3;		//start from root, N = # of symbols
      do {
        loc0 = loc;			//fp is file pointer pointing to encoded data
        if ( read_one_bit( fp ) == 0 )   //a 0, go left
            loc = htree[loc0];
        else
            loc =  htree[loc0+1];	    //a 1, go right
       
      } while ( loc >= N );        	//traverse until reaches leaf
      return htree[loc];	 	//returns symbol
      

    Table 2 can be simplified if we assert that the number of symbols n is smaller than a certain value and we associate each symbol with a value in the range 0 to n - 1. For example, if we assert n ≤ 128, then

  5. each pointer can be represented by a byte, with a left child denoted by the upper byte of a 16-bit word and right child by lower byte,
  6. table locations signify symbol values and do not need extra entries to hold the symbols,
  7. table size NT is reduced to n - 1

  8. The above Huffman tree example where the number of symbols is 5 can now be represented by a table with size 5 - 1 = 4 as shown below.

    table index
    ( symbol )
    left childright child comments
    3 ( 8 ) 0 ( a ) 7 left, right children of node 0 ( root )
    2 ( 7 ) 6 4 ( e ) left, right children of node 1
    1 ( 6 ) 1 ( b ) 5 left, right children of node 2
    0 ( 5 ) 2 ( c ) 3 ( d ) left, right children of node 3

    The table only has 4 entries. The symbols are represented by the table index with 0 representing 'a', 1 representing 'b' and so on. To resolve the case if a table entry holds a pointer or an actual symbol value, we've added the value NT to all table indices before saving it as pointers. Therefore, in a table entry if the pointer value is smaller than NT, we know that it is a terminal node. The following piece of code shows how to decode such a table that represents a Huffman tree; a left mask and right mask are used to extract the correct pointer value.

    				//N = # of symbols
      left_mask = 0xFF00;		//to extract upper byte ( left child )
      right_mask = 0x00FF;		//to extract lower byte ( right child )
      loc = ( N - 1 ) + N;		//start from root; add offset N 
    				//  to distinguish pointers from symbols 
      do {
        loc0 = loc - N;		//loc0 is real table location
        if ( read_one_bit( fp ) == 0 ){   //a 0, go left
            loc = ( htree[loc0] & left_mask ) >> 8;
       } else{
            loc =  htree[loc0] & right_mask;
       }
      } while ( loc >= N );        	//traverse until reaches leaf
      return loc;			//symbol value = loc
    

  9. Pre-calculated Huffman-based Tree Coding

    The Huffman coding process has a disadvantage that the statistics of the occurrence of the symbols must be known ahead of the encoding process. Though we do not need to transmit the probability table to the decoder, we do need it before we can do any encoding. The probability table for a large video cannot be calculated until after the video data have been processed which may introduce unacceptable delay into the encoding process. Because of these, practical video coding standards define sets of codewords based on the probability distributions of generic video data. The following example is a pre-calculated Huffman table taken from MPEG-4 Visual ( Simple Profile ), which uses 3D run-level coding discussed above to encode quantized coefficients. A total of 102 specific combinations of ( run, level, last ) have variable-length codewords assigned to them and part of these are shown in Table 4. Each codeword can be up to 13 bits long and the last bit is the sign bit 's', which indicates if the decoded coefficient is positive ( 0 ) or negative ( 1 ). Any (run, level, run ) combination that is not listed in the table is coded using an escape sequence; a special ESCAPE code of 0000011 is first transmitted followed by a 13-bit fixed-length codeword describing the values of run, level, and last.

    We shall use this table in our entropy-encoding stage. The full implementation of the Huffman encoding and decoding for our video codec will be discussed in the next section.

    <<Prev   Next >>