| Games and SDL SDL Installation SDL for Embedded SDL API SDL Events | SDL Graphics SDL Threads Thread Example SDL Animation SDL Sound | Raw Video Player Video Formats Video Compression | Game Trees About The Author |
Video Compression
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
After DCT transform, forward quantization and reordering, we may obtain long sequences of zeros followed by nonzero values. One of the effective ways to encode this kind of data is the three-dimensional ( 3D ) run-level encoding. A 3D run-level codeword is represented by a tuple ( run, level, last ) where run is the number of zeros preceding a nonzero coefficient, level is the value of the nonzero coefficient, and last indicates if the codeword is the final one with nonzero coefficient in the block. For example, the output of the sequence
Implementation of run-level encoding can be done by defining a struct run3D comprising members run, level, and last as follows. The struct run3D can hold one run-level codeword.
typedef struct {
unsigned char run;
short level;
char last;
} run3D;
|
Suppose a macroblock of 8x8 DCT coefficients have been quantized, zigzag-reordered, and saved in an array Y[]. The following piece of code shows how to run-level-encode this block of 64 coefficients.
/*
Input: 64 quantized DCT coefficients in Y[].
Output: 3D run-level codewords in runs[].
*/
void run_block ( short Y[], run3D runs[] )
{
short run_length = 0, k = 0;
for ( short i = 0; i < 64; ++i ) {
if ( Y[i] == 0 ) {
run_length++;
continue;
}
//Y[i] != 0
runs[k].run = (unsigned char) run_length;
runs[k].level = Y[i];
runs[k].last = 0;
run_length = 0;
k++;
} //for i
if ( k > 0 )
runs[k-1].last = 1; //last nonzero element
else { //whole block 0
runs[0].run = 64;
runs[0].level = 0;
runs[0].last = 0;
}
}
|
The corresponding code to recover the block of 64 DCT coefficients from the run-level codewords is shown below.
/*
Input: 3D run-level codewords of a macroblock in runs[].
Output: 64 DCT coefficients in Y[].
*/
void run_decode ( run3D runs[], short Y[] )
{
short i, j, k = 0;
i = 0;
while ( i < 64 ) {
for ( j = 0; j < runs[k].run; ++j )
Y[i++] = 0;
if ( i < 64 )
Y[i++] = runs[k].level;
if ( runs[k].last ) break;
k++;
}
//run of 0s to end
if ( i < 64 ) {
bzero ( Y + i, sizeof(Y[0]) * ( 64-i) );
}
}
|
The following listing, run.cpp is a complete program that demonstrates the operations of quantization, reordering and run-level encoding and the reverses of the operations. It reads DCT coefficients from the file "sample_video.dct" which has saved blocks of 64 DCT coefficients in short ( 16-bit ) form. Actually, "sample_video.dct" is obtained from RGB raw video data after the operations of 4:2:0 YCbCr down sampling and DCT Transformation as discussed in Section 15.
/*
run.cpp
A demo program that illustrates the concepts of quantization, zigzag reordering
and run-level encoding. It reads DCT coefficients from the file "sample_video.dct"
which has saved size-64 blocks of DCT coefficients in short ( 16-bit ) form.
Compile: g++ -o run run.cpp
Execute: ./run
*/
#include <iostream>
#include <stdio.h>
#include <math.h>
#include <string.h>
using namespace std;
const short Qstep = 12; //quantization step
typedef struct { //struct for holding one 3D run-level codewording
unsigned char run;
short level;
char last;
} run3D;
unsigned char zigzag[64] = {
0, 1, 8, 16, 9, 2, 3, 10,
17, 24, 32, 25, 18, 11, 4, 5,
12, 19, 26, 33, 40, 48, 41, 34,
27, 20, 13, 6, 7, 14, 21, 28,
35, 42, 49, 56, 57, 50, 43, 36,
29, 22, 15, 23, 30, 37, 44, 51,
58, 59, 52, 45, 38, 31, 39, 46,
53, 60, 61, 54, 47, 55, 62, 63
};
short get64 ( short Y[], FILE *fp )
{
short n;
n = fread ( Y, 2, 64, fp );
return n;
}
short put64 ( short Y[], FILE *fp )
{
short n;
n = fwrite ( Y, 2, 64, fp );
return n;
}
void print_block ( short s[] )
{
short k = 0;
for ( short i = 0; i < 8; ++i ) {
printf("\n");
for ( short j = 0; j < 8; ++j ) {
printf("%6d", s[k++] );
}
}
printf("\n");
}
void quantize_block ( short coef[] )
{
for ( short k = 0; k < 64; ++k ) {
coef[k] = ( short ) round ( (double)coef[k] / Qstep );
}
}
//inverse quantize one block
void inverse_quantize_block ( short coef[] )
{
for ( short k = 0; k < 64; ++k ) {
coef[k] = coef[k] * Qstep;
}
}
//zigzag reorder one block
void reorder ( short Y[], short Yr[] )
{
for ( short k = 0; k < 64; ++k ) {
Yr[k] = Y[zigzag[k]];
}
}
//reverse the reorder of one block
void reverse_reorder ( short Yr[], short Y[] )
{
for ( short k = 0; k < 64; ++k ) {
Y[zigzag[k]] = Yr[k];
}
}
/*
Input: 64 quantized DCT coefficients in Y[].
Output: 3D run-level codewords in runs[].
*/
void run_block ( short Y[], run3D runs[] )
{
short run_length = 0, k = 0;
for ( short i = 0; i < 64; ++i ) {
if ( Y[i] == 0 ) {
run_length++;
continue;
}
//Y[i] != 0
runs[k].run = (unsigned char) run_length;
runs[k].level = Y[i];
runs[k].last = 0;
run_length = 0;
k++;
} //for i
if ( k > 0 )
runs[k-1].last = 1; //last nonzero element
else { //whole block 0
runs[0].run = 64;
runs[0].level = 0;
runs[0].last = 0;
}
}
//print the run-level codewords of one block
void print_run( run3D runs[] )
{
short k = 0, len = 0, count = 0;
while ( len < 64 ) {
if ( count % 4 == 0 ) printf("\n");
printf( "(%2d, %3d, %1d) ", runs[k].run, runs[k].level, runs[k].last );
len += runs[k].run;
++len;
if ( runs[k].last ) break;
count++;
k++;
}
printf("\n");
}
/*
Input: 3D run-level codewords of a macroblock in runs[].
Output: 64 DCT coefficients in Y[].
*/
void run_decode ( run3D runs[], short Y[] )
{
short i, j, k = 0;
i = 0;
while ( i < 64 ) {
for ( j = 0; j < runs[k].run; ++j )
Y[i++] = 0;
if ( i < 64 )
Y[i++] = runs[k].level;
if ( runs[k].last ) break;
k++;
}
//run of 0s to end
if ( i < 64 ) {
bzero ( Y + i, sizeof(Y[0]) * ( 64-i) );
}
}
int main()
{
FILE *fp = fopen ( "sample_video.dct", "rb" );
if ( fp == NULL ) {
printf("\nError opening file\n");
return 1;
}
short Y[64], Yr[64];
run3D runs[64];
while ( get64 ( Y, fp ) > 0 ) { //read a block of 64 samples
printf("\nA block of DCT coefficients:");
print_block ( Y );
quantize_block ( Y );
printf("\nDCT block after quantization ( Qstep = %d ):", Qstep );
print_block ( Y );
reorder ( Y, Yr );
printf("\nDCT block After zigzag reorder:");
print_block ( Yr );
run_block ( Yr, runs );
printf("\n3D run-level codewords of the reordered quantized DCT coeficients:");
print_run ( runs );
printf("\nHit any key to reverse the processes:");
getchar();
//reversing the process
memset ( Y, 1, 128 );
run_decode ( runs, Y );
printf("\nDCT block after decode run:");
print_block ( Y );
reverse_reorder ( Yr, Y );
printf("\nDCT block after reversing reorder:");
print_block ( Y );
inverse_quantize_block ( Y );
printf("\nDCT block after inverse quantization:");
print_block ( Y );
printf("\nDo you want another block? ( y/n) ");
char c;
scanf ( "%c", &c );
if ( c == 'n' ) break;
}
fclose ( fp );
}
|
The following are sample outputs of the program run.cpp. You can see from the data that reordering and run-level encoding are reversible while quantization is not.
$ ./run
A block of DCT coefficients:
479 -27 -63 12 -28 6 13 7
69 116 -53 -49 -3 -10 16 -2
-31 58 44 -42 0 7 3 3
1 -19 19 19 -18 12 -8 1
-1 -8 -2 13 0 3 -3 -1
6 0 -1 2 -3 -1 1 -3
1 4 -2 -3 1 -1 2 0
1 -9 -1 0 2 -1 0 0
DCT block after quantization ( Qstep = 12 ):
40 -2 -5 1 -2 1 1 1
6 10 -4 -4 0 -1 1 0
-3 5 4 -4 0 1 0 0
0 -2 2 2 -2 1 -1 0
0 -1 0 1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 -1 0 0 0 0 0 0
DCT block After zigzag reorder:
40 -2 6 -3 10 -5 1 -4
5 0 0 -2 4 -4 -2 1
0 -4 2 -1 1 0 0 0
2 0 -1 1 1 1 1 -2
1 0 0 0 -1 0 0 0
1 0 0 0 -1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
3D run-level codewords of the reordered quantized DCT coeficients:
( 0, 40, 0) ( 0, -2, 0) ( 0, 6, 0) ( 0, -3, 0)
( 0, 10, 0) ( 0, -5, 0) ( 0, 1, 0) ( 0, -4, 0)
( 0, 5, 0) ( 2, -2, 0) ( 0, 4, 0) ( 0, -4, 0)
( 0, -2, 0) ( 0, 1, 0) ( 1, -4, 0) ( 0, 2, 0)
( 0, -1, 0) ( 0, 1, 0) ( 3, 2, 0) ( 1, -1, 0)
( 0, 1, 0) ( 0, 1, 0) ( 0, 1, 0) ( 0, 1, 0)
( 0, -2, 0) ( 0, 1, 0) ( 3, -1, 0) ( 3, 1, 0)
( 3, -1, 1)
Hit any key to reverse the processes:
DCT block after decode run:
40 -2 6 -3 10 -5 1 -4
5 0 0 -2 4 -4 -2 1
0 -4 2 -1 1 0 0 0
2 0 -1 1 1 1 1 -2
1 0 0 0 -1 0 0 0
1 0 0 0 -1 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
DCT block after reversing reorder:
40 -2 -5 1 -2 1 1 1
6 10 -4 -4 0 -1 1 0
-3 5 4 -4 0 1 0 0
0 -2 2 2 -2 1 -1 0
0 -1 0 1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 -1 0 0 0 0 0 0
DCT block after inverse quantization:
480 -24 -60 12 -24 12 12 12
72 120 -48 -48 0 -12 12 0
-36 60 48 -48 0 12 0 0
0 -24 24 24 -24 12 -12 0
0 -12 0 12 0 0 0 0
12 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 -12 0 0 0 0 0 0
Do you want another block? ( y/n)
|
Rather than saving the 3D run-level tuples directly, people encode them using an entropy encoder which in general yields significant compression. Entropy encoding is a reversible or lossless process; exact data can be recovered in the decoding process from the encoded data. Arithmetic coding and Huffman coding are two popular methods of entropy encoding with the former giving slightly better results and consuming more computing power. This kind of encoding is also referred to as variable-length coding ( VLC ) because the codewords representing symbols are of varying lengths. We shall only discuss the Huffman encoding method here.
The commonly used generalized ASCII code uses 8 bits to represent a character and unicode uses 16 bits to do so; these are fixed-length codes, which are simple but inefficient in the representation. Huffman coding assigns a variable-length codeword to each symbol ( or tuple here ) based on the probability of the occurrence of the symbols. Frequently occurring symbols are represented with short codewords whilst less common symbols are represented with longer codewords; in this way we have a shorter average codeword length and thus saving space to store the codewords, which leads to data compression.
We say that a code has the prefix property and is a prefix code if no codeword is the prefix, or start of the codeword for another symbol. A code with codewords {1, 01, 00} has the prefix property; a code consisting of {1, 0, 01, 00} does not, because "0" is a prefix of both "01" and "00". A non-prefix code like { 1, 0, 01, 00 } cannot be instantaneously decoded because when we receive a bitstream such as "001", we do not know whether it consists of { '0', '0', '1' }, or { '0', '01' } or { '00', 1 }. On the other hand, a prefix code can be instantaneously decoded. That is, a message can be transmitted as a sequence of concatenated codewords, without any out-of-band markers to frame the words in the message. The receiver can decode the message unambiguously, by repeatedly finding and removing prefixes that form valid codewords, which are impossible if the message is formed by a non-prefix code as shown in the above example. Prefix codes are also known as prefix-free codes, prefix condition codes, comma-free codes, and instantaneous codes.
Not only that Huffman codes are prefix codes, they are also optimal in the sense that no other prefix code can yield a shorter average codeword length than a corresponding Huffman code. Huffman code was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes." Huffman codes are easiest to understand and implement if we use trees to represent them though the tree concept was not used when Huffman first developed the code. Briefly, Huffman tree is a binary tree with a weight associated with each node. Let us first consider some simple examples.
Consider the following two codes.
Symbol Code 1 Code 2 a 000 000 b 001 11 c 010 01 d 011 001 e 100 10
Code 1 is fixed-length and code 2 has the prefix property ( note that a fixed-length code always has the prefix property ). They can be represented by binary trees shown below.
Decoding a bitstream is simply a process of traversing the binary tree; we start from the root of the tree and go left or right based on whether the current bit examined is 0 or 1 until we reach a leaf which is associated with a symbol; we then start from the root again and examine the next bit and so on. For example, Code 2 decodes the string "000100111" uniquely to "aecb".
Consider an example that the probabilities of the occurrence of the symbols are known:
| Symbol | Frequency | Code 1 | Code 2 |
| a | 0.35 | 000 | 00 |
| b | 0.20 | 001 | 10 |
| c | 0.20 | 010 | 011 |
| d | 0.15 | 011 | 010 |
| e | 0.10 | 100 | 110 |
We can calculate the average lengths L of the two codes. Obviously, Code 1 is a fixed-length code and the average length is 3. For Code 2, we need to take into account the frequency of occurrence of each symbol.
Code 2 : L = 0.35 x 2 + 0.20 x 2 + 0.20 x 3 + 0.15 x 3 + 0.10 x 3 = 2.45 ( bits )
Code 2 is a prefix code and has a significantly shorter average codeword length than Code 1. Can we do better than Code 2, having a code that has shorter L than Code 2? This question can be answered by constructing a Huffman tree. We start from a forest of trees, each of which has only one single node ( which is a root as well as a leaf ); each root consists of the symbol and the weight ( probability ) of it. In this example, we totally have five single-noded trees as shown below.
Next, we merge the two trees whose roots have lowest weights and calculate the sum of the two weights. We assign the sum as the weight to the root of the merged tree. The resulted forest is shown below.
We repeat the above merging process until there is only one tree in the forest. The following figures show two more iterations of the process.
|
|
To obtain the codeword for a symbol, we traverse the tree, starting from the root, until we arrive at a leaf, which contains the symbol; in the traversal, we generate a 0 on going left and a 1 on going right ( it works just as well if we generate a 1 on going right and a 0 on left ). We then obtain the following Huffman code.
| symbol | Codeword | Length | |
|---|---|---|---|
| a | 10 | 2 | |
| b | 00 | 2 | |
| c | 01 | 2 | |
| d | 110 | 3 | |
| e | 111 | 3 |
The average length of this Huffman code is
which is shorter than that of Code 2 ( 2.45 bits ) discussed above. Actually, one can prove that a Huffman code always gives the shortest average code length of all prefix codes for a given set of probabilities of occurrences of symbols. ( We'll skip the proof here. )
A Huffman tree can be conveniently constructed using a priority queue. A priority queue is a special queue that associates each element with a priority value. The element in the front of the queue always has the highest priority and is the first one to be deleted. In other words, when an element with a priority higher than the priorities of all the elements currently in the queue enters the queue, it will be moved to the front of the queue. ( The C/C++ Standard Template Library ( STL ) also provides functions of a priority queue. ) We can set the priority of a node to be the reciprocal of its weight. That is, the lower the weight, the higher the priority. With this association, a Huffman tree can be constructed by
The priority queue implementation is straight forward and intuitive. However, in our video compression application here, we shall make use of some of the properties of binary trees to do a more efficient implementation.
We'll first prove a lemma to help us simplify the implementation of a Huffman tree and we shall use a table to implement it. Consider a binary tree where
Lemma:
Proof:
Except the root, a node always has a branch leading to it. Thus the total number of branches is
But all branches stem from nodes of degree 1 or 2, so
From which we have
From the construction of a Huffman tree, we know that it does not have any node of degree 1 ( i.e. n1 = 0 ) and thus the number of internal nodes is equal to n2. Also, the number of symbols is equal to the number of leaves ( n0 ) in the tree. Therefore, if we have n symbols, the corresponding Huffman tree will have n - 1 ( = n2 ) internal nodes. To save a Huffman tree, we need an entry for each of the left and right child pointers and an entry for each symbol. Therefore, the total number of entries NT in the table that holds the Huffman Tree is
For instance, consider a case that we have 5 symbols: a, b, c, d, e. The codewords and the Huffman tree are shown in Table 1 and the figure below.
|
![]() |
||||||||||||||
In this example, the size of the table NT implementing the Huffman tree is
The Huffman tree is represented by Table 2, where traversing left gives a 0, and traversing right gives a 1 and NT = 13.
| |||||||||||||||||||||||||||||||||||||||||||||||
Note that in Table 2, the root is pointing at the highest table location ( NT - 1) and when traversing the tree we move from the top of the table down to a location that contains a symbol. When we reach a location whose index is smaller than NT, we know that we have reached a terminal node ( leaf ) containing a symbol. The following piece of code shows how we traverse the table to obtain the symbols. In the code, htree[] is the table containing the Huffman tree.
loc = 3 * N - 3; //start from root, N = # of symbols
do {
loc0 = loc; //fp is file pointer pointing to encoded data
if ( read_one_bit( fp ) == 0 ) //a 0, go left
loc = htree[loc0];
else
loc = htree[loc0+1]; //a 1, go right
} while ( loc >= N ); //traverse until reaches leaf
return htree[loc]; //returns symbol
|
Table 2 can be simplified if we assert that the number of symbols n is smaller than a certain value and we associate each symbol with a value in the range 0 to n - 1. For example, if we assert n ≤ 128, then
The above Huffman tree example where the number of symbols is 5 can now be represented by a table with size 5 - 1 = 4 as shown below.
| table index ( symbol ) |
left child | right child | comments |
|---|---|---|---|
| 3 ( 8 ) | 0 ( a ) | 7 | left, right children of node 0 ( root ) |
| 2 ( 7 ) | 6 | 4 ( e ) | left, right children of node 1 |
| 1 ( 6 ) | 1 ( b ) | 5 | left, right children of node 2 |
| 0 ( 5 ) | 2 ( c ) | 3 ( d ) | left, right children of node 3 |
The table only has 4 entries. The symbols are represented by the table index with 0 representing 'a', 1 representing 'b' and so on. To resolve the case if a table entry holds a pointer or an actual symbol value, we've added the value NT to all table indices before saving it as pointers. Therefore, in a table entry if the pointer value is smaller than NT, we know that it is a terminal node. The following piece of code shows how to decode such a table that represents a Huffman tree; a left mask and right mask are used to extract the correct pointer value.
//N = # of symbols
left_mask = 0xFF00; //to extract upper byte ( left child )
right_mask = 0x00FF; //to extract lower byte ( right child )
loc = ( N - 1 ) + N; //start from root; add offset N
// to distinguish pointers from symbols
do {
loc0 = loc - N; //loc0 is real table location
if ( read_one_bit( fp ) == 0 ){ //a 0, go left
loc = ( htree[loc0] & left_mask ) >> 8;
} else{
loc = htree[loc0] & right_mask;
}
} while ( loc >= N ); //traverse until reaches leaf
return loc; //symbol value = loc
|
The Huffman coding process has a disadvantage that the statistics of the occurrence of the symbols must be known ahead of the encoding process. Though we do not need to transmit the probability table to the decoder, we do need it before we can do any encoding. The probability table for a large video cannot be calculated until after the video data have been processed which may introduce unacceptable delay into the encoding process. Because of these, practical video coding standards define sets of codewords based on the probability distributions of generic video data. The following example is a pre-calculated Huffman table taken from MPEG-4 Visual ( Simple Profile ), which uses 3D run-level coding discussed above to encode quantized coefficients. A total of 102 specific combinations of ( run, level, last ) have variable-length codewords assigned to them and part of these are shown in Table 4. Each codeword can be up to 13 bits long and the last bit is the sign bit 's', which indicates if the decoded coefficient is positive ( 0 ) or negative ( 1 ). Any (run, level, run ) combination that is not listed in the table is coded using an escape sequence; a special ESCAPE code of 0000011 is first transmitted followed by a 13-bit fixed-length codeword describing the values of run, level, and last.
| Table 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We shall use this table in our entropy-encoding stage. The full implementation of the Huffman encoding and decoding for our video codec will be discussed in the next section.