| Games and SDL SDL Installation SDL for Embedded SDL API SDL Events | SDL Graphics SDL Threads Thread Example SDL Animation SDL Sound | Raw Video Player Video Formats Video Compression | Game Trees About The Author |
Video Compression
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
In previous sections, we discussed that by transforming the representation of an image from RGB to 4:2:0 YCbCr format, which separates the intensity from color components, we can easily compress an image by a factor of two without much downgrading of the image quality. The underlying principles in this stage of compression is that human eyes are more sensitive to brightness than to color and in practice, we do not need to retain as much information of the color components as we need in presenting an image. However, even after this change in format, we still represent the image in the spatial domain, where a sample value depends on the positon of the two-dimensional space; that is, it is a function of the coordinates (x, y) of a two-dimensional plane. Our eyes do not have any crucial discrimination of a point at any special position in space and thus it is difficult for us to further pick the irrelevant information and throw it away if we need to. On the other hand, if we could represent the image in the frequency domain, we know that our eyes are not very sensitive to high frequency components and if we have to get rid of any information, we would like to get rid of those components first. It turns out that we can indeed transform an image from the spatial domain to the frequency domain with two well-studied and researched methods called Discrete Cosine Transform ( DCT ) and Discrete Wavelet Transform (DWT). Researchers have found that in most cases DWT works best for static image compression while DCT works best for video compression. One characteristics of DCT is that it has good 'enery' compact capability, meaning that it clusters information into a small region. After DCT, the next step in compressing an image is to quantize the DCT coefficients and reorder the quantized coefficients to make them easier to compress. We summarize the compression steps that we have been discussing in the following figure.
The Discrete Cosine Transform ( DCT ) is a derivation of Fourier analysis, which is often used to decompose continuous, periodic signals into frequency components. The Discrete Fourier Transform ( DFT ), makes use of sine and cosine basis functions to deal with sampled signals. DCT only has cosine basis, which is an even function, meaning that cos ( -x ) = cos ( x ). It turns out that DCT is effective in handling images. In image processing, DCT operates on X, a block of N x N image samples to create an N x N block of coefficients Y. A coefficient Y(u, v) of Y at coordinates ( u, v ) can be calculated from sample values X(i, j) of X by the following formula.
DCT:
| Y(u, v) = auav |
N-1
i=0 |
N-1
j=0 |
X(i, j) cos |
| cos |
|
( 10.1 ) |
The samples values X(i, j) can be recovered from Y(u, v) with Inverse DCT ( IDCT ) according to the following formula.
IDCT ( Inverse DCT ) :
| X(i, j) = |
N-1
u=0 |
N-1
v=0 |
auavY(u, v) cos |
| cos |
|
( 10.2 ) |
The DCT and IDCT can be also expressed in matrix form as follows.
DCT:
IDCT:
where X is a matrix of samples, Y is a matrix of coefficients, and AT is the transpose of A = an N x N transform matrix whose elements are:
| Aui = | aucos |
|
For example, when N = 4, the transform matrix is given by
| A = | |
a a a a
b c -c -b a -a -a a c -b b -c |
|
where
b = (1/√2) cos ( π/8) c = (1/√2) cos ( 3π/8) |
Evaluating the cosines gives
| A = | |
|
|
| Y = | |
a a a a b c -c -b a -a -a a c -b b -c |
|
.X. | |
a b a c a c -a -b a -c -a -b a -b a -c |
|
| Y = | |
1 1 1 1 1 d -d -1 1 -1 -1 1 d -1 1 -d |
|
.X. | |
1 1 1 d 1 d -1 -1 1 -d -1 1 d -1 1 -d |
|
⊗ | |
a2 ab a2 ab ab b2 ab b2 a2 ab a2 ab ab b2 ab b2 |
|
As we discussed in previous sections, our sample blocks are 8x8. So in the following discussions of DCT, we consider N = 8 most of the time.
A direct implementation of DCT and IDCT is very simple and easy. All we need to do is to have two for-loops to do summations. However, such an implementation is inefficient and impractical in image or video compression, which require numerous operations of DCT and IDCT. Nevertheless, to get a feeling of how DCT works, we present dct_direct.cpp below, which uses a direct implementation of DCT and IDCT with floating point computation. In the program, the functions dct_direct() and idct_direct() do the actual work of DCT, and IDCT; they use floating point ( double ) in calculations; the functions dct() and idct() simply cast short values to double and call dct_direct() or idct_direct() to do the transformations. This program can be used for checking when we later implement DCT and IDCT using other more efficient methods.
/*
dct_direct.cpp
A straight forward implementation of DCT and IDCT for the purpose of learning and testing.
Floating-point arithmetic is used. Such an implementation should not be used
in practical applications.
Compile: g++ -o dct_direct dct_direct.cpp -lm
Execute: ./dct_direct
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
using namespace std;
#define PI 3.141592653589
//input: X, N; output: Y
short dct_direct( short N, double *X, double *Y )
{
double a[32], sum, coef;
short i, j, u, v;
if ( N > 32 || N <= 0 ) {
printf ("\ninappropriate N\n");
return -1;
}
a[0] = sqrt ( 1.0 / N );
for ( i = 1; i < N; ++i ) {
a[i] = sqrt ( 2.0 / N );
}
for ( u = 0; u < N; ++u ) {
for ( v = 0; v < N; ++v ) {
sum = 0.0;
for ( i = 0; i < N; ++i ) {
for ( j = 0; j < N; ++j ) {
coef = cos ((2*i+1)*u*PI / (2*N)) * cos ((2*j+1)*v*PI/(2*N));
sum += *(X+i*N+j) * coef; //X[i][i] * coef
} //for j
*(Y+u*N+v) = a[u] * a[v] * sum;
} //for i
} //for u
} //for v
return 1;
}
//input: N, Y; output X
short idct_direct( short N, double *Y, double *X )
{
double a[32], sum, coef;
short i, j, u, v;
if ( N > 32 || N <= 0 ) {
printf ("\ninappropriate N\n");
return -1;
}
a[0] = sqrt ( 1.0 / N );
for ( i = 1; i < N; ++i ) {
a[i] = sqrt ( 2.0 / N );
}
for ( i = 0; i < N; ++i ) {
for ( j = 0; j < N; ++j ) {
sum = 0.0;
for ( u = 0; u < N; ++u ) {
for ( v = 0; v < N; ++v ) {
coef = cos ((2*j+1)*v*PI/(2*N)) * cos ((2*i+1)*u*PI / (2*N));
sum += a[u] * a[v] *( *(Y+u*N+v) ) * coef; //a[u] * a[v] * Y[u][v] * coef
} //for j
*(X+i*N+j) = sum;
} //for i
} //for u
} //for v
return 1;
}
/*
change values from short to double and vice versa.
*/
short dct ( short N, short *X, short *Y )
{
double tempx[1024], tempy[1024];
int total, i;
if ( N > 32 || N <= 0 ) {
printf ("\ninappropriate N\n");
return -1;
}
total = N * N;
for ( i = 0; i < total; ++i ) {
tempx[i] = (double) *(X+i);
}
dct_direct ( N, tempx, tempy ); //DCT operation
for ( i = 0; i < total; ++i ) {
*(Y+i) = (short ) ( floor (tempy[i]+0.5) );
}
return 1;
}
/*
change values from short to doulbe, and vice versa.
*/
short idct ( short N, short *Y, short *X )
{
double tempx[1024], tempy[1024];
int total, i;
if ( N > 32 || N <= 0 ) {
printf ("\ninappropriate N\n");
return -1;
}
total = N * N;
for ( i = 0; i < total; ++i ) {
tempy[i] = (double) *(Y+i);
}
idct_direct ( N, tempy, tempx ); //IDCT operation
for ( i = 0; i < total; ++i ) {
*(X+i) = (short ) ( floor (tempx[i]+0.5) ); //rounding
}
return 1;
}
void print_elements ( short N, short *X )
{
short i, j;
for ( i = 0; i < N; ++i ){
printf("\n");
for ( j = 0; j < N; ++j ) {
printf ("%4d, ", *(X+N*i+j) );
}
}
}
int main()
{
short X[8][8], Y[8][8];
int i, j, x, y, N;
char temp[8][8];
N = 8;
//try some values for testing
for ( i = 0; i < N; ++i ) {
for ( j = 0; j < N; ++j ) {
X[i][j] = i + j;
}
}
printf("\nOriginal sample values");
print_elements ( N, &X[0][0] );
printf("\n--------------------\n");
dct ( N, &X[0][0], &Y[0][0] ); //performing DCT
printf("\nCoefficients of DCT:");
print_elements ( N, &Y[0][0] );
printf("\n--------------------\n");
idct ( N, &Y[0][0], &X[0][0] ); //performing IDCT
printf("\nValues recovered by IDCT:");
print_elements ( N, &X[0][0] );
printf("\n");
}
|
The following is the output from the program.
Original sample values 0, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 11, 5, 6, 7, 8, 9, 10, 11, 12, 6, 7, 8, 9, 10, 11, 12, 13, 7, 8, 9, 10, 11, 12, 13, 14, -------------------- Coefficients of DCT: 56, -18, 0, -2, 0, -1, 0, 0, -18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -------------------- Values recovered by IDCT: 0, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 10, 4, 5, 6, 7, 8, 9, 10, 11, 5, 6, 7, 8, 9, 10, 11, 12, 6, 7, 8, 9, 10, 11, 12, 13, 7, 8, 9, 10, 11, 12, 13, 14, |
You can see from the output that there are only a few nonzero coefficient values after DCT and they are clustered at the upper left corner. So after DCT, it becomes clear to us that the sample block X actually does not contain as much information as it appears and it is a lot easier to carry out data compression in the transformed domain.
The value of the DCT coefficient at position (0, 0), Y(0, 0) is in general referred to as the DC value and others are referred to as AC values. This is because Y(0, 0) is essentially the average of all the sample values and you can easily see this if you write down the formula for calculating its value ( note that a0 * a0 = 1 / N, and cos ( 0 ) = 1. ):
| Y(0, 0) =( 1/N ) |
N-1
i=0 |
N-1
j=0 |
X(i, j) |
As DCT is so important in signal processing, a lot of research has been done to speed up its calculations. One main idea behind fast DCT is to break down the summation into stages and in each stage, intermediates sums of two quantities are formed, which will be used in later stages to obtain the final sum. If we consider only small values of N in the form of N = 2n, it is not difficult to understand how this is done. For instance, consider N = 8, which is what we need in our video compression. We can rewrite ( 10.1 ) as follows.
| Y(u, v) = auav |
7
i=0 |
7
j=0 |
X(i, j) cos |
| cos |
|
( 12.1 ) |
| Y(u, v) = auav |
7
i=0 |
yiv cos | ( (2i+1)uπ/16 ) | ( 12.2 ) |
where
| yiv = |
7
j=0 |
X(i, j) cos | ( (2j+1)vπ/16 ) | ( 12.3 ) |
| x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | |
|---|---|---|---|---|---|---|---|---|
| y0 = | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| y1 = | cos (1θ) | cos (3θ) | cos (5θ) | cos (7θ) | cos (9θ) | cos (11θ) | cos (13θ) | cos (15θ) |
| y2 = | cos (2θ) | cos (6θ) | cos (10θ) | cos (14θ) | cos (18θ) | cos (22θ) | cos (26θ) | cos (30θ) |
| y3 = | cos (3θ) | cos (9θ) | cos (15θ) | cos (21θ) | cos (27θ) | cos (33θ) | cos (39θ) | cos (45θ) |
| y4 = | cos (4θ) | cos (12θ) | cos (20θ) | cos (28θ) | cos (36θ) | cos (44θ) | cos (52θ) | cos (60θ) |
| y5 = | cos (5θ) | cos (15θ) | cos (25θ) | cos (35θ) | cos (45θ) | cos (55θ) | cos (65θ) | cos (75θ) |
| y6 = | cos (6θ) | cos (18θ) | cos (30θ) | cos (42θ) | cos (54θ) | cos (66θ) | cos (78θ) | cos (90θ) |
| y7 = | cos (7θ) | cos (21θ) | cos (35θ) | cos (49θ) | cos (63θ) | cos (77θ) | cos (91θ) | cos (105θ) |
Since θ = π/16, we have 16θ = π; by making use of some basic cosine properties such as cos ( π - α ) = -cos ( α ) and cos ( 2π - α ) = cos ( α ), we can always reduce cos ( nθ ) when n ≥ 8 to a form of ±cos ( kθ ) with k < 8. For example,
Table 1 can then be simplified to Table 2 shown below.
| x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | |
|---|---|---|---|---|---|---|---|---|
| y0 = | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| y1 = | cos (1θ) | cos (3θ) | cos (5θ) | cos (7θ) | -cos (7θ) | -cos (5θ) | -cos (3θ) | -cos (1θ) |
| y2 = | cos (2θ) | cos (6θ) | -cos (6θ) | -cos (2θ) | -cos (2θ) | -cos (6θ) | cos (6θ) | cos (2θ) |
| y3 = | cos (3θ) | -cos (7θ) | -cos (1θ) | -cos (5θ) | cos (5θ) | cos (1θ) | cos (7θ) | -cos (3θ) |
| y4 = | cos (4θ) | -cos (4θ) | -cos (4θ) | cos (4θ) | cos (4θ) | -cos (4θ) | -cos (4θ) | cos (4θ) |
| y5 = | cos (5θ) | -cos (1θ) | cos (7θ) | cos (3θ) | -cos (3θ) | -cos (7θ) | cos (1θ) | -cos (5θ) |
| y6 = | cos (6θ) | -cos (2θ) | cos (2θ) | -cos (6θ) | -cos (6θ) | cos (2θ) | -cos (2θ) | cos (6θ) |
| y7 = | cos (7θ) | -cos (5θ) | cos (3θ) | -cos (1θ) | cos (1θ) | -cos (3θ) | cos (5θ) | -cos (7θ) |
We observe in Table 2 that there are some symmetries in the columns; besides the first row, whenever cos ( kθ ) appears in an i-th column, it also appears in another j-th column with i + j = 7. This implies that we can always group the i-th and j-th column together in our summing operations to compute ( xi ± xj ) cos ( kθ ). For example, we can rewrite y0 and y2 as follows, where we have abbreviated cos ( kθ ) as ck.
| y0 | = | ( x0 + x7 ) | + | ( x1 + x6 ) | + | ( x2 + x5 ) | + | ( x3 + x4 ) | |
| y2 | = | ( x0 + x7 )c2 | + | ( x1 + x6 )c6 | - | ( x2 + x5 )c6 | - | ( x3 + x4 )c2 |
| y0 | = | ( x'0 + x'3 ) | + | ( x'1 + x'2 ) | = | ( x''0 + x''1 ) |
| y2 | = | ( x'0 - x'3 )c2 | + | ( x'1 - x'2 )c6 | = | x''3c2 + x''2c6 |
It is obvious that y4 and y6 can be decomposed in the same way as y4 only involves c4 and y6 only involves c2 and c6. The other yk's ( y1, y3,y5,and y7 ) appear to be more difficult to decompose as four ci's are involved ( c1, c3,c5, and c7 ). However, it turns out that the coefficients can be expressed in terms of each other by making use of the cosine properties. For example,
Making use of these identities, we can similarly apply the decomposition steps to all the yi's. For example, we can express y1 as
| y1 | = | ( x0 - x7 )c1 + ( x1 - x6 )c3 + ( x2 - x5 )c5 + ( x3 - x4 )c7 |
| = | x'7c1 + x'6c3 + x'5c5 + x'4c7 | |
| = | x'7c1 + x'6( c1 + c7 ) / √2 + x'5 ( c1 - c7 ) / √2 + x'4c7 | |
| = | [ x'7 + ( x'6 + x'5 ) / √2 ] c1 + [x'4 + ( x'6 - x'5 ) / √2 ] c7 | |
| = | [ x'7 + x''6 ] c1 + [x'4 + x''5 ] c7 | |
| = | x'''7 c1 + x'''4 c7 |
|
c = ( a + b )
d = ( a - b ) |
( 1 ) |
If we include the constants ak in our calculation of yk and let
| y2 | = | x''2C6 + x''3C2 |
| = | [ x'1 - x'2 ]C6 + [ x'0 - x'3 ]C2 | |
| = | [ ( x1 + x6 ) - ( x2 + x5 ]C6 + [ ( x0 + x7 ) - ( x3 + x4 ) ]C2 |
The following piece of codes shows how y[k]'s are computed from x[k]'s.
for (j = 0; j < 4; j++) { //first stage transform, see flow-graph
j1 = 7 - j;
x1[j] = x[j] + x[j1];
x1[j1] = x[j] - x[j1];
}
x[0] = x1[0] + x1[3]; //second stage transform
x[1] = x1[1] + x1[2];
x[2] = x1[1] - x1[2];
x[3] = x1[0] - x1[3];
x[4] = x1[4];
x[5] = ( x1[6] - x1[5] ) * C0;
x[6] = ( x1[6] + x1[5]) * C0
x[7] = x1[7];
y[0] = ( x[0] + x[1] ) * C4; //upper-half of third (final) stage, see flow-graph
y[4] = ( x[0] - x[1] ) * C4;
y[2] = x[2] * C6 + x[3] * C2;
y[6] = x[3] * C6 - x[2] * C2;
x1[4] = x[4] + x[5]; //lower-half of third stage
x1[5] = x[4] - x[5];
x1[6] = x[7] - x[6];
x1[7] = x[7] + x[6];
y[1] = x1[4] * C7 + x1[7] * C1; //lower-half of fourth ( final ) stage
y[7] = x1[7] * C7 - x1[4] * C1;
y[5] = x1[5] * C3 + x1[6] * C5;
y[3] = x1[6] * C3 - x1[5] * C5;
|
This code works a lot faster than the direct implementation of DCT. Moreover, we can make further improvement if we use integer-arithmetic in the calculations.
In practice, we need to use integer arithmetic to implement a Fast DCT as that will speed up the code significantly. To see how this works, without loss of generality, let us consider 16-bit integers and consider a simple example with a number x = 2.75. We cannot express this number as an integer directly, but we can imagine that there is a binary point at the right side of bit 0 of a 16-bit integer. In this imaginary format, x can be represented as a bit vector with imaginary digits ( i.e. 2.75 = 21 + 2-1 + 2-2 ) depicted in the following diagram.
| bit position: | 9 8 7 6 5 4 3 2 1 0 | |
| x = | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 | .1 1 |
Of course, a 16-bit integer does not have the part shown in pink. To retain the information of the part in pink, we can shift the whole thing left by 8 bits, which is the same as multiplying 2.75 by 256 ( = 28 ). After this shift, the binary point will be at position between bit 8 and bit 7 and the number becomes an integer shown below.
| bit position: | 9 8 7 6 5 4 3 2 1 0 |
| x' = | 0 0 0 0 0 0 1 0 . 1 1 0 0 0 0 0 0 |
The value of x becomes
which is the same as 2.75 x 256. If we now divide x' by 256, or right-shift it by 8 bits, we obtain the integer 2; that is, we have truncated the pink digits, the fraction part of 2.75. In many cases, we would prefer to obtain a value of 3 after the shift operations. Such an operation is the same as a round operation. We can achieve this by first adding 0.5 to 2.75 before the truncation. If we express 0.5 in our imaginary format and left-shift it by 8 bits, we obtain the value 128 ( = 27 ). So adding 0.5 to x corresponds to the action of adding 27 to x'. This is illustrated in the following figure.
| bit position: | 9 8 7 6 5 4 3 2 1 0 |
| x' = | 0 0 0 0 0 0 1 0 . 1 1 0 0 0 0 0 0 |
| + 0.5 x 27 = | 0 0 0 0 0 0 0 0 . 1 0 0 0 0 0 0 0 |
| ----------------------------------------------------- | |
| x'' = | 0 0 0 0 0 0 1 1 . 0 1 0 0 0 0 0 0 |
Obviously, when we right-shift x'' by 8 bits, we obtain our desired value 3. So in general, if we multiply real positive numbers by 2n to convert them to integers and we want to achieve the rounding effect, we can add the value 2n-1 to the results before shifting them right n bits. The above discussion covers positive numbers. How are about negative numbers? Should we still add 0.5 to the a number or should we subtract 0.5 from it before truncation? The following example tells us what we should do.
Most modern-day computers use two's complement to represent negative numbers. Binary numbers that can have negative values are referred to as signed numbers, otherwise they are referred to as unsigned numbers. In two's complement representation, if we right-shift an unsigned number one bit, a 0 is always shifted into the leftmost bit position. However, if we right-shift a negative signed-number, a 1 is shifted in. For example, consider the following piece code.
int main()
{
char sc = 0x82; //8-bit signed number
unsigned uc = 0x82; //8-bit unsigned number
sc >>= 1; //right-shift one bit
uc >>= 1; //right-shift one bit
printf("\nsigned shift: 0x%x", sc );
printf("\nunsigned shift: 0x%x", uc );
}
|
signed shift: 0xc1 unsigned shift: 0x41 |
Now consider the negative signed-number x = -2.75. Its 2s complement representation can be obtained by complementing all bits of the corresponding positive number ( 2.75 ) and adding 1 to the right most bit. Thus it has the following binary form.
| bit position: | 9 8 7 6 5 4 3 2 1 0 | |
| x = | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 | .0 1 |
| bit position: | 9 8 7 6 5 4 3 2 1 0 |
| x' = | 1 1 1 1 1 1 0 1 . 0 1 0 0 0 0 0 0 |
| + 0.5 x 27 = | 0 0 0 0 0 0 0 0 . 1 0 0 0 0 0 0 0 |
| ----------------------------------------------------- | |
| x'' = | 1 1 1 1 1 1 0 1 . 1 1 0 0 0 0 0 0 |
| bit position: | 9 8 7 6 5 4 3 2 1 0 |
| x'' >> 8 = | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 |
When we add two numbers using integer arithmetic, we must align the binary points of the two numbers before addition; if the first number has been left-shifted by n bits, the second one must also be shifted by the same amount. In our implementation of Fast DCT, we use 32-bit integers; all the coefficients Ci is pre-multiplied by 2048 ( = 211 ). After the calculations, we right-shift the integers to obtain the final results. The code of the implementation of Fast DCT is listed below.
#include <string.h>
#include <stdio.h>
#include <math.h>
#define PI 3.141592653589
const short shift = 10; //10 bits precision for fixed-point arithmetic
const short shift1 = 2 * shift; //at the final stage, values have been shifted twice
const int fac = 1 << shift ; //multiply all constants by 2^10 ( = 1024 )
const int delta = 1 << (shift-1) ; //for rounding adjustment ~ 0.5x2^10
const int delta1 = 1 << (shift1-1); //for final rounding adjustment ~ 0.5x2^20
const double a = PI / 16.0; //angle theta
//DCT constants; use integer-arithmetic.
const int c0 = (int) ( 1 / sqrt ( 2 ) * fac );
const int c1 = (int) ( cos ( a ) / 2 * fac );
const int c2 = (int) ( cos ( 2*a ) / 2 * fac );
const int c3 = (int) ( cos ( 3*a ) / 2 * fac );
const int c4 = (int) ( cos ( 4*a ) / 2 * fac );
const int c5 = (int) ( cos ( 5*a ) / 2 * fac );
const int c6 = (int) ( cos ( 6*a ) / 2 * fac );
const int c7 = (int) ( cos ( 7*a ) / 2 * fac );
/*
DCT function.
Input: X, array of 8x8, containing data with values in [0, 255].
Ouput: Y, array of 8x8 DCT coefficients.
*/
void dct(int *X, int *Y)
{
int i, j, j1, k;
int x[8], x1[8], m[8][8];
/*
Row transform
i-th row, k-th element
*/
for (i = 0, k = 0; i < 8; i++, k += 8) {
for (j = 0; j < 8; j++)
x[j] = X[k+j]; //data for one row
for (j = 0; j < 4; j++) { //first stage transform, see flow-graph
j1 = 7 - j;
x1[j] = x[j] + x[j1];
x1[j1] = x[j] - x[j1];
}
x[0] = x1[0] + x1[3]; //second stage transform
x[1] = x1[1] + x1[2];
x[2] = x1[1] - x1[2];
x[3] = x1[0] - x1[3];
x[4] = x1[4];
x[5] = ((x1[6] - x1[5]) * c0 + delta ) >> shift; //after multiplication, add delta for rounding,
x[6] = ((x1[6] + x1[5]) * c0 + delta ) >> shift; // shift-right to undo 'x fac' to line up
x[7] = x1[7]; // binary points of all x[i]
m[i][0] = (x[0] + x[1]) * c4; //upper-half of third (final) stage, see flow-graph
m[i][4] = (x[0] - x[1]) * c4;
m[i][2] = x[2] * c6 + x[3] * c2;
m[i][6] = x[3] * c6 - x[2] * c2;
x1[4] = x[4] + x[5]; //lower-half of third stage
x1[5] = x[4] - x[5];
x1[6] = x[7] - x[6];
x1[7] = x[7] + x[6];
m[i][1] = x1[4] * c7 + x1[7] * c1; //lower-half of fourth ( final ) stage
m[i][7] = x1[7] * c7 - x1[4] * c1;
m[i][5] = x1[5] * c3 + x1[6] * c5;
m[i][3] = x1[6] * c3 - x1[5] * c5;
} //for i
/*
At this point, coefficients of each row ( m[i][j] ) has been multiplied by 2^10.
We can undo the multiplication by << 10 here before doing the column transform.
However, as we are using int variables, which are 32-bit to do multiplications, we
can tolerate another multiplication of 2^10. So we delay our undoing until the end
of the vertical transform and we undo all left-shift operations by shifting
the results right 20 bits ( i.e. << 2 * 10 ).
*/
// Column transform
for (i = 0; i < 8; i++) { //eight columns
//consider one column
for (j = 0; j < 4; j++) { //first-stage operation
j1 = 7 - j;
x1[j] = m[j][i] + m[j1][i];
x1[j1] = m[j][i] - m[j1][i];
}
//second-stage operation
x[0] = x1[0] + x1[3];
x[1] = x1[1] + x1[2];
x[2] = x1[1] - x1[2];
x[3] = x1[0] - x1[3];
x[4] = x1[4];
//undo one shift for x[5], x[6] to avoid overflow
x1[5] = (x1[5] + delta) >> shift;
x1[6] = (x1[6] + delta) >> shift;
x[5] = (x1[6] - x1[5]) * c0;
x[6] = (x1[6] + x1[5]) * c0;
x[7] = x1[7];
m[0][i] = (x[0] + x[1]) * c4; //upper-half of third (final) stage, see flow-graph
m[4][i] = (x[0] - x[1]) * c4;
m[2][i] = ( x[2] * c6 + x[3] * c2 ) ;
m[6][i] = ( x[3] * c6 - x[2] * c2 );
x1[4] = x[4] + x[5]; //lower-half of third stage
x1[7] = x[7] + x[6];
x1[5] = x[4] - x[5];
x1[6] = x[7] - x[6];
m[1][i] = x1[4] * c7 + x1[7] * c1; //lower-half of fourth stage
m[5][i] = x1[5] * c3 + x1[6] * c5;
m[3][i] = x1[6] * c3 - x1[5] * c5;
m[7][i] = x1[7] * c7 - x1[4] * c1;
} // for i
//we have left-shift (multiplying constants) twice
for ( i = 0; i < 8; ++i ) {
for ( j = 0; j < 8; ++j ) {
*Y++ = ( m[i][j] + delta1 ) >> shift1; //round the value by adding delta
}
}
}
|
Once we have written the code for DCT, the implementation of IDCT becomes easy. All we need to do is to reverse the steps in the DCT program. In the flow graph shown in Figure 2, when we traverse from left to right, we obtain the DCT; if we traverse from right to left, we obtain the IDCT. To understand why this is so, lets examine a butterfly computation, where we obtain (c, d) from (a, b). If we reverse the direction of the butterfly flow-graph, going from right to left, we recover (a, b) from (c, d) by
|
a = ( c + d ) / 2
b = ( c - d ) / 2 |
( 2 ) |
|
c = aCi - bCj
d = aCj + bCi |
( 3 ) |
|
( 4 ) |
You may now recognize that the operation of (4) is a rotation of iθ on a plane about the origin, which rotates the point (a, b) to the point (c, d). Obviously, the inverse of such a transformation is a rotation of -iθ which will bring the point (c, d) back to (a, b). That is,
|
( 5 ) |
|
a = cCi + dCj
b = -cCj + dCi |
( 6 ) |
/*
dct_video.cpp
An implementation of 8x8 DCT and IDCT for video compression.
32-bit integer-arithmetic is assumed.
For DCT operation:
dct ( int *X, int *Y );
X is the input pointing to an 8x8 array of samples; values must be within [0, 255]
Y is the output pointing to an 8x8 array of DCT coefficients.
For IDCT operation:
idct ( int *Y, int *X );
Y is the input pointing to an 8x8 array of DCT coefficients.
X is the output pointing to an 8x8 array of sample values.
For more detailed explanations, see
http://www.webkinesia.com/games/vcompress3.php
compile: g++ -c dct_video.cpp
To use them, include the the following header statements in your application.
void dct(int *X, int *Y);
void idct(int *Y, int *X);
*/
#include <string.h>
#include <stdio.h>
#include <math.h>
#define PI 3.141592653589
const short shift = 10; //10 bits precision for fixed-point arithmetic
const short shift1 = 2 * shift; //at the final stage, values have been shifted twice
const int fac = 1 << shift ; //multiply all constants by 2^10 ( = 1024 )
const int delta = 1 << (shift-1) ; //for rounding adjustment ~ 0.5x2^10
const int delta1 = 1 << (shift1-1); //for final rounding adjustment ~ 0.5x2^20
const double a = PI / 16.0; //angle theta
//DCT constants; use integer-arithmetic.
const int c0 = (int) ( 1 / sqrt ( 2 ) * fac );
const int c1 = (int) ( cos ( a ) / 2 * fac );
const int c2 = (int) ( cos ( 2*a ) / 2 * fac );
const int c3 = (int) ( cos ( 3*a ) / 2 * fac );
const int c4 = (int) ( cos ( 4*a ) / 2 * fac );
const int c5 = (int) ( cos ( 5*a ) / 2 * fac );
const int c6 = (int) ( cos ( 6*a ) / 2 * fac );
const int c7 = (int) ( cos ( 7*a ) / 2 * fac );
/*
DCT function.
Input: X, array of 8x8, containing data with values in [0, 255].
Ouput: Y, array of 8x8 DCT coefficients.
*/
void dct(int *X, int *Y)
{
int i, j, j1, k;
int x[8], x1[8], m[8][8];
/*
Row transform
i-th row, k-th element
*/
for (i = 0, k = 0; i < 8; i++, k += 8) {
for (j = 0; j < 8; j++)
x[j] = X[k+j]; //data for one row
for (j = 0; j < 4; j++) { //first stage transform, see flow-graph
j1 = 7 - j;
x1[j] = x[j] + x[j1];
x1[j1] = x[j] - x[j1];
}
x[0] = x1[0] + x1[3]; //second stage transform
x[1] = x1[1] + x1[2];
x[2] = x1[1] - x1[2];
x[3] = x1[0] - x1[3];
x[4] = x1[4];
x[5] = ((x1[6] - x1[5]) * c0 + delta ) >> shift; //after multiplication, add delta for rounding,
x[6] = ((x1[6] + x1[5]) * c0 + delta ) >> shift; // shift-right to undo 'x fac' to line up
x[7] = x1[7]; // binary points of all x[i]
m[i][0] = (x[0] + x[1]) * c4; //upper-half of third (final) stage, see flow-graph
m[i][4] = (x[0] - x[1]) * c4;
m[i][2] = x[2] * c6 + x[3] * c2;
m[i][6] = x[3] * c6 - x[2] * c2;
x1[4] = x[4] + x[5]; //lower-half of third stage
x1[5] = x[4] - x[5];
x1[6] = x[7] - x[6];
x1[7] = x[7] + x[6];
m[i][1] = x1[4] * c7 + x1[7] * c1; //lower-half of fourth ( final ) stage
m[i][7] = x1[7] * c7 - x1[4] * c1;
m[i][5] = x1[5] * c3 + x1[6] * c5;
m[i][3] = x1[6] * c3 - x1[5] * c5;
} //for i
/*
At this point, coefficients of each row ( m[i][j] ) has been multiplied by 2^10.
We can undo the multiplication by << 10 here before doing the vertical transform.
However, as we are using int variables, which are 32-bit to do multiplications, we
can tolerate another multiplication of 2^10. So we delay our undoing until the end
of the vertical transform and we undo all left-shift operations by shifting
the results right 20 bits ( i.e. << 2 * 10 ).
*/
// Column transform
for (i = 0; i < 8; i++) { //eight columns
//consider one column
for (j = 0; j < 4; j++) { //first-stage operation
j1 = 7 - j;
x1[j] = m[j][i] + m[j1][i];
x1[j1] = m[j][i] - m[j1][i];
}
//second-stage operation
x[0] = x1[0] + x1[3];
x[1] = x1[1] + x1[2];
x[2] = x1[1] - x1[2];
x[3] = x1[0] - x1[3];
x[4] = x1[4];
//undo one shift for x[5], x[6] to avoid overflow
x1[5] = (x1[5] + delta) >> shift;
x1[6] = (x1[6] + delta) >> shift;
x[5] = (x1[6] - x1[5]) * c0;
x[6] = (x1[6] + x1[5]) * c0;
x[7] = x1[7];
m[0][i] = (x[0] + x[1]) * c4; //upper-half of third (final) stage, see flow-graph
m[4][i] = (x[0] - x[1]) * c4;
m[2][i] = ( x[2] * c6 + x[3] * c2 ) ;
m[6][i] = ( x[3] * c6 - x[2] * c2 );
x1[4] = x[4] + x[5]; //lower-half of third stage
x1[7] = x[7] + x[6];
x1[5] = x[4] - x[5];
x1[6] = x[7] - x[6];
m[1][i] = x1[4] * c7 + x1[7] * c1; //lower-half of fourth stage
m[5][i] = x1[5] * c3 + x1[6] * c5;
m[3][i] = x1[6] * c3 - x1[5] * c5;
m[7][i] = x1[7] * c7 - x1[4] * c1;
} // for i
//we have left-shift (multiplying constants) twice
for ( i = 0; i < 8; ++i ) {
for ( j = 0; j < 8; ++j ) {
*Y++ = ( m[i][j] + delta1 ) >> shift1; //round the value by adding delta
}
}
}
/*
Implementation of idct() is to reverse the operations of dct().
We first do vertical transform and then horizontal; this is easier
for debugging as the operations are just the reverse of those in dct();
of course, it works just as well if you do the horizontal transform first.
So in this implementation, the first stage of idct() is the final stage of dct()
and the final stage of idct() is the first stage of dct().
*/
void idct(int *Y, int *X)
{
int j1, i, j, k;
int x[8], x1[8], m[8][8], y[8];
k = i = 0;
//column transform
for ( i = 0; i < 8; ++i ) {
for (j = 0; j < 8; j++)
y[j] = Y[i+8*j];
//print_x ( y );
//getchar();
x1[4] = y[1] * c7 - y[7] * c1; //lower-half final stage of dct
x1[7] = y[1] * c1 + y[7] * c7;
x1[6] = y[3] * c3 + y[5] * c5;
x1[5] = -y[3] * c5 + y[5] * c3;
x[4] = ( x1[4] + x1[5] ); //lower-half of third stage of dct
x[5] = ( x1[4] - x1[5] );
x[6] = ( x1[7] - x1[6] );
x[7] = ( x1[7] + x1[6] );
x1[0] = ( y[0] + y[4] ) * c4; //upper-half of third ( final ) stage of dct
x1[1] = ( y[0] - y[4] ) * c4;
x1[2] = y[2] * c6 - y[6] * c2;
x1[3] = y[2] * c2 + y[6] * c6;
x[0] = ( x1[0] + x1[3] ); //second stage of dct
x[1] = ( x1[1] + x1[2] );
x[2] = ( x1[1] - x1[2] );
x[3] = ( x1[0] - x1[3] );
//x[4], x[7] no change
x1[5] = ((x[6] - x[5]) * c0 + delta ) >> shift; //after multiplication, add delta for rounding,
x1[6] = ((x[6] + x[5]) * c0 + delta ) >> shift; // shift-right to undo 'x fac' to line up x[]s
x[5] = x1[5];
x[6] = x1[6];
for (j = 0; j < 4; j++) { //first stage transform of dct
j1 = 7 - j;
m[j][i] = (x[j] + x[j1] );
m[j1][i] = (x[j] - x[j1] );
}
} //for i
//row transform
for ( i = 0; i < 8; i++ ) {
for (j = 0; j < 8; j++)
y[j] = m[i][j] ; //data for one row
x1[4] = y[1] * c7 - y[7] * c1;
x1[7] = y[1] * c1 + y[7] * c7;
x1[6] = y[3] * c3 + y[5] * c5;
x1[5] = -y[3] * c5 + y[5] * c3;
x[4] = ( x1[4] + x1[5] ); //lower-half of third stage
x[5] = ( x1[4] - x1[5] );
x[6] = ( x1[7] - x1[6] );
x[7] = ( x1[7] + x1[6] );
x1[0] = ( y[0] + y[4] ) * c4;
x1[1] = ( y[0] - y[4] ) * c4;
x1[2] = y[2] * c6 - y[6] * c2;
x1[3] = y[2] * c2 + y[6] * c6;
//undo one shift for x[5], x[6] to avoid overflow
x1[5] = (x[5] + delta) >> shift;
x1[6] = (x[6] + delta) >> shift;
x[5] = (x1[6] - x1[5]) * c0;
x[6] = (x1[6] + x1[5]) * c0;
//x[4], x[7] no change
x[0] = ( x1[0] + x1[3] ); //second stage transform
x[1] = ( x1[1] + x1[2] );
x[2] = ( x1[1] - x1[2] );
x[3] = ( x1[0] - x1[3] );
for (j = 0; j < 4; j++) { //first stage transform, see flow-graph
j1 = 7 - j;
m[i][j] = (x[j] + x[j1]);
m[i][j1] = (x[j] - x[j1]);
}
}
//we have left-shift (multiplying constants) twice
for ( i = 0; i < 8; ++i ) {
for ( j = 0; j < 8; ++j ) {
*X++ = ( m[i][j] + delta1 ) >> shift1; //round the value by adding delta
}
}
}
|
We supply test_dct.cpp listed below for you to test the DCT and IDCT routines in dct_video.cpp listed above. Note that the sample values must be within the range [0, 255].
/*
test_dct.cpp
For testing dct() and idct() routines in dct_video.cpp.
Compile: g++ -o test_dct dct_video.o test_dct.cpp -lm
Execute: ./test_dct
*/
#include <stdio.h>
#include <string.h>
using namespace std;
void dct(int *X, int *Y);
void idct(int *Y, int *X);
void print_block( int *X )
{
int k = 0;
for ( int i = 0; i < 8; ++i ){
printf("\n");
for ( int j = 0; j < 8; ++j ) {
printf("%d, ", X[k++] );
}
}
}
int main()
{
int X[64], Y[64], i, j, k;
//some sample data
k = 0;
for ( i = 0; i < 8; ++i )
for ( j = 0; j < 8; ++j )
X[k++] = ( 3 * i *j + j + 1 );
printf("\nOriginal Data:");
printf("\n ------------------");
print_block ( X );
dct( X, Y );
printf("\n\nData after dct:");
print_block ( Y );
bzero ( X, 256 ); //clear X
idct ( Y, X );
printf("\n\nData recovered by idct:");
printf("\n ------------------");
print_block ( X );
printf("\n");
return 0;
}
|
The following is the output when you execute test_dct.
Original Data: ------------------ 1, 2, 3, 4, 5, 6, 7, 8, 1, 5, 9, 13, 17, 21, 25, 29, 1, 8, 15, 22, 29, 36, 43, 50, 1, 11, 21, 31, 41, 51, 61, 71, 1, 14, 27, 40, 53, 66, 79, 92, 1, 17, 33, 49, 65, 81, 97, 113, 1, 20, 39, 58, 77, 96, 115, 134, 1, 23, 45, 67, 89, 111, 133, 155, Data after dct: 330, -210, 0, -22, 0, -6, 0, -2, -191, 124, 0, 13, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, -20, 13, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -6, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, 0, 1, 0, 0, 0, 0, Data recovered by idct: ------------------ 1, 2, 3, 4, 5, 6, 7, 8, 1, 5, 9, 13, 17, 21, 25, 29, 1, 8, 15, 22, 29, 36, 43, 50, 1, 11, 21, 31, 41, 51, 61, 71, 1, 14, 27, 40, 53, 66, 79, 92, 1, 17, 33, 49, 65, 81, 97, 113, 1, 20, 39, 58, 77, 96, 115, 134, 1, 23, 45, 67, 89, 111, 133, 155, |