***Welcome to ashrafedu.blogspot.com * * * This website is maintained by ASHRAF***

Tuesday 26 November 2019

Huffman codes


Huffman codes

Huffman codes compress data very effectively: savings of 20% to 90% are typical, depending on the 
characteristics of the data being compressed.

There are many options for how to represent such a file of information. 

Consider the problem of designing a binary character code in which each character is represented by
a unique binary string, called a code word. If the code word of fixed length code words are used, 
more bits are necessary to store compared to variable length code words.

Huffman codes uses Variable length code words for data compression. A variable-length code can 
do considerably better than a fixed-length code, by giving frequent characters short code words and 
infrequent characters long code words. These types of variable-length codes are called prefix codes. 
A prefix code can always achieve the optimal data compression among any character code. 
Prefix codes are desirable because they simplify decoding. Since no code word is a prefix of any other, the code
word that begins an encoded file is unambiguous.
Above figure is the tree corresponding to the fixed-length code of three bits. Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the frequencies of the leaves in its sub tree.
Above figure is the tree corresponding to the optimal prefix code. An optimal code for a file is always represented by a full binary tree.
Binary code word for a character is the path from the root to that character where 0 means "go to left child"and 1 means "go to right child".

Ex: file contains "aabe" then the encoding is 0.0.101.1101, where '. ' denotes concatenation.  Decoding is also simple, since prefix code is used no other code word is prefix of any other code word.

Given a tree T corresponding to a prefix code, the number of bits required to encode a file can be easily computed. For each character ‘c’ in the alphabet C, let the attribute c.freq denote the frequency of ‘c’ in the file and let dT(c) denote the depth of c’s leaf in the tree.
The number of bits required to encode a file (cost of the tree T) is thus
Constructing a Huffman code

The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. It begins with a set of |C| leaves and performs a sequence of |C| - 1 “merging” operations to create the final tree. 
The algorithm uses a min-priority queue Q, keyed on the freq attribute, to identify the two least-frequent objects to merge together.













Line 2 initializes the min-priority queue Q with the characters in C. The for loop in lines 3–8 repeatedly extracts the two nodes x and y of lowest frequency from the queue, replacing them in the queue with a new node ‘z ‘representing their merger. The frequency of z is computed as the sum of the frequencies of x and y in line 7. The node z has x as its left child and y as its right child. After n - 1 mergers, line 9 returns the one node left in the queue, which is the root of the code tree.


The total running time of HUFFMAN on a set of n characters is O(n lg n).

Example:
Consider the characters in the file occur with the frequencies given by 
Arrange the frequencies in monotonically increasing order in a queue
Choose the two minimum frequencies and merge them
Now queue becomes
By repeating the same process, we get

choose two min frequencies and merge
By repeating the same process, we get

choose two min frequencies and merge













No comments:

Post a Comment