Huffman codes: examples, applications
At the moment, few people think about the fact,how compression works. Compared to the past, using a personal computer has become much easier. And practically every person working with the file system uses archives. But few people think about how they work and on what principle is the compression of files. The very first version of this process was the Huffman codes, and they are still used in various popular archivers. Many users do not even think how easy it is to compress the file and according to which scheme it works. In this article, we'll look at how compression is done, what nuances help to speed up and simplify the encoding process, and we'll figure out what the principle of constructing a coding tree is.
History of the algorithm
The very first algorithm for an effectivecoding of electronic information was the code proposed by Huffman in the middle of the twentieth century, namely in 1952. It is currently the main basic element of most programs created to compress information. At the moment, one of the most popular sources using this code are ZIP, ARJ, RAR archives and many others.This Huffman algorithm is also used forcompression of JPEG-images and other graphic objects. Well, all modern fax machines also use coding, invented in 1952. Despite the fact that since the creation of the code so much time has passed, to this day it is used in the newest shells and on equipment of old and modern types.
The principle of efficient coding
The basis for the Huffman algorithm is a scheme,It allows to replace the most probable, most frequently encountered symbols with codes of a binary system. And those that are less common are replaced with longer codes. The transition to long Huffman codes occurs only after the system uses all the minimum values. This technique allows you to minimize the length of the code for each character of the original message as a whole.An important point is that in the beginningencoding the probability of occurrence of letters should already be known. It is from these that the final message will be compiled. Based on these data, the Huffman code tree is constructed, on the basis of which the process of encoding letters in the archive will be carried out.
Huffman's code, example
To illustrate the algorithm, let us takea graphic version of the construction of a code tree. To use this method was effective, it is worthwhile to clarify the definition of some values necessary for the concept of this method. The set of arcs and nodes that are directed from node to node is usually called a graph. The tree itself is a graph with a set of certain properties:
- in each node can enter no more than one of the arcs;
- one of the nodes must be the root of the tree, that is, no arc should enter it at all;
- if from the root to start moving along arcs, this process should allow to get completely into any of the nodes.
There is also such a concept, which is included in the codesHuffman, like a leaf of a tree. It is a node from which no arc should escape. If two nodes are connected by an arc, then one of them is the parent, the other child, depending on which node the arc is coming from, and which one it is in. If two nodes have the same parent node, they are usually called fraternal nodes. If, in addition to the leaves, there are several arcs in the nodes, this tree is called binary. This is exactly the tree of Huffman. The peculiarity of the nodes of this construction is that the weight of each parent is equal to the sum of the weight of all its nodal children.
Algorithm for constructing a tree according to Huffman
The construction of the Huffman code is made from lettersof the input alphabet. A list of those nodes that are free in the future code tree is created. The weight of each node in this list should be the same as the probability of occurrence of the letter of the message corresponding to this node. In this case, among the few free nodes of the future tree is chosen one that weighs least. At the same time, if the minimum indicators are observed in several nodes, then it is possible to choose freely any of the pairs.Then the creation of the parentnode, which should weigh as much as the sum of this pair of nodes weighs. After this, the parent is sent to the list with free nodes, and the children are deleted. At the same time, the arcs receive corresponding indices, ones and zeros. This process is repeated exactly as long as necessary to leave only one node. After that, binary numbers are written down from top to bottom.
Improving compression efficiency
To increase the compression efficiency, it is necessary tothe time for building a code tree to use all the data regarding the probability of letters appearing in a particular file attached to a tree, and not to allow them to be scattered over a large number of text documents. If you first walk through this file, you can immediately calculate the statistics of how often letters from an object to be compressed are encountered.
Acceleration of the compression process
To speed up the algorithm, the definition of lettersIt is necessary to carry out not on indicators of probability of occurrence of this or that letter, and on frequency of its occurrence. Thanks to this, the algorithm becomes simpler, and work with it is greatly accelerated. This also avoids the operations associated with floating commas and division.In addition, working in this mode, dynamicThe Huffman code, or rather the algorithm itself, is not subject to any changes. This is mainly due to the fact that the probabilities are directly proportional to the frequencies. It is worth paying special attention to the fact that the final weight of the file or the so-called root node will be equal to the sum of the number of letters in the object to be processed.
Huffman's codes - simple and long-establishedalgorithm, which is still used by many well-known programs and companies. Its simplicity and clarity allow to achieve effective results of compression of files of any volumes and to considerably reduce the space occupied by them on the storage disk. In other words, the Huffman algorithm is a long-studied and well-designed scheme, the relevance of which does not decrease to this day.And thanks to the ability to reduce the size of files,their transmission through the network or in other ways becomes more simple, quick and convenient. Working with the algorithm, you can compress absolutely any information without harming its structure and quality, but with the maximum effect of reducing the weight of the file. In other words, Huffman code coding was and remains the most popular and actual method of file size compression.