data compression link collection

Huffman Coding

David Huffman developed a form of encoding that creates the most efficient set of prefix codes for a given text. The ease with which Huffman codes can be created and used makes this still an extremely popular tool for compression code.

Huffman Coding Class

This version of file encoder and decoder program is based on the Huffman coding method. It explicitly demonstrates the details of the files during the encoding and decoding. The algorithm is encapsulated in a class En_Decode in standard C++.

* *      

Posted in June 6th, 2004

Basic Compression Library

Marcus Geelnard has created a batch of compression routines that you can plug and ply into your programs at will. Marcus is using the wonderfully open zlib license, which means thare are just about no reason you can’t use this code. The 1.0.5 added an LZ77 codec to the RLE, Huffman, and Rice coders

Satisfied user Todd W said: I needed a simple set of compression routines for use in an embedded system. I must be able to store a fair amount of information in a small EEPROM as a generic database. The Huffman coder works very well in the application and has met my needs exactly! Very nice!

* * * * *

Posted in May 14th, 2004

Compression and Encryption Sources

Links to a variety of lossless coders, includes source for Huffman, arithmetic, LZSS, and other compressors.

* * * * *

Posted in May 1st, 2004


BAR Archiver is a free, cross-platform file compression and archiving tool written in Java. It outperforms most of the popular file
compression tools available as commercial utilities.
Compression is based on Burrows Wheeler Transformation and modified Huffman entropy coder. The compression ratio is in par
with most of the commercial grade compression utilities.


Posted in April 11th, 2004

libhuffman - Huffman encoder/decoder library

libhuffman is a Huffman encoder/decoder library and a command line interface to the library. The encoder is a 2 pass encoder. The first pass generates a huffman tree and the second pass encodes the file. The decoder is one pass and uses a huffman code table at the beginning of the compressed file to decode the file.

Beta 3 shipped in October, 2003.

* * * *  

Posted in October 27th, 2003

Michael Dipperstein’s Huffman Code Page

Michael Dipperstein describes his personal quest for understanding an implementation of Huffman coding. Full source included.

The page was updated with new source December, 2002.

* * * * *

Posted in October 18th, 2003


A Huffman file compressor for *NIX platforms. The author admits that the compression is perhaps not the best, but says it’s going to be just right for embedded programming because of its miniscule footprint and high speed.

Version 1.01 was shipping in June, 2003.

* * *    

Posted in June 12th, 2003

Self-similar Huffman Trees with Extreme Guessing Propertie

An article by John Pliam. Needs a better summary, but I don’t quite get it, other than the fact that there is some cryptographic thinking going on here.


Posted in May 4th, 2003

C Library to search over compressed texts

A team from the University of Pisa in Italy have created a set of two libraries to demonstrate the ability to perform searches in compressed text. The code is packaged as two LGPL libraries: HuffwLib and CGrepLib, and sample programs are available on the site for demonstration purposes.

* * * * *

Posted in April 15th, 2003

K-ary Huffman Encoding

The definition from the NIST Dictionary of Algorithms and Data Structures.


Posted in April 5th, 2003

Shannon-Fano Coding

The definition from the NIST Dictionary of Algorithms and Data Structures.


Posted in April 5th, 2003

Canonical Huffman Coder Construction

The first time I ever heard the phrase Canonical Huffman Coder was in reference to the technique used in PKZip to store Huffman tables. I don’t know where the technique originated, but it is basically a way to construct a Huffman table so that the actual codes don’t have to be stored when storing the table. This makes for some nice space savings when compared to a first-pass naïve implementation. (Like the ones I’ve done in the past.)

It turns out that somebody named Gareth was attempting to implement this code but was having a bit of trouble. His post to comp.compression brought out some useful help from some of the newsgroup regulars, and did a lot to shed light on the topic, and includes a reference to the paper that actually gave birth to the concept.

* * * *  

Posted in March 13th, 2003

bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor

bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.

Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here


Posted in December 11th, 2002

McKee’s Directed Acyclic Graph Compression

Will McKee has released this as freeware - includes complete source to a string substitution compressor. From the description it sounds as though it’s variant on LZSS, but I’ll defer to anyone willing to do a real analysis.


Posted in December 9th, 2002

Will’s Huffman Demo

Will McKee wrote some Huffman code in C++. Take a look.

Update: Will reports that he has improved the documentation in this package, as well as adding a new function.

* * * *  

Posted in December 9th, 2002

Dynamic Huffman Coder

This dynamic Huffman coder from Karl Malbrain is written in C and includes weight scaling. It is modeled on the Vitter algorithm.

A user notes that this site has been undergoing continual changes, and perhaps would benefit from some sort of “last modified on” field.

* * * *  

Posted in November 20th, 2002

In-Place Calculation of Minimum-Redundancy Codes

The abstract for a paper on calculation of Huffman codes. The paper isn’t here, but the source code is. Alistair says that if you sort your array of counts, you can create the Canonical Huffman tree in memory.

* * * *  

Posted in October 31st, 2002

Text compression for embedded controllers

Some simple BASIC routines to compress data without extravagant data or code space. The author seems to indicate this isn’t Huffman coding, but doesn’t say what it is.


Posted in September 28th, 2002

A Method for the Construction of Minimum-Redundancy Codes

The famous paper by David Huffman, as originally published in the Proceedings of the I.R.E.

* * * * *

Posted in September 15th, 2002

UPL Compression : the complete professional toolkit

The UPL Compression Library is a high-performance professional compression library. It offers the ability to compress and decompress data, buffers, strings or single files and features the latest innovations in data compression. The library offers eight extremely powerful compression algorithms. Dynamic Huffman, Arithmetic, BWT, Ppm and several Lempel Ziv flavors. user John G. had this to say: I was looking for adding a better compression to my Visual Basic project and it worked like a charm. The compression ratio is really good, better than Zip!

* * * *  

Posted in August 27th, 2002

Parallel Lossless Image Compression Using Huffman and Arithmetic Coding

by P. G. Howard and J. S. Vitter. This paper shows how images can be encoded and decoded using parallel processing. Both Huffman and arithmetic coding are examined.

* * * * *

Posted in July 28th, 2002

Lossless Compression Algorithms (Entropy Encoding)

An overview of the basics, including Shannon-Fano, Huffman, Arithmetic coding, and a section on LZW for good measure.

* * *    

Posted in July 20th, 2002

Selective Extension Coding

Graham Fyffe proposes an alternative type of variable length coding that he claims will offer improved efficiency. Full description, no implementation.


Posted in July 18th, 2002

Fyffe Codes for Fast Codelength Approximation

Graham Fyffe proposes a variable length integer code that he says is easier to compute than Huffman codes.


Posted in July 18th, 2002

David Scott’s Bijective Static Entropy Compression

This compressor is designed to operate on English text - it has a static probability table configured for written English.

* * * * *

Posted in June 26th, 2002

Huffman Coding with Priority Queues

This article is really about using the priority queue containers that are part of the standard C++ library. The example program implements a Huffman Encoder using the queues, showing how they can do a fairly complex piece of work without too much coding on your part.

* * * * *

Posted in April 12th, 2002

Huffman Coding: A CS2 Assignment

Some good introductory explanation here.


Posted in April 8th, 2002

The shcodec Home Page

shcodec is order-0 32-bit canonical static huffman codec. It encodes an alphabet of 256 symbols with minimum-redundancy or length-restricted codes (basic method: Alistair Moffat and Jyrki Katajainen, modified by Artur A. Pessoa). shcodec uses efficient method for tree packing: on text files packed tree size is approx 68 bytes, on binary files this value is about 132 bytes. Memory requirements are very small: 1280 bytes for encoding and only 574 bytes for decoding! shcodec uses extremely fast and simple SHIFT-OR method for encoding, and CANONICAL-DECODE with a cache for small codewords for decoding.

Update: Alexander has added SH-SFX to the web page - a program for creating Win32 SFXs from files compressed with shcodec.

* * * *  

Posted in April 4th, 2002

Video and Audio Compression

Class notes on lossless compression algorithms. Quick info on Huffman, Adaptive Huffman, and LZW.

* * * *  

Posted in February 26th, 2002

The Huffman Compression Algorithm

Yet another description of Huffman coding. Comes with a couple of lovely diagrams and some Delphi code.

* * * *  

Posted in February 25th, 2002