David Huffman developed a form of encoding that creates the most efficient set of prefix codes for a given text. The ease with which Huffman codes can be created and used makes this still an extremely popular tool for compression code.
Huffman Coding Class
This version of file encoder and decoder program is based on the Huffman coding method. It explicitly demonstrates the details of the files during the encoding and decoding. The algorithm is encapsulated in a class En_Decode in standard C++.
http://codeproject.com/cpp/HandyHuffmanCoding.asp
Basic Compression Library
Marcus Geelnard has created a batch of compression routines that you can plug and ply into your programs at will. Marcus is using the wonderfully open zlib license, which means thare are just about no reason you can’t use this code. The 1.0.5 added an LZ77 codec to the RLE, Huffman, and Rice coders
Satisfied user Todd W said: I needed a simple set of compression routines for use in an embedded system. I must be able to store a fair amount of information in a small EEPROM as a generic database. The Huffman coder works very well in the application and has met my needs exactly! Very nice!
Compression and Encryption Sources
Links to a variety of lossless coders, includes source for Huffman, arithmetic, LZSS, and other compressors.
http://www.larkin.lanit.ru/compress/compr.htm
BAR
BAR Archiver is a free, cross-platform file compression and archiving tool written in Java. It outperforms most of the popular file
compression tools available as commercial utilities.
Compression is based on Burrows Wheeler Transformation and modified Huffman entropy coder. The compression ratio is in par
with most of the commercial grade compression utilities.
http://fermatjen.tripod.com/bar/
libhuffman - Huffman encoder/decoder library
libhuffman is a Huffman encoder/decoder library and a command line interface to the library. The encoder is a 2 pass encoder. The first pass generates a huffman tree and the second pass encodes the file. The decoder is one pass and uses a huffman code table at the beginning of the compressed file to decode the file.
Beta 3 shipped in October, 2003.
http://huffman.sourceforge.net/
Michael Dipperstein’s Huffman Code Page
Michael Dipperstein describes his personal quest for understanding an implementation of Huffman coding. Full source included.
The page was updated with new source December, 2002.
http://michael.dipperstein.com/huffman/index.html
HFFzip
A Huffman file compressor for *NIX platforms. The author admits that the compression is perhaps not the best, but says it’s going to be just right for embedded programming because of its miniscule footprint and high speed.
Version 1.01 was shipping in June, 2003.
http://utenti.quipo.it/claudioscordino/hffzip.html
Self-similar Huffman Trees with Extreme Guessing Propertie
An article by John Pliam. Needs a better summary, but I don’t quite get it, other than the fact that there is some cryptographic thinking going on here.
http://www.ima.umn.edu/~pliam/doc/huff/
C Library to search over compressed texts
A team from the University of Pisa in Italy have created a set of two libraries to demonstrate the ability to perform searches in compressed text. The code is packaged as two LGPL libraries: HuffwLib and CGrepLib, and sample programs are available on the site for demonstration purposes.
http://butirro.di.unipi.it/~ferrax/CompressedSearch/
K-ary Huffman Encoding
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/karyHuffman.html
Shannon-Fano Coding
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/shannonFano.html
Canonical Huffman Coder Construction
The first time I ever heard the phrase Canonical Huffman Coder was in reference to the technique used in PKZip to store Huffman tables. I don’t know where the technique originated, but it is basically a way to construct a Huffman table so that the actual codes don’t have to be stored when storing the table. This makes for some nice space savings when compared to a first-pass naïve implementation. (Like the ones I’ve done in the past.)
It turns out that somebody named Gareth was attempting to implement this code but was having a bit of trouble. His post to comp.compression brought out some useful help from some of the newsgroup regulars, and did a lot to shed light on the topic, and includes a reference to the paper that actually gave birth to the concept.
bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor
bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.
Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here
http://stl.caltech.edu/bwtzip.html
McKee’s Directed Acyclic Graph Compression
Will McKee has released this as freeware - includes complete source to a string substitution compressor. From the description it sounds as though it’s variant on LZSS, but I’ll defer to anyone willing to do a real analysis.
http://www.cjkware.com/wamckee/mcdag.zip
Will’s Huffman Demo
Will McKee wrote some Huffman code in C++. Take a look.
Update: Will reports that he has improved the documentation in this package, as well as adding a new function.
http://www.cjkware.com/wamckee/huffman.zip
Dynamic Huffman Coder
This dynamic Huffman coder from Karl Malbrain is written in C and includes weight scaling. It is modeled on the Vitter algorithm.
A DataCompression.info user notes that this site has been undergoing continual changes, and perhaps would benefit from some sort of “last modified on” field.
http://www.geocities.com/malbrain/vitter_c.html
In-Place Calculation of Minimum-Redundancy Codes
The abstract for a paper on calculation of Huffman codes. The paper isn’t here, but the source code is. Alistair says that if you sort your array of counts, you can create the Canonical Huffman tree in memory.
http://www.cs.mu.oz.au/~alistair/abstracts/mk95%3Awads.html
Text compression for embedded controllers
Some simple BASIC routines to compress data without extravagant data or code space. The author seems to indicate this isn’t Huffman coding, but doesn’t say what it is.
http://www.sxlist.com/techref/method/compress/embedded.htm
A Method for the Construction of Minimum-Redundancy Codes
The famous paper by David Huffman, as originally published in the Proceedings of the I.R.E.
http://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-cod
es.pdf
UPL Compression : the complete professional toolkit
The UPL Compression Library is a high-performance professional compression library. It offers the ability to compress and decompress data, buffers, strings or single files and features the latest innovations in data compression. The library offers eight extremely powerful compression algorithms. Dynamic Huffman, Arithmetic, BWT, Ppm and several Lempel Ziv flavors.
DataCompression.info user John G. had this to say: I was looking for adding a better compression to my Visual Basic project and it worked like a charm. The compression ratio is really good, better than Zip!
Parallel Lossless Image Compression Using Huffman and Arithmetic Coding
by P. G. Howard and J. S. Vitter. This paper shows how images can be encoded and decoded using parallel processing. Both Huffman and arithmetic coding are examined.
http://www.cs.duke.edu/~jsv/Papers/catalog/node66.html
Lossless Compression Algorithms (Entropy Encoding)
An overview of the basics, including Shannon-Fano, Huffman, Arithmetic coding, and a section on LZW for good measure.
http://www.cs.cf.ac.uk/Dave/Multimedia/node207.html
Selective Extension Coding
Graham Fyffe proposes an alternative type of variable length coding that he claims will offer improved efficiency. Full description, no implementation.
http://www.geocities.com/g_fyffe/selext.htm
Fyffe Codes for Fast Codelength Approximation
Graham Fyffe proposes a variable length integer code that he says is easier to compute than Huffman codes.
http://www.geocities.com/g_fyffe/fastselext.htm
David Scott’s Bijective Static Entropy Compression
This compressor is designed to operate on English text - it has a static probability table configured for written English.
http://bijective.dogma.net/compres1bse.htm
Huffman Coding with Priority Queues
This article is really about using the priority queue containers that are part of the standard C++ library. The example program implements a Huffman Encoder using the queues, showing how they can do a fairly complex piece of work without too much coding on your part.
http://www.dogma.net/markn/articles/pq_stl/priority.htm
Huffman Coding: A CS2 Assignment
The shcodec Home Page
shcodec is order-0 32-bit canonical static huffman codec. It encodes an alphabet of 256 symbols with minimum-redundancy or length-restricted codes (basic method: Alistair Moffat and Jyrki Katajainen, modified by Artur A. Pessoa). shcodec uses efficient method for tree packing: on text files packed tree size is approx 68 bytes, on binary files this value is about 132 bytes. Memory requirements are very small: 1280 bytes for encoding and only 574 bytes for decoding! shcodec uses extremely fast and simple SHIFT-OR method for encoding, and CANONICAL-DECODE with a cache for small codewords for decoding.
Update: Alexander has added SH-SFX to the web page - a program for creating Win32 SFXs from files compressed with shcodec.
Video and Audio Compression
Class notes on lossless compression algorithms. Quick info on Huffman, Adaptive Huffman, and LZW.
http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.1/Chap4.1.html
The Huffman Compression Algorithm
Yet another description of Huffman coding. Comes with a couple of lovely diagrams and some Delphi code.
http://www.howtodothings.com/showarticle.asp?article=313