Coding refers to techniques used to encode tokens or symbols. Two of the best known coding algorithms are Huffman Coding and Arithmetic Coding. Coding algorithms are effective at compressing data when they use fewer bits for high probability symbols and more bits for low probability symbols.

# Arithmetic Coding revealed - A guided tour from theory to praxis

An updated and translated version of our German paper “Proseminar Datenkompression - Arithmetische Kodierung” from 2001. To the best of our knowledge, it is the first comprehensive paper that describes the whole way from the basic principles of AC up to a simple implementation, fully documented with C++ source code.

http://www.sable.mcgill.ca/~ebodde/pubs/sable-tr-2007-5.pdf

# Huffman Coding Class

This version of file encoder and decoder program is based on the Huffman coding method. It explicitly demonstrates the details of the files during the encoding and decoding. The algorithm is encapsulated in a class En_Decode in standard C++.

http://codeproject.com/cpp/HandyHuffmanCoding.asp

# Basic Compression Library

Marcus Geelnard has created a batch of compression routines that you can plug and ply into your programs at will. Marcus is using the wonderfully open zlib license, which means thare are just about no reason you can’t use this code. The 1.0.5 added an LZ77 codec to the RLE, Huffman, and Rice coders

Satisfied user Todd W said: *I needed a simple set of compression routines for use in an embedded system. I must be able to store a fair amount of information in a small EEPROM as a generic database. The Huffman coder works very well in the application and has met my needs exactly! Very nice!*

# Michael Dipperstein’s Airthmetic Coding Page

The third in Michael’s collection of pages explaining lossless compression algorithms. A nice tutorial accompanied by ANSI C source.

http://michael.dipperstein.com/arithmetic/index.html

# Five Cents on Arithmetic Encoding

Aliaksei Sanko makes a few improvements to the code in the original 1987 CACM article. His sample includes a templated producer and consumer.

http://codeguru.com/Cpp/Cpp/algorithms/compression/article.php/c7043/

# Compression and Encryption Sources

Links to a variety of lossless coders, includes source for Huffman, arithmetic, LZSS, and other compressors.

http://www.larkin.lanit.ru/compress/compr.htm

# Adaptive Huffman Compression

C# implementation of adaptive Huffman coding. Implements both the FGK and Vitter algorithm variations. Compression provided through two public classes, AdaptiveHuffmanProvider and AdaptiveHuffmanStream. Good compression ratios for text-based data

# BAR

BAR Archiver is a free, cross-platform file compression and archiving tool written in Java. It outperforms most of the popular file

compression tools available as commercial utilities.

Compression is based on Burrows Wheeler Transformation and modified Huffman entropy coder. The compression ratio is in par

with most of the commercial grade compression utilities.

http://fermatjen.tripod.com/bar/

# libhuffman - Huffman encoder/decoder library

libhuffman is a Huffman encoder/decoder library and a command line interface to the library. The encoder is a 2 pass encoder. The first pass generates a huffman tree and the second pass encodes the file. The decoder is one pass and uses a huffman code table at the beginning of the compressed file to decode the file.

Beta 3 shipped in October, 2003.

http://huffman.sourceforge.net/

# PAQ4 Archiver

The latest in the series of multi-model compressors from Matt Mahoney. This improves on PAQ3n’s remarkable Calgary corpus performance by an additional 12K, at some expense in speed. Takes a whopping 84MB at runtime!

http://cs.fit.edu/~mmahoney/compression/paq4.cpp

# Michael Dipperstein’s Huffman Code Page

Michael Dipperstein describes his personal quest for understanding an implementation of Huffman coding. Full source included.

The page was updated with new source December, 2002.

http://michael.dipperstein.com/huffman/index.html

# PAQ3N

Matt Mahoney says that with recent improvements by Serge Osnach, PAQ3N does better on the Calgary Corpus than any other open source compressor.

http://cs.fit.edu/~mmahoney/compression/paq3n.cpp

# RF Coding

A paper by Eduardo Enrique González Rodriguez that describes a proposed new method of entropy encoding. Eduardo overcomes some of the problems faced by Huffman coding in certain circumstances, such as a very small alphabet.

http://datacompression.info/Miscellaneous/rfdoc.rtf

# RF Coder Source Code

The source code to accompany Eduardo Enrique Gonzalez Rodriguez’s artice on RF coding, his proposed new entropy encoder. (Note that this archive contains his paper as well, so you don’t need to download both.)

http://datacompression.info/Miscellaneous/rf-coder.tgz

# Binary Encoding

A survey paper that covers the current roster of popular encoding algorithms, such as Base64, Base32, Hex, etc.

http://www.imagingstuff.com/binaryencoding.pdf

# HFFzip

A Huffman file compressor for *NIX platforms. The author admits that the compression is perhaps not the best, but says it’s going to be just right for embedded programming because of its miniscule footprint and high speed.

Version 1.01 was shipping in June, 2003.

http://utenti.quipo.it/claudioscordino/hffzip.html

# Self-similar Huffman Trees with Extreme Guessing Propertie

An article by John Pliam. Needs a better summary, but I don’t quite get it, other than the fact that there is some cryptographic thinking going on here.

http://www.ima.umn.edu/~pliam/doc/huff/

# C Library to search over compressed texts

A team from the University of Pisa in Italy have created a set of two libraries to demonstrate the ability to perform searches in compressed text. The code is packaged as two LGPL libraries: HuffwLib and CGrepLib, and sample programs are available on the site for demonstration purposes.

http://butirro.di.unipi.it/~ferrax/CompressedSearch/

# K-ary Huffman Encoding

The definition from the NIST Dictionary of Algorithms and Data Structures.

http://www.nist.gov/dads/HTML/karyHuffman.html

# Prefix Code

The definition from the NIST Dictionary of Algorithms and Data Structures.

http://www.nist.gov/dads/HTML/prefixcode.html

# Shannon-Fano Coding

The definition from the NIST Dictionary of Algorithms and Data Structures.

http://www.nist.gov/dads/HTML/shannonFano.html

# Arithmetic Coding

The NIST page on arithmetic coding from their Dictionary of Algorithms and Data Structures. A terse definition and a couple of links.

http://www.nist.gov/dads/HTML/arithmeticCoding.html

# Contributions à la compression de données

Steven Pigeon’s Ph. D Thesis from the University of Montreal. Proposes a new set of universal codes, which he dubs *taboo* codes, as well as new optimization algorithms for (Start, Step, Stop) codes. Plus lossy variations on LZW.

http://www.iro.umontreal.ca/~pigeon/science/publications/phd.ps.zip

# Canonical Huffman Coder Construction

The first time I ever heard the phrase *Canonical Huffman Coder* was in reference to the technique used in PKZip to store Huffman tables. I don’t know where the technique originated, but it is basically a way to construct a Huffman table so that the actual codes don’t have to be stored when storing the table. This makes for some nice space savings when compared to a first-pass naïve implementation. (Like the ones I’ve done in the past.)

It turns out that somebody named Gareth was attempting to implement this code but was having a bit of trouble. His post to comp.compression brought out some useful help from some of the newsgroup regulars, and did a lot to shed light on the topic, and includes a reference to the paper that actually gave birth to the concept.

# A Memory-Efficient Huffman Adaptive Coding Algorithm for Very Large Sets of Symbols

This paper describes an adaptive Huffman scheme that is designed to work with large numbers of symbols - at least that’s what the abstract says. The web page indicates that there are executables in the archive, but all I see is a Word document.

http://www.iro.umontreal.ca/~pigeon/programs/programs.html#huffman

# Steven Pigeon’s Phasing In Codes

Coding integers of an arbitrary length is an interesting problem. Steven Pigeon discusses a way to code integers in a very efficient manner, which approaches the optimal Log2(n) value.

http://www.iro.umontreal.ca/~pigeon/

# Arithmetic Coding Revisited: A New Floating Point Technique

Anand Jain discusses his new technique for arithmetic coding. An executable file is included, but no source code.

# Compression via Arithmetic Coding in Java

Bob Carpenter has created a nice Java package that implements a PPM/arithmetic coding compression system. This page includes links to the source code, javadocs, and a fair amount of tutorial material. Very complete!

http://www.colloquial.com/ArithmeticCoding/

# bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor

bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.

Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here

http://stl.caltech.edu/bwtzip.html

# McKee’s Directed Acyclic Graph Compression

Will McKee has released this as freeware - includes complete source to a string substitution compressor. From the description it sounds as though it’s variant on LZSS, but I’ll defer to anyone willing to do a real analysis.

http://www.cjkware.com/wamckee/mcdag.zip