Coding refers to techniques used to encode tokens or symbols. Two of the best known coding algorithms are Huffman Coding and Arithmetic Coding. Coding algorithms are effective at compressing data when they use fewer bits for high probability symbols and more bits for low probability symbols.
Arithmetic Coding revealed - A guided tour from theory to praxis
An updated and translated version of our German paper “Proseminar Datenkompression - Arithmetische Kodierung” from 2001. To the best of our knowledge, it is the first comprehensive paper that describes the whole way from the basic principles of AC up to a simple implementation, fully documented with C++ source code.
http://www.sable.mcgill.ca/~ebodde/pubs/sable-tr-2007-5.pdf
Huffman Coding Class
This version of file encoder and decoder program is based on the Huffman coding method. It explicitly demonstrates the details of the files during the encoding and decoding. The algorithm is encapsulated in a class En_Decode in standard C++.
http://codeproject.com/cpp/HandyHuffmanCoding.asp
Basic Compression Library
Marcus Geelnard has created a batch of compression routines that you can plug and ply into your programs at will. Marcus is using the wonderfully open zlib license, which means thare are just about no reason you can’t use this code. The 1.0.5 added an LZ77 codec to the RLE, Huffman, and Rice coders
Satisfied user Todd W said: I needed a simple set of compression routines for use in an embedded system. I must be able to store a fair amount of information in a small EEPROM as a generic database. The Huffman coder works very well in the application and has met my needs exactly! Very nice!
Michael Dipperstein’s Airthmetic Coding Page
The third in Michael’s collection of pages explaining lossless compression algorithms. A nice tutorial accompanied by ANSI C source.
http://michael.dipperstein.com/arithmetic/index.html
Five Cents on Arithmetic Encoding
Aliaksei Sanko makes a few improvements to the code in the original 1987 CACM article. His sample includes a templated producer and consumer.
http://codeguru.com/Cpp/Cpp/algorithms/compression/article.php/c7043/
Compression and Encryption Sources
Links to a variety of lossless coders, includes source for Huffman, arithmetic, LZSS, and other compressors.
http://www.larkin.lanit.ru/compress/compr.htm
Adaptive Huffman Compression
C# implementation of adaptive Huffman coding. Implements both the FGK and Vitter algorithm variations. Compression provided through two public classes, AdaptiveHuffmanProvider and AdaptiveHuffmanStream. Good compression ratios for text-based data
BAR
BAR Archiver is a free, cross-platform file compression and archiving tool written in Java. It outperforms most of the popular file
compression tools available as commercial utilities.
Compression is based on Burrows Wheeler Transformation and modified Huffman entropy coder. The compression ratio is in par
with most of the commercial grade compression utilities.
http://fermatjen.tripod.com/bar/
libhuffman - Huffman encoder/decoder library
libhuffman is a Huffman encoder/decoder library and a command line interface to the library. The encoder is a 2 pass encoder. The first pass generates a huffman tree and the second pass encodes the file. The decoder is one pass and uses a huffman code table at the beginning of the compressed file to decode the file.
Beta 3 shipped in October, 2003.
http://huffman.sourceforge.net/
PAQ4 Archiver
The latest in the series of multi-model compressors from Matt Mahoney. This improves on PAQ3n’s remarkable Calgary corpus performance by an additional 12K, at some expense in speed. Takes a whopping 84MB at runtime!
http://cs.fit.edu/~mmahoney/compression/paq4.cpp
Michael Dipperstein’s Huffman Code Page
Michael Dipperstein describes his personal quest for understanding an implementation of Huffman coding. Full source included.
The page was updated with new source December, 2002.
http://michael.dipperstein.com/huffman/index.html
PAQ3N
Matt Mahoney says that with recent improvements by Serge Osnach, PAQ3N does better on the Calgary Corpus than any other open source compressor.
http://cs.fit.edu/~mmahoney/compression/paq3n.cpp
RF Coding
A paper by Eduardo Enrique González Rodriguez that describes a proposed new method of entropy encoding. Eduardo overcomes some of the problems faced by Huffman coding in certain circumstances, such as a very small alphabet.
http://datacompression.info/Miscellaneous/rfdoc.rtf
RF Coder Source Code
The source code to accompany Eduardo Enrique Gonzalez Rodriguez’s artice on RF coding, his proposed new entropy encoder. (Note that this archive contains his paper as well, so you don’t need to download both.)
http://datacompression.info/Miscellaneous/rf-coder.tgz
Binary Encoding
A survey paper that covers the current roster of popular encoding algorithms, such as Base64, Base32, Hex, etc.
http://www.imagingstuff.com/binaryencoding.pdf
HFFzip
A Huffman file compressor for *NIX platforms. The author admits that the compression is perhaps not the best, but says it’s going to be just right for embedded programming because of its miniscule footprint and high speed.
Version 1.01 was shipping in June, 2003.
http://utenti.quipo.it/claudioscordino/hffzip.html
Self-similar Huffman Trees with Extreme Guessing Propertie
An article by John Pliam. Needs a better summary, but I don’t quite get it, other than the fact that there is some cryptographic thinking going on here.
http://www.ima.umn.edu/~pliam/doc/huff/
C Library to search over compressed texts
A team from the University of Pisa in Italy have created a set of two libraries to demonstrate the ability to perform searches in compressed text. The code is packaged as two LGPL libraries: HuffwLib and CGrepLib, and sample programs are available on the site for demonstration purposes.
http://butirro.di.unipi.it/~ferrax/CompressedSearch/
K-ary Huffman Encoding
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/karyHuffman.html
Prefix Code
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/prefixcode.html
Shannon-Fano Coding
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/shannonFano.html
Arithmetic Coding
The NIST page on arithmetic coding from their Dictionary of Algorithms and Data Structures. A terse definition and a couple of links.
http://www.nist.gov/dads/HTML/arithmeticCoding.html
Contributions à la compression de données
Steven Pigeon’s Ph. D Thesis from the University of Montreal. Proposes a new set of universal codes, which he dubs taboo codes, as well as new optimization algorithms for (Start, Step, Stop) codes. Plus lossy variations on LZW.
http://www.iro.umontreal.ca/~pigeon/science/publications/phd.ps.zip
Canonical Huffman Coder Construction
The first time I ever heard the phrase Canonical Huffman Coder was in reference to the technique used in PKZip to store Huffman tables. I don’t know where the technique originated, but it is basically a way to construct a Huffman table so that the actual codes don’t have to be stored when storing the table. This makes for some nice space savings when compared to a first-pass naïve implementation. (Like the ones I’ve done in the past.)
It turns out that somebody named Gareth was attempting to implement this code but was having a bit of trouble. His post to comp.compression brought out some useful help from some of the newsgroup regulars, and did a lot to shed light on the topic, and includes a reference to the paper that actually gave birth to the concept.
A Memory-Efficient Huffman Adaptive Coding Algorithm for Very Large Sets of Symbols
This paper describes an adaptive Huffman scheme that is designed to work with large numbers of symbols - at least that’s what the abstract says. The web page indicates that there are executables in the archive, but all I see is a Word document.
http://www.iro.umontreal.ca/~pigeon/programs/programs.html#huffman
Steven Pigeon’s Phasing In Codes
Coding integers of an arbitrary length is an interesting problem. Steven Pigeon discusses a way to code integers in a very efficient manner, which approaches the optimal Log2(n) value.
http://www.iro.umontreal.ca/~pigeon/
Arithmetic Coding Revisited: A New Floating Point Technique
Anand Jain discusses his new technique for arithmetic coding. An executable file is included, but no source code.
Compression via Arithmetic Coding in Java
Bob Carpenter has created a nice Java package that implements a PPM/arithmetic coding compression system. This page includes links to the source code, javadocs, and a fair amount of tutorial material. Very complete!
http://www.colloquial.com/ArithmeticCoding/
bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor
bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.
Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here
http://stl.caltech.edu/bwtzip.html
McKee’s Directed Acyclic Graph Compression
Will McKee has released this as freeware - includes complete source to a string substitution compressor. From the description it sounds as though it’s variant on LZSS, but I’ll defer to anyone willing to do a real analysis.
http://www.cjkware.com/wamckee/mcdag.zip