data compression link collection

Information Theory

Information Theory is an umbrella term for the scientific disciplines that attempt to codify the mathematical underpinnings of data. In particular, Information Theory is interested in topics such as data compression, data communications, and error correction

Fast and Efficient Algorithms for Text and Video Compression

by Dzung Tien Hoang.
“There is a tradeoff between the speed of a data compressor and the level of compression it can
achieve. Improving compression generally requires more computation; and improving speed generally
sacrifices compression. In this thesis, we examine a range of tradeoffs for text and video.”


Posted in July 10th, 2007

PyLZMA homepage

PyLZMA allows to use LZMA SDK in Python, using LZMA compression by Igor Pavlov


Posted in July 2nd, 2007

Arithmetic Coding revealed - A guided tour from theory to praxis

An updated and translated version of our German paper “Proseminar Datenkompression - Arithmetische Kodierung” from 2001. To the best of our knowledge, it is the first comprehensive paper that describes the whole way from the basic principles of AC up to a simple implementation, fully documented with C++ source code.

* * * * *

Posted in November 6th, 2004

Genetic Algorithms for Fractal Image and Image Sequence Compression

In this paper we present a method that uses Genetic Algorithms (GAs) to find a Local Iterated Function System (LIFS) that encodes a single image. By doing this, the time needed to achieve this LIFS is reduced by about compared with Barnsley’s method if similar image quality is desired. If less quality is acceptable, using a GA we can vary the time the encoding will take by changing parameters such as population size and number of generations allowed.


Posted in September 12th, 2004

TTA Lossless Audio Compressor

A lossless codec developed in Russia originally for radio telescope data. Apparently that specialized codec turned out to be good on audio data as well. Distributed under a free license.

* * * * *

Posted in September 12th, 2004

Fractal Image Compression for Spaceborne Transputers

A dissertation by Keith Howell which evaluates the suitability of Fractal Compression for spacecraft images. Keith says he is willing to supply source code upon request.

* * * * ½

Posted in July 29th, 2004

SynCE - Dynamite

Dynamite is a tool and library for decompressing data compressed with PKWARE Data Compression Library and it was created from the specification provided by a post in the comp.compression newsgroup.


Posted in July 29th, 2004


This program tries to unpack the given file by application of several algorithms byte-by-byte. Result of work of the program is the set of files with the unpacked data. Many of the produced files are not correct. However, among them there can be correctly unpacked data. Correctly unpacked files have mainly significant sizes that distinguishes them from dust.


Posted in July 28th, 2004

About LZW Compression

A tutorial by Martin Zolnieryk on LZW, along with some pseudo code and links.


Posted in July 16th, 2004


A compressor built with the world-beating PAQAR 3.0 compressor. axPAQ wraps a GUI around the engine, and includes complete source.


Posted in July 16th, 2004

Estimating entropy rates with Bayesian confidence intervals

I’m pleased to announce with my co-authors availability of a preprint on our new algorithm to estimate the Shannon entropy rate (bits/symbol) or (bits/sec) of an observed sequence of low-alphabet symbols. It uses the Context-Tree-Weighting universal compression method, but doesnot use the compression ratio directly as an entropy estimator but as a scaffold for a Bayesian estimate. The result is significantly lower bias.


Posted in July 10th, 2004

A Compressed Bitset Class

We have developed the CMJBitset class as a plug-in replacement for bitset. The CMJBitset classm depending on compilation optionsm may take as little as 7 bytes to represent a bitset of any size, assuming all the bits are set or reset. In comparision, a 1024 bitset will take 128 bytes. In essence, the CMJBitset operates by run length encoding a bitset if the bitset is either almost all set/reset, but otherwise uses the STL bitset class.

* * * * *

Posted in June 27th, 2004

Generating random numbers

An article by Eric Uner talks a bit about generating reandom numbers, something we all want to do from time to time.


Posted in June 20th, 2004

Java FLAC Codec

FLAC is a port of the Free Lossless Audio Codec (FLAC) library to Java. This library allows java developers to experiment and write programs that use the FLAC algorithms.

Version 0.5 is shipping as of June, 2004.


Posted in June 20th, 2004

PJL Compressing Filter

A J2EE servlet filter which compresses data written to the response. It supports several algorithms (gzip, deflate, etc.) and emphasizes minimal memory usage and high throughput. Also provides detailed performance stats.


Posted in June 20th, 2004


The zisofs filesystem is an extension to the ISO9660 filesystem that allows files, on a file-by-file basis, to be stored compressed and decompressed in real time. The zisofs filesystem is supported by recent versions of Linux (2.4.14 or later). Legacy systems can still read uncompressed files. zisofs-tools contains the tools necessary to create such a compressed ISO9660 filesystem and to read compressed files on a legacy system.


Posted in June 20th, 2004

XMLPPM: XML-Conscious PPM Compression

Published in Source Code, PPM

An open source project that performs PPM compression on XML files. The advance knowledge of XML format helps give this algorithm somewhat better compressions ratios on XML data than universal compressors.

Version 0.98.1 was shipping as of June, 2004.


Posted in June 13th, 2004


This page describes a program, ent, which applies various tests to sequences of bytes stored in files and reports the results of those tests. The program is useful for those evaluating pseudorandom number generators for encryption and statistical sampling applications, compression algorithms, and other applications where the information density of a file is of interest


Posted in June 13th, 2004

Huffman Coding Class

This version of file encoder and decoder program is based on the Huffman coding method. It explicitly demonstrates the details of the files during the encoding and decoding. The algorithm is encapsulated in a class En_Decode in standard C++.

* *      

Posted in June 6th, 2004

LZW Compression

An article that describes itself as showing how to implement LZW compression in MFC.

* * * * *

Posted in June 2nd, 2004


Pack all your files into a single executable with MoleBox or MoleBox Pro.


Posted in May 24th, 2004

bsdtar, libarchive

Libarchive is a programming library that can create and read several different streaming archive formats, including most popular tar variants and the POSIX cpio format. The bsdtar program is an implementation of tar(1) that is built on top of libarchive. It started as a test harness, but is quickly moving toward becoming a candidate system tar for FreeBSD


Posted in May 22nd, 2004

Compression of Individual Sequences via Variable-Rate Coding

by Ziv and Lempel. The seminal LZ78 paper which spawned LZW, GIF, and an entire academic industry.

Update 2004: Document is now packed in RAR format.


Posted in May 16th, 2004

A Universal Algorithm for Sequential Data Compression

The 1977 paper describing an algorithm for compression using pointers to previously seen text. This algorithm, later known as LZ77, is still one of the most widely used techniques for lossless data compression in use today.

Update 2004: Document is now packed in RAR format.


Posted in May 16th, 2004

Hybrid Lossless Audio Compression

WavPack allows you to losslessly compress (and restore) both 16 and 24-bit audio files in the .WAV format. Unlike “lossy” compression schemes (like MP3) that discard information, WavPack converts the audio data into a more compact form so that the restored files are digitally identical to the original source. It’s somewhat like the file compression portion of WinZIP except that it’s optimized for audio data. Like other lossless compression schemes the data reduction varies with the source, but it is generally between 25% and 50% for typical popular music and somewhat better than that for classical music and other sources with greater dynamic range.

* * * * ½

Posted in May 15th, 2004

Parallel Implementation of Data Compression Technologies for Multi-Gbit/s Networks

This group at Loughborough University in the UK would like to use sophisticated compression techniques in high speed networks. To make it all happen, they need to do it in hardware, and do it in parallel. This page has information about their efforts, along with links to papers and other information.



Posted in May 15th, 2004

X-Match Pro

A fast ASIC core designed for lossless compression.


Posted in May 15th, 2004

12Ghosts Zip

This package includes 12Zip and 12Zip2. The first version uses Zip compatible compression, and the second uses a BWT variant.

Version 7.0 of the package is shipping as of May, 2004


Posted in May 14th, 2004

Basic Compression Library

Marcus Geelnard has created a batch of compression routines that you can plug and ply into your programs at will. Marcus is using the wonderfully open zlib license, which means thare are just about no reason you can’t use this code. The 1.0.5 added an LZ77 codec to the RLE, Huffman, and Rice coders

Satisfied user Todd W said: I needed a simple set of compression routines for use in an embedded system. I must be able to store a fair amount of information in a small EEPROM as a generic database. The Huffman coder works very well in the application and has met my needs exactly! Very nice!

* * * * *

Posted in May 14th, 2004

BWT in Matlab

Imran Akthar’s implementation of the BWT transform in Matlab. Free.


Posted in May 14th, 2004