data compression link collection


Miscellaneous and esoteric stuff that doesn’t fit anywhere else.

SQUASHFS - A squashed read-only filesystem for Linux

Squashfs is a highly compressed read-only filesystem for Linux (kernel 2.4.x). It uses Gzip compression to compress files, inodes and directories. Inodes in the system are very small and all blocks are packed to minimise data overhead. Block sizes greater than 4K are supported up to a maximum of 64K.

Version 3.2 of squashfs released in January, 2007.

* * * * *

Posted in July 2nd, 2007

Estimating entropy rates with Bayesian confidence intervals

I’m pleased to announce with my co-authors availability of a preprint on our new algorithm to estimate the Shannon entropy rate (bits/symbol) or (bits/sec) of an observed sequence of low-alphabet symbols. It uses the Context-Tree-Weighting universal compression method, but doesnot use the compression ratio directly as an entropy estimator but as a scaffold for a Bayesian estimate. The result is significantly lower bias.


Posted in July 10th, 2004

Remote Sessions with NX

NxServLiv allow compressed remote X sessions using NoMachine’s NX oss libraries. Like ssh terminal but graphical. Work from Modem to Lan with different compression ratios


Posted in July 4th, 2004

A Generic, Reusable Diff Algorithm in C# - II

Another difference engine in C#.

* *      

Posted in May 31st, 2004

Code to extract plain text from a PDF file

PDF documents are commonly used and their content is usually compressed. This article shows a simple C code that can be used to extract plain text from the PDF file.


Posted in May 31st, 2004

Caliph and Emir

Java & MPEG-7 based tools for semantic annotation and retrieval of digital photos and images supporting graph like annotation and content based image retrieval.


Posted in May 23rd, 2004

Adaptive Huffman Compression

C# implementation of adaptive Huffman coding. Implements both the FGK and Vitter algorithm variations. Compression provided through two public classes, AdaptiveHuffmanProvider and AdaptiveHuffmanStream. Good compression ratios for text-based data

* * * * *

Posted in April 25th, 2004

Binary Delta Compression

From Microsoft: This document describes Binary Delta Compression (BDC) technology and its use in software update deployment. This implementation of BDC technology developed by Microsoft reduces the download size of software update packages by downloading only the differences between old files and new files.


Posted in April 4th, 2004


DACT is a brute force compressor written by Roy Keene that tries out a whole library of compression routines on a given file, and then simply picks the best performer.

Release 0.8.36 was shipping in early March, 2004.


Posted in March 7th, 2004

Word Replacing Transform

A pre-compression transform by Przemyslaw Skibinski, somewhat along the lines of Star Encoding.

Version 3.0 of this program is shipping as of February, 2004.


Posted in February 1st, 2004


Compression is concerned with detecting patterns in order to reduce redundancy. Complearn is an attempt to take that pattern-recognition ability and use it for different ends.


Posted in January 18th, 2004

GAO Research Modem Software

GAO Research sells modem software for quite a few different platforms, including a big batch of DSP parts. Naturally, this includes modules to perform both V.42bis and V.44 data compression.


Posted in December 21st, 2003

LuraDocument.jpm PdfCompressor

The LuraDocument.jpm PdfCompressor is a Windows application that can be used to automatically (Server version) or manually convert scanned documents to higly compressed PDF files. The supported input formats are TIFF, JPEG, BMP and PNM.


Posted in December 14th, 2003

CRC Encoding

Marcel de Wijs has written an article on creating CRC codes in C#.


Posted in December 14th, 2003

MFFM Bit Stream

A C++ hierarchy that is designed to efficiently read and write bit streams. Needless to say, this is quite useful for compression programs.

Version 1.0 shipped in December, 2003.


Posted in December 8th, 2003

Star Encoding in C++

Star Encoding performs some preprocessing on text files, enabling standard compressors to do somewhat better on the files. This article explains the transform and provides some sample code.


Posted in December 6th, 2003

S3TC-BOXEN: A simple S3TC texture compression tool

This product advertises itself as a simple S3TC texture compression
tool. The description says that it loads an image file into your video card’s RAM, then instructs the video card to compress it.


Posted in November 2nd, 2003


A Mac OS X utility to shrink the huge PDF files created by the Mac Quartz rendering engine.


Posted in October 28th, 2003

NX Developers

This site appears to the home page for a project dedicated to developing an Open Source X Windows compression library.


Posted in October 14th, 2003

Algorithms for Triangulated Terrains

by Marc van Kreveld. This paper looks at a method for compressing geographical elevation data.


Posted in October 7th, 2003

PDF Compress

Free PDF Compressor that removes duplicate PDF objects, optionally takes advantage of new compression features of latest PDF specification (1.5), and optionally takes advantage of a new proposed format called “Compact PDF” that for many classes of documents compresses 30 - 60% better than what is possible in PDF 1.5.

Note: Navigate up two levels to get to Tom’s download page.

Version 2.2 is shipping as of February, 2004.


Posted in September 12th, 2003

The Open Compression Toolkit for C++

The Open Compression Toolkit is a set of modular C++ classes and utilities for implementing and testing compression algorithms.

  • Simple interface and skeleton code for creating new compression algorithms.
  • Complete testing framework for validating and comparing new algorithms.
  • Support for algorithms that use external dictionaries/headers.
  • Utility classes and sample code for bitio, frequency counting, etc.


Posted in August 25th, 2003

Zip/JPEG Mask and Encryption

This cool program uses either Zip file or JPEG file to encrypt some of your data. I think it’s free, email me if I’m wrong.


Posted in June 24th, 2003


This program is designed to accurately create a difference file between two packages, allowing for an update with a minimal source file.


Posted in June 19th, 2003


Soft Defender is a good exe file compressor, which can reduce the file size of 32-bit Windows programs by as much as 50%.In addition, Soft Defender is a perfect product of software protection of applications.With Soft Defender, your application can have anti-debugging, anti-tracer, anti-disassemble, anti-dumping, anti-apihook, file integrity checking functions in seconds. It requires no source code editing or your registration algorithm changing.

* * * * *

Posted in June 12th, 2003

Spoofing the Wily CRC

This article on The Code Project web site shows you how to calculate a CRC, but even better it shows you how to create a file that will have a given CRC value.


Posted in June 7th, 2003


Convert your CHM help files from Microsoft to Adobe’s PDF format. This is shareware that runs on Win32 platforms.


Posted in May 30th, 2003

PATRICIA trie implementation

This CodeProject article describes the development of a PATRICIA trie in the .NET framework. The actual code is written in C#, but naturally, it can be used with any of the .NET languages.


Posted in May 30th, 2003


This is a somewhat unique utility that will only be useful to those of you working on Visual Studio programming projects. Running it erases all your temporary project files, then zips up all the leftovers so you can backup your work.


Posted in May 26th, 2003 - C/C++ Random Numbers

Links to code and tools to create random numbers. Quite useful, and includes some intriguing use of sound cards to provide true (instead of psuedo-random) numbers. Registration required.


Posted in April 23rd, 2003