Markov modeling is essentially a method to predict the probability of a given character based on what has come before it. The best know type of Markov modeling is known as PPM, which has its own category. This topic covers all things Markov that are not PPM-related.
XMLPPM: XML-Conscious PPM Compression
An open source project that performs PPM compression on XML files. The advance knowledge of XML format helps give this algorithm somewhat better compressions ratios on XML data than universal compressors.
Version 0.98.1 was shipping as of June, 2004.
http://www.cs.cornell.edu/People/jcheney/xmlppm/xmlppm.html
PAQ4 Archiver
The latest in the series of multi-model compressors from Matt Mahoney. This improves on PAQ3n’s remarkable Calgary corpus performance by an additional 12K, at some expense in speed. Takes a whopping 84MB at runtime!
http://cs.fit.edu/~mmahoney/compression/paq4.cpp
PAQ3N
Matt Mahoney says that with recent improvements by Serge Osnach, PAQ3N does better on the Calgary Corpus than any other open source compressor.
http://cs.fit.edu/~mmahoney/compression/paq3n.cpp
Text Preprocessing for Data Compression
by Jürgen Abel and William Teahan. This paper looks at a few different techniques for preprocessing data before performing text compression, and compares the gains achieved when combining the preprocessors with PPM, BWT, and LZ algorithms.
http://www.data-compression.info/JuergenAbel/Preprints/Preprint_Text_Preprocessing.pdf
ShipInBottle Archiver
This is a Russian language web site, so I’ll have to apologize in advance for any mistakes. As far as I can tell, this is a PPM-based archiver. I can’t tell if it’s free or not, so I’m listing it as both!
Dmitry Shkarin
Dmitry Shkarin is the author of the PPMD compressor. His homepage has links to to versions A-I of the compressor as of April, 2003. This page is in Russian, but I find that running it through
BabelFish produces a usuable translation.
Compress::PPMd
Salvador Fandiño García has created a Perl interface to Dmitry Shkarin PPMd compression library.
Version 0.05 is shipping as of March, 2003.
http://search.cpan.org/author/SALVA/
Compression via Arithmetic Coding in Java
Bob Carpenter has created a nice Java package that implements a PPM/arithmetic coding compression system. This page includes links to the source code, javadocs, and a fair amount of tutorial material. Very complete!
http://www.colloquial.com/ArithmeticCoding/
BICOM - BIjective COMpressor
BICOM is a freely available open source compressor. It uses a souped-up PPM algorithm, and is completely bijective.
Reader comment:
Wow this is hot! …a bijective compressor
using full size Rijndael encryption…
http://www3.sympatico.ca/mt0000/bicom/bicom.html
UPL Compression : the complete professional toolkit
The UPL Compression Library is a high-performance professional compression library. It offers the ability to compress and decompress data, buffers, strings or single files and features the latest innovations in data compression. The library offers eight extremely powerful compression algorithms. Dynamic Huffman, Arithmetic, BWT, Ppm and several Lempel Ziv flavors.
DataCompression.info user John G. had this to say: I was looking for adding a better compression to my Visual Basic project and it worked like a charm. The compression ratio is really good, better than Zip!
PPM FAQ
What appears to be a comprehensive FAQ on PPM programming. Russian language, and fairly slow loading, but no reason not to try Google or BabelFish for a translation.
http://www.arctest.narod.ru/descript/ppm-faq.htm
PPMD var I1
From the archive: PPMd program is file-to-file compressor, it is written for embedding in user programs mainly and it is not intended for immediate use. I was interested in speed and performance improvements of abstract PPM model [1-5] only, without tuning it to particular data types, therefore compressor works good enough for texts, but it is not so good for nonhomogeneous files (executables) and for noisy analog data (sounds, pictures etc.).
ftp://ftp.elf.stuba.sk/pub/pc/pack/ppmdi1.rar
Markov Predictive Coders : PPMZ
This directory contains source and executable for Charles Bloom’s PPMZ encoder, as well as a paper on PPMZ and some benchmark results. There are also links to a few other pages containing PPM information.
News: Charles Bloom has now released the source code to PPMZ2. He says it is both cleaner and faster than the original PPMZ code.
http://www.cbloom.com/src/ppmz.html
Design and Analysis of Fast Text Compression Based on Quasi-Arithmetic Coding
by Paul Howard and Jeff Vitter. Here’s what they have to say about this paper from the abstract: Our algorithm, related to the PPM method, simplifies the modeling phase by eliminating the escape mechanism, and speeds up coding by using a combination of quasi-arithmetic coding and Rice coding. We provide details of the use of quasi-arithmetic code tables, and analyze their compression performance. Our Fast PPM method is shown experimentally to be almost twice as fast as the PPMC method, while giving comparable compression..
http://www.cs.duke.edu/~jsv/Papers/catalog/node68.html
Bill Teahan’s Papers
Bill has links to some of his papers here, citations for all.
http://www.scms.rgu.ac.uk/staff/smc/researchcoord/staff_publications/wjt.html
PPM: one step to practicality
by Dmitry Shkarin. Dmitry is the author of PPMd, a well know compressor. PPM is a powerful compression technology which has a reputation for being resource intensive. In this paper Dmitry adds bells and whistles to basic PPM and gets fantastic results.
Reader jcp said: Big problem: algorithm very slow, it’s impractical to search anything inside of compressed files (compared with gzip), .. But it’s good to send/receive letters of e-mail, well compressed!!!
http://DataCompression.info/Miscellaneous/PPMII_DCC02.pdf
Lectures on Statistical Modeling Theory
A paper by Jorma Rissanen, IBM Almaden Research Center.
http://www.cs.tut.fi/~rissanen/papers/model.ps
Test Harness for the PPMD Compressor
Here’s a nice little frameword for testing PPMD, or for that matter, any other compressor.
http://www.codeproject.com/cpp/ppmd.asp
PPMD Var ‘G’ (Dmitry Shkarin)
Free PPM program with full source, advertised as fast.
ftp://ftp.simtel.net/pub/simtelnet/win95/compress/ppmdi.zip
On-Line Stochastic Processes in Data Compression
Suzanne Bunton’s PhD thesis. A recent post to comp.compression said that Bunton had used floating point math for arithmetic coding. I haven’t verified this, but it sounds worth a look.
ftp://ftp.cs.washington.edu/tr/1997/03/UW-CSE-97-03-02.PS.Z
PPMD Var ‘F’ (Dmitry Shkarin)
A file to file compressor that uses the PPMD algorithm is boasts of being very fast.
ftp://ftp2.itb.it/files/PC/SAC/pack/ppmdf.rar
DWC File Format
An explanation of the archaic DWC file format.
http://DataCompression.info/ArchiveFormats/dwc.txt
Finite context modeling
An article by Arturo Campos that describes and discusses Finite Context Modeling. This modeling technique is uses by PPM compressors, although Campos makes the point that the ideas in this article can be used in other compressors as well.
http://www.arturocampos.com/ac_finite_context_modeling.html
PPMD Var ‘E’ (Dmitry Shkarin)
Another version of the PPMD program by Dmitry Shkarin. Readme says this includes a bug fix, and removal of one model.
ftp://ftp.nsk.su/.3/windows/compress/ppmde.zip
Lzp
Arturo Campos presents the LZP algorithm, first described by Charles Bloom. LZP is a hybrid of dictionary based coding and statistical modeling. This means it has some of the elements of popular LZSS encoders, but takes advantage of PPM style modeling as well. The combination of the two leads to very good compression ratios.
http://www.arturocampos.com/ac_lzp.html
Lzp order-3 with linked lists
Another article about LZP by Arturo Campos. This piece describes a specific implementation technique using linked lists.
http://www.arturocampos.com/ac_lzp_o3_ll.html
Lzp research
Arturo Campos has been doing quite a bit of research on the LZP algorithm (first described by Charles Bloom.) This paper presents all of his results to date.
http://www.arturocampos.com/ac_lzp_research.html
Lossless Data Compression Toolkit
Version 1.1 of the lossless data compression toolkit by Nico deVries. The C sources in this toolkit include an LZW compressor, AR002 archiver, a PPM like compressor using arithmetic compression, Huffman compressor, splay tree program, and LZRW1. Quite a variety.
ftp://garbo.uwasa.fi/pc/programming/lds_11.zip
PPM by Bill Teahan
This ftp site has a few PPM programs, along with Bill Teahan’s thesis.
ftp://ftp.cs.waikato.ac.nz/pub/compression/ppm/