Suffix trees are a data structure that makes it convenient to do string matching against an entire data set in O(N) time. This is really wonderful, but creaing the suffix tree isn’t always that easy.
Adaptive Huffman Compression
C# implementation of adaptive Huffman coding. Implements both the FGK and Vitter algorithm variations. Compression provided through two public classes, AdaptiveHuffmanProvider and AdaptiveHuffmanStream. Good compression ratios for text-based data
SuffixTree
An Open Source Suffx Tree implementation. Looks as though this might be oriented towards teaching applications, as it is an interactive application, not just an implementation of the data structure.
http://bioinformatics.rit.edu/~tex/publications/suffixtree/
ANSI C implementation of a Suffix Tree
A C implementation of a Suffix Tree. This project also includes a Perl module that can use the C code.
http://cs.haifa.ac.il/~shlomo/suffix_tree/
Extended Application of Suffix Trees to Data Compression
By Jesper Larsson, Published in Proceedings of the IEEE Data Compression Conference 1996. Hey Jesper, how about putting a PDF version on line?
http://www.larsson.dogma.net/dccpaper.pdf
The Context Trees of Block Sorting Compression
By Jesper Larsson. Published in Proceedings of the IEEE Data Compression Conference 1998, PDF version of the paper.
http://www.larsson.dogma.net/lev3.pdf
Attack of the Mutant Suffix Trees
By Jesper Larsson. Published in Proceedings of the IEEE Data Compression Conference 1998, a PDF version of the paper.
http://www.larsson.dogma.net/lic.pdf
Notes on Suffix Sorting
by Jesper Larsson. Technical report. LU-CS-TR:98-204, LUNFD6/(NFCS-3134)/1-6/(1998). A PDF version of the paper.
http://www.larsson.dogma.net/tr204.pdf
Suffix Trees on Words
By Jesper Larsson, Arne Andersson and Kurt Swanson. Technical report. LU-CS-TR:95-158, LUNFD6/(NFCS-3107)/1-14/(1995). Early version of a paper published in Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching 1996; to appear in Algotrithmica
http://www.larsson.dogma.net/words-alg.pdf
Jesper Larsson’s Research Links
I am a graduate student (doktorand) in algorithms and data structures at the Department of Computer Science at Lund University. Special interests are text searching and data compression.
I use the links on this page to papers, but don’t index the page itself
http://www.larsson.dogma.net/research.html
Jesper Larsson
Jesper Larsson spent a fair amount of his years in academia
studying Suffix Trees. His home page has links to his thesis and a few other papers on suffix trees and other string matching/Data Compression topics.
Suffix Tree
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/suffixtree.html
Suffix Array
The definition from the NIST Dictionary of Algorithms and Data Structures.
http://www.nist.gov/dads/HTML/suffixarray.html
bwtzip: A Linear-Time Portable Research-Grade Universal Data Compressor
bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.
Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here
http://stl.caltech.edu/bwtzip.html
From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction
1997, Robert Giegerich, Stefan Kurtz. We review the linear time suffix tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonen’s online construction, to explain its historic predecessors. The submitter of this paper indicates that it has user-friendly terminology, always welcome in Journal papers.
http://citeseer.nj.nec.com/giegerich97from.html
Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching
R. Grossi and J. S. Vitter. “Compressed Suffix Arrays and Suffix Trees, with Applications to Text Indexing and String Matching,” Proceedings of the 32nd Annual ACM Symposium on Theory of Computing (STOC ‘00), Portland, OR, May 2000, 397-406.
http://www.cs.duke.edu/~jsv/Papers/catalog/node79.html
Suffix Trees
A short and sweet tutorial on suffix trees. What they are and how to construct them.
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix.html
Opportunistic data structures with applications
Two papers in which it is show how to combine the BWT with the suffix array
data structure, in order to build a sort of compressed suffix array.
In the first paper it is proven that
the space occupancy of the compressed suffix array can be bounded in
terms of the entropy of the input string. In the second paper it is
proposed and tested a practical implementation of this data structure.
The first paper will appear in the Proc. of 41st IEEE Symposium on
Foundations of Computer Science; the second one in the Proc. of the 12th
SIAM-ACM Symposium on Discrete Algorithms.
http://www.mfn.unipmn.it/~manzini/papers/focs00.html
Strmat
A collection of C programs that do string matching and pattern discovery. This appears to be free code by D. Gusfield, who also has a book called “Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology”.
One DCL reader commented The strmat package is wonderful.
http://www.cs.ucdavis.edu/~gusfield/strmat.html
Doug McIlroy’s source code page
Links to a documented implementation of a suffix sort. This may not be a compression topic per se, but suffix trees are useful for compressing data.
http://cm.bell-labs.com/cm/cs/who/doug/source.html
Notes on suffix tree construction
Notes on suffix tree construction, some course notes for COMP 612: Graduate Seminar in Compiler Construction, includes some pointers to important papers.
DataCompression.info user David D. had this complaint: None of the links work on this page, all there is is a short paragraph on suffix trees. I have to agree, it’s a rather strange page.