Return to Answer

added 299 characters in body

Source Link

edited Jul 16, 2009 at 15:50

finnw

48.8k
24
150
223

Any algorithm/library that supports a preset dictionary, e.g. zlib.

This way you can prime the compressor with the same kind of text that is likely to appear in the input. If the files are similar in some way (e.g. all URLs, all C programs, all StackOverflow posts, all ASCII-art drawings) then certain substrings will appear in most or all of the input files.

Moving these common substrings into the preset dictionaryEvery compression algorithm will sharesave space if the costsame substring is repeated multiple times in one input file (e.g. "the" in English text or "int" in C code.)

But in the case of storing these substrings among all filesURLs certain strings (e.g. "http://www.", ".com", ".html", ".aspx" will typically appear once in each input file. So you need to share them between files somehow rather than repeating them for everyhaving one compressed occurrence per file. Placing them in a preset dictionary will achieve this.

Source Link

answered Jul 16, 2009 at 15:42

finnw

48.8k
24
150
223

Any algorithm/library that supports a preset dictionary, e.g. zlib.

This way you can prime the compressor with the same kind of text that is likely to appear in the input. If the files are similar in some way (e.g. all C programs, all StackOverflow posts, all ASCII-art drawings) then certain substrings will appear in most or all of the input files.

Moving these common substrings into the preset dictionary will share the cost of storing these substrings among all files, rather than repeating them for every file.

Collectives™ on Stack Overflow

Return to Answer