Arithmetic Coding with Folds and Unfolds
Richard Bird and Jeremy Gibbons
Abstract
Arithmetic coding is a method for data compression. It produces a theoretically optimal compression under much weaker assumptions than Huffman and Shannon-Fano, and can compress within one bit of the limit imposed by Shannon's Noiseless Coding Theorem. Earlier presentations provided little in the way of proof of why the various steps in the encoding process were correct, particularly when it came to the specification of precisely what problem the implementation solved, and the details of why the inverse operation of decoding was correct. Our aim in these lectures is to provide a formal derivation of basic algorithms for coding and decoding. Our development makes heavy use of the algebraic laws of folds and unfolds. One novel result concerns a new pattern of computation, which we call streaming, whereby elements of an output list are produced as soon as they are determined (and which has nothing to do with lazy evaluation).