Statistical Lempel–Ziv

Statistical Lempel–Ziv is a concept of lossless data compression technique published by Sam Kwong and Yu Fan Ho in 2001.^[1] It may be viewed as a variant of the Lempel–Ziv (LZ) based method. The contribution of this concept is to include the statistical properties of the source information while most of the LZ-based compression methods, such as LZ78 and LZW do not take this property into consideration.

History

The concept of statistical Lempel–Ziv was first proposed by Yu Fan Ho in 2000 as the research topic of the master's degree in the Department of Computer Science of the City University of Hong Kong. Dr. Sam Kwong was Ho's supervisor in this research topic.

In Feb, 2001, the paper on the title "A Statistical Lempel–Ziv compression algorithm for personal digital assistant (PDA)" was published in the IEEE Transactions on Consumer Electronics.

In 2004, Ho successfully applied statistical Lempel–Ziv to a compression algorithm specific for polyphonic melody data. It was useful for the popular mobile phone or handhelds as polyphonic ring tones. Ho proved that the compression ratio, decompression speed and memory consumption outperformed the commonly used lossless compressors such as LZ77, zip, etc., although the compression speed is lower. Fortunately, the compression speed is not essential because compression of the ring tones for handheld devices were preprocessed in factory and not in the devices.

In March 2009, the application of statistical Lempel–Ziv on melody data was granted a patent by the USPTO with the United States Patent number 7,507,897.^[2]

Background

Traditional LZ-based technologies make use of the repetitive characteristic of the data. The decompression process can be done simply by copying the repeated data from the search window according to an index in the compressed data. The data not found in the window is left uncompressed in the compressed data. The uncompressed data is then shifted into the search window for the next repetition and so on. The data is shifted into the window unconditionally without considering the statistical information. Because of limited size of the search window, the first-in data is shifted out unconditionally when the window is full. There are high possibilities that the window is occupied by the useless (non-repetitive) data while the useful (to be repeated) data is banished. To improve the compression ratio, larger search window should be used and hence more memory required in the decompressor.

Statistical Lempel–Ziv is a LZ-like lossless compression algorithm but statistical information is also taken into consideration to identify the useful data that should be put into the dictionary (search window). It improves the compression ratio compared with LZ77 because more useful data can be kept in the dictionary. The dictionary can be smaller in size for keeping the useful data and hence less memory required in the decompressor. Since not all the data has to be shifted into the window, less processing power is required on the decompressor.

References

↑ Kwong, S. & Ho, Y.F., "A statistical Lempel–Ziv compression algorithm for personal digital assistant (PDA)", IEEE Transactions on Consumer Electronics, Vol. 47, Issue 1, pp. 154-162, Feb 2001
↑ Dictionary-based compression of melody data and compressor/decompressor for the same, U.S. patent: 7,507,897

Data compression methods

Lossless

Entropy type	Unary Arithmetic Asymmetric Numeral Systems Golomb Huffman Adaptive Canonical Modified Range Shannon Shannon–Fano Shannon–Fano–Elias Tunstall Universal Exp-Golomb Fibonacci Gamma Levenshtein

Dictionary type	Byte pair encoding DEFLATE Snappy Lempel–Ziv LZ77 / LZ78 (LZ1 / LZ2) LZJB LZMA LZO LZRW LZS LZSS LZW LZWL LZX LZ4 Brotli Statistical

Other types	BWT CTW Delta DMC MTF PAQ PPM RLE

Audio

Concepts	Bit rate average (ABR) constant (CBR) variable (VBR) Companding Convolution Dynamic range Latency Nyquist–Shannon theorem Sampling Sound quality Speech coding Sub-band coding

Codec parts	A-law μ-law ACELP ADPCM CELP DPCM Fourier transform LPC LAR LSP MDCT Psychoacoustic model WLPC

Image

Concepts	Chroma subsampling Coding tree unit Color space Compression artifact Image resolution Macroblock Pixel PSNR Quantization Standard test image

Methods	Chain code DCT EZW Fractal KLT LP RLE SPIHT Wavelet

Video

Concepts	Bit rate average (ABR) constant (CBR) variable (VBR) Display resolution Frame Frame rate Frame types Interlace Video characteristics Video quality

Codec parts	Lapped transform DCT Deblocking filter Motion compensation

Theory

Compression formats
Compression software (codecs)

This article is issued from Wikipedia - version of the 9/26/2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.