Saturday, October 27, 2007 at 15:31
martian77 in Multimedia design

This follows on from the colour model discussion in the bitmap image entry. If we are going to store high quality, memory intensive images, we may need some way to compress the data so that it takes up less space.

There are two ways to do this: lossy and lossless. Lossy throws away information. Lossless (guess) doesn't. In the lecture it was demonstrated as two different ways to tidy a room. You could either tidy it by restacking everything so that it takes up less space (but doesn't actually reduce the amount of stuff) or you can tidy by throwing out all the stuff you don't need any more (empty pizza boxes were the example used). Lossless compression restacks the information, but the original information is always still available. Lossy compression throws away the information we say we don't need any more.

Run Length Encoding (RLE) is a very basic form of lossless compression. It stores the data in a more appropriate form. So if (for example) you have a long string of 1s and 0s as your data like this: 11111111111111111100000111111111, it could be stored as  18,5,9. That takes it from 4 bytes (32 bits) to 3. This method of encoding rarely gives compression ratios of better than 2:1.

You can get more complex with lossless compression and use Huffman coding. That replaces common pixel values witha  short code, and less common with a longer code. It may result in some individual segments taking more memory to store, but these should be infrequently used and this can give from a 27% to maybe a 40% reduction in size. This does rely on an unequal distribution of pixels, so trying to compress an equally-distributed image may actually result in an increase in size. It is used as the final stage in the compression of jpegs.

Then we have Lempel-Ziv-Welsh (LZW) compression. This is probably the most widely used lossless compression - used in Unix compression and GIF. It works very well for text documents but much less well for noisy images. It is on a similar idea to Huffman coding, but instead of individual colours it replaces whole repeating sequences with a code that points to a data dictionary entry.

Lossy compression works on removing the information that we don't need. It works by taking advantage of the properties of human vision and hearing, and tries to remove data where it won't be noticed. In the case of images, that is normally colour information (the eye is not so good at recognising tiny changes in colour) and in sound it's the bits you can't really hear.  

Article originally appeared on Life on Mars (
See website for complete article licensing information.