Posts Tagged 'lzw'

About image files internal compression

Did you know that some image file formats support internal compression? This means that the data can be compressed inside the file, the decompression being done on the fly when you open the image in a software, the task being done by the image driver. By using internal compression, you do not need to zip your files to save space on disk, and no file is created when you read the image. It worth to be considered also when your files are stored on a remote hard drive (you may save a lot of processing time).

There are two types of compression: with information loss, like jpeg, or loss-less. Lossless compression allows the exact original data to be reconstructed from the compressed data. Lossy data compression only allows an approximation of the original data to be reconstructed, in exchange for better compression rates (often used in photography).

I use to store all my data in geotiff + internal lossless compression. Gdal offers three lossless compression for geotiff: LZW, deflate and packbits. I use LZW because it is implemented on all commercial software, but deflate seems to give better compression rate.

Exporting an image, say image.img to geotiff + compression geotiff, say new_image.tif, is simple:
gdal_translate -of Gtiff -co "compress=lzw" image.img new_image.tif

Let’s take an example of an NDVI image of Africa, say 9633*8117 pixels, 1 byte per pixel. The data amount is about 75Mb. If using geotiff with LZW compression, I’ve got file sizes of about 30Mb (the actual size varies a bit from an image to another).
I also have data which are the detection of surface water on the continent. I typically have 5 classes: the ocean, the dry land and three classes of surface water. Then the geotiff +LZW files are around 2Mb!

The time spent on decompression is not noticeable. In my case, it is even the opposite: I’ve a (very) large repository of images (Spot/vegetation images of Africa), stored on a remote machine: internal compression saves network bandwidth and time: reading a file of 2Mb vs 75Mb makes a big difference!