Posts Tagged 'gdal_translate'

Some tips on HDF5 files

In general, my users ask me to export HDF formatted images into something directly usable with GIS desktop software (Geotiff, Erdas Imagine, etc.)

The problem with HDF is that it is not an image format but a data container format: it’s very general, can contain any type of object (variables, arrays, images…). The best way to handle this format is to write some lines of code to browse the file internal table storing the meta information.

From the command line, you can use gdalinfo to get some meta-information. The meta information can be more or less complex depending on what was stored in the HDF file.
Let’s consider an HDF5 file, which metainformation would be

Driver: HDF5/Hierarchical Data Format Release 5
Size is 512, 512
Coordinate System is `'
SUBDATASET_0_DESC=[1511x701] //FVC (8-bit integer)
SUBDATASET_1_DESC=[1511x701] //FVC_QF (8-bit character)
SUBDATASET_2_DESC=[1511x701] //FVC_err (8-bit integer)
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0, 512.0)
Upper Right ( 512.0, 0.0)
Lower Right ( 512.0, 512.0)
Center ( 256.0, 256.0)

In HDF you can store different types of data in the same file, the Size information for the file is not meaningful in this case (here it is written that Size is 512, 512, which is wrong since the actual size of the images is given on the lines with the SUBADATASET_0_DESC.

The image size given in the header is not meaninful: the images are 1511×701 lines, as indicated in the line SUBDATASET_0_DESC and not 512×512 in the header. The same SUBADATASET_0_DESC line gives you the file type.

The example above is about an HDF5 image, but FWTools can also handle HDF4 images.

Now, to export the image to something easier to handle, you must give the dataset name to gdal_translate, not the hdf5 file name:

gdal_translate -of gtiff HDF5:"HDF5_LSASAF_MSG_FVC_SAme_200806100000"://FVC fvc.tif

to export the data set named FVC into a single image.

About image files internal compression

Did you know that some image file formats support internal compression? This means that the data can be compressed inside the file, the decompression being done on the fly when you open the image in a software, the task being done by the image driver. By using internal compression, you do not need to zip your files to save space on disk, and no file is created when you read the image. It worth to be considered also when your files are stored on a remote hard drive (you may save a lot of processing time).

There are two types of compression: with information loss, like jpeg, or loss-less. Lossless compression allows the exact original data to be reconstructed from the compressed data. Lossy data compression only allows an approximation of the original data to be reconstructed, in exchange for better compression rates (often used in photography).

I use to store all my data in geotiff + internal lossless compression. Gdal offers three lossless compression for geotiff: LZW, deflate and packbits. I use LZW because it is implemented on all commercial software, but deflate seems to give better compression rate.

Exporting an image, say image.img to geotiff + compression geotiff, say new_image.tif, is simple:
gdal_translate -of Gtiff -co "compress=lzw" image.img new_image.tif

Let’s take an example of an NDVI image of Africa, say 9633*8117 pixels, 1 byte per pixel. The data amount is about 75Mb. If using geotiff with LZW compression, I’ve got file sizes of about 30Mb (the actual size varies a bit from an image to another).
I also have data which are the detection of surface water on the continent. I typically have 5 classes: the ocean, the dry land and three classes of surface water. Then the geotiff +LZW files are around 2Mb!

The time spent on decompression is not noticeable. In my case, it is even the opposite: I’ve a (very) large repository of images (Spot/vegetation images of Africa), stored on a remote machine: internal compression saves network bandwidth and time: reading a file of 2Mb vs 75Mb makes a big difference!

Changing geospatial images file format

In the field of Earth Observation, a wide range of file formats exists. Many of them come from software which imposed their home made formats, like Erdas-Imagine HFA files or Envi/IDL file format, while others where created by reasearch groups like HDF made by the HDF group of the University of Illinois or geotiff which is an effort of the open source community in which many Universities or companies are involved.

Unfortunately, it comes that the file format with which you are provided is not necessarily the one you wish to have, whether it is not supported by your software or that it does not match some database requirements.

To make these changes I use gdal_translate which is one of many commands available from FWTools package or from a gdal installation. First install gdal, or better FWTools on your PC, and get some data. We will use the commands from the shell, ms-dos or linux shell (bash for example), the path to your gdal commands (or FWTools commands) must be correctly set. To know if everything works, run the FWTools shell (or open an ms-dos box) or go to your linux prompt and type:
The command should answer a long text like this one:

gdal_translate help and list of formats

gdal_translate help and list of formats

The first section of the text gives you the list of options you can use along with the command line, following the classical convention: parameters between brackets [ ] are optional, braces { } give you a list of choices, parameters without brackets [ ] are mandatory.

Hence, the minimal command you can invoke is
gdal_translate file_in file_out
which will export your input file (in any supported format) into the default output (which is geotiff).
Let’s say you want to transform an Erdas Imagine file named africa.img into a geotiff image named export_africa.tif, you simply write
gdal_translate africa.img export_africa.tif

gdal_translate guess the input format, you do not even need to know it!

Now you can use the -of option to define the export format. Say you want to export to Windisp file format (which name is IDA). Say you want the output image in IDA format to be named africa.ida, you have to write:
gdal_translate -of IDA africa.img africa.ida

Of course, it works only if you input image is in Bytes (8bits) since IDA format only support 8bits.

A short list of format (a reminder) is visible if you type
To see the exhaustive list of formats, type
gdal_translate --formats
The list is rather impressive and tells you if you can read only (ro) or read and write (rw) or even read, write and update existing files (rw+).
gdal_translate --formats | more
to pause when displaying the information (press ENTER to move forward by one line or space to move by one page). You can see that you can read, write and update geotiff or Erdas Imagine formats
GTiff (rw+): GeoTIFF
HFA (rw+): Erdas Imagine Images (.img)
, you can read and write ERMapper Compressed Wavelets images
ECW (rw): ERMapper Compressed Wavelets but only read HDF5 images
HDF5 (ro): Hierarchical Data Format Release 5

You can find more details about formats on the gdal page.

Do not forget that some formats, like windisp IDA, do not support all type of data. For example windisp IDA only support Bytes. When exporting TO this format, you must be sure that your original data can be stored on Bytes, else you need to apply a rescaling of the data.

In next posts, we will see how to use the other options.