Archive for the 'python' Category

python-gdal: simple file processing

Here is a simple example of image processing with python and gdal.

Say we want to make a mask from an image, 0 if data<10, 1 else.
The strategy is the following:

  • open the input file
  • create an output file, geotiff, byte
  • read the input file line by line
  • test values and write to the output

First, let’s import some libraries. I use the following snippet in any of my codes, so it must (may?) work with most of the gdal/numpy libraries settings:

try:
    from osgeo import gdal
    from osgeo.gdalconst import *
    gdal.TermProgress = gdal.TermProgress_nocb
except ImportError:
    import gdal
    from gdalconst import *
try:
    import numpy as N
    N.arrayrange = N.arange
except ImportError:
    import Numeric as N
try:
    from osgeo import gdal_array as gdalnumeric
except ImportError:
    import gdalnumeric

Now let’s define file names, and the threshold value. You can read something else than a geotiff file in input, gdal should do the job for you.

file='/data/demo/input.tif'
outfile='/data/demo/output.tif'
value=1

The lines below open the input file, and get from it its width and height.

fid = gdal.Open(file, GA_ReadOnly)
ns = fid.RasterXSize
nl = fid.RasterYSize

The output file is created, with the same dimensions, 1 single band, its type is set to Byte (GDT_Byte), format forced to geotiff, options are left empty. Note that options is an array of strings, we’ll see in another post what can be done with it. The output file get same projection and geotransformation as the input. Geo transformation is an array of float indicating the upper left corner coordinates and pixel size.

outDrv = gdal.GetDriverByName('GTiff')
options=[]
outDs  = outDrv.Create(outfile, ns, nl, 1, GDT_Byte, options)
outDs.SetProjection(fid.GetProjection())
outDs.SetGeoTransform(fid.GetGeoTransform())

Preparation is done, it is now enough to parse the input file line by line.
At each iteration the output array is set to 0 (default value), places where data is found greater than the given threshold being set to 1.

Note that ReadAsArray returns a 2-D array. N.ravel transforms it in a single array on which operations like dataOut[data>value]=1 can be applied. To write this array to a file you MUST give it 2 dimensions, which is achieved with dataOut.shape=(1,-1).

datazeros=N.zeros(ns)
for il in range(nl):
data = N.ravel(fid.GetRasterBand(1).ReadAsArray(0, il, ns, 1))
dataOut = datazeros
dataOut[data>value]=1
dataOut.shape=(1,-1)
outDs.GetRasterBand(1).WriteArray(N.array(dataOut), 0, il)

Job done!

Faster loop in Python

If found this today: I had a loop for scanning an image line by line:

for il in range(ystart, yend+1):

data = N.ravel(fid.GetRasterBand(band+1).ReadAsArray(xstart, il, ns,1))
wts = (data != 0)

if wts.any():

index = N.arange(ns*nl)
jpos  = index / ns
ipos  = index - jpos * ns
jmin = jpos.min()
jmax = jpos.max()
imin = ipos.min()
imax = ipos.max()
etc...

The loop was pretty slow.  I thought it was coming from computation on the arrays in the 'if' statement. Actually, by commenting those lines, I realized it was due to the creation of the index array:
index = N.arange(ns*nl)
To (significantly) speed up the loop, I simply had to create first a reference indexRef array to avoid creating it in the loop.

indexRef = N.arange(ns*nl)
for il in range(ystart, yend+1):

data = N.ravel(fid.GetRasterBand(band+1).ReadAsArray(xstart, il, ns,1))
wts = (data != 0)

if wts.any():

index = indexRef[wts]
jpos  = index / ns
etc...