Terrain Processing

(Laura Toma, developed with the support of NSF award no. 0728780.)

Overview

Disciplines such as environmental sciences, earth, oceanographic and atmospheric sciences deal with terrains and surfaces. In a computer, terrains (surfaces) are represented by sampling points on the terrain and recording the geographical coordinates {x, y} of the point and the coresponding elevation {z} of the terrain at that point. Thus, a terrain is represented as a cloud of points {x, y, z} (stored in a file).

Terrain data can be obtained by directly sampling the elevation of the terrain (this may be the most precise method, but the most time consuming, for sure!); by interpolating cartographic maps; from technology like LIDAR; or estimated from aerial photographs (SAR interpherometry).

Terrain elevation models are used to model and understand a wealth of natural processes in geosciences, and beyond.

In this lab you will develop an application to read terrain elevation data from a file and compute the flow of water. This is one of the basic functions in a Geographic Information System (GIS).

Representing terrains

Terrains are represented by sampling points on the terrain and recording the geographical coordinates {x, y} of the point and the coresponding elevation {z} of the terrain at that point. Thus a terrain is represented as a cloud of points {x, y, z} stored in a file.

The most common terrain data available comes in form of a grid: this means that the points {x, y} are sampled uniformly with a grid. Grid elevation data is refered to as a digital elevation model or DEM, in short. In a grid terrain the individual coordinates {x, y} of the points need not be specified. It is sufficient if we know the coordinates of the upper-left corner of the grid and the spacing between sample points. If these are specified, the geographical coordinates of the grid points are implicit.

A grid terrain consists of a header, and a set of points {z} that represent the elevation of the terrain sampled with a uniform grid. Here is a grid file, set1.asc, that represents a basin in the Sierra Nevada range in the south-west of the US. Take a look at it.

The format of a terrain grid file is standard. The first 6 lines in the file represent the header of the terrain; it stores the number of rows in the grid, the number of columns, the geographical coordinates of the lower left corner, the spacing in the grid (assumed to be the same both horizontally and vertically), and a nodata value (this is the value that is used for points where the elevation could not be measured).

ncols         391
nrows         472
xllcorner     271845
yllcorner     3875415
cellsize      30
NODATA_value  -9999

Following the header there are nrows lines, each line containing ncols values, for a total of nrows * ncols values. These values represent the elevations sampled from the terrain. Thus, a grid terrain is basically a 2D-array of elevation values. You can assume that the elevations are integers (though you can also use floats or doubles if you want). The dimensions of the grid can be found out from the header. The other information stored in the header (xllcorner, yllcorner, cellsize, NODATA_value) you will happily ignore for this lab since we won't care about geographical coordinates---- but you have to read them anyways in order to get to the point in the file where you can start reading the elevations.

Declaring a type for a grid

To store all the information of a grid we'll use a struct:

typedef struct _grid {

     int  rows, cols;  // the size of the grid
     ....
     float** data;   //the 2D array of value in the grid 

} Grid;

Reading a grid from a file

You'll need to write code that reads a terrain from a file. One thing you will need to figure out is how to work with files in C. Below is a piece of code that opens a file and reads an integer.

FILE* f;
char s[100];
int nrows;

f=fopen("myfile.asc", "r");
if (f== NULL) {
   printf("cannot open file..");
   exit(1);
}

fscanf(f, "%s", s);
printf("read %s from file\n", s);
fscanf(f, "%d", &nrows);
printf("read %d from file\n", nrows);

Testing your grid reader

To test that you read the right values, write a printGrid (Grid g) method that prints the grid (header and values); it should print the same information as when you open the grid.asc file. You may want to break this function into two: printHeader and printValues.

In addition to the printGrid method, write a printInfo (Grid g) method that prints the important information about the grid: rows, cols, h_min and h_max.

See some test grids at the top of this page. Some of them (europe.asc) are large, so better not open them in a browser. Your Grid class will not be able to handle large files (try it), but it should handle the rest of them just fine. While testing, you should try getting the small ones to work before you try any of the larger ones.

If you want to play with more data, you can download your own DEMs at USGS geodata site. One of the test DEMs I included, europe.asc, comes from this site under GTOPO30 project, which contains Global 30 Arc-Second Elevation Data Set global DEM with a horizontal grid spacing of 30 arc seconds (approximately 1 kilometer). These files come in a different format, so if you want to use one of them, let me know and I will be very happy to help you convert it. Another site where you can download (more recent) data is: here. This is SRTM data collected by NASA SRTM mission in 2000 and released for free for research and educational purposes. It covers the entire world at 3 arc second resolution (approx. 90m). It comes in tiles. Download a tile in a location that you like.

Flow

The goal of flow modeling is to quantify how much water flows through the any point of the terrain; as a result, one can infer the location of the rivers (the points with a lot of flow) and th3 ridges (the points with low flow).

Knowledge about flow on the terrain turns out to be the fundamental step in many geographical processes. Unfortunately, flow data is not available from satellites, like elevation data. The location of rivers can be obtained by flying a plane and identifying rivers, but you can imagine that is a very very expensive process. That is why people have looked into ways to model flow based on the elevation of the terrain.

Flow on a grid terrain

For a grid terrain, flow is expressed by a grid which has the same size as the elevation grid. The value of this flow grid at position (i,j) represents the total amount of flow that flows through point (i,j).

The computation of flow goes as follows: To start, every point in the grid has 1 unit of water. So the very first thing that you want to do in computeFlow() after you create the flow grid is

//initialize flow grid 
for (i=0; i < rows; i++) 
   for (j=0; j < cols; j++) 
     if point is not NODATA then set(flowGrid, i,j, 1);
     else set(flowGrid, i,j,NODATA);

Here assume you have a set function on a grid set(Grid* grid, int i,int j, float val) that sets the value of an element (i,j) in grid to the desired value.

The question is, where does the water go? The simplest way to model the flow is to assume that every single point in the grid distributes its water (initial, as well as incoming) to a single neighbor, namely its steepest downslope neighbor. If there are ties, they are broken arbitrarily.

If a point is NODATA (undefined), its flow should be NODATA (undefined); A NODATA point does not send its flow to any neighbor, so all NODATA points should end up with flow = NODATA.
If a point has no neighbors that are lower than it, then it does not send its flow anywhere. This point is (part of) a pit.

So you see, with this model, water follows the steepest way down, either to the edge of the terrain or to a pit. Because water flows "down", there cannot be cycles.

The goal is to compute, for each point in the grid, the total amount of water that flows through that point.

To do this, the first method that you'll write will be:

 boolean flowsInto(Grid grid, int i, int j, int k, int l)

which returns true if cell (i,j) flows into cell (k,l), that is, if cell (k, l) is the steepest downslope neighbor of cell (i,j) in grid. The way this method works is the following: it looks at all 8 neighbors of cell (i,j) in grid and computes the lowest neighbor; this is the direction where water would go from cell (i,j). Then it checks whether this neighbor is equal to (k,l), in which case it returns true; otherwise it returns false.

Once you have method flowsInto written and tested, you can start working on computing flow. You will write a method that computes a flow grid based on the (elevation) grid:

void computeFlow(Grid  elevgrid, Grid * flowgrid) {


   //initialize flow grid 	
   ...
		
    //compute flow grid
    for (int i=0; i< rows; i++) {
	for (int j=0; j < cols; j++) { 

           //compute flow of point (i,j)
	   int flow = computeFlow(elevGrid, i, j);

           set(flowGrid, i,j, flow);
	}
    }

}

Naturally, you will write a separate method computeFlow(Grid elevGrid, int i, int j) that computes the flow of a point (i,j) in the grid. There are several ways to go about computing the flow of a point (i,j). Probably the simplest way is to run a loop through all its neighbors, for every neighbor you will check whether it flows into the current cell, and if it does, you will add its value of the flow to the flow of the current cell. Note that a cell can receive flow from multiple cells. Also note that you will need to compute the value of the flow of the neighbor, which means that the problem is probably recursive. When no neighbor flows into the current cell there are no subsequent recursive calls.

Two things to think about when writing computeFlow(i,j):

Infinite recursion: is it possible for computeFlow to create a loop of recursive calls; if yes, how to avoid it.
Efficiency: is it possible for the flow of a point to be computed multiple times. If yes, how to avoid it. One obvious way is to mark the grid: initially the entire grid is unmarked; whenever you determine the final value of the flow of cell (i,j), you set mark[i][j] = true. This way you know, if you ever need flow[i][j], whether you've computed it before or not. Marking the grid is just one way to do it. Maybe you can come up with a different way.

For debugging reasons, you should first get test2.asc to compute flow correctly. Here is how the grid test2.asc looks like:

The grid is: 
 9 9 9 9 9
 8 7 6 7 8
 7 6 5 6 7
 6 5 4 5 6
 5 4 3 4 5

Here is what the flow should look like:

The grid is: 
 1 1 1 1 1
 1 2 4 2 1
 1 2 9 2 1
 1 2 14 2 1
 1 3 25 3 1

Note that point (4, 2) has height 3, and receives flow from all its neighbors, as it is the lowest neighbor for each of them. So 2 + 3 + 14 + 2 + 3 = 24. Add to this the initial 1 unit of flow at point (4, 2) and you got 25.

If you got test2.asc to run correctly, then your program is almost correct except maybe nodata values.

The main function

Your program should have a main function that takes the name of the elevation grid and the name of the flow grid on the command line, reads the elevation grid in memory, computes a flow grid, and saves the flow grid to a file with the given name. It should be run as follows:

>flow test2.asc test2flow.asc

This will read test2.asc, compute its flow grid and save it as test2flow.asc (in the current folder). The suggested outline of the main function:

int main(char** args, int argc) {


      char *elevfname, *flowfname; 
     //check args and argc to make sure the user entered two files
     //names on the command line, and if so, figure out the two file
     //names


     Grid elevgrid; 
     / * read the elevation grid from file into this
     structure. Note that first you have to read the number of rows
     and columns, then you have to allocate the 2D-array in the grid,
     then you have to fill it with values from the file
     */

     Grid flowGrid;
     //need to set the rows, columns in this grid and allocate its 2Darray 


     //compute the flow grid 
     computeFlow(elevGrid, &flowGrid);


      //save the flowGrid to a file

}

Test grids

Test grids are available here.

brunsdem.asc
kaweah.asc
sierra.asc
test1.asc
test2.asc
srtm_21_05.asc
srtm_22_04.asc
Another site where you can download (more recent) data is: here. This is SRTM data collected by NASA SRTM mission in 2000 and released for free for research and educational purposes. It covers the entire world at 3 arc second resolution (approx. 90m). It comes in tiles. Download a tile in a location that you like.

Submitting your Work

You will e-mail me a tar file with the following files:

Readme file. A simple file named README.txt with your name and a paragraph telling me and the grading TAs how to compile and use your program.
Your program: .c and .h file. A tar file is the concatenation of one or more files. More about the tar file format:
- http://en.wikipedia.org/wiki/Tar_file_format
- http://www.computerhope.com/unix/utar.htm

Last modified: Wed Nov 30 11:04:37 EST 2011