Project 2: Reading a grid and creating a blocked layout

grids

In this second project, the task is to read a grid from file and store it in two layouts: row-major (the one it comes from the file), and a blocked layout (which you'll create).

We'll assume that the grid is in ascii format (more on that below), and its name is specified by the user on the command line. The interface will look like this:

[ltoma@dover:\~] ./readgrid test1.asc
reading grid test1.asc
ncols         5 
nrows         5
xllcorner     271845
yllcorner     3875415
cellsize      30
NODATA_value  -  9999
1 2 3 4 5
6 7 8 9 10 
11 12 13 14 15
16 17 18 19 20 
21 22 23 24 25

creating blocked layout 
testing: 
test1 passed 
test2 passed
..

Grid ascii format

The grid arc-ascii, or simply ascii format is one of the standard formats for raster data in GIS. A grid in ascii format consists of a header, followed by the values in the grid, all in ascii. This is not the most efficient format in terms of space, but because the values are in ascii (as opposed to binary) we can see them.

The header: The first 6 lines in an ascii file represent the header of the grid; first the number of rows in the grid, followed by the number of columns, the geographical coordinates of the lower left corner, the spacing in the grid (assumed to be the same both horizontally and vertically), and a nodata value (this is the value that is used for points where the elevation could not be measured; So a NODATA value in the elevation grid means "unknown"). Below is the header for the grid called set1.asc which you'll find here. All grids in ascii format have a similar header, with all lines in the exact same order. Check out the test grids to get the idea.

ncols         391
nrows         472
xllcorner     271845
yllcorner     3875415
cellsize      30
NODATA_value  -9999
For this lab you probably don't care about the xll and yll and the cellsize, but you have to read them anyways in order to get to the point in the file where you can start reading the elevations.

The data: Following the header there are nrows lines, each line containing ncols values, for a total of nrows * ncols values. These values represent the elevations sampled from the terrain, in row-major order. Remember that a grid terrain is a 2D-array of (elevation) values. The elevations are sometimes integers, sometimes not, so using floats or doubles is more general.

The blocked layout

The grid file contains the values in the grid in row-major order. After reading the grid from the file in this form, you will create a blocked layout.

Note: We will not be using a Morton layout. The issue with the Morton layout is dealing with grids whose size is not a power of two: For grids whose size is a power of two, we know that point (i,j) in the grid will be stored at index zindex(i,j) in the layout. When the grid is not a power of two, we obtain the layout by sorting the points according to their zindex(i,j); however this means that given a point (i,j) we do not know at what index in the layout it is placed. One way around this is to pad grids so that they are always a power of two; this has the obvious drawback of space redundancy.

In this project you'll compute a blocked layout. What this means is that the grid is divided in blocks of predefined size R (define it as a constant in your code); blocks are stored in row-major order, and the points inside each block are stored in row-major order.

Picture below:

General outline

You can use C or C++. Your program should have a main() function that takes the name of the grid on the command line, calls a function that reads the grid in memory, and calls a function that creates the blocked layout, and then calls the testing function(s).

To store all the information of a grid use a struct (if using C) or a class (if in C++). Ideally, this is defined in file Grid.h. My example below is in C, but feel free to use a class instead.

typedef struct _grid {

     int  rows, cols;  // the size of the grid
     .... 
     float* data_rowmajor;   //the values in the grid, in row-major order

     float* data_blocked;    //the values in blocked layout
} Grid;

The idea is that the grid stores the data in two layouts: row-major, and blocked; initially you read the grid from file into the row-major layout. Then you create the blocked layout. In a future project, you will then choose to keep/use one or the other, but not in this project.

The user of the grid should be aware that there are two layouts, but should not know the details of each specific layout; The details of how the layout maps the elements are encapsulated in two functions:

//return the element (i,j) in the matrix from the row-major layout 
float get_rowmajor(Grid g, int i, int j)

//return the element (i,j) in the matrix from the blocked layout 
float get_blocked(Grid g, int i, int j)
The user can request a specific value from the grid by calling these functions. Obviously both functions should return the same element, the only difference is that one may have better locality, and therefore fewer cache misses, than the other.

The main() function will look something like this:

#define R 50  

int main(char** args, int argc) {

      Grid grid; 

      char *gridfname; 
      //read grid name from user

      /* read the elevation grid from file into this structure. Note
         that first you have to read the number of rows and columns,
         then you have to allocate the array data_rowmajor[], then you
         have to fill it with values from the file
       */
         readGridfromFile(gridfname, & grid);
        
       /* print it to see that we got it right  */
       //printGrid(grid);


       /* create the blockedlayout of the grid. Note that this has to
           first allocate the array  data_blocked[]
        */
       crateBlockedLayout(&Grid);


      //unit testing 
      //call functions that are meant to test the correctness of yourcode      
}
The function createBlockedLayout(Grid*) will work something like this:
  for i 
     for j
        copy grid->rowmajor_data[i*ncols+j]  into grid->blocked_data([blockedindex(i,j)]
Even better, you could use the function get_rowmajor(Grid g, int i, int j) :
 ....copy get_rowmajor(grid, i, j)  into grid->data_blocked([blockedindex(i,j)]
)
Even even better, you could write a function set_blocked(grid, i, j,float x) that sets the value at (i,j) in the blocked layout to x.
//set the element (i,j) in the matrix from the row-major layout to x
void set_rowmajor(Grid g, int i, int j, float x)

//set the element (i,j) in the matrix from the blocked layout to x 
void set_blocked(Grid g, int i, int j, float x)

Here you might ask if the set_* functions have the task to keep the two layouts in sync. The answer is NO. Both layouts are there, but the user will use only one. For example the user might decide, after reading the grid, that she wants todo a viewshed computation using the blocked_layout. So she'll free() the rowmajor layout and use only the blocked one.

You will need to write the function blockedindex(i,j) that computes and returns the index of element (i,j) in the block layout.

/* return the index where element (i,j) in the matrix is stored in the
   blocked layout
*/
long blockedindex(Grid g, int i, int j) 

Reading a grid from a file

To read the grid from file you will need to figure out how to work with files in C or C++. Below is a piece of code that opens a file and reads an integer.
FILE* f;
char s[100];
int nrows;

f=fopen("myfile.asc", "r");
if (f== NULL) {
   printf("cannot open file..");
   exit(1);
}

fscanf(f, "%s", s);
printf("read %s from file\n", s);
fscanf(f, "%d", &nrows);
printf("read %d from file\n", nrows);
....
//use fscanf(f, "%s", ..) to skip over stuff you don't need  like xll corner 

//use a loop for i for j that fscanf("%f", ....) and puts it in rowmajor_data at the right place 
To test that you read the right values, write a printGrid (Grid g) method that prints the grid (header and values); it should print the same information that you see in the grid.asc file. You may want to break this function into two: printHeader and printValues.

In addition to the printGrid() method, write a printInfo (Grid g) method that prints the important information about the grid: rows, cols, h_min and h_max.

Unit testing

Whener you write code, you need to write functions that specifically test it. The more functions that test special edge cases, the better. For this project the only issue is the conversion to blocked layout. You need to write a function (or more) to check that the conversion to blocked layout was done properly. That is, we want to go through every element (i,j) in row major order, and make sure that the data in rowmajor at (i,j) is the same as the data in blocked layout at (i,j). Your tester needs to iterate through the grid and make sure the two layouts give the same thing:
      for (i=0; i< nrows, ...)
          for (j=0; j < ncols, ...)
		assert( get_rowmajor(grid, i,j) == get_blocked(grid,i,j))

Another test is that for
#define R 2
and the grid
ncols 3
nrows 4
...
1 2 5 
3 4 6
7 8 11
9 10 12
the blocked layout is :
1 2 3 4 5 6 7 8 9 10 11 12
Another test is testing the case when R=1: in this case the blocked order and rowmajor order coincide, and basically blockedIndex(i,j) should return i*ncols+j.

What to turn in

In additon to pushing the code to your GitHub repo, please bring to class and hand in a sheet of paper that (legibly) states So for example, if I were to turn in my helloworld project I would hand in one sheet with:

Laura Toma 

Github username lauratoma

Worked alone

To clone: https://github.com/lauratoma/helloworld.git

Grading

Total 10 points

Enjoy!


Comments

Below are the diffs between this site as is now, and this site when it was assigned (Tue 9/19).