[Bowdoin Computer Science]

CS 340 Spring 2008: Project 1

In this project you will start working with TIGER/Line data (TIGER = Topologically Integrated Geographic Encoding and Referencing system). This represents a database of line segments covering

of all counties in the U.S. This database is released and updated yearly by U.S. Census Bureau. We will work with 2006 TIGER/Line files.

Read carefully the File Overview section then scroll down to the end of the page and download data for your desired state. If you click on Maine, for instance, you will see a list of all counties in Maine and a zip file containing data for each county. You will need to download all these data, collect it into a directory, unzip it, etc.

To understand the file structure and data format you will need to read the documentation:

Please try to avoid printing as much as possible --- the full document has 300+ pages!

To get started, take a look at the associated FIPS codes for state and counties, given in Appendix A. For Maine, the state code is 23, and the counties codes are as follows:

Now download the data for Maine. Note that for each county in Maine there is a corresponding .zip data file. For instance, tgr23005.zip contains TIGER/Line data for Cumberlands county.

Data in the TIGER/Line files includes roads, hydrography, railroads, boundary lines and miscellanous features. All these are topologically linked and create a topologically consistent network.

The information about spatial objects is organized in records files. The 2006 TIGER/Line data consists of 19 (check this) record types that collectively contain attributes like address ranges and ZIP codes, street names, classification codes, latitude and longitude coordinates and so on.

Each county file will expand into a folder containing up to 19 files, each file corresponding to a record type (The file extension will tell you the record type). Some counties do not require all of the 19 record types and will have less than 19 files. For instance, the data for Cumberlands county looks like this:

The file TGR23005.RT1 contains record-type-1 data, TGR23005.RT2 contains record-type-2 data, and so on.

Users of TIGER/Line files need to link information from several record types to find all attributes of interest that belong to one spatial object. For instance, RT1 contains address ranges and ZIP codes, RT2 contains latitude and longitude coordinate values for all points on a chain (road) that is not a straight line, RT7 described landmark features, and so on--- For an overview of the record types and what information they contain read Chapter 1 in the Technical Documentation.

For this project, you will need to work with records/files RT1 and RT2. Together they describe chains (roads). Each chain has an unique ID and may be a straight line or may have several points in between (polyline). You can find a documentation of these in the technical documentation, Chapter 1.

The start and end points of a chain are called nodes. The intermediate points on a chain are called shape points. Remember, all chains are topologically linked and create a topologically consistent network. That is, any intersection of two segments, of whatever type, is marked with a node. In other words, the chains define a planar graph.

Record Type 1 contains a single record for each unique chain. Each record contains the chain ID (TLID), the feature type (road, hydrography, railroad, etc), the coordinates of the start and end nodes of the chain, and others.

Record Type 2 gives the coordinates of the intermediate points (the shape points) on the chains referenced by the feature ID given in Record Type 1.

The feature type in RT1 is encoded as a Census Feature Class Code (CFCC). CFCCs are described in Chapter 3. The CFCC is a three digit code. The first digit is a letter that represents the major type: A for roads, B for railroad, C for pipeline,..., H for hydrography. The following two digits are numbers, describing further the feature type. CFCC A11, for instance, means primary road, A34 secondary road, and so on. Thus, all roads beging with letter A, railroads beging with letter B, and so on.

More detailed information about chains and record types 1 and 2 is found in Chapter 3 in the documentation. It is pointed out that "Plotting a complete chain requires using the nodes from Record Type 1, and all of the shape points records in Record Type 2 with the same TLID, if any. Plot the start node first, then search RT2 for any matching records, and ...finally plotting the end node from RT1".

Now to see the exact formatting of RT1 and RT2, refer to the data record formats in Chapter 6. To extract just road data you would pay attention to the following fields/offsets:

In Record Type 1:

field name, offset start-end, length, decription
...
TLID,	 6, 15,   10      Tiger/LINE ID
CFCC,	56,58,    3      Census Feature Class Code
...
FRLONG,	191,200, 10,     Start Longitude
FRLAT,	201,209,  9,     Start Latitude
TOLONG,	210, 219,10,     End Longitude
TOLAT ,	220,228   9,     End Latitude
...

In Record Type 2:

field name, offset start-end, length, decription
TLID ,    6, 15,  10, Tiger/LINE ID 
RTSQ ,   16,  18, 3, Record Sequence Number
LONG1,   19, 28, 10, Point 1 Long.
LAT1 ,   29, 37,  9, Point 1 Lat
LONG2,   38, 47, 10, Point 2 Long.
LAT2 ,   48, 56,  9, Point 2 Lat. 
....

The records are in ASCII, so you can just open up the RT1 file and RT2 file and see some examples. For instance, open up TGR23005.RT1 and go to line 1.

11106  78043067 J  Tulip                         Ave   A41         65         99         64         9810100403204032              23230050052652526525                    00450000450060106010 -70099664+43892462 -70100631+43891798
You can go to the various offsets and see that the following attributes:
TLID=  78043067
CFCC=A41 (Local, neighborhood, and rural road, city street, unseparated)
FRLONG=-70099664 (meaning 70.099664 degrees West)
FRLAT=+43892462  (meaning 43.892462 degrees North)
TOLONG=-70100631
TOLAT=+43891798
...
To read about coordinates of node and shape points go to Chapter 3.

Then open file TGR23005.RT2 and search for TLID= 78043067 to find the intermediate points. You will find this line:

21106  78043067  1 -70099751+43892365 -70099884+43892260 -70099945+43892206 -70100216+43891991 -70100344+43891898 -70100456+43891834 -70100541+43891806+000000000+00000000+000000000+00000000+000000000+00000000
The first coordinate in this list corresponds to the first shape point in the chain, which is:
LONG1=-70099751
LAT1=+43892365
...and so on. The last two fields in this line are zero.

If they were non-zero, it would mean that this (chain) road is further continued in the next entry. Note that the first entry in the RT2 file has a RTSQ=1 indicating this is the first set of intermediate coordinates. The next entry (if there were one) would also have TLID= 78067476, but RTSQ=2, and would give the remaining coordinates in the chain.

So, if you wanted the chain for this whole road, you would start with the start node in RT1

FRLONG=-70099664
FRLAT=+43892462 
Sequentially list the points with the matching TLID and RTSQ=1 from the RT2 file
LONG1=-70099751
LAT1=+43892365
...

And possibly continue with remaining points in the next RT2 entry with RTSQ=2. And finish with the end node from RT1
 
TOLONG=-70100631
TOLAT=+43891798
So conceptually it is easy, but in practice it will be a bit messy to get all data ....but, you can easily (don't you love it when people say easily?) extract the fields from RT1 and RT2 files for each county.

OK...now coming back to the project: display in OpenGL the roads, railroads, and hydrography in an entire state in the U.S. of your choice.

To save, use default file names and the following format: chain TLID followed by number of points in the chain and a list of all points in the chain, in order. For instance:
475648756 
3
x1 y1
x2 y2
x3 y3
398568569
5
x1 y1
x2 y2
...
This says that chain with TLID=475648756 has 3 points etc. In order that we are able to check for consistency, I ask that you save the chain in order of TLID.

Some questions before you start:

I suggest you approach this project (any project) incrementally, that is, get something that works and gradually refine it.

For instance, start by ignoring RT2 and displaying each chain as a segment between the two endpoints using only the info in RT1. Only after you got it all working, add the intermediate shape points from RT2.

Also, if you were thinking of building an index on RT2 to help searching, refrain from doing it from the beginning, wait until you have something that works, even though slow, and add the index afterwards.

We (that is, you) will build on this project further in the comings weeks, so code neatly, and...think before you start!

The most important part in this project is not whether it works, but how you approach it and how you structure your code; and how painful it is for you to get it done. Think of the structure and order in which you do things before you start. Add a small feature at a time. Write short functions. Define good data types. Separate your code into modules --- at least two in this case (one for graphics, one for handling the data); each should have its header.

Code nicely and clearly. This is not like a 210 lab, where you suffer for a week and then it's gone; it's much longer. And you'll build on it for the rest of the semester. You'll see that it can get painful fast.

If you get stuck on a piece of code and don't see the problem, email me. I very much prefer to help you debug and see your progress later, then suffer through bad code and unhappy faces.

What to hand in: Leave your code in your folder in microwave:/courses/csci340/. I'll look at it there.

I put some TIGER data in microwave:/courses/csci340/Materials/.

Start early, don't procrastinate. Good luck!