- Render a terrain in OpenGL:
Chance to learn and play with OpenGL features, such as shading and
textures, lights and animation. Allow the user to switch between 2D
and 3D views, and change camera angles. This will look really nice.
The other great thing about this project is that all other projects
will be able to use it as a visualization component for their
output, so the entire class will love you.
- Flow directions on flat areas: A flat area is a
connected part of the terrain consisting of cells that could not be
assigned flow direction. Extend your flow assignment to assign flow
directions across flat areas (to start: plateaus only; ideally both
plateaus and sinks) and study the qualitative difference that this
makes in terms of the river network. For example you could collect
stats on what percentage of cells are part of flat areas; how many
plateaus and sinks are in the terrain; and how many river trees (a
river tree is a connected component of the FD graph) are in the
terrain before and after routing flow on flat areas.
Assigning flow on flat areas involves:
- start by assigning FD of each cell in the grid towards its steepest
- the cells that cannot be assigned FD this way are part of flat
areas; determine the "connected components" of unassigned cells;
each one of these corresponds to a flat area.
- determine the cells lieing on the boundary of each flat area,
and from here infer whether the flat area is a plateau or a sink
- for each plateau, find its spill points and run a multi-source
BFS starting from its spill points. Each source will direct towards
it the ground that it reaches first.
- Alternate approach to flooding? A sink is a flat
area surrounded by higher grounds (no spill points on the
boundary). After routing flow on plateaus, the only remaining areas
with unassigned flow direction (FD) are the sinks.
Given a sink s, consider the set of all the points in the
terrain that flow into it: This is called the watershed of the
sink. If you drop a droplet of water anywhere in the watershed of s,
it will reach s. Every point on the terrain belongs to the watershed
of some sink. Thus the terrain can be partitioned into watersheds.
Each sink watershed will have its own river tree, which will be a
connected component of the FD graph.
In reality, a terrain might have a lot of sinks, many of them
spurious and due to tiny data errors. The problem is that sinks
"interrupt" the flow of water and do not let water flow to the
Sinks are handled by simulating flooding the terrain: when a
very large amount of water is poured over the terrain, the water
will start accumulating in sinks and rising. Consider two adjacent
watersheds: as the water level rises, it will eventually reach the
lowest point on th eboudnary of the two watersheds and will cause
the two watersheds to merge. The process continues until all
watersheds find a path to the outside. At that point the level of
the water has reached steady state and does not increase anymore.
In the process of flooding one computes the order in which
watersheds merge and the final raised level of water in each
watershed. If we imagine lifting each cell in the terrain to the
raised elevation of its watershed, this gives a terrain with no
sinks where every path has a flow path to the outside. It can be
shown that assigning FD via flooding corresponds to routing each
cell in the direction of its lowest path towards the outside, where
the height of a path is the height of teh highest cell along the
And this brings us to the point of the project: Perhaps you can
think of a different way to route water out of sinks, that does not
use flooding. Remember the goal is to compute, for each point, its
lowest path to the boundary of the terrain. Some AI heuristics?
Study the qualitative difference that routing FD out of sinks
makes to the river network; that is, look at the river network with
and without routing water out of the sinks. For example you could
collect stats on how many different sinks are in the terrain, and how
many river trees (a river tree is a connected component of the FD
graph) before and after.
- Compute main rivers with Pfafstetter labeling:
Assume you start with a flow direction (FD) grid that assigns flow
on flow areas such that there are no cycles (FD can be computed in
one of the GIS software). Compute the partition of the terrain into
watersheds --- basically this means compute the connected components
of the FD graph. For each river tree, compute the main river and
its main affluents. This can be done by following a path from the
mouth of the river network up, and whenever reaching a point with
more than one in-coming FD, follow towards the one with larger flow
accumulation. This computes the backbone of the main river.
Determine the four largest tributaries and their watersheds.
Here you may want to go in the full recursion (ie do this for the
affluents), or not --- just the top level or perhaps the top two
levels is fine.
- Rising seas: You start with an elevation grid of
an area on the coast. You can look into answering questions such
as: What will happen when the water level rises 10ft? What will be
There is a lot of interest in this topic. This will be a
very nice project.
- Flow accumulatio on big data:
Investigate the efficiency of flow accumulation algorithms on very
large grids, and come up with a more efficient approach. The data
directory on microwave contains grids up to 7 billion points.
- Parallel flow accumulation? Investigate a parallel
algorithm for computing flow accumulation that will take advantage of
the multiple cores on any modern machine.
- Flooding simulation Visualize a simulation of
- Parallel multi-viewshed: Some applications need the
computation of viewsheds from every point in the terrain as
viewpoint. More precisely, we want to compute a multiviewshed grid,
which stores, at point (i,j), the size of the viewshed of point
(i,j). For a grid of size n, a single viewshed computation as we
discussed in class takes O (n \sqrt n), and thus computing it from all
points takes O(n^2 \sqrt n) time. In practice, for modest value of n,
a quadratic algorithm is not feasible.
The first step in this project would be to extend your viewshed
assignment to compute a multiviewshed grid. Run experiments to see
how long it takes on set1.asc (which has less than 200k points) ---
The second step will be to write a parallel implementation. To
start, use threads. Eventually, you'll want to use MPI and run it on
The problem is embarassingly parallel (can compute each viewshed
separately) so the focus of this project will be working with MPI and
a distributed environment. The speed-ups will be impressive! Very
- Approximate viewshed: Come up and implement a
different approach for computing the viewshed of a point on a grid
terrain that runs faster than the straightforward algorithm (which
runs in O( n \sqrt n)). You'll need new ways to think about it and
introduce some approximation.
- Speeding up multi-viewshed computation: This project
would start by computing a multi-viewshed grid using the
straightforward algorithm; and then doing everything you can to speed
up the implementation and bring it down by a factor of, say,
10. You'll need new ways to think about it and perhaps introduce
I see a few ways to approach this:
All our tests will be on set1. I do have the correct multi-viewshed
grid computed for set1 (yes, it took ~30 hours).
- you compute an approximate viewshed count for each point in the
- you compute an exact viewshed count for some points in the grid and
interpolate the other values somehow
- a combination of the two
Note for optimization: large data is not a problem here. memory is
cheap. you have at least 4GB of RAM, and 2 grids of 180k elements.
there is no IO. CPU is important. optimize CPU. any cycle is
The quality of the multi-viewshed grid will depend on how we
measure the quality of viewsheds, and on how long you let it
run. Naturally, the longer you are willing to let your program run,
the more precise results you can get. You'll have to trade off
quality-vs-time somehow. Assume you have: a. 1 minute b. 10 minutes
c. one hour of CPU time. What's the best multi-viewshed grid you can
get in these times?
A related problem is how to quantify how good an approximate
multi-viewshed grid (AMV) is. What is a good metric? Think of the
two grids as surfaces. What is a good metric for the distance between
two surfaces? Let's call this d(AMV, MV).
- the number of point2point differences between AMV and MV
- the number of p2p differences weighed by how big they are
- the sum of differences ( we need to take absolute values of the differences so
that badness does not cancel out)
- the largest p2p difference
- the square root of the sum of the squares of the differences
- or maybe think of the 2 grids as vectors in a vector space?
- or think of the 2 grids as pixelized images. this problem must have been considered
in vision/image processing -- quantifying the difference between images.
- Viewshed on big data: Investigate the performance
of the standard viewshed algorithm on very large grids (the data
directory contains grids up to 28GB) and come up with a more efficient
approach on big data. The idea here is to optimize the IO of the
algorithm, not the CPU.
- Horizon-based viewshed: Explore a horizon-based
algorithm for the computation of viewshed. Although this algorithm is
a little harder to understand and to implement, it has the advantage
that it is faster in practice, and it can be extened to deal with
large data rather easily.
- Grid to TIN simplification: Implement incremental
refinement and try to make it run as fast as possible. We'll time how
long it takes to simplify say set1.asc down to error 5%.
- to be continued
- Shortest path grids: Assume you have a cost
grid that gives you the cost of traversing each cell in a grid,
and a source grid that stores a set of sources (a cell is 1 if
its a source, and 0 otherwise). Compute a shortest path grid where
each cell stores the least cost of reaching one of the sources from
that cell, traveling via the cost grid.
- Shortest paths with TIGER data: Download US TIGER
data and implement shortest paths. Foe example the user could input a
start point and an end point and you would compute the shortest route
from start to end. This project gives you very nice opportunity to
explore the performance of your code--- the goal is to get to compute
SP queries online, inn real time ---- in a few seconds per query, like
Bingo or GoogleMaps.
- LIDAR to TIN simplification: to be continued.
- Space filling curves for storing grids --- impact on
flow accumulation or viewshed algorithms: to be continued.
- Visualize large terrains using LOD and quadtrees: to be continued.