Project ideas

Render a terrain in OpenGL: Chance to learn and play with OpenGL features, such as shading and textures, lights and animation. Allow the user to switch between 2D and 3D views, and change camera angles. This will look really nice. The other great thing about this project is that all other projects will be able to use it as a visualization component for their output, so the entire class will love you.
Flow directions on flat areas: A flat area is a connected part of the terrain consisting of cells that could not be assigned flow direction. Extend your flow assignment to assign flow directions across flat areas (to start: plateaus only; ideally both plateaus and sinks) and study the qualitative difference that this makes in terms of the river network. For example you could collect stats on what percentage of cells are part of flat areas; how many plateaus and sinks are in the terrain; and how many river trees (a river tree is a connected component of the FD graph) are in the terrain before and after routing flow on flat areas.
Assigning flow on flat areas involves:
1. start by assigning FD of each cell in the grid towards its steepest downslope neighbor
2. the cells that cannot be assigned FD this way are part of flat areas; determine the "connected components" of unassigned cells; each one of these corresponds to a flat area.
3. determine the cells lieing on the boundary of each flat area, and from here infer whether the flat area is a plateau or a sink
4. for each plateau, find its spill points and run a multi-source BFS starting from its spill points. Each source will direct towards it the ground that it reaches first.
Alternate approach to flooding? A sink is a flat area surrounded by higher grounds (no spill points on the boundary). After routing flow on plateaus, the only remaining areas with unassigned flow direction (FD) are the sinks.
Given a sink s, consider the set of all the points in the terrain that flow into it: This is called the watershed of the sink. If you drop a droplet of water anywhere in the watershed of s, it will reach s. Every point on the terrain belongs to the watershed of some sink. Thus the terrain can be partitioned into watersheds. Each sink watershed will have its own river tree, which will be a connected component of the FD graph.
In reality, a terrain might have a lot of sinks, many of them spurious and due to tiny data errors. The problem is that sinks "interrupt" the flow of water and do not let water flow to the bigger rivers.
Sinks are handled by simulating flooding the terrain: when a very large amount of water is poured over the terrain, the water will start accumulating in sinks and rising. Consider two adjacent watersheds: as the water level rises, it will eventually reach the lowest point on th eboudnary of the two watersheds and will cause the two watersheds to merge. The process continues until all watersheds find a path to the outside. At that point the level of the water has reached steady state and does not increase anymore. In the process of flooding one computes the order in which watersheds merge and the final raised level of water in each watershed. If we imagine lifting each cell in the terrain to the raised elevation of its watershed, this gives a terrain with no sinks where every path has a flow path to the outside. It can be shown that assigning FD via flooding corresponds to routing each cell in the direction of its lowest path towards the outside, where the height of a path is the height of teh highest cell along the path.
And this brings us to the point of the project: Perhaps you can think of a different way to route water out of sinks, that does not use flooding. Remember the goal is to compute, for each point, its lowest path to the boundary of the terrain. Some AI heuristics?
Study the qualitative difference that routing FD out of sinks makes to the river network; that is, look at the river network with and without routing water out of the sinks. For example you could collect stats on how many different sinks are in the terrain, and how many river trees (a river tree is a connected component of the FD graph) before and after.
Compute main rivers with Pfafstetter labeling: Assume you start with a flow direction (FD) grid that assigns flow on flow areas such that there are no cycles (FD can be computed in one of the GIS software). Compute the partition of the terrain into watersheds --- basically this means compute the connected components of the FD graph. For each river tree, compute the main river and its main affluents. This can be done by following a path from the mouth of the river network up, and whenever reaching a point with more than one in-coming FD, follow towards the one with larger flow accumulation. This computes the backbone of the main river. Determine the four largest tributaries and their watersheds.
Here you may want to go in the full recursion (ie do this for the affluents), or not --- just the top level or perhaps the top two levels is fine.
Rising seas: You start with an elevation grid of an area on the coast. You can look into answering questions such as: What will happen when the water level rises 10ft? What will be under water?
There is a lot of interest in this topic. This will be a very nice project.
Flow accumulatio on big data: Investigate the efficiency of flow accumulation algorithms on very large grids, and come up with a more efficient approach. The data directory on microwave contains grids up to 7 billion points.
Parallel flow accumulation? Investigate a parallel algorithm for computing flow accumulation that will take advantage of the multiple cores on any modern machine.
Flooding simulation Visualize a simulation of flooding (OpenGL).
Parallel multi-viewshed: Some applications need the computation of viewsheds from every point in the terrain as viewpoint. More precisely, we want to compute a multiviewshed grid, which stores, at point (i,j), the size of the viewshed of point (i,j). For a grid of size n, a single viewshed computation as we discussed in class takes O (n \sqrt n), and thus computing it from all points takes O(n^2 \sqrt n) time. In practice, for modest value of n, a quadratic algorithm is not feasible.
The first step in this project would be to extend your viewshed assignment to compute a multiviewshed grid. Run experiments to see how long it takes on set1.asc (which has less than 200k points) --- expect hours!
The second step will be to write a parallel implementation. To start, use threads. Eventually, you'll want to use MPI and run it on the grid. The problem is embarassingly parallel (can compute each viewshed separately) so the focus of this project will be working with MPI and a distributed environment. The speed-ups will be impressive! Very cool project.
Approximate viewshed: Come up and implement a different approach for computing the viewshed of a point on a grid terrain that runs faster than the straightforward algorithm (which runs in O( n \sqrt n)). You'll need new ways to think about it and introduce some approximation.
Speeding up multi-viewshed computation: This project would start by computing a multi-viewshed grid using the straightforward algorithm; and then doing everything you can to speed up the implementation and bring it down by a factor of, say, 10. You'll need new ways to think about it and perhaps introduce some approximation.
I see a few ways to approach this:
1. you compute an approximate viewshed count for each point in the grid
2. you compute an exact viewshed count for some points in the grid and interpolate the other values somehow
3. a combination of the two
All our tests will be on set1. I do have the correct multi-viewshed grid computed for set1 (yes, it took ~30 hours).
Note for optimization: large data is not a problem here. memory is cheap. you have at least 4GB of RAM, and 2 grids of 180k elements. there is no IO. CPU is important. optimize CPU. any cycle is important.
The quality of the multi-viewshed grid will depend on how we measure the quality of viewsheds, and on how long you let it run. Naturally, the longer you are willing to let your program run, the more precise results you can get. You'll have to trade off quality-vs-time somehow. Assume you have: a. 1 minute b. 10 minutes c. one hour of CPU time. What's the best multi-viewshed grid you can get in these times?
A related problem is how to quantify how good an approximate multi-viewshed grid (AMV) is. What is a good metric? Think of the two grids as surfaces. What is a good metric for the distance between two surfaces? Let's call this d(AMV, MV). Some ideas:
- the number of point2point differences between AMV and MV
- the number of p2p differences weighed by how big they are
- the sum of differences ( we need to take absolute values of the differences so that badness does not cancel out)
- the largest p2p difference
- the square root of the sum of the squares of the differences
- or maybe think of the 2 grids as vectors in a vector space?
- or think of the 2 grids as pixelized images. this problem must have been considered in vision/image processing -- quantifying the difference between images.
Viewshed on big data: Investigate the performance of the standard viewshed algorithm on very large grids (the data directory contains grids up to 28GB) and come up with a more efficient approach on big data. The idea here is to optimize the IO of the algorithm, not the CPU.
Horizon-based viewshed: Explore a horizon-based algorithm for the computation of viewshed. Although this algorithm is a little harder to understand and to implement, it has the advantage that it is faster in practice, and it can be extened to deal with large data rather easily.
Grid to TIN simplification: Implement incremental refinement and try to make it run as fast as possible. We'll time how long it takes to simplify say set1.asc down to error 5%.
to be continued
Shortest path grids: Assume you have a cost grid that gives you the cost of traversing each cell in a grid, and a source grid that stores a set of sources (a cell is 1 if its a source, and 0 otherwise). Compute a shortest path grid where each cell stores the least cost of reaching one of the sources from that cell, traveling via the cost grid.
Shortest paths with TIGER data: Download US TIGER data and implement shortest paths. Foe example the user could input a start point and an end point and you would compute the shortest route from start to end. This project gives you very nice opportunity to explore the performance of your code--- the goal is to get to compute SP queries online, inn real time ---- in a few seconds per query, like Bingo or GoogleMaps.
LIDAR to TIN simplification: to be continued.
Space filling curves for storing grids --- impact on flow accumulation or viewshed algorithms: to be continued.
Visualize large terrains using LOD and quadtrees: to be continued.