Project4: The total viewshed: An experimental evaluation using Bowdoin's HPC grid


grids

Overview

The goal of this project is to compute the total viewshed on a grid terrain, parallelize it using OpenMP, and assess the effect of this parallelization using the multicore servers on Bowdoin's HPC grid; You will also explore how using row-major grid layout versus blocked grid layout plays in the overall performance. You will describe your results in a paper.

Outline

Bowdoin HPC grid

The Bowdoin Computing Grid (also known as "The Grid") is a group of Linux servers which appears as one big, multiprocessor server that can run parallel jobs concurrently. As of Fall 2017, the Grid has a number of serves totaling 1088 CPU cores (The most recent servers added have 32 CPU cores and 256 GB of RAM). Check out Bowdoin's HPC website.

The servers in the grid are not interactive machines --- you cannot interact with them the same as you interact with dover and foxcroft, or with your laptop. The Grid is setup to run batch jobs only (not interactive and/or GUI applications).

The servers on the Grid run the Sun Grid Engine (SGE), which is a software environment that coordinates the resources on the grid. The grid has a headnode which accepts jobs, puts them in a waiting queue until they can be run, sends them to the computational node(s) on the grid to run them, manages them while they run, and notifies the owner when the job is finished. This headnode is a machine called moosehead. To interact with The Grid you need to login to the Grid headnode "moosehead.bowdoin.edu" via an SSH client program.

ssh moosehead.bowdoin.edu

Moosehead is an old server which was configured to run the Sun Grid Engine and do whatever a headnode is supposed to do: moosehead accepts jobs, puts them in a queue until they can be executed, sends them to an execution machine, manages them during execution, and logs the record of their execution when they are finished.

Moosehead runs linux so in principle you can run on it anything that you could run on dover. However DJ (the sysadmin, and Director of Bowdoin's HPC Grid) asks that you don't. Moosehead is an old machine. Use it only to submit jobs to the grid and to interact with the grid. Do the compiling, developing and testing somewhere else (e.g. on dover).

The Grid uses the same shared filespace as all of the Bowdoin Linux machines, so you can access the same home directory and data space as with dover or foxcroft (if you need to transfer files from a machine that is not a part of the Bowdoin network, use scp from your machine to dover or foxcroft first).

Running jobs on the grid

Below I am including a summary of the commands we'll use to start. For more detailed information on how to interact with the grid check out the website maintained by DJ, which should be be your go-to page.

To submit to the grid you have two options:

  1. Use hpcsub.
  2. Create a script and use qsub.

Submit using hpcsub.

The command hpcsub will allow you to submit any single commands to the grid. For example:
ssh moosehead
cd [directory-where-your-code-is-compiled]
hpcsub -pe smp 8 -cmd [your-code] [arguments to pass to the program]
The arguments -pe smp 8 are optional (but, if you are running OpenMP code, you should use them). They specify that your code is to be run in the SMP environment, with 8 cores (here 8 is only an example, it can be any number you want).

For example, if I want to run hellosmp that we talked about in class (which you can find here) using 8 CPU cores in the SMP environment, I would do:

ssh moosehead
[ltoma@moosehead:~]$ pwd
/home/ltoma
[ltoma@moosehead:~]$ cd public_html/teaching/cs3225-GIS/fall17/Code/OpenMP/
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ ls
example1.c  example2.cpp  example3.c  example4.c   hellosmp  hellosmp.c  hellosmp.h  hellosmp.o Makefile
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ hpcsub -pe smp 8 -cmd hellosmp
Submitting job using:
qsub -pe smp 8 hpc.10866
Your job 236150 ("hpc.10866") has been submitted

The headnode puts this job in the queue and starts looking for 8 cores that are free. When 8 cores become available, it assigns these 8 cores to your job. While your job is running no other job can use the 8 cores that it got assigned---- they are exclusively yours while your job runs. To check the jobs currently in the queue, do:

qstat
To check on all jobs running on the cluster, type
qstat -u "*"
For a full listing of all jobs on the cluster, type
qstat -f -u "*"
To display list of all jobs belonging to user foo, type
qstat -u foo
After I submit a job I usually check the queue:
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236150 0.00000 hpc.10866  ltoma        qw    10/12/2016 15:53:20                                    8        
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236150 0.58278 hpc.10866  ltoma        r     10/12/2016 15:53:27 all.q@moose15                      8        
Note how the job initially shows as "qw" (queued and waiting) and then changes to "r" (running).

When the job is done you will get an email. If you list the files, you will notice a new file called "hpc.[job-number].xxx". This file represents the standard output for your job ---- all the print commands are redirected to this file.

[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ 
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ ls
example1.c  example2.cpp  example3.c  example4.c   hellosmp  hellosmp.c  hellosmp.h  hellosmp.o   hpc.10866.o236150  Makefile

[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ cat hpc.10866.o236150
I am thread 1. Hello world!
I am thread 2. Hello world!
I am thread 7. Hello world!
I am thread 0. Hello world!
I am thread 5. Hello world!
I am thread 6. Hello world!
I am thread 4. Hello world!
I am thread 3. Hello world!
  

Submit using qsub.

A more general way to submit jobs is via a script. You will need to create a script to run your programs on the grid. A sample script myscript.sh might look like this:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -M (my_login_name)@bowdoin.edu -m b -m e

./hellosmp 
To submit your job to the grid you will do:
ssh moosehead 
cd [folder-containing-myscript.sh]
qsub myscript.sh
Example:
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$  cat myscript.sh 
#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -M ltoma@bowdoin.edu -m b -m e

#./hellosmp 
./example1
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qsub myscript.sh
Your job 236154 ("myscript.sh") has been submitted
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236154 0.00000 myscript.s ltoma        qw    10/12/2016 16:00:17                                    1        
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236154 0.50500 myscript.s ltoma        r     10/12/2016 16:00:27 all.q@moose22                      1        
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ 
Note how your job went from "qw" to not in the queue (basically it ran and finished so fast that we could not see it).

Each job creates a file by appending the job number to the script. In our case this is a file called "myscript.sh.o[job-number]". These .o* file will be the equivalent to what you would see on the console if running the program interactively.

[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ ls 
example1             example3.c           hellosmp             hellosmp.o             myscript.sh          
example1.c           example4.c           hellosmp.c            example2.cpp         hello 
hellosmp.h           hpc.10866.o236150    Makefile   myscript.sh.o236154  
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ cat myscript.sh.o236154 
Hello World from thread 0
There are 1 threads
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$

Looking at the output we see that example1 was run with just one thread. That's because when we submitted we did not specify that we wanted SMP and how many threads we wanted, so we got whatever the default is (which is no threads). When running OpenMP code you need to submit using arguments -pe smp [numberthreads]. For example:

[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qsub -pe smp 8 myscript.sh
Your job 236155 ("myscript.sh") has been submitted
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat 
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ cat myscript.sh.o23615
Hello World from thread 5
Hello World from thread 1
Hello World from thread 0
There are 8 threads
Hello World from thread 6
Hello World from thread 7
Hello World from thread 2
Hello World from thread 4
Hello World from thread 3
Ah, that's better.

Using a machine exclusively

If you run a job in the SMP environment requesting, say, 8 cores, the headnode will look for one machine that has 8 core available. The other cores on the same machine may be already in use, or if not, may be given to other jobs in the future. Since all cores on a machine share the memory and also some caches, there will be some competition among the threads. The timing of your job will be impacted by what else is running on the same machine.

If you are running an experimental analysi sand youc are about the timings, you want to request that the whole machine is yours, even if your job is only going to use x processors. You can do that by including flag excl=true:

[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qsub -l excl=true -pe smp 8 myscript.sh
Your job 236157 ("myscript.sh") has been submitted
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236157 0.00000 myscript.s ltoma        qw    10/12/2016 16:05:15                                    8        
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 236157 0.60500 myscript.s ltoma        r     10/12/2016 16:05:27 all.q@moose22                      8        
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ qstat
[ltoma@moosehead:~/public_html/teaching/cs3225-GIS/fall17/Code/OpenMP]$ 

The paper

You need to write a paper that describes the project. The paper should be structured like a research paper:
  1. Introduction/background

    (Briefly) describe the total viewshed problem, what the project is doing, and why (to support the why, bring in the running time of the total viewshed on one processor).

  2. Our approach

    Here you'll want to say that you use OpenMP which conveniently provides a parallel for loop; the important part is to describe the details on your parallel for loop.

  3. Results: parallelization and speedup

    Describe the experiments you ran to assess the effect of parallelization: include the table with the running times and the plot of the speedup.

    Datasets: Use set1.asc. It would be great if you also ran experiments for kaweah.asc, but since the running times are larger, it's optional.

    For the experiments, include some brief detail on the command you used to submit the jobs that can help us interpret and compare the running times with those of your peers, such as if you used the -excl flag. Also include info on what server ran your job.

    The table: the running time of your code on the grid, with number of cores P = 1, 2, 4, 8, 12, 16, 20, 24, 32, 40, and the speedup obtained in each case (speedup is defined as T1/Tk, where T1 is the time to run with P=1 cores and Tk is the time to run with P=k cores.

    The plot: plot of the speedup function of the number of cores, for set1.asc

    Also include a screenschot of the total viewshed computed by your code on set1.asc (use render2d to render it)

    Discuss your findings.

  4. Results: grid layout

    Describe the experiments you have done to assess the effect of the blocked layout, and discuss yoru findings. Describe how you chose the value of the block.

  5. Conclusion Describe overall conclusions. This is usually an overview, at high-level, of the discussion in section 3 and 4. Feel free to include any personal thoughts.

What to turn in

Push your code and paper in GitHub; bring a hard copy of your paper to class.

Grading

Total 25 points

Enjoy!