Using the grid

To submit jobs to the grid you have to login to moosehead.bowdoin.edu. It's a linux machine that shares the same filesystem as the other linux machines, so you'll see the same files as when you log in to dover.

There are three queues set up, each queue has two machines.

I have made an example script that you can reference for making your own scripts: http://www.bowdoin.edu/~ltoma/teaching/cs345/spring11/Src/grid1.sh It looks like this:

#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -M ltoma@bowdoin.edu -m be

/people/faculty4/cs/ltoma/gridtest/timertest > /people/faculty4/cs/ltoma/gridtest/test1.$QUEUE.$JOB_ID

When making your own scripts, you will want to change the line "#$ -M ltoma@bowdoin.edu -m be" to "#$ -M yourlogin@bowdoin.edu -m be" . This will give you e-mail notifications of the job start and end with stats.

The two variables in the filenames are $QUEUE which is the queue that it is running in (ie, cs512m), and $JOB_ID, which is a unique job number within the Grid system. You also should use full paths to the files rather than the shortcut "~". This example runs timertest once, and redirects its output (the stuff that is usually printed to the screen when program runs) into a file called test1.$QUEUE.$JOB_ID. .

To run a job on the grid:

qsub -q cs1g grid1.sh
or 
qsub -q cs1512m sierra.sh
or 
qsub -q cs4g grid1.sh

You can submit many jobs at once, and then logout and go home. The Grid system will queue them up and run them one after the other, in the order received. There is no need to remain logged in while the jobs are running. You will receive an e-mail notification when each job starts, and another when the job finishes, along with statistics about the job. Once the jobs are done, you can then login to any of the Linux machines and find the results in the output files located in the directory that you ran the scripts.

Since moosehead is a fairly minimal machine, please do not run any programs on it except for the Grid "q" commands (qsub, qstat, etc), and only use it to submit the Grid jobs and check on their status. Since it shares the same filesystem as the other Linux machines, you can edit files, tweak programs, etc on a machine like dover or tuna, and then just use moosehead to submit the job for processing.

If you wish to see the jobs that are running, you can use the command: qstat -f -u "*"

The quick info is:

  1. login using SSH to moosehead.bowdoin.edu
  2. cd to the directory containing your Grid script
  3. Use the qsub command to submit the Grid script.

More detailed information about the Grid environment can be found at . Here is a sample of what I did:

[ltoma@moosehead:~]$ pwd
/people/faculty4/cs/ltoma
[ltoma@moosehead:~]$ cd gridtest/
[ltoma@moosehead:~/gridtest]$qstat -q cs512m grid1.sh
Your job 2875 ("grid1.sh") has been submitted
[ltoma@moosehead:~/gridtest]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
   2874 0.00000 grid1.sh   ltoma        qw    02/17/2011 15:06:15                                    1        
[ltoma@moosehead:~/gridtest]$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@moose2                   BIP   0/0/8          0.00     lx24-amd64    
---------------------------------------------------------------------------------
all.q@moose3                   BIP   0/0/8          0.00     lx24-amd64    
---------------------------------------------------------------------------------
all.q@moose4                   BIP   0/0/4          0.00     lx24-amd64    
---------------------------------------------------------------------------------
cs1g@moosecs3                  BI    0/0/1          0.00     lx24-amd64    
---------------------------------------------------------------------------------
cs1g@moosecs4                  BI    0/0/1          0.00     lx24-amd64    
---------------------------------------------------------------------------------
cs4g@moosecs1                  BI    0/0/1          0.01     lx24-amd64    
---------------------------------------------------------------------------------
cs4g@moosecs2                  BI    0/0/1          0.00     lx24-amd64    
---------------------------------------------------------------------------------
cs512m@moosecs5                BI    0/0/1          0.00     lx24-amd64    
---------------------------------------------------------------------------------
cs512m@moosecs6                BI    0/0/1          0.00     lx24-amd64    
---------------------------------------------------------------------------------
mtest4.q@moose4                BIP   0/0/4          0.00     lx24-amd64    
---------------------------------------------------------------------------------
mtest8.q@moose3.bowdoin.edu    BIP   0/0/8          0.00     lx24-amd64    
[ltoma@moosehead:~/gridtest]$ 
[ltoma@moosehead:~/gridtest]$ ls -la
total 72
drwxr-xr-x  2 ltoma cs  4096 Feb 17 15:08 .
drwx-----x 37 ltoma cs 12288 Feb 17 15:06 ..
-rwxr-xr-x  1 ltoma cs   180 Feb 17 15:08 grid1.sh
-rwxr-xr-x  1 ltoma cs   162 Feb 17 15:04 grid1.sh~
-rw-r--r--  1 ltoma cs     0 Feb 17 15:08 grid1.sh.o2875
-rwxr-xr-x  1 ltoma cs   405 Feb 17 14:57 Makefile
-rwxr-xr-x  1 ltoma cs  3046 Feb 17 14:57 rtimer.c
-rwxr-xr-x  1 ltoma cs  5588 Feb 17 14:57 rtimer.h
-rw-r--r--  1 ltoma cs  3040 Feb 17 14:57 rtimer.o
-rw-r--r--  1 ltoma cs   350 Feb 17 15:08 test1.cs512m.2875
-rwxr-xr-x  1 ltoma cs 10674 Feb 17 14:57 timertest
-rwxr-xr-x  1 ltoma cs  1653 Feb 17 14:57 timertest.c
-rw-r--r--  1 ltoma cs  5816 Feb 17 14:57 timertest.o
[ltoma@moosehead:~/gridtest]$ 

Last modified: Thu Feb 17 15:20:01 EST 2011