The Academic Side of Big Data

By Amy Kerr
If you think high performance computing (HPC) seems like an unusual resource to find at a liberal arts college, you might be right. But Dj Merrill, Director of High Performance Computing, sees it as a natural fit and a valuable asset that sets Bowdoin apart from most other traditional liberal arts schools.

Faculty and students that use the HPC environment refer to it as “The Grid”. The Grid is a group of Linux servers that act as a single, huge multiprocessor and can run multiple intensive jobs concurrently.

How is the Grid being used by students and faculty?

  • The Chemistry Department is using the Grid to simulate the molecular structures of pollutants.
  • Marine Biology students use the Grid to sequence next-generation DNA. 
  • Students in the Computer Science Department are programming the Grid to process data more efficiently, which radically reduces the time it takes to analyze huge data sets.

Infrastructure

Photo of Oxford Networks buildingBecause the Grid has special power and security needs, the servers currently live off campus at Oxford Networks, a facility located in Brunswick Landing.

Researchers at Bowdoin interact with the HPC grid via a dedicated computer that accepts jobs, puts them in a holding area (queue), manages a job when it is run, and then sends a notification when the job is finished.

The Grid supports a wide range of jobs from simple shell scripts to heavy computational jobs and parallel processes. Jobs taking several days on a typical desktop machine might finish within hours using the Grid environment.

The initial equipment for the Grid, four computer nodes, was funded by matching funds from the Department of Chemistry and IT in the Fall of 2008. The very first non-test job run on the Grid was started on Christmas Day 2008. It successfully performed calculations for 56 days before returning the expected answer.