High Performance Computing
Linux High Performance Computing (HPC)
HPC at Bowdoin
Originally supporting a single faculty member in a single department, the Bowdoin HPC environment now supports dozens of faculty and students across a variety of disciplines, including biology, chemistry, computer science, digital and computational studies, environmental studies, economics, geology, history, sociology, math, and physics, as well as many individual student and faculty research projects.
A wide variety of software packages and programming languages are available within the Linux HPC environment, both commercial and Open Source, including but not limited to ADF, Beast, C, C++, CUDA, Fortran, Gamess, Gaussian, Grass, IDL, Java, Mathematica, Matlab, Mopac, NBO, NWChem, Perl, Python, R (along with RStudio), Ruby, Sage, SPSS, Stata-MP, SuperMongo, Topspin, and hundreds more.
There are two different methods for using the Linux HPC resources; interactively, and via a batch scheduler that manages the HPC Grid.
Interactive Computing is the way that most people use a computer. This is essentially sitting at a single computer, running programs, and interacting with those programs either through a GUI (Graphical User Interface) or command line interface. Checking e-mail, browsing the web, and composing a document are examples of interactive computing. You would use interactive computing if you were running any software that displayed graphics, or required manual interaction with the program while it is running, such as inputting more data, typing additional information, or clicking on an icon. Interactive computing is best used when you are able to accomplish your goals during the time that you will be sitting at the computer, and can quit the program when you leave (ie, you are not leaving the computer running your job when you walk away from it).
Bowdoin Computing Grid
The Bowdoin Computing Grid is a group of Linux servers which appear as one big, multiprocessor, compute server that can run many computationally intensive jobs concurrently. The Grid supports a wide range of jobs from simple shell scripts to heavy computational jobs and parallel processes. Jobs taking several days on a typical desktop machine might finish within hours using the Grid environment, thus freeing up the desktop computer for other tasks while the Grid resources process the job on dedicated computational nodes. People interact with the Bowdoin Computing Grid via the Son of Grid Engine (SGE), which is a software environment that coordinates the resources of multiple computers. SGE accepts jobs, puts them in a waiting queue until they can be run, sends them to a computational node, manages them during the run, and notifies the person when they are finished.
Typically if a job takes more than a few hours to run on a desktop machine, or the desktop machine needs to be available for other tasks, you should consider running the job on the Grid.
Computational nodes within the Grid range from 16 CPU core, 128 Gb RAM systems, to 56 CPU core, 256 Gb RAM systems. The Grid also offers GPU computational nodes utilizing NVidia GPU cards.
HPC Community @Bowdoin
"My collaborator, Tim Divoll from Indiana State University, and I use the HPC to conduct bioinformatic analyses of high throughput DNA sequencing data. The DNA sequences are a metabarcoding study to understand the diet of a Neotropical bat species, the frog-eating bat Trachops cirrhosus. T. cirrhosus has been well studied in captivity for its behavior of hunting frogs by eavesdropping on their calls, but their diet in the wild is largely unknown. We collected fecal samples from more than 100 T. cirrhosus individuals over multiple years and across dry and wet seasons in Panama. We then sequenced two gene regions, 16S and CO1 (using 454 sequencing and illumina sequencing respectively), from prey remains in the fecal samples. We are using the HPC to sort and assign taxonomy to the millions of sequences that result from these next-generation sequencing approaches. Our goal is to be able to comprehensively describe the diet of this bat for the first time, as well as assess how diet varies between adults and juveniles, across capture sites, and across seasons. Understanding a species' diet is a first step in better understanding its ecology, and potential approaches to conservation." -- Patricia Jones, Biology
"Several of my recent projects address how public assistance like welfare affect the long run outcomes of children. For example, one project focuses on the role of public assistance in delaying the onset of certain diseases or conditions like diabetes, asthma, and high blood pressure. This requires longitudinal data where we can follow people from childhood to adulthood. The project uses the Panel Study of Income Dynamics (PSID), an annual sample that follows households to today beginning in 1968. Several thousand families generate a lot of data when followed for 40 years with extensive information on demographics, income, labor, health, etc. The study estimates transition models of age at adverse health onset using computationally intensive latent variable methods that allow joint modeling of family income and public assistance. The HPC environment allows me to estimate models that use up to days of CPU time, and run more than one at a time . This would simply not be feasible on a desktop computer." --John Fitzgerald, Economics
"I am using a deep neural network to study a problem in theoretical mathematics: what is the minimum number of scalar multiplications necessary to multiply two matrices. This is a long-standing problem that has never been analyzed using machine learning. Based on some initial experiments using Bowdoin's GPUs, it looks the current known bound on the number of multiplications can be improved, perhaps substantially." --Thomas Pietraho, Mathematics
"I am a biologist studying Gulf of Maine marine organisms’ response to changing waters and the underlying genetic mechanisms of adaptation. In order to do my work, I use the Bowdoin HPC Grid to align and analyze large datasets of DNA and RNA sequences. In addition, I harness the computing power of the HPC Grid to link underlying genetic variation with traits that matter to the organisms in the wild, e.g. how well a mussel can continue to build its shell in acidic water.
During the Fall, my Bowdoin Marine Science Semester students learn to use the bioinformatic pipelines on the HPC Grid to investigate population genomic patterns in a native Gulf of Maine intertidal snail species. These snails exhibit physical differences between populations in sheltered shorelines versus wave-exposed shorelines; the BMSS students utilized next generation sequencing techniques and the HPC to discern the genetic architecture underlying these physical differences.
This Spring 2017 semester, I am teaching The Omics Revolution: Computational Genomics and Big Data in the Field of Biology, where students learn to use the HPC Grid to analyze a variety of large-scale datasets common in the ‘Omics’ fields. The Omics Revolution students will choose a question and existing omic-scale dataset of interest, analyze that data to test their hypothesis, and write a scientific paper summarizing their findings." --Sarah Kingston, Biology
"I have incorporated Amsterdam Density Functional (ADF) calculations into my research efforts with students since 2001. ADF calculations allow them to better understand the bonding and photochemical properties of molecules containing metals such as platinum, gold, iridium, ruthenium, and osmium. Beginning about 2005, I added a computational chemistry project as part of Chemistry 3400, an Advanced Inorganic Chemistry course. The project requires students to perform ADF calculations on a molecule of their choosing, and to write a report describing and interpreting the results of the calculations. This project complements the other parts of the course and, among other things, provides students interested in graduate work in chemistry with valuable computational chemistry experience. Since 2008, the Bowdoin HPC Grid has allowed us to run much larger jobs, and multiple jobs at the same time, which is a great improvement over running on desktop computers, and also frees up the desktop computer for other tasks." --Jeff Nagle, Chemistry
"The HPC Grid has been a great resource for teaching and doing research with computational text analysis. I was able to be up and running quickly; the process is not intimidating. Students in my First Year Seminar “How to Read 1,000,000 Books” use programs written in R to isolate language usage patterns in over 1.2 billion data points collected from Google Books. My different research projects use smaller data sets: 1 million tweets, nearly 46,000 journal articles, and nearly 1,000 books. Having the HPC allows me to run my analyses without tying up the computer that I use for teaching and day-to-day activities and my student research assistants can do the same, which helps with collaboration. Sometimes this means reconfiguring a matrix with billions of elements, other times it means creating thousands of smaller documents for comparison. Completing these jobs, and completing them in a reasonable time frame, simply wouldn’t be possible without the HPC." --Crystal Hall, Digital and Computational Studies
"My students and I use the Bowdoin Computing Grid to perform numerical relativity simulations of black holes. Einstein’s theory of gravity, general relativity, is encoded in Einstein’s equations, a complicated set of partial differential equations. The equations can be solved exactly only for special cases. In general, we have to rely on some approximation technique in order to study the behavior and interactions of black holes, as well as the gravitational radiation that they emit. A particular powerful technique are numerical simulations. In recent years we have used such ’numerical relativity’ simulations to study, for example, binaries of two orbiting black holes, as well as so-called ‘critical phenomena’ in the formation of black holes. These simulations require significant computational resources and can be performed only in high-performance computing environments. It is extremely useful to have such resources here at Bowdoin." --Thomas Baumgarte, Physics
"I am utilizing the HPC for my research on cricket's neuronal plasticity in response to injury. In Dr. Horch's wet lab, we extracted RNA from the terminal ganglion of adult male crickets and are now building a transcriptome from the sequences. Thus far, we have used the HPC Grid to run programs such as bowtie, Fastqc, trim galore! to build our transcriptome. To identify target candidates involved in neuronal plasticity such as guidance molecules slit and semaphorin, we will use the Trinity program suite on the HPC grid and conduct differential analysis. The HPC's high computing capacity make it easy for me to submit jobs that do not use local resources on my computer, nor take years to finish given the high volume files I am working with. It would be impossible for me to complete my project in a thorough or efficient manner without the HPC Grid." --Meera Prasad, Biology and Religious Studies double major
"I am an economics and mathematics major and the HPC has been a great resource for my research with Prof. Nelson in the economics department and for my independent study with Prof. Pietraho in the mathematics department.
For my economics research I use the HPC to apply a machine learning package to estimate the likelihood and to describe the factors that affect fishing spots in the Gulf of Maine. The dataset contains 21 bio-psychical and socio-economic variables for around 2.6 million coordinates in the Gulf of Maine. The processing power of the HPC allows for easy partitioning of this dataset to learn how these variables interact together, which can then be used to classify the corresponding occurrence of fishing spots.
On the other hand, for my independent study in mathematics I am using a deep neural network to classify images. The objective is to use a pre-trained neural network that has been trained on a large dataset and multiple categories and then to retrain it on my smaller dataset. Such re-training allows us to explore how explore how accuracy of classification is affected.
Both my projects could not have been implemented on my PC. The HPC has been a great facilitator for exploring the research questions I have and for demystifying the world of computing." --Parikshit Sharma, Economics and Mathematics Major
"I am a Neuroscience major and Mathematics minor using the HPC for the bioinformatics aspects of my research on the compensatory growth responses to injury in crickets. In lab, I collected the prothoracic ganglion from 21 male adult crickets for RNA extraction and sequencing. During the summer of 2016, using the Trinity program suite on the HPC Grid, I assembled a de novo transcriptome of RNAseq read data and began differential expression analysis using EdgeR. I have also used bowtie, FastQC, the BLAST suite, and Tophat on the HPC for further analysis of my dataset. After an an initial characterization of the guidance cue peptidome in my transcriptome, I will aim to identify novel candidate genes for involvement in the compensatory plasticity in response to injury using differential expression analysis. The high computing power of the Grid and ability to submit jobs that don’t require local resources on my computer have been hugely beneficial to my project." --Harris Fisher, Neuroscience Major
History of HPC at Bowdoin College
Spring 2003 - Creation of specialized Physics cluster (16 CPU cores total) supporting one computational application for Thomas Baumgarte
Spring 2008 - Hiring of Dj Merrill to support HPC / Research Computing
Fall 2008 - Creation of general purpose HPC Grid to support campus-wide research. Dan O'Leary, Chemistry, was the first faculty to use the new resource
Fall 2009 - HPC usage expands to the classroom teaching environment, supporting both research and academic use. Chemistry, Computer Science, Math, Physics, Geology, Economics, and Biology are actively using the HPC environment
Spring 2010 - Code ported from the old Physics cluster to the new HPC Grid
Summer 2010 - Specialized Physics cluster retired
Summer 2010 - Dhiraj Murthy, Sociology, starts using the HPC environment for analyzing Twitter content
Summer 2011 - GPU compute capabilities added to HPC Grid
Fall 2012 - Digital and Computational Studies Major created
Summer 2013 - Gluster high speed data storage solution added to HPC environment
Summer 2014 - Campus core server environment upgraded. "Best of" these systems re-purposed as HPC compute nodes, breaking the 500 CPU core barrier
Summer 2014 - Sarah Kingston, Biology, starts doing bioinformatics research using the HPC environment
Summer 2015 - Upgraded HPC core networking using Cisco's 10 Gb ultra-low latency Nexus 3548 switches, offering speeds comparable to Infiniband networking
Summer 2016 - Sarah Kingston teaches seminar on using the Bowdoin Computing Grid to run bioinformatics analysis
Fall 2016 - Institutional Research, Analytics & Consulting Division starts using the HPC environment for data set analysis, representing the first business (non-Academic) usage
Winter 2017 - In addition to the normal research efforts, five classes, an Honors Thesis, and multiple independent studies are using the HPC environment this term.
Fall 2018 - Current status: 1192 CPU cores, 9 GPU compute nodes, 15+ Commercial applications, hundreds of Open Source applications