Biology Students + Crickets + High Performance Computers = Scientific Breakthrough

By Rebecca Goldfine
Students in Hadley Horch's molecular neurobiology class are not just getting an introduction to bioinformatics this semester, they're applying their new computational skills to a project that will help decipher the mysterious genetics of crickets.
Hadley Horch with a cricket
Hadley Horch, professor of biology and neuroscience, with a cricket.

The task that Horch has set before her class of twenty-four students is to create the largest and most complete transcriptome of the common field cricket. By April.

A transcriptome refers to all the transcripts, known as messenger RNAs (mRNAs), that exist in a tissue. These mRNAs, which are RNA copies of the genes located in the DNA, are the blueprint instructions for building proteins. 

"It is ambitious what we're trying to do," Horch said. "And we have more plans, too, after we build this." They will use the information they piece together to search for possible biological pathways behind the cricket's remarkable ability to compensate for the loss of a sensory organ.

Most of Horch's students have limited or no experience with the kind of computational biology needed to perform such a big job. To help bring them up to speed quickly, Horch has partnered with three bioinformaticians at the Bar Harbor-based Mount Desert Island Biological Laboratory (MDIBL). They Zoom in weekly to lead online labs (in fact, they teach four sessions of the same lab so that the class can split up into groups of six socially distanced participants).

In a recent lab session in Kanbar 107, six masked students sitting at least six feet apart sat below a large screen projection of Joel Graber, a senior scientist and director of computational biology at MDIBL who was teaching that day's lesson. Step by step, he and his colleagues are guiding students through the process of commanding a computer to string together billions of fragments of genetic material.

Though the students will use their own laptops to work on the project, the actual job of generating a transcriptome requires far more computational oomph than any one desktop or laptop can provide. So students are learning how to use their personal computers to connect to Bowdoin's high-performance computer cluster (HPC), which is overseen by Dj Merrill, Bowdoin's director of high-performance computing.

Merrill said he's excited to see the convergence of an outside research facility, a Bowdoin class, and the College's HPC. "This is the first time we've seen the three coming together to actively teach a class," he said. 

"I’m personally really interested in molecular biology; recent advances in molecular research methods have made it possible to pursue answers to so many cool and interesting questions. To understand modern research in neuroscience, it’s necessary to keep up with the cutting edge of these methods. I’m also really interested in neurological and psychiatric disorders, like Angelman’s, Alzheimer’s, and Parkinson’s; these are molecular disorders. To understand how to treat and cure them, I need to gain the skills to work in molecular biology. I’m gaining those skills in Molecular Neuro." — Anthony Yanez ’22 

The importance of transcriptomes for all creatures

For the nonbiologists, a transcriptome, while related to a genome, is very different. (The word transcriptome is a blending of transcript and genome). A genome is the full map of an organism's DNA and genes. A transcript refers to the information encoded in mRNA after it is copied from DNA and used to begin building proteins. And a transcriptome is a catalog of all possible mRNA transcripts in any given tissue at any given time—which means it can be quite large.

Scientists obtain transcripts by sequencing the mRNA in tissue samples from organisms—such as embryos, brains, livers, and other organs. Tissues are dissected and homogenized, and the RNA is purified from the samples. Messanger RNA is then cut into short pieces and sequenced, which creates a jigsaw puzzle with many millions of pieces that needs to be put back together.

Sample of transcript reads
A screenshot of a portion of the transcriptome assembled by students.

Horch would like to have an accurate, complete cricket transcriptome because it can help show her which genes, and how many of them‚ are being turned on or off in the insect (or, "up- or down-regulated") under certain experimental conditions. This in turn can reveal the genes involved in the biological processes that result in the formation of a cricket leg or eye—or, more applicable to her research, an auditory system.

The cricket is highly unusual because it can reorganize its auditory system after it's been injured. This means, basically, that after one of its ears has been cut off or damaged, the cricket can compensate by rewiring the disconnected side to respond to the remaining, intact ear. "We're hunting for the molecular basis of this reorganization," Horch said. 

Horch's type of research is often described as basic science, which means that, while there could be medical or therapeutic advancements in the future based on her work, her goals are purely to unearth knowledge. She borrows the term "preclinical" to describe basic research because, she explained, "You never know what will go on and become useful in another organism's system or become the basis of a big breakthrough in cancer or regenerative biology. Could it be possible we could recapitulate the cricket's plasticity somewhere else?"

"That is where we're trying to get to," Graber echoed Horch. "Ultimately, whether it's understanding a disease, looking for treatments, understanding developmental processes—what we're trying to do is get a picture at the molecular level of what the players are and how they are changing."

MDIBL
Mount Desert Island Biological Laboratory in past years has invited Bowdoin students to its facilities for faculty-led research projects over spring break. This year that wasn't possible.

A partnership with MDIBL

Since the early 2000s, MDIBL and Bowdoin have been part of a federally funded network of educational and research institutions building up biomedical research and training in Maine. In 2019, this group—the IDeA Network of Biomedical Research Excellence (INBRE)—got a boost when the National Institutes of Health awarded it $18 million to continue creating research and training opportunities across the state for the next five years.

Horch has brought her students to MDIBL's Bar Harbor lab a handful of times since 2003—always over spring break. Her students typically spend ten long days in the seaside lab, usually on a project related to her cricket research. This year, because of COVID, these hours have been spread out over the course of the semester, and take place on campus, albeit remotely. Fortunately, bioninformatics—as it is computer based—lends itself easily to an online classroom. 

In a typical year, MDIBL offers many lab-based trainings to students in genetics and molecular biology. In the past two years, it has also added a transcriptome analysis workshop to its inventory of offerings for Maine students, scientists, and educators.

"We had a number of different people looking at this common problem" of trying to make transcriptomes—but on different animals. "Sequencing has gotten very inexpensive and accessible, but we have people working with sea bass, earwigs, lobsters, or crickets, so the workshop grew out of the desire to get people working together, talking together, and sharing resources and efforts," Graber said.

And the bioinformatics skills he's teaching are only going to become more important in biology, certainly, but also in many other fields. "Our world is data intensive and it is only going to get more so, so developing the skills and mindset to deal with large data sets is a critical skill," he said. 

"I spoke to a few neuroscience PhD candidates and researchers recently, and one of the questions I asked was 'what is a skill that you didn’t get in college do you wish you had?' And a lot of them said gaining proficiency in computer coding, especially as technology advances, that you have to have to keep up with the technology." — Lucy O'Sullivan ’23

High-performance computing at Bowdoin

HPCs

These images show the high performance computing (HPC) racks housed at FirstLight on the former navy base. The computers are connected to Bowdoin's central campus with dedicated fiber-optic lines. 

Bioinformatics is a fast-moving field. "There is more information coming out all the time," Horch said. Last summer, the cricket genome became publicly available—a major milestone. Currently five transcriptomes also exist (two from Horch's lab). "But they're all piecemeal," Horch said, "and made separately by different groups."

Horch wants her students to take these transcriptomes, these "pots of data," and build them into a unified transcriptome that would provide a "more complete and full resource for everybody in the cricket research community." 

But to do this, her class requires a high-powered computer to put together massive amounts of information in a logical way. "This is where the bioinformatics knowledge MDIBL has is helpful, to help us think about dividing these long sequences into bite-sized chunks which will result in something that hangs together as a whole. We couldn't do it on our own for sure," Horch said.

Dj Merrill
Dj Merrill oversees the high-performance computing cluster for Bowdoin.

And this is where Bowdoin's HPC comes in. "The students will tell it to run these enormous jobs, to do something difficult and computationally complex, time-consuming, and resource-consuming, and they can do it through the command line on their little laptop," Horch said. 

Bowdoin began investing in its HPC over twelve years ago, originally to support faculty research in the departments of chemistry, biology, and physics. Since then, the capacity of the computers, which live in a facility on the nearby former navy base, has grown as demand has increased and technology improved. "We've made quite a big jump in processing power and memory," Merrill said.

Even as Bowdoin's scientists have been relying on the HPC more, faculty in fields beyond the hard sciences—such as digital and computational studies, environmental studies, economics, and government—have been more frequently taking advantage of Bowdoin's computing power.

Just in the past three or four years, a growing number of students have also begun accessing the HPC for independent research and class projects. To help connect students to the HPC, Merrill set up a more intuitive web interface, and Bowdoin also began supporting Jupyter, a free, open-source web tool with a graphical interface that can facilitate HPC programs.

"Students can go into Jupyterhub with their laptop, run R, Python, or whatever environment they want to use, and their project will run on the HPC in the back end," Merrill said. "That has opened up a lot of opportunities, and we have multiple classes each term using the Jupyterhub web interface and running HPC jobs."

With Merrill's support, and the support of MDIBL, Horch's students will complete their cricket transcriptome in just a few weeks. Then they'll enter phase two, which is annotating and curating the resource. At the same time, they’ll be working in the "wet lab," where they'll try to determine which genes are being expressed—and how many—when the cricket auditory organ is intact, in control animals, and when it has been injured.

"It is great to be ambitious, and to complete the story," Horch said. "They'll pick one gene to focus on, dissect tissue and do measurements on that gene to see if it is up- or down-regulated as we predicted. These two strands will come together nicely—the bioinformatics and the wet lab research—so students can see how you build this resource and how you play with it experimentally as well."