By Rob Mitchum // November 24, 2014
For three years, the Beagle supercomputer has driven University of Chicago biology and medical research into new computational territories, fueling groundbreaking research in genomics, drug design, and personalized medicine. Now, with a $2 million grant from the National Institutes of Health, UChicago’s high-performance computing resource for biomedical research is ready for an upgrade that will enable the next wave of pioneering discoveries.
Beagle, operated through a partnership between the University of Chicago Biological Sciences Division and the Computation Institute, is a powerful Cray XE6 supercomputer unique for its focus on the newer computational fields of biology and medicine. Unlike national supercomputing resources where biomedical scientists must compete with physics, astronomy, and climate research for limited compute time, Beagle primarily supports advanced analysis and modeling for the life sciences.
“By providing biologists access to powerful, world-class computing, Beagle and its expert crew provide the foundation for a new community of computational biomedicine at the University of Chicago, resulting in innovative research and influential publications,” said Conrad Gilliam, Dean of Basic Science in the Biological Sciences Division at the University of Chicago. “Beagle-2 will further establish UChicago as a leader in applying computational techniques to the most important questions in biology and medicine today.”
The upgrade will improve Beagle, named for the ship that carried Charles Darwin on his pivotal expedition, from 18,000 to 24,000 computing cores, and raise peak performance from 150 to 250 teraflops — capable of 250 trillion calculations per second. Improvements in RAM, disk storage, and high-speed interconnect will help researchers perform larger and faster tasks, including analyses of whole genomes and brain activity, molecular simulations, and large-scale medical image analysis.
“The Computation Institute and the University of Chicago have pioneered the use of leading edge supercomputing systems for Next Generation Sequencing (NGS),”said Barry Bolding, Cray’s vice president of marketing and business development. “Cray is very excited to expand our collaboration with the University of Chicago to dramatically accelerate NGS workflows, and deliver the resulting benefits to researchers and medical practitioners.”
A more powerful Beagle-2 will also continue fertilizing computational biology and medicine in Chicago, offering research and educational opportunities rare for life sciences. In Beagle’s first iteration, more than 90 projects and 300 users used the supercomputer for work that would be challenging or impossible on smaller computing resources. A dedicated user support team from the Computation Institute — including Lorenzo Pesce, Ana Marija Sokovic, Michael Wilde, Paul Davé, and Joe Urbanski — helped researchers optimize code and scale up their work to take full advantage of the supercomputing resource. Many of those projects will continue on Beagle-2, exploring new frontiers in oncology, neurobiology, genomics, and radiology.
New Pipelines for Cancer Genomics
As the price tag for genetic sequencing drops and scientists routinely collect whole genomes, the bottleneck for translating genetic information into clinical applications shifts to analysis. The raw data for a whole human genome is roughly 200 gigabytes, and large sequencing projects, such as The Cancer Genome Atlas (TCGA), hope to analyze thousands of genomes to find new medical insights.
The laboratory of Kevin White, James and Karen Frank Professor of Human Genetics and Director of the Institute for Genomics and Systems Biology at the University of Chicago, has used Beagle to streamline and standardize these large analyses. Graduate student Jason Pitt and CI computational scientist Lorenzo Pesce constructed a computational pipeline called SwiftSeq, which uses the Swift parallel programming language to accelerate analysis.
Using SwiftSeq, Pitt has so far analyzed over 7,000 exomes from the TCGA and autism research, as well as whole genomes from breast cancer studies, creating an unprecedented catalog of uniformly-analyzed results. With this new data, scientists can now compare variants associated with different cancer or disease types without concerns that the data was processed at different sites with different methods.
“Beagle allows us to ask questions that previously we would have had to discard. We just wouldn’t be able to operate on this scale,” Pitt said. “It’s such a powerful tool that now we’ve tailored everything we’re doing computationally in the lab to the fact that we have it there for us.”
With Beagle-2, the additional computing cores and disk space will allow Pitt to further expand his pipeline to analyze even more genomes, and will also boost other White lab projects searching for fusion genes and regulatory elements linked to various forms of cancer. The access to an elite in-house supercomputing resource establishes UChicago as a leader in genomic cancer research, White said.
“Beagle has allowed us to scale up our computations and optimized workflows on genome data, leading to better quality results and several exciting new discoveries including the identification of novel fusion genes and inherited risk factors in cancer,” White said. “Coupled with our use of the Bionimbus Protected Data Cloud system developed in the Institute for Genomics and Systems Biology, Beagle provides us with one of the most advanced platforms for genome analysis in the world.”
A Zoom Lens for Drugs and Proteins
In 2001, the drug Gleevec was one of the first target cancer therapies approved by the FDA, the fruition of a research thread that began with UChicago’s Janet Rowley discovery of the gene fusion underlying a form of leukemia in the 1970s. Gleevec, also known as imatinib, works by inhibiting a protein called Abl tyrosine kinase that is overactive in some forms of cancer. But not all types of tyrosine kinases are inhibited by the drug, a mystery that slows the design of new, more effective drugs.
To study this important drug-protein interaction and gain insight on how Gleevec functions, University of Chicago scientists used Beagle to conduct one of the most difficult computations based on atomic models ever performed. Using the computational method of molecular dynamics, the laboratory of Benoît Roux, Professor of Biochemistry and Molecular Biophysics, compared closely how Gleevec binds and affects Abl and a different tyrosine kinase called Src. The results offer a glimpse of the molecular basis of binding specificity that would be impossible to view experimentally, and suggest potential strategies for designing new drugs and preventing drug resistance.
“In the computer, you run an approximation of reality, but you have access to all the details. You’re basically like a perfect observer,” Roux said. “You can measure anything you want, or calculate anything you want, compare to experiments, and draw conclusions. This kind of study shows the power of these atomic models where you can really help clarify the hidden factors controlling experimental observations. Otherwise, it’s just left for speculation.”
More generally, access to Beagle-2 provides the Roux lab with a computational testing ground for testing more exploratory ideas than typically approved by national high-performance computing resources. Preliminary work studying ion channels, the sodium/potassium pump, and other essential cellular proteins on the original Beagle led to prestigious publications and successful grant proposals.
“Sometimes you want to ask different kinds of questions, and a computer like Beagle-2 is much more amenable to try out new strategies,” Roux said. “We’re doing things that are much more ambitious than in the past, and Beagle-2 is a good platform for that.”
Reading the Thousand Words in a Medical Image
Medical imaging is a key tool for cancer detection and treatment, allowing radiologists to catch tumors early, assess disease severity, and monitor the response to therapy. Maryellen Giger, A.N. Pritzker Professor of Radiology at University of Chicago Medicine and a pioneer of computer-aided diagnosis for cancer, believes that medical images, such as breast MRIs, contain far more information than currently involved in clinical interpretations.
“We’re interested in the relationships between quantitatively-extracted image phenotypes from images and the biology of cancer,” Giger said. “We can then potentially use such information along with other clinical, pathology, and genomics data to predict or enable personalized screening and therapy for patients.”
Giger’s research involves “image-omics” — computationally extracting predictive markers from medical images and building models that use images (alongside genetic and traditional radiologic methods) to more specifically classify tumors and guide treatment. Giger’s team can use Beagle-2 for “rapid, high-throughput image-based phenotyping,” mining thousands of images and using machine learning techniques to find features capable of differentiating between invasive and non-invasive cancers, as well as other cancer subtypes based on molecular classifications.
With Beagle-2, the researchers hope to train and develop image analysis algorithms that can be exported to less powerful machines, providing advanced patient-level diagnosis in the common clinic.
“The public might look at this work and say, ‘You need a supercomputer in every doctor’s office?’,” Giger said. “But here’s a nice example where the big data training is conducted on a supercomputer, and once the model is trained you can run it on a laptop.”