By Rob Mitchum // March 9, 2017
When Globus Genomics launched five years ago, biologists were just getting used to the idea of being a “big data” science. At that time, the rapidly falling costs of next-generation sequencing suddenly made large-scale genetics more accessible to life scientists. However, these new methods also brought new challenges, as researchers used to working with small datasets on their desktop computer were faced for the first time with the kind of hard-drive flooding data streams more commonly seen by physicists and astronomers.
Today, with the help of tools such as Globus Genomics, biologists are much more comfortable keeping up with data demands. Combining the cloud-based data management services of Globus with the Galaxy bioinformatics platform, Globus Genomics made it easier for scientists, laboratories, and research groups to move, share, and analyze their sudden data boom. But the field isn’t sitting still, moving from the partial sequencing of gene-encoding exomes to whole-genome sequencing (WGS) that produces even more data and analytics hurdles.
To keep up with this fast-moving field, Globus Genomics is moving into its second phase, with a $1.58M award from the National Institutes of Health. New features will address suggestions by the original users of the platform and provide additional services that improve accessibility and expand its user base. While the central mission — using cloud computing to manage and analyze genomic data — remains the same, the next chapter of GG integrates the latest developments in both genetics and computer science.
“The science in understanding the genomic or genetic basis of diseases is constantly and rapidly evolving,” said CI Senior Fellow Ravi Madduri, who leads the project with CI Senior Fellow Ian Foster. “One of the things that we pride ourselves on is taking the great new analytic pipelines that research groups develop, making them computationally efficient, and running them at scale on Amazon.”
As more laboratories turn to whole genome sequencing, they’ll need even more powerful tools to handle the terabytes of data these methods produce. While most WGS data analysis currently requires access to a large cluster or supercomputer, Globus Genomics will help researchers build on-demand and temporary high-performance computers with Amazon Web Services to handle these weighty tasks.
For both large and small data sets, an increasingly popular tool for analytics are Docker “containers,” which make it easier to run software applications in the cloud. For each of the more than 1500 tools currently available through Globus Genomics, the research team has created containers that use cloud resources more efficiently and allow for more scheduling flexibility, which can boost performance and reduce costs.
In addition, the GG team — which also includes Paul Davé, Dinanath Sulakhe, Alexis Rodriguez and Yukai Xiao — created computational profiles for these containers, which helps automate the complicated process of choosing the right Amazon cloud resource for a given analysis job. After users select the tools they’d like to use, the platform will determine the best options for performance or cost (or a combination of the two). They can also package jobs with other GG users, creating additional savings.
“We can pack in a lot of compute by packaging a huge machine with a lot of small jobs,” Madduri said. “This drives efficiency much higher and lowers the cost. It’s kind of like the Costco of genomic analysis — the more you do, the cheaper it gets, and we can pass savings on to the users.”
While Globus Genomics hopes to expand to serve 100 research groups by the end of phase two, Madduri credited their original research partners with valuable assistance in getting the project off the ground. In particular, he highlighted a collaboration with CI Senior Fellow Funmi Olopadewhere her laboratory built a pipeline to analyze genomic data from Nigerian women at elevated risk of breast cancer.
“It helps them hone in on the drugs and therapies that can be developed in order to provide a better treatment options than what exists today,” Madduri said. “That’s the kind of project that makes me feel we’re doing something to make the world a better place, that makes you feel like part of something much bigger than what you are.”
(Image: Arthrobacter arilaitensis Re117 genome atlas. Monnet C, Loux V, Gibrat J-F, Spinnler E, Barbe V, et al. (2010) The Arthrobacter arilaitensis Re117 Genome Sequence Reveals Its Genetic Adaptation to the Surface of Cheese. PLoS ONE 5(11): e15489.)