By Rob Mitchum // December 11, 2013
The annual re:Invent conference put on by Amazon Web Services (AWS) gathers people using cloud computing for businesses and information technology. At this year’s conference, held in November in Las Vegas, Ian Foster and Ravi Madduri of the Computation Institute presented their vision for expanding the cloud to another domain: scientific research. In their combined talk, “Globus Genomics: How Science-as-a-Service is Accelerating Discovery,” the pair discussed the computational challenges small science and clinical laboratories face in the era of big data, and how the range of AWS-assisted services developed by the Globus team has started to clear those obstacles and put scientists back on the fast track to discovery.
Across many fields, new technology has dramatically improved the collection of data, but moved the bottleneck back to the management, sharing and analysis of that information. Foster highlighted the rapidly declining cost of genome sequencing, which is now far outpacing advances in computing speed and power described by Moore’s Law. For small laboratories made up of a primary investigator and a handful of students or post-docs, this accumulating mountain of data creates IT problems that are too expensive and time-consuming to develop on their own.
“How are we going to allow these people to continue to make progress, to innovate in genome science, material science, environmental science, in an era of big data?” asked Foster, the director of the Computation Institute.
Foster’s answer is to build a “discovery cloud,” a toolbox of scientific services that automate and outsource the tedious or expensive tasks that slow the pace of science. The CI’s Globus team has developed a number of tools to realize this vision, including Globus Online for moving, sharing and syncing data across institutions and global collaborations, and Globus Nexus for handling identity management issues that plague large collaborations. Many of these services utilize AWS for hosting, elastic load balancing and storing data.
A recent expansion of the Globus vision is Globus Genomics, a platform designed specifically to help researchers with the transfer and analysis of large genetic datasets. Madduri, a CI fellow and project manager at Argonne National Laboratory, discussed how Globus Genomics improves upon the inefficient “FedEx or FTP” model of moving terabytes of data from sequencing centers to laboratories, or between collaborators. The platform also incorporates the open-source Galaxy genomic analysis software and the elastic computational power of Amazon Web Services to enable small labs to conduct big genomics projects — for rates as low as $5 for an exome and $20 for a whole genome.
“We’re trying to make it really easy for researchers to analyze large amounts of sequence data without having to build their own IT structure,” Madduri said. “Researchers who are working on finding cures for diseases like cancer or type 2 diabetes no longer have to worry about the cost of analyzing large amounts of sequence data they have been acquiring…and can accelerate the rate at which they find actionable biomarkers for diseases.”
Madduri also previewed similar platforms for climate science, proteomics and materials research, expected to be launched soon. You can view their slides here, or watch the full talks via the Amazon Web Services YouTube channel below.