By Rob Mitchum // December 29, 2014
The back half of 2014 was filled with big announcements: partnerships with the White House and the City of Chicago, participation and leadership in some of the newest, biggest national scientific computing projects, a new home for cancer genomic data, and an upgrade for our pet Beagle. The CI hosted a hackathon, kicked off the second year of its Inside The Discovery Cloud Speaker Series, and toasted the second class of the Data Science for Social Good fellowship. Most importantly, across all the research centers of the CI, the stage was set for an even bigger 2015. Happy New Year!
A new graphic released by the UK’s Foreign and Commonwealth Office visualized the interrelated impacts of climate change upon the world, from several different perspectives. One portion of the poster-sized graphic focused on research from CI fellow Joshua Elliott, illustrating his research on how climate change will deplete global freshwater supplies, while others were based on research by the Agricultural Model Intercomparison and Improvement Project (AgMIP), an international collaboration including several RDCEP researchers.
RDCEP research was also included in an announcement from the White House’s Office of Science and Technology Policy, part of their massive Climate Data Initiative. Multiple RDCEP tools, including the climate emulator, FACE-IT, and the upcoming ATLAS were highlighted as innovative tools for connecting the public and policymakers with results from large scale climate and climate impact models.
The Data Science for Social Good fellowship caught the attention of WBEZ’s Chris Hagan, who featured the program in an episode of his podcast Tech Shift. DSSG was also highlighted by NPR’s Marketplace, VentureBeat, and New Scientist over the course of the summer.
In a corner of the Data Science for Social Good office, a group of Chicago high school students spent the summer working on their own creative data project — the construction of a data-powered, shoebox-sized device called the Looplamp. With instructors from RDCEP, Knowledge Lab, and Teamwork Englewood, the students learned about and used digital fabrication, programming, and data analysis methods to create their own custom designs of the “smart lamps,” producing Looplamps that change color and brightness in response to energy usage, bus tracking, and other data sources.
As one of the world’s leaders in meteorology and climate research, the National Center for Atmospheric Research (NCAR)both collects and generates a lot of data. To ease the sharing and management of these massive data stores, NCAR joined with Globus to create an “enhanced data service” that will fuel global collaboration. Earlier in the year, at the GlobusWorld conference, NCAR’s Pam Gillman contributed a video testimonial about the value of Globus to her organization.
Cloud computing is now an essential part of how people and businesses use the internet, though it remains a technology in its earliest stages. In pursuit of the next wave of cloud-based innovation, a team led by CI Senior Fellow Kate Keahey received $10 million from the National Science Foundation to launch Chameleon, a new testbed for large-scale cloud computing. The project is the heir to FutureGrid, a CI and Texas Advanced Computing Center effort that allowed researchers to develop and test new powerful computer systems.
At the end of August, the Data Science for Social Good fellowship wrapped up its 2014 program with a Data Fair before a packed house at Chicago’s technology hub, 1871. Successful projects included a new predictive model guiding city lead inspectors to the most hazardous homes, an early warning system for identifying struggling high school students in danger of dropping out, and a website for giving consumers more useful information from home “smart meters.” The program is currently accepting applications for fellows, mentors, and project partners who would like to join the 2015 program.
The Array of Things project — UrbanCCD’s ambitious effort to create a “fitness tracker for the city” with a network of interactive urban sensors — drew media attention all summer and fall in anticipation of its winter rollout. This month, AoT launched its website and was nominated for the Cooper Hewitt People’s Design Award, while Bloomberg TV and WIREDproduced videos on the project and USA Today and Chicago Magazine wrote up its goals.
Another UrbanCCD project announced in September, Plenario, created a user-friendly online platform for accessing and using the kinds of citywide datasets that Array of Things will someday produce. By freeing open city data from static spreadsheets and allowing users to map and combine the information they hold, project leaders Charlie Catlett and Brett Goldstein hope that Plenario will kick off the next wave of urban data analysis and app development.
The current scientific citation system is poorly equipped to distribute credit among both co-authors and the tools and software they used in their research. So CI Senior Fellow Daniel S. Katz and Github’s Arfon Smith proposed a new system for assigning scientific credit where credit’s due: transitive credit. By attaching percentage contributions (using JSON-LD) to authors, citations and “contriponents,” the system imagines a new, more easily quantifiable measure of the impact from a scientist — or the scientist’s code.
How does scientific jargon create barriers between research fields? Authors from CI’s Knowledge Lab and the Metaknowledge Research Network explored this question, using citations and text analysis to create a “map” of science, with disciplines that share language connected on the same land mass and highly jargonized fields balkanized into archipelagos of islands.
As both wealthy and developing countries rush to build larger and larger urban areas, the classical tools of architecture and city planning increasingly come up short. An UrbanCCD collaboration with the architects and developers of the 600-acre Chicago Lakeside Development on the city’s south side created a new platform, called LakeSim, that helps developers rapidly test out the energy, transportation, and wastewater demands of different designs. LakeSim was featured by Argonne this month, and back in March, LakeSim co-investigator Leah Guzowski spoke at the CI about the energy models embedded in the platform to help designers predict resource demand.
One way to weaken the obstructions created by jargon is to create thesauruses that capture the different names for shared concepts between fields. But an analysis by the CI’s James Evans and Andrey Rzhetsky found that current such references in biology and medicine miss more than 90% of these synonyms. The authors propose that a partnership of algorithms and user input — an effort on the scale of the Human Genome Project — could create a better scientific thesaurus to find previously undiscovered connections and insights.
The value of collaborations within and across national borders was quantified by the laboratory of CI fellow and faculty Stefano Allesina. By analyzing and calculating the scientific impact of over 1 million papers, graduate student Matthew Smith found that journal articles with authors from multiple countries out-performed papers with only a single country of origin. The study also found that the established scientific centers of the US and UK are losing ground to growing scientific communities in China, India, and South Korea.
When it comes to finding rare genetic variants associated with disease, putting multiple sets of eyes on the hunt can only help. In a collaboration between the Globus Genomics team and the laboratory of CI Senior Fellow Nancy Cox, researchers developed a new algorithm for finding these variants, a consensus caller that applies four different algorithms to the same sequence, then “holds a vote” for the most likely candidates.
An all-night hackathon organized by Hack@UChicago and hosted by the CI brought dozens of students to late-night seminars about visualization, mapping, agent-based modeling, and working with climate and urban data. The 24-hour competition also yielded impressive projects that visualize hospital quality data, map crime around the University of Chicago campus, or even generate automated rap lyrics.
The CI was a major participant in the annual Supercomputing conference, held this year in New Orleans. CI personnel including Director Ian Foster, Senior Fellows Rick Stevens and Michael Wilde, and fellow Pavan Balaji presented subjects ranging from exascale computing to parallel scripting for supercomputers to mapping the protein universe. A Globus GridFTP demonstration at the conference by Raj Kettimuthu moved 65 terabytes of data in under two hours.
CommunityRx is a unique health care project, led by University of Chicago Medicine’s Stacy Tessler Lindau, that gives patients personalized information about healthy resources in their local community. Partnering with CI Senior Fellows Jonathan Ozikand Charles Macal, Lindau is now using agent-based modelingto estimate the full impact of the intervention, studying how information travels through families and neighborhoods.
The Knowledge Lab completed its first major call for proposals, distributing $1.4 million to 15 projects about the creation and spread of scientific discovery. The projects will use scholarly data to investigate mentorship, technical evolution, peer review, citation networks, and other dimensions of the systems underlying science.
Since 2011, the Beagle supercomputer has brought biomedical researchers at the University of Chicago up to the terascale, enabling new discoveries on the vanguard of genomics, protein biochemistry, and image analysis. A new $2 million grant from the National Institutes of Health will allow Beagle to evolve into its second phase of life, with an upgrade to 250 teraflops supporting a deep new menu of research in biology in medicine.
Funded by the National Science Foundation, the new cloud-computing infrastructure Jetstream hopes to become an “on-ramp” for scientists new to high-performance computing. To ease this merge, the $6.6 million project will use Globus services for data management and publication, ensuring that researchers will be able to move their data to and from HPC resources and share the findings when their study is finished.
An explosion of cancer genomic data has made it difficult for researchers to understand the disease and develop new treatments against it. The Genomic Data Commons, a new National Cancer Institute effort led by CI Senior Fellow and Faculty Robert Grossman, seeks to unify and organize this spread-out data, creating a central resource for medical records and genomic data that can inform both research and patient care.
An Argonne feature described how CI’s Swift language for parallel computing is deployed at the laboratory’s Advanced Photon Source, helping researchers rapidly analyze their collected data from the resource and make adjustments on the fly.
RDCEP once again joined forces with the White House’s Climate Data Initiative, committing to improving the resolution of national data on irrigation to help fuel more complex models of the interplay between agriculture, climate, and hydrology.
The motto for the 2014-15 edition of our Inside the Discovery Cloud Speaker Series is “Catalyzing Collaboration,” highlighting the multi-disciplinary partnerships that the CI facilitates in biology, the humanities, urban research, and climatology. The first event focused on “Genomic Analysis in the Cloud,” and the collaboration between Globus Genomics and the laboratory of CI Senior Fellow Nancy Cox, who have developed new, cloud-based methods for analyzing sequencing results. You can watch videos of talks from Cox, and Paul Dave and Alex Rodriguez of Globus Genomics.