By Rob Mitchum // June 21, 2016
The traditional science research article doesn’t perform many tricks. When journals made the leap from paper to web, they largely settled on the static format of the PDF, a format that offers easy printing and little else. In the meantime, research across the spectrum of science has grown more data-intensive and computational, and programmers have developed exciting new ways to document, publish, share, and collaborate on projects. The ingredients are all there for a new kind of living, dynamic scientific publication.
Whole Tale, a new project from several partners including the Computation Institute, seeks to combine those ingredients and reinvent how scientific knowledge is created, shared, and discovered. While there have been many efforts recently to give journal articles a modern makeover, Whole Tale will pursue their goals not by creating an entirely new format from scratch, but by stitching together already-existing tools and adapting them to work for all forms of science.
The name Whole Tale simultaneously puns on two of the project’s missions: telling the “whole story” of a publication, and exhuming data from the “long tail” of science. The project recently received $5 millionfrom the National Science Foundation’s Data Infrastructure Building Blocks program.
“The idea is really about rethinking the scientific publication process,” said CI Fellow Kyle Chard, one of the principal investigators on the Whole Tale team. “It’s about reducing barriers and integrating pieces we already have, making data easier to publish, find, and use, and capturing the research process to make computational science more reproducible.”
WholeTale will integrate several different tools that are already well-supported and popular, such as Jupyter (formerly iPython) notebooks for documenting analysis steps, Globus services for moving, sharing, and publishing datasets, the D3 package for data visualization, and many more. While a particularly savvy researcher may already be using many of these tools in their research, WholeTale’s vision is to seamlessly integrate them into a single environment.
For example, an astrophysicist studying dark matter could log into their WholeTale environment, open up a Jupyter notebook, and access petabytes of simulation data and experimental data from the Dark Energy Survey. Because these datasets come from distributed repositories stored in the cloud, the researcher doesn’t need to download it to their local environment, and can conduct all analyses remotely as well. After their work is finished, the researcher can use WholeTale to publish their workflow (including any code or software used in the work), raw data, and results, all with unique digital object identifiers (DOI) for easier reference and discovery by other scientists.
“The focus is not to not build new things, but to focus on integrating wherever possible,” Chard said. “We’re building an ecosystem, where we’re the glue joining the tools that allow researchers to use their front-end environment to access , analyze, and publish data in a standard way.”
A key piece of that puzzle will be the CI’s Globus, which will adapt many of its data management services to fit the WholeTale platform. Data transfer and sharing will be integrated through the 40,000 (and counting) Globus endpoints worldwide, allowing scientists to easily access data from collaborators or public repositories. Globus Publication will help researchers publish and discover collections of data, and Globus authentication services will allow academics to use their institutional account to log-in to WholeTale.
Like many other attempts to bring research articles into the 21st century, the ultimate success of WholeTale will depend on uptake within the scientific community. In addition to outreach efforts with the major journal publishers, the WholeTale collaboration includes representatives from several different research fields, including astrophysics, materials science, biology, earth sciences, disaster recovery, archeology, and social sciences. These “embedded” science teams will co-develop and “field test” WholeTale to make sure it works across different disciplines.
“Scientific fields are so different in terms of types of data and types of analyses, so for us to be successful, we need interest and personal investment from a broad range of scientists and research groups who come on board and really buy into the vision,” Chard said. “Establishing communities around our work is really important.”
Whole Tale is co-led by Matt Turk, Bertram Ludäscher, and Victoria Stodden, of the University of Illinois; Niall Gaffney of the Texas Advanced Computing Center; Matt Jones from the National Center for Ecological Analysis and Synthesis at University of California, Santa Barbara; and Jarek Nabrzyski of the Center for Research Computing at the University of Notre Dame.