Select Page

A New Way to Publish and Discover Data

By Rob Mitchum // November 4, 2015

The traditional scientific journal format has reached new limitations as science grows more data-intensive. Common practice has long been to only publish summarized results of the described experiments, leaving the raw data undisclosed except at the discretion of the researcher. Even if journals wanted to publish the data, the technical infrastructure to do so is far beyond their capacity or expertise as datasets grow larger and larger. But as open data and global collaboration become more common drivers of science, the demand for data publication grows.

Last week, CI research center Globus announced their solution to this vexing data problem with the trial launch of Globus data publication. Building upon their software-as-a-service model for data transfer and sharing, the new publication services make it easier for researchers to identify, describe, curate, verify, and preserve their data at the appropriate levels of durability. By storing metadata in the cloud — while the actual dataset is stored on campus or other local resources — Globus data publication also makes data more discoverable, so that researchers can more easily find relevant data from other laboratories and communities.

For instance, a scientist studying molecular engineering may seek simulation data on a particular material of interest. Through searching Globus data publication, they would find the Nanoscale Materials collection at Argonne National Laboratory, one of the institutions that helped test the new service. If the data is marked public, the scientist can then browse or download it (using Globus) and incorporate it into their own research, saving valuable time and money.

Published data can also be marked private, accessible only to approved users such as the participants in a spread-out collaboration. Custom workflows to curate and approve datasets can be established for each community to control what data is published and how it is described and accessed. If a research group decides to make their data public upon the release of a journal article describing the experiments and results, they can create a DOI link that directs readers of the article to the location of the data for further inspection or download.

Though Globus data publication just now become available to all Globus subscribers, it has been under trial by several volunteer institutions for months. Earlier this year, CI Director Ian Foster announced the service and offered a demonstration with CI Deputy Director Steve Tuecke at Globusworld — a demo which you can watch at the link below. For more information on the framework and functionality of Globus data publication, or to try out a “sandbox” version of the service, visit their website. The Globus team will also have a booth at the SC15 conference in Austin, November 16-19.