The Data Lifecycle

From Creation through Curation to Preservation

The OCHRE Data Service supports the archival needs of a research project by implementing a comprehensive strategy of capturing project data and seeing it through to long-term preservation, making such data available along the way for analysis and publication.

Working closely with the Digital Library Development Center (DLDC) at the University of Chicago Library, the OCHRE Data Service uses OCHRE to package project data and dispatch it to an archive which is created, secured, and maintained in the short-term and sustained over the long-term. Ultimately, OCHRE provides the tools to make itself redundant; that is, the preservation strategy assumes that OCHRE will not be needed to access archived information.

General Archival Strategy

An OCHRE project’s archiving strategy is implemented using the recommended standards outlined by the Open Archival Information System (OAIS) Reference Model for digital preservation practice. In the terminology of the OAIS Reference Model, OCHRE provides the means for creating (as the producer) the submission information packages (SIPs) to be delivered to archival storage (in the guise of the DLDC), where archive information packages (AIPs) will be prepared, ingested, and made accessible, as needed.

Practically speaking, this involves a number of steps:

  • Ensuring that all data to be archived is within OCHRE, or within the reach of OCHRE; that is, consolidating all relevant data, including potentially vast collections of digital images in such a way that all necessary files are accessible for describing and packaging.
  • Creating de-normalized forms of the highly granular data within OCHRE so that it can be independently understandable (to use OAIS terminology); that is, OCHRE delivers data in a format that can stand alone, outside of OCHRE.
  • Deciding what constitutes preservation-worthy content, and what should be its preservation description information (PDI).
  • Defining the structure and format of a submission information package; this represents the collection of information about a unit of preservation, including relevant metadata, that will be created and delivered to the archive.
  • Using tools in OCHRE to extract the necessary content and produce a collection of external data files representing information packages ready for submission to an archive.
  • Preparing the archive to accept and ingest information packages. This is done independently of OCHRE so that the formatted and fully described information has a long-term home apart from OCHRE.
  • Determining what content should be available to consumers from the archive and by what access methods. This is a decision to be made by the content managers and in cooperation with those developing finding aids to provide access to the archive.

The Europeana Data Model

OCHRE follows the data representation standard described by the Europeana Data Model (EDM) as a primary archival format. The EDM is a rich and flexible data representation standard that is developing wide support, initially in Europe, and particularly by institutions and systems dealing with cultural heritage data, including the Digital Public Library of America (DPLA), Galleries, Libraries, Archives, Museums (GLAM) partners, and the Library of Congress (through BIBFRAME). The EDM borrows widely from other common standards like the Dublin Core (DC) metadata standard and Simple Knowledge Organization System (SKOS) concepts, and supports the creation of linked open data for Web-based publication. Given our roots in managing cultural and historical information, this standard is particularly appropriate for data that originates in OCHRE.

OCHRE is based on a generic, “upper” ontology that is fully compatible with the Europeana Data Model. The object-centric approach employed by the EDM, and based on cultural heritage objects (CHOs) is well-matched by OCHRE’s underlying item-based data model. The EDM entities of Agent, Event, Place, and TimeSpan correspond directly to OCHRE’s Persons, Events, Locations and Periods respectively. The SKOS Concept entities borrowed by EDM can be fully mapped to by OCHRE’s Concepts. Alternate labels for items (skos:prefLabel) are managed by OCHRE’s Alias mechanism. Multi-lingual support is built-in to OCHRE’s character string management and so maps easily to the xml:lang tag or to the RDF language notation (e.g. “…”@fr) recommended by the EDM.

Many descriptive, “metadata” elements, like those in the Dublin Core standard which are borrowed by the EDM, are built into OCHRE. These include, for example, creator (dcterms:creator), creation date (dcterms:created), title (dcterms:title), description, type, and so on. Additional project-defined properties attached to OCHRE items to describe them more fully are captured using the aggregate structures (e.g. aggregatedCHO) of the EDM.

Web resources (edm:WebResource), or any other digital representation of an object, can be easily represented by OCHRE’s Resources category. Objects and resources can be related in OCHRE using assorted linking strategies. Items can be 1) linked directly (e.g. edm:hasView), 2) linked via properties based on relational variables (e.g. edm:dataProvider), 3) linked via hotspots on image resources, or 4) related hierarchically. The flexibility of the EDM provides the mechanism for representing all of these strategies.

In short, the EDM provides a compatible, flexible and comprehensive data representation format suitable for creating archivable objects exported from core OCHRE project data.

Pilot Project

Our pilot implementation of this archiving strategy involves representing a subset of OCHRE project data as cultural heritage objects (CHOs) as defined by the EDM. The goal is to express these as RDF-triples stored in a MarkLogic database and accessible via SPARQL. Please contact us for more details or if you are interested in participating in our pilot program.