OCHRE System Design

OCHRE is designed to support all stages of the research data life cycle and to make it easy to move from one stage to the next—from initial data acquisition to integration, analysis, publication, and final archiving. The OCHRE system has four tiers, as shown in the diagram below.

The Linked Data Tier consists of data obtained from external Web servers via URLs or from external online databases that have APIs (application programming interfaces) which permit retrieval of specific pieces of information as needed. External resources obtained in this way (e.g., 2D images, 3D models, documents, table rows, Web pages, geospatial data, etc.) can be linked dynamically into an OCHRE project, allowing the data to be fetched as needed. This provides another mechanism for data acquisition in addition to keying in data manually or importing it automatically into the data warehouse, which is done for textual and numeric data contained in legacy text files (e.g., CSV files) and spreadsheets (e.g., Excel). A project will provide its own server to store its images and other external resources, or it will pay for server space provided by the OCHRE Data Service. A project can also create live links to curated online databases maintained by other organizations (e.g., the Zotero bibliographic database maintained by George Mason University, or field-specific online databases).

The Client Tier consists of two kinds of end-user software applications. The first is a stand-alone Java application (not a browser applet) that runs on desktop and laptop computers under Windows, Mac OS X, or Linux. It has a feature-rich graphical user interface (GUI) that enables the members of a project team to build and manage their project’s data via password-protected user accounts, and to display and analyze the data in many different ways. The Java GUI communicates via the Internet with the OCHRE data warehouse running on a Tamino XML Server in the University of Chicago Library. The Java GUI is normally used online but also has an offline mode for use when a fast Internet connection is not available (e.g., on an archaeological dig).

The second kind of software application in the Client Tier consists of Web browser apps (written in HTML, CSS, and JavaScript) and mobile apps (for iOS, Android, etc.) which have been created to view OCHRE data that projects have made public. Project directors may decide at any time to make some or all (or none) of their data public; they can do so via an option in the Java GUI. Published data is exposed to Web browser and mobile apps in the form of self-describing XML documents—usually one per real-world entity (e.g., one document for each artifact, in an archaeological project). These published XML documents are intended for data-exchange between OCHRE and other applications and are dynamically constructed as needed from the much more highly atomized data in the OCHRE warehouse. Thus, they are “flattened” or “denormalized” in comparison to the underlying database items (although, technically speaking, the atomized database items are also XML documents; see the Database page of this website for further details). Published OCHRE data can be delivered to a browser or mobile app as XML or as JSON or HTML, depending on what the app developer prefers, with automatic conversion from XML to JSON or HTML via an XSLT stylesheet that is provided and documented as part of the OCHRE Web API. Apps can be customized for particular projects or particular kinds of data. Generic Web browser apps for various research domains (archaeological, textual, historical, etc.) are under development by the OCHRE team at the University of Chicago and will be made freely available to all OCHRE users, but anyone may create an app to view published data using the OCHRE Web API.

The Middle Tier is the layer of software that exposes published data from the data warehouse to browser and mobile apps via a RESTful Web API (or Web service). This API is currently under development; details and documentation will be coming soon. Apps use the API to fetch published data (exposed as “flattened” XML-based data-exchange documents; see above) by means of persistent URLs; or an app may trigger Java routines and R functions, and pass arguments to them, to execute pre-written queries and analytical workflows that a project has named and saved in the data warehouse for others to use. The Middle Tier uses the HTTP Web service capability of Tamino XML Server, which gives browser and mobile apps access to published data from the core data warehouse.

The Core Data and Analysis Tier consists of both the OCHRE data warehouse and a separate server for executing R functions to do statistical analysis and visualization. The data warehouse is highly scalable and extensively indexed, permitting fast queries. It is implemented using a high-performance database management system called Tamino XML Server from Software AG, which is maintained and backed-up by professional system administrators in the Digital Library Development Center of the University of Chicago Library. The data warehouse is structured in accordance with an innovative non-relational graph data model that is optimized for semistructured data represented as recursive hierarchies of spatial, temporal, linguistic, and taxonomic items. The data warehouse and its underlying data model are described further in the Database page of this website.