OCHRE: An Online Cultural and Historical Research Environment
A Computational Platform for All Stages of Research Data
The OCHRE platform is a tightly integrated suite of computational tools for working with all kinds of data through all stages of research, from initial acquisition to final archiving of the data. OCHRE provides a seamless environment in which it is easy to move from one stage of research to the next. The data is organized by project and is credited to and controlled by the members of the research project. OCHRE has a powerful graphical user interface (GUI) for entering, viewing, and organizing information from many different sources, and for combining, analyzing, publishing, and preserving the information.
Complementing this common GUI, which participating projects use to build and manage their data, are customized Web browser and mobile apps. These employ a Web API (currently under development) that anyone can use to write apps for retrieving and displaying published data from the OCHRE platform in whatever form may be needed. These apps can be tailored for a particular project or a group of similar projects to view and search their data in familiar ways.
The alternative is to employ an ad hoc collection of unrelated software for data management, data analysis, mapping, Web publication, etc., as most researchers currently do. But this requires cumbersome transfers of data from one piece of software to another using intermediate file formats—a time-consuming and error-prone process in which it is easy to lose track of the many pieces of information that one accumulates in a typical research project. By contrast, OCHRE users have a comprehensive view of their data in all its stages and a coherent user interface in which to:
- acquire relevant data from instruments, external online data sources, legacy data files, or by keying it in manually
- integrate large amounts of heterogeneous data within a common, searchable framework
- analyze and visualize the data using powerful statistical techniques to answer research questions
- publish data and research results on the Web in standard formats (XML, JSON) for online viewing and for other software to use
- archive data in an open, standard format (RDF) to preserve it for the long term and ensure its accessibility and reusability
OCHRE was originally developed for use in ancient studies, as an aid to archaeological research involving excavations and surveys, and as an aid to philological research on ancient inscriptions and languages. But its basic design and powerful features make it suitable for many other kinds of research in the humanities and social sciences, and in some branches of the natural sciences, as explained below.
OCHRE at the University of Chicago
The OCHRE platform is supported by experienced computing personnel at the University of Chicago. Technical support, user training, and legacy data conversion are provided by the staff of the OCHRE Data Service (based at the Oriental Institute of the University of Chicago). The OCHRE servers are professionally hosted and maintained at the University of Chicago Library, and the University of Chicago Research Computing Center provides expertise and support for aspects of OCHRE that require high-performance computing, especially for statistical analysis and visualization.
At the heart of the OCHRE platform is an innovative graph database that serves as a data warehouse to integrate diverse information from many different sources. The OCHRE database is implemented in an enterprise-class database management system that is monitored 24/7/365 and backed-up by system administrators in the University of Chicago Library, who ensure the security and accessibility of the data.
OCHRE has been engineered, tested, and refined over the past several years in close consultation with academic researchers in a variety of fields and with funding from a four-year, $1.75-million Scientific Software Integration grant from the U.S. National Science Foundation’s Office of Advanced Cyberinfrastructure (award no. 1450455). Additional funding for research projects to use and test the OCHRE platform has come from the Social Sciences and Humanities Research Council of Canada for an interdisciplinary and international collaboration entitled “Computational Research on the Ancient Near East” (CRANE), headed by Timothy Harrison of the University of Toronto.
OCHRE is now being used by more than 60 multi-person research projects representing over 40 universities and other institutions in the U.S., Canada, Middle East, and Europe, including several large, multi-person projects at the University of Chicago. As of mid-2020 OCHRE has over 700 user accounts and more than 9 million indexed database items representing 80 terabytes of data. This scale of usage has enabled rigorous real-world testing of the software for a wide range of academic use cases and has demonstrated the system’s sustainability. With testing now completed, the platform is being advertised more widely and new projects are being added. OCHRE can easily accommodate thousands of users and petabytes of data because it has been professionally engineered to be computationally efficient and scalable.
OCHRE Deals with Divergent Views of Space, Time, and Taxonomy
OCHRE, which stands for “Online Cultural and Historical Research Environment,” was initially developed for use in archaeology and philology. These fields of research have well-established empirical methods but they are characterized by a high degree of ontological heterogeneity, such that similar phenomena are described in different ways by different researchers. (An “ontology,” in the sense intended here, is a specification of the concepts and relations in a given domain of knowledge. A hierarchical taxonomy is a common, and relatively simple, kind of ontology.)
Ontological heterogeneity is not a problem in itself. Indeed, it is inherent in the practice of research, because different ontologies reflect different interpretive frameworks and research agendas—they are not just the result of sloppy thinking or individual quirks. Ontological heterogeneity is not a vice to be eliminated, in a misguided attempt to standardize human ways of knowing, but rather a defining virtue of a research community that is open to multiple perspectives. However, the non-standardized, heterogeneous ontologies embedded in traditional databases (e.g., in the table schemas of relational databases) cause problems for researchers because they inhibit the automated integration and comparison of data among different research projects that use different recording systems. A mechanism for automated querying and comparison across many diverse data sets would be of great benefit to researchers. What is needed is software that does not suppress ontological heterogeneity via forced standardization but instead embraces it and facilitates data integration by making it easy to create semantic mappings from one project to another. This is a basic aim of OCHRE.
Archaeologists study the material traces of past human activity. Philologists study the historical development of languages, literatures, and systems of writing. These two disciplines exhibit, not just ontological heterogeneity, but a high proportion of relatively unstructured or semistructured data in the form of qualitative descriptions and natural-language texts, which are best represented digitally as open-ended hierarchies (trees) rather than as rigid tables with rows and columns. Archaeology and philology also entail close attention to geographical and chronological variations in the phenomena being studied. Moreover, when dealing with spatial and temporal relations among entities, researchers need mechanisms for representing not just absolute locations in space and time, in terms of numeric coordinates and dates, but the relative placement objects or events with respect to other spatial and temporal phenomena.
Innovative computational methods for dealing with ontological heterogeneity, semistructured data, and spatio-temporal relations are the hallmark of OCHRE. Several large collaborative projects in archaeology and philology have been test beds for OCHRE and provide examples of its use. But it turns out that the software tools developed to deal with the spatial, temporal, linguistic, and taxonomic complexity of archaeological and philological data are applicable to a much wider range of research. This is so because the software is based on powerful conceptual abstractions expressed in an innovative graph database structure characterized by overlapping recursive hierarchies of atomized data elements (described in the Database page of this website). OCHRE’s hierarchical and recursive data model can flexibly represent scholarly knowledge of all kinds without sacrificing the power of modern databases, because it is implemented by means of well-indexed and properly atomized database items that conform to a predictable schema, and so enable efficient queries. Accordingly, OCHRE is now being used, not just in archaeology and philology, but in other areas of the humanities and social sciences; and also in branches of the natural sciences where spatial, temporal, and taxonomic variation are key concerns, such as population genetics (comparing ancient and modern DNA), paleoclimatology and other kinds of paleoenvironmental research, and paleobiology.
OCHRE Manages All Kinds of Research Data
OCHRE supports a wide range of digital formats and data types: textual, numeric, visual, sonic, geospatial, etc. A project’s textual and numeric data are ingested into the OCHRE data warehouse, where they are atomized and manipulated as database items (see the System Design page for a description of the data warehouse in relation to other components of the OCHRE system). OCHRE can automatically import textual and numeric data stored in Excel spreadsheets, Word documents, other XML documents (e.g., TEI texts), or plain text files (e.g., CSV files).
OCHRE Embraces Multiple Ontologies
Computational tools for working with a growing body of interconnected scholarly knowledge must cope with the practical reality that such knowledge is recorded by many different people using divergent ontologies. Each ontology reflects the nomenclature and conceptual distinctions relevant to a particular domain of research, and perhaps also reflects the idiosyncrasies of an individual researcher. No single ontology, no matter how complex and ramified, will be suitable for all purposes, because there is an endless array of conceptual possibilities depending on the subject matter and the questions being asked, not to mention the linguistic traditions and historically situated perspectives of the researchers involved.
OCHRE Conforms to Semantic Web Standards
OCHRE is based on the open, non-proprietary standards published by the World Wide Web Consortium (W3C), the organization responsible for the design of the Web itself. This is especially important for data publication and archiving. The OCHRE Web API publishes data for use by Web browsers and other software using the W3C’s Extensible Markup Language (XML), a self-describing tagged-text format, with stylesheets in the Extensible Stylesheet Language (XSLT) to convert the XML data to HTML or JSON for use in Web apps.
Thus, OCHRE is fundamentally compatible with the Semantic Web. At its deepest level, it is based on the open, non-proprietary standards published by the World Wide Web Consortium, which means it is not locked into any commercial software vendor. Its integrative data warehouse currently runs on Tamino XML Server, a native-XML database management system (DBMS) from Software AG that uses the W3C’s XML Schema language to define the database structure and uses the powerful XQuery language to perform database queries. However, any XML DBMS that supports XQuery could be used instead, such as MarkLogic, IBM DB2 PureXML, or Oracle Berkeley DB XML. (To avoid confusion, please note that the XML “documents” in the OCHRE data warehouse are quite small and highly atomized, and function as indexed database objects with unique keys. They do not normally correspond to real-world documents—unlike TEI-XML documents, for example. See the Database page of this website for more details.)