The Life-Cyle of a Text

Introduction

OCHRE is a powerful database environment for editing textual content. Our unique data model has many advantages over other approaches and has far-reaching implications when conceptualizing a text, representing it digitally, capturing relationships with other relevant data, and presenting integrated, interactive views for scholarly analysis and discovery.

Follow the life cycle of a Text item in OCHRE from initial creation, through various stages of editing and linking, to ultimate publication and preservation.

OCHRE has a category dedicated to the collection and management of Texts. Conceptualizing a capital-T “Text” using the OCHRE data model often requires some adjustment of one’s preconceived notions of what the content and structure of a digital representation of a text should look like for research purposes.

Conception

Conceptual Manoeuvre: Tablet vs. Text

Our first move, conceptually, is to separate the textual content from the artifact, monument, papyrus, or other physical object upon which it is written. By doing this, the physical object can be recorded and analyzed in its own right, distinct from any epigraphic content written upon it.

Take for example the rather ordinary tablet from the Persepolis Fortification Archive shown in the sidebar to the right (click on the image to enlarge it). This tablet has an Elamite text inscribed on multiple surfaces as well as an Aramaic text written (inverted) in ink at the bottom-left of its reverse surface. Additionally there are three seal impressions on various parts of this tablet. All of these items of study — 1 Tablet, 2 Texts, 3 Seal impressions — are given independent status as distinct items in OCHRE.

In separating these items of interest we do not want to lose sight of their original connection so we relate them via links. The description of the Tablet (Object PF-NN 0037) in OCHRE thus includes links to its two related Text items: the Elamite Text PF-NN 0037 and the Aramaic Text PF-NN 0037Ar.

Text, conception, the artifact description.JPG

Conceptual Manoeuvre: Epigraphic vs. Discourse

Another important distinction we make in the analysis of textual content is the separation of the physical structure and layout of the text components from the meaning which they convey. That is, the “epigraphic” structure of the text — the lines of text, the signs or letters, and the surfaces upon which they are written — are represented separately from what we term the “discourse” structure, that is, the sentences, phrases, and words being communicated.

Both of these structures, the epigraphic and discourse analyses, are represented in hierarchical terms.

For the epigraphic analysis we might have:

Tablet (PF-NN 0037)

  • Tablet surface (Obverse)
    • Line (01)
      • Sign 1
      • Sign 2
      • etc.
    • Line (02)
      • etc.
  • Tablet surface (Reverse)
    • etc.

The discourse hierarchy might look like this:

Tablet (PF-NN 0037)

  • Sentence (“840 qts. of flour, to my puhu, to them, issue as rations!”)
    • Phrase (“840 quarts of flour”)
      • Word (“840”)
        • Sign 1
        • Sign 2
        • etc.
      • Word (“quarts”)
      • Word (“flour”)
      • etc.

Notice that the epigraphic and discourse hierarchies overlap at the bottom, sharing and interpreting the same signs, just in different contexts.

Born Digital

Nowadays, much data originates in digital format, but we recognize that most scholars have many texts in pre-existing formats, such as word processing or database files, that must be born again, and often again and again, as systems are upgraded and formats are changed in the name of “progress” in the digital age. OCHRE has a number of features to support the reincarnation of data.

XML Data Format

XML (Extensible Markup Language) is at the heart of our database system. Whatever else we may claim for XML, it is unquestionably a data format suitable for sharing and transforming data. This makes it relatively easy for OCHRE to transform incoming data into OCHRE’s data structure, or to export data into other useful formats.

Pending Content

Every Text item has a Pending Content pane where data can be entered and stored while it is waiting to be processed — a work area, if you will. This content can be:

  • Typed directly into Pending Content using a familiar line-by-line format
  • Pasted in from the local clipboard
  • Loaded in from external files
  • Imported with the help of one of our research database specialists

Appropriate Representation

All content should be given to OCHRE as Unicode-based data. Appropriate fonts can be specified for different views of the Text. A hieroglyphic font might be used for display of the epigraphic content (the hieroglyphic signs), while a Unicode-based font containing characters with appropriate diacritical marks might be used for the discourse view (a normalization or vocalization). OCHRE also enables any number of virtual keyboards (with matching fonts) to facilitate the entering of special characters.

Import Tool

A comprehensive import tool enables intelligent loading of external data with options for parsing, language-tagging, and character substitution of incoming content. This tool greatly enhances the conversion experience.

Egocentricity

An Item-based Approach

The basic structural elements in OCHRE’s data model are individual items of information that are of interest to particular researchers and are defined in relation to their research needs. Within OCHRE a Text is comprised of both epigraphic items like surfaces, lines, and signs or characters, and discourse items like sentences, phrases, and words.

Each item finds itself at the center of its own universe. Everything that can be known about an item is attached to that item. It knows what to call itself (its name), its own description (intrinsic properties), what it looks like (attached images), how to draw itself (attached shapefiles), what has happened to it (events), and who has studied it. Also, and very importantly, it understands its relationships to other items (links) and its context within the whole Text.

Extreme Atomization

Applying the item-based data model to our example, the Text is atomized into its most minimal parts. For most purposes, this extreme minimal part is the sign or letter. Projects dealing with syllabic writing systems treat each individual sign as a unique database item. For projects dealing with texts written in alphabetic writing systems, the minimal database item is the individual letter.

This approach allows the scholar to analyze each individual grapheme, adding notes, comments, links, and important metadata. For example, each sign (an epigraphic unit) can be tagged with metadata related to damage, certainty of reading, scribal emendation, sign placement, and various other characteristics. Each word (a discourse unit) can be described as to its meaning, grammatical form or syntactic role.

The Whole Self

OCHRE’s built-in import process performs the decomposition of textual elements, organizing them into their respective epigraphic and discourse hierarchies (see Conception). This hierarchical organization gives necessary structure to the individual component items and allows them to exist together, in the aggregate, as a single Text.

Here, again, is our “840 quarts of flour” in its epigraphic context (line 4 on the Obverse side of the tablet) and its discourse context (the words to be spoken to Bakadadda).

Text, egocentricity, hierarchies.png

Growing Pains

Control Issues

As Text editions are added to an OCHRE project’s online archive, the constant challenge becomes one of staying organized. The analysis of each Text generates a host of related information pertaining to items such as the artifact on which it was written, related bibliographic entries, photographs, scans, and other images, editorial contributors and their contributions, and so on.

OCHRE prevents a project’s collection of data from burgeoning out of control by providing appropriate categories of primary data in which to store and manage the accumulated mass of related project content. OCHRE’s built-in categories are as follows:

  • Bibliographies
  • Concepts
  • Dictionaries
  • Locations & Objects
  • Chronological Periods
  • Persons (Users) & Organizations
  • Resources
  • Taxonomy (Variables, Values, Predefinitions)
  • Texts
  • Writing Systems

Identity Crisis

OCHRE does not present itself as an anonymous authority on the data contained within its framework. All data can be properly attributed to the scholar who has contributed the data.

OCHRE’s highly atomized, item-based approach also allows for scholarly disagreement at any level. For example, OCHRE allows for contrasting opinions on any letter, word, line, etc. If one scholar reads a sign as {d} and another reads it as {r}, the database can track both opinions and attribute them to their respective sources.  [Behind the scenes, simply imagine new branches being formed on the epigraphic and discourse hierarchies representing diverging opinions.]

The Standard View of the Text in OCHRE allows one to choose which scholar’s edition to display, toggling between different editions from among those made available.

Text, growing pains, identity.JPG

Privacy Concerns

During that self-conscious phase when text editions are being created, edited, and polished, the project has the ability to keep them private. In fact, each project has complete control over which sets of data are made public, and when. When you are satisfied that your work is of publishable quality you can “flip the switch” and make your text editions available to the wider world.

Spreading of Wings

Once a Text has been entered, fully described, and tagged, it is time to reach out and make connections to related project data. Items within any of the categories of data in OCHRE can link to any other item within any category.

Linking to Images

Typically there will be an assortment of photographs, scans, specialized images (e.g. PTM/RTI format), and drawings of a Text. These items are, first, set up as items in their own right in OCHRE’s Resources category, properly attributed as to their source and contributor. They are then linked to the Text item.

Here is a partial view of the gallery of images linked to our sample Text.

Text, spreading wings, image gallery, embedded.JPG

Hotspot Links to Images

Because of OCHRE’s highly atomized, item-based data model, each sign is represented as its own item in OCHRE which can, therefore, be linked independently to a specified “hotspot” that defines that sign’s position on the Tablet. This provides a sign-by-sign linked view of the Tablet. Once again, the underlying relationships among the individual epigraphic and discourse units can be used to provide a synchronized view of the Text, allowing the user to click in any pane to see the related content in each of the other panes.

Text, spreading wings, hotspots, embedded.png

Linking to the Writing System

Did we say that the epigraphic unit was the lowest, most atomized unit of the textual analysis? Well, we lied. Each epigraphic unit can link to a sign in a specified Writing System defined to OCHRE. Here the sign “GIŠ” in line (04) of our sample Text is linked to its appropriate entry in the cuneiform writing system.

Text, spreading wings, scriptUnit, embedded.JPG

Linking to Dictionaries

The words of each Text link to the corresponding entries of the appropriate dictionary or glossary in OCHRE’s Dictionaries category. In fact, OCHRE provides tools to parse the words of the Text and automatically link to existing dictionary entries, or create new entries as needed. In this way a project can build up a comprehensive corpus-based dictionary as Texts are added to OCHRE.

Text, spreading wings, dictionary, embedded.png

Prime of Life

Having successfully navigated the coming of age process, our Text is now ready for prime time. Publishing the Text is a simple matter of turning off the privacy setting and making it available through a public interface, along with all of the other content ready to be published.

Here, for example, is the Public View provided by the Persepolis Fortification Archive project, allowing users access to a large collection of published Texts, images, and related information.

Text, prime, public view, embedded.png

OCHRE has many built-in Views of a Text, each presenting a different assortment of content. The Standard View is the most egocentric display, but provides links to a wide range of related items. The Description panel of this view, for example, links in:

  • Editors
  • Other comparable Text items (“Cf. …”)
  • The time Periods pertaining to this Text
  • The original Tablet artifact with its related Seal impressions
  • Other Texts found on the same Tablet (in this case the Aramaic epigraph)

Text, prime, description pane.JPG

The Parallel View shows a familiar line-by-line format that includes the Transliteration (from the epigraphic analysis), the Phonemic view (from the discourse analysis) and the Translation. The Synchronized option, turned on in this example, uses the underlying relationships inherent in the epigraphic and discourse hierarchies to highlight the related components of each view.

Text, prime, parallel sync view.JPG

Immortality

Working in close collaboration with the Digital Library Development Center(DLDC) at the University of Chicago Library, OCHRE is being adapted to support a comprehensive archival strategy for all OCHRE data items.  OCHRE will package project data in a format suitable for dispatch to a secure archive where it can be preserved indefinitely.

We are still seeking immortality, but more details regarding the promise and the dream can be found here.

The Life-Cycle of a Text, In Images