OCHRE R Server for Data Analysis and Visualization

An important component in the OCHRE platform is an R server that is tightly integrated with the OCHRE database (see the System Design page of this website). R is a programming language and a set of related software tools for statistical computing and graphics. It is free and open-source, and has become very popular in recent years. OCHRE users can perform statistical analyses and visualizations via an R server that is hosted and supported by the University of Chicago’s Research Computing Center, whose staff have expertise in high-performance computing and maintain the University’s computing cluster. The OCHRE R server is implemented using the Rserve TCP/IP mechanism.

In addition to the built-in R functions, there are many pre-written R packages available to perform a wide variety of procedures, ranging from simple univariate and bivariate statistics to complex multivariate statistics, as well as specialized kinds of data analysis, such as natural language processing (NLP), social network analysis (SNA), spatial analysis, and machine learning. R packages can make use of code libraries written in other languages such as FORTRAN, C/C++, Java, or Python. Thus, R provides a mechanism for running Python code, for example (e.g., the NumPy and SciPy libraries), if a project wishes to do so.

The OCHRE Java GUI lets end-users set up and run queries to retrieve data from the data warehouse and display it in various ways. They can analyze the query results by formatting them in R data frames and sending the data to the R server together with R commands that execute code on the server to perform the desired analytical procedures. The numerical and graphical results of the analysis are sent back from the R server to the OCHRE GUI, where the user can save them in the data warehouse as resources for later use, if desired. These interactions among the various components of the OCHRE platform are illustrated in the system diagram below.

A Data-aware R Console and Analytical Workflow Scripts

Users who know R can enter com­mands directly into a data-aware R console inside the OCHRE Java GUI. They can save the R commands they have entered for repeated use. Commands in the console allow them to submit data to the R server from external CSV and Excel files or from dynamic OCHRE queries. Outputs from the R server are then displayed to the user in a separate window. These outputs can be named and saved in the data warehouse.

In addition to, or instead of, entering commands in the R console, a project can use YAML or JSON to script multi-step analytical workflow jobs that (1) perform OCHRE queries; (2) execute R functions to analyze the query results; and (3) specify the outputs to be returned from the R server (PDFs, images, etc.). These workflows can be named and saved by a project for use by people who do not know R or do not want to write their own scripts. When a workflow script is executed, the user is prompted to specify any external files to be used in the analysis and to supply run-time arguments to pass to the parameters of the chosen queries and R functions, in order to customize them for the current job. The progress of the job is echoed in the R console window. Scripted workflow jobs can be chained, such that the output of one job is the in-memory input (data frame) for the next. Both the workflow scripts and the outputs can be named and saved in the data warehouse as resource items for repeated use.

In addition to performing analyses via the OCHRE Java GUI, saved queries and analytical workflows can be performed on published data by means of separate Web apps that interact with the OCHRE data warehouse via the OCHRE Web API (this mechanism is currently under development). A project team can create, name, and save queries and analytical workflow scripts for repeated use and publish them for use in Web apps. App users will select the named query or workflow they wish to perform and will then be prompted for the run-time arguments (e.g., search criteria) to be passed to the chosen queries and R functions. The results of the query or statistical analysis will then be passed back to the app for display to the user.