A New Algebra of Data Visualization | Computation Institute

By Rob Mitchum // March 16, 2015

Data visualization sits within the triangle of science, art, and communication. That’s part of what makes it an exciting field, particularly in today’s information-heavy age. But these split loyalties also create a delicate balance that must be struck to produce a successfully communicative visualization. Fall too far on the science side, and the graphic may be too technical or dry for a general audience. Too far on the art side, and the nuances and details of the underlying information are lost beneath the design.

Many design experts have attempted to plot a safe path through these hazards with theories about the proper grammar and practice of data visualization. But in a 2014 paper and presentation at the IEEE VIS meeting, CI faculty and fellow Gordon Kindlmann proposed a new, mathematically-driven approach to creating better representations of data: Algebraic Visualization.

Developed with Carlos Scheidegger from the University of Arizona, algebraic visualization is based around the central rule that changes in the data should be reflected by perceptually equivalent changes in the graphic. By providing a new “working vocabulary,” it gives scientists and other people who work with complex data a new strategy to make sure their visualizations make sense.

“We’re tying together recent work on perception with the tools and applications of visualization,” Kindlmann said. “We hope this empowers everyone who uses data to create good visualizations, with a way of mathematical thinking that allows people to test different variations out.”

In an online primer on the theory, Kindlmann and Scheidegger give examples of what they call “confusers” and “jumblers” — visualization choices that fail to accurately communicate changes in data. Using a 2014 county-level election map of Virginia, they demonstrate how different color schemes can represent the actual election results well, but fail under other circumstances, such as when the results are reversed or narrowed.

For example, a simple red-blue colormap does not change whether the margin of victory in each county was 10% or 1%, a “confuser” that fails to communicate the closeness of each race. A red-purple-blue colormap fixes that problem, but when the data is “flipped” (eg a 55-45 race is changed to 45-55), it is hard to tell the difference before-and-after, since close races are represented by hard to distinguish shades of purple. This dilemma represents a “jumbler,” where the data change is hard to track with human perception. (A red-gray-blue colormap, as seen at the top of this article, fixes both these problems.)

Algebraically, these flaws can be depicted as a mismatch between the variables alpha and omega. A change in alpha (the data) should be accompanied by an equivalent change in omega (the visualization). If that’s not the case, then more work is needed to create an optimal visualization.

What may sound like common sense is a paradigm shift for the field of visualization, which has historically operated on subjective opinions of the “best” practices instead of mathematically-based rules. Visualization software or online tools will often present users with pre-determined color schemes and templates, without much explanation about why they were chosen or under what conditions they are best used. As a result, many people go with what designs look best, without considering the potential for confusers and jumblers to create inaccurate or misleading visualizations.

At the moment, algebraic visualization is just a theory, a new perspective and framework for the field and a guiding principle for further research on human perception and the accurate communication of information. But some previous attempts to create theories of visualization have eventually resulted in software, such as Tableau, or programming libraries, such as ggplot2, that translate a philosophy of graphic design into a user-friendly tool for visualizing data. Kindlmann said that any such applications of his work with Scheidegger are in the distant future, but that new, mathematically-based visualization tools will be increasingly important as researchers work with larger and more complex sets of data.

“We need visualizations not just for more data, but for more kinds of data,” Kindlmann said. “We need to create new kinds of visualizations and new tools to make them quickly and easily.”