Data-Driven Policy Goes to Washington

By Rob Mitchum // April 10, 2014

Through civic hacking events and open data portals, the Obama administration has embraced the potential of data and programming to improve the performance of government for its citizens. As academia and industry increasingly moves toward using computational techniques to inform policy decisions, these more ambitious efforts have also attracted the attention of the White House. On April 4th, the President’s Council of Advisors on Science and Technology (PCAST)convened a panel called “Analytical Techniques to Improve Public Policy Decision-Making” at their regular meeting, inviting CI Senior Fellow Charlie Catlett and three other experts to report on the promise of this young research area.

In front of a Washington D.C. audience including John Holdren, director of the White House Office of Science and Technology Policy, panelists discussed how simulation, analytics, machine learning, modeling, and large-scale social science experiments can revolutionize the way policy is created and tested, just as they have left their mark in the fields of physics, engineering, and biology.

PCAST vice-chair William Press set the tone by asking whether computational and analytic techniques originally developed to design spacecraft and rockets “can inform policymakers more directly, but also in some ways just have more relevance to the lives of individuals, of everybody.”

In his remarks, Catlett talked about how the CI’s Urban Center for Computation and Data was designed to address that challenge by building new partnerships between the physical and social sciences.

“There are a lot of people with passion for analyzing data that don’t have the training or the background to know the right questions to ask of that data when it comes to policy or in our case, cities,” Catlett said. “Then there’s a vast community of people who have just the right questions but are not aware of the data being available, and if they were they wouldn’t have the tools or methods to be able to analyze that data. So we’re trying to bridge between those two communities.”

To illustrate the potential of these collaborations, Catlett described the LakeSim project that will provide modeling of energy, transportation, and other sectors for the designers of the 600-acre Chicago Lakeside Development. He also mentioned two projects with the City of Chicago — an upcoming project to “instrument the city” with sensor boxes to collect new forms of data about urban life, and a predictive analytics platform that will help officials anticipate and address issues such as rat infestations or black market cigarette sales before they even happen.

Additionally, to train future researchers who can straddle the worlds of policy and analytics, Catlett described the Data Science for Social Good fellowship, which will bring in its second class of students this summer.

“There’s a generation of people here that are very interested in using data to make the world better, so we want to capitalize on that and train them to use these tools appropriately,” Catlett said. “This is a program that is not just about learning to use the tools, but trying to figure out how to take a business problem or opportunity and some data and put them together to learn something useful to a real organization.”

[You can watch the archived video of the entire panel at the PCAST website — Charlie Catlett’s remarks begin at 36:00.]

Other speakers on the panel talked about both the great potential of analytics for public policy and the tremendous difficulties that this approach will face. In describing the challenges of modeling the policy effects, Chris Barrett of the Virginia Bioinformatics Institute compared gene-environment interactions to the influence of society on individuals. Large-scale social dynamics, such as the economy, can have effects at the individual level, which is a challenge for computer scientists creating already complex agent-based models of the behavior of millions or even billions of people.

“We’ll have to ask the question of how do we get populations of all these ‘you’ things that are not even local and interact them in all the ways they can interact for purposes of informing policy,” Barrett said. “Because you’re going to do things like have economic effects guide something else, which cascades to juvenile obesity. There’s a complicated chain of things going on there.”

One industry that is already deep into research on the complex interactions between people and society is advertising, where there is an obvious profit motive to model and understand human behavior. Panelist Claudia Perlich, a scientist with digital advertising firm Dstillery, brought some of that industry’s perspective to the panel, talking about the use of machine learning techniques to both predict the future and test hypotheses about the past. But Perlich warned that data quality and domain expertise are very important for modeling, and that ever larger amounts of data don’t always make it easier to draw useful conclusions.

“Since the model is only as good as the data, you need to understand the data,” Perlich said. “If i drop a terabyte of data on you, there’s no understanding. There’s no way to really grasp and fully explain what’s going on.”

But even that cloudy situation is an improvement on the way social science has traditionally been performed, argued Duncan Watts of Microsoft Research. The proliferation of data about human life — much of it collected accidentally through communication, shopping, internet content, and other means — offer exciting new ways to study policy and society and examine theories that were previously impossible to test.

“There’s plenty of ideas about how the world works, but what you don’t see is a cohesive, cumulative, and empirically tested body of theoretical knowledge,” Watts said. “In some sense, we’ve generated an enormous amount of ideas about how the world might work, but we’re not coalescing on any cohesive ideas about how the world does work.”

While the recent explosion in data does create some challenges — such as siloed data, algorithmic confounds, and institutions not structured to conduct this type of research — the ability to do both “field experiments” and large-scale virtual experiments over the web may change social science just as dramatically as some other historic instruments changed their fields.

“These two capabilities together, I think, have the ability to revolutionize not just society itself, but how we study society,” Watts said. “Those of us who are prone to metaphors point to the telescope as a potential analogy, but I also think the collider is another analogy of being able to do experiments on a large scale.”