By Rob Mitchum // April 21, 2017
Last summer, Data Science for Social Good fellows dug into the program’s first ever infrastructure project: using data to predict water main breaks in the New York city of Syracuse. While failing pipes might not get the headlines of other DSSG topics, such as police misconduct, education, and lead poisoning, the failure of these systems are a huge financial drain on cities and cause major headaches for businesses and residents. Partnering with the Syracuse i-team, DSSG and the Center for Data Science and Public Policy developed an algorithm that finds the water mains most likely to break in the near future, so that the city can make proactive repairs rather than responding to catastrophe.
In Politico Magazine, reporter Debra Bruno used this unique collaboration as an example of how cities increasingly turn to data and mathematicians to help solve complex and costly urban problems. The piece describes how the team gathered data from a variety of sources — including decades-old handwritten notebooks — and created a model that helps direct Syracuse crews.
This machine-learning system, an application of artificial intelligence, homed in on 50 (out of 5,263) of the city’s most break-prone blocks and pointed to 32 blocks that were most likely to break in the next three years.
To get to that formula, researchers applied a series of factors—age of pipes, construction material, previous breaks and pipe dimensions—to breaks that happened in the past as a way to “predict the past,” or test whether the formula, working blind, could accurately guess which mains would break. Rayid Ghani, director of the University of Chicago’s Data Science for Social Good summer fellowship, says, “If you have 10 years of data, you take nine years and hide the tenth year from the system. So you pretend it’s 2015 and you try to predict what would have happened.”
One surprise in the findings, notes Ghani, was that pipes that had broken recently tended to be more likely to break again, possibly because of some intrinsic flaw that hadn’t been corrected with a repair. Keep in mind that the city expects to see 500 to 600 breaks over the next few years, says data officer Edelstein. When the city does replace some mains in the 32 hotspots, “we’d be pretty sure we are replacing the ones most likely to break,” he says.
So far, the model has performed well, the story reports, as 21 water main breaks occurred in the “hotspots” identified by the DSSG model. The team also shared the code through Github, so that other cities facing similarly aging infrastructure can apply it to their own data. “We should be able to solve this problem for any city,” 2016 fellow and current DSaPP postdoc Avishek Kumar told the reporter.
For more on the water main project, read a blog post and watch the presentation below.