Of course, states and localities already use data analytics for a lot of things. Departments of revenue, for instance, rely on it to curb tax fraud. Public schools use it to measure student performance and figure out how to boost grades and graduation rates. Cities turn to it to manage traffic congestion and monitor air pollution. But despite all of this, governments are still collecting vast amounts of data and, well, doing nothing with it. “A lot of time is spent and wasted trying to find the right data,” says Adnan Mahmud, founder and CEO of LiveStories, a firm that creates digital tools for visualizing data. “Very little time is spent exploring it.”
Mahmud estimates that government workers spend about 80 percent of their time trying to find data and only about 20 percent of their time analyzing it. “We need to flip that number,” he says. He and others argue that government needs a better way to sift through and tell the story that lies behind the data it collects. But most important, it needs people who can analyze and diagnose what it means.
That’s where “citizen data scientists” come in. These people aren’t statisticians or analysts by training, nor are they coders -- the people who build apps using government data and programming software during hackathons. Rather, these are skilled workers who can generate predictive models or pursue data analysis using new software tools or apps. The technology research firm Gartner predicts that as much as 40 percent of data science tasks will be either automated or conducted by these nonexperts by 2020.
In the public sector, citizen data scientists range from volunteers to government workers. Dr. Matt Willis, a public health officer for Marin County, Calif., uses citizen data scientists to tackle a range of problems, from finding better ways to manage the county’s emergency services to stemming the exploding opioid epidemic. He’s excited by the potential and says that it’s increasingly important for governments to “provide tools that allow people who are not analysts to conduct analytical research.”
Marin County has begun using report cards that present data in what Willis calls a “storytelling” format so that everyone from county workers and government partners to policymakers can better understand the correlations between certain sets of data. He’s hoping it’s a first step toward encouraging citizen volunteers to begin to do their own analysis using the county’s open data and tools.
Tom Schenk, Chicago’s chief data officer, has similar hopes. He says that with the right planning, the city can get high-quality predictive models using nonexperts and volunteers. Already, the city has used citizen data scientists to predict with a high degree of accuracy when the city’s beaches could be affected by an outbreak of E. coli. Another project the city hopes to engage nonexperts on is to accurately gauge how much rainwater runoff goes into the city’s sewers and how much can be diverted by more environmentally friendly methods.
But some in governments are wary about letting volunteers and nonexperts interpret data using dashboards and other analytical tools. These officials are worried citizen data scientists will see things that government doesn’t want them to see. For instance, will a map reveal awkward disparities in how rich and poor neighborhoods receive public funding? They also worry that the correlations and predictions could end up being spurious or distracting. Already there’s a cottage industry of unusual and ridiculous correlations. One online meme jokingly notes a correlation between the release of Nicolas Cage movies and the number of swimming pool drownings in the U.S.
Mahmud, Willis, Schenk and others think these concerns can be avoided. When government makes sure the data is presented in the proper context and the right parameters are set around the project, the prospect of something going wrong can be minimized, says Willis. “I believe we can benefit from the collective wisdom of the community.”
This story was originally published by Governing.