Despite well-publicized advantages, however, open data has many unknowns. For example, there are no unified publication standards yet, data formats vary by jurisdiction, and data comes with no guarantees that projects will do more than just provide transparency — a valuable trait, but an investment not always justifiable when budgets are tight.
On Sept. 24 Mark Headd, Philadelphia’s former chief data officer and now “developer evangelist” for Accela, offered advice at the 2014 Code for America Summit, an annual rally highlighting the group’s volunteer civic tech projects and campaigns. Headd delivered tips alongside fellow open data Developer Evangelist Seth Axthelm and Senior Product Manager Brian Gryth — a recent hire and founder of OpenColorado.
The team’s message focused on practical steps any government can use to start or improve open data programs — with primary emphasis given to problem-solving and open data value creation.
1. Start with Public Data
A fear for many entities that are publishing open data is the possible public or internal blowback that could result from unintentionally opening private data, or data that's sensitive or too underdeveloped for release.Axthelm said such dread is avoidable — and especially for emerging programs — when data sets are chosen from information already publicly released by converting it into computer-friendly formats (for examples, click here).
“One of the challenges that a lot of agencies run into when they’re trying to publish open data is the nervousness about getting that data out there,” Axthelm said “The great thing about this [public data] is that someone has already gone through that approval process of saying ‘Yeah, it’s OK to put that data out there.’”
Examples of the practice include searching out commonly requested PDF documents, scanned images and other difficult to search document types. Meeting minutes are a common source for this information — reports often locked in PDF form.
2. Identify Desire Lines
To explain the concept of a “desire line,” Headd projected an image of a green lawn bordered by sidewalks but with a trail cut diagonally down its center. The scarred grass, he said, is what happens when builders don’t think about — or anticipate — the behavioral demands of visitors.“A desire line is what happens when there is a divergence between what you want people to do and what they want to do,” Headd said, crediting the idea to Andrew Turner, CTO at ESRI’s Research and Development Center.
In relation to open data, he said governments should try to find these separations, and then discover the underlying reasons why information is used. Governments can search their websites with basic analytics to identify which data sets may be in the highest demand. Indicators might be what pages are visited, links commonly clicked, frequent downloads and types of top referring sites. Google Analytics is a commonly used freemium tool for analysis, and there also are paid Web services like CrazyEgg that generate clicking heat maps for popular pages. Headd didn't endorse any specific analytics tool, but encouraged the study of open data efforts to prevent time-intensive demands on staff.
“The point that I want to leave you with here is that if you have information on your public website — and it’s valuable information and you don’t give people an easy way to get it — they will come and get it anyway,” he said. “And unfortunately, sometimes when they do that, they will be disruptive.”
3. Look for Offline Indicators
Web metrics aren’t the only pointers for selecting open data sets. Indicators of high-value data can come from department staff themselves — like data typically requested between departments — or information requests arriving outside the Internet entirely.The Freedom of Information Act (FOIA) and the Freedom of Information Law (FOIL) are two such outside sources. Both laws mandate that U.S. jurisdictions provide the public with all government data that doesn’t have privacy restrictions.
Open data assists demand by having one repository for common requests. In a study conducted by Reinvent Albany, a government transparency group in the state of New York, the organization analyzed the State Department of Environmental Conservation’s 2013 FOIL requests. Of the 3,977 total requests, Headd said 50 percent were attributed to only six data sets. The takeaway: If the data sets were openly available online, it would’ve reduced request workload — phone calls, research and analysis — to roughly 50 percent.
"Think about all that time and money you could recoup just by paving over that desire line,” Headd said.