Now, it seems antithetical that a massive government count of citizens will lead to better privacy practices for cities, counties and states. A number of privacy advocates and experts in the space, however, say that outcome is not only possible, but it is perhaps even likely. And it all has to do with something called differential privacy.
Differential privacy refers to releasing open data about the public while also making it so individuals can’t be identified within the information. For example, a city might release data on how frequently rideshare programs are used in certain areas, while at the same time changing the information just enough to obscure who is using rideshares and where, without compromising how useful the data is for observers.
Two factors have complicated this in recent years: accelerating technology and an increase in the available data, said Ginger Armbruster, chief privacy officer for Seattle, which is the only City Hall in the country to have such a role. Basically, Armbruster notes that the technology readily available to collect and aggregate data is far more advanced than it has ever been and so too is the number of open data sets. So, while a city might carefully consider and vet privacy for data it releases, that data can still ultimately identify individuals when combined with other available sets published by other entities.
“It’s not the data on its own,” Armbruster said. “It’s the data in conjunction with other data sets joined together by location data.”
It’s called the mosaic effect, and it’s a major challenge for the Census, which releases geographic data that is used for the next decade to determine federal funding and political representation, as well as for private-sector research factored into community-altering decisions such as where to locate new businesses. Essentially, during the 200-some years the Census has been constitutionally mandated, America as a nation has come to rely heavily on the insights the Census provides about our communities.
The complication now is that acceleration of tech and available data Armbruster describes. When it comes to these two areas, we live in a vastly different world than when the last Census took place in 2010, when the mosaic effect didn’t yet exist, or at least wasn’t discussed enough to have a name put to it. The risk with identifiable Census data is that it can be used to violate public trust, enabling things like landlords spotting tenants in violation of leases, or other government agencies violating legal Census confidentiality to find people who live in the country without citizenship.
danah boyd is a technology and social media scholar who is also a partner researcher at Microsoft Research, as well as a visiting professor on such matters at New York University. boyd is one of the many privacy experts paying close attention as the Census Bureau builds its differential privacy protections for 2020. boyd said the Census approach is still coming together before the April count, but there is curiosity and optimism around the finished product.
The reason being that the Census Bureau has far more resources than local governments that work on differential privacy, as well as a clearer sense of purpose than those who do so within the private sector. The Census also has a long history of commitment to confidentiality, albeit one that is yet to be tested by today’s high-tech climate. Our online access to information in 2020 is a far cry from 1920, when those who wanted Census data had to send a physical letter to the government asking for it.
Still, for boyd it is optimism that’s currently winning the day.
“This is going to be transformative,” boyd said. “It will open the possibility of making available data sets across government that have never been available before. From an innovation place, I really want to see the Census Bureau succeed, because the possibility for using data more generally is really profound.”
boyd is not alone with this optimism.
Kelsey Finch is the senior counsel for the Future of Privacy Forum, a think tank based in Washington, D.C., and she has advised Seattle on its sector-leading local government privacy policies. Finch described the Census’ identification disclosure avoidance as a “gold standard” within government, in that it has long struck the right balance between keeping its information useful without violating individual privacy.
Since the last Census, Finch said companies like Apple, Google and Microsoft have engaged in relatively minor experiments related to differential privacy, just not on the level of what’s coming from the Census in 2020.
“This will be the biggest thing we’ve seen at scale,” Finch said, “the most important rollout of differential privacy in the world. There’s a great chance of success and lots of lessons being learned. They’ve been thoughtfully working toward this for a number of years now.”
What is perhaps most exciting for local, state and county governments — as well as the citizens and communities they serve — is that the lessons learned by the Census will be a first step toward making unprecedented differential privacy tools accessible to smaller agencies, too.
There’s certainly a need for that. When Finch’s organization worked with Seattle a few years back, differential privacy was key to the recommendations they offered that city. As open data work has accelerated in places like Philadelphia, Boston and Chicago, chief data officers have increasingly factored it into projects there as well.
While guidance in the matter can come from a number of sources, Finch said the Census is easily the most practical example of what works and what doesn’t. What it all adds up to is perhaps the most exciting time for folks standing at the intersection of the vast potential offered by open data and the caution inherent to matters of individual privacy.
“We have been talking about this for ages,” Finch said. “I used to get together with other privacy wonks, and we’d cross our fingers and dream of the day.”