IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Project Aims to Better Use Municipal Open Data, Boost Equity

The EquiTensors project from the University of Washington takes the abundance of open data produced by government and transforms it so that not only is it useful, it's also equitable and promotes privacy.

a city skyline with digital data
Shutterstock/jamesteohart
MetroLab Network has partnered with Government Technology to bring its readers a segment called the MetroLab Innovation of the Month Series, which highlights impactful tech, data and innovation projects underway between cities and universities. If you’d like to learn more or contact the project leads, please contact MetroLab at info@metrolabnetwork.org for more information.

In this month’s installment of the Innovation of the Month series, we highlight EquiTensors, a project that is reflecting on and raising awareness of applications, opportunities and potential misuses of data science and AI applied to mobility and transportation, specifically as it refers to race, equity and diversity. MetroLab’s Josh Schacht spoke with the leader of the project, Bill Howe, University of Washington Associate professor in the Information School, adjunct associate professor in Computer Science and Engineering, and associate director and senior data Science fellow at the UW eScience Institute.

Josh Schacht: Can you tell us about the origin of this project and the big-picture problem it is addressing?

Bill Howe: We began the EquiTensors project after observing that most open data is severely underused in practice. There are a lot of reasons for this — the data is of variable reliability, provenance and structure, so it’s often hard to use. So we started thinking about what people typically wanted to do with the data, and increasingly people want to train some predictive model.

But cities are complex systems, so everything interacts with everything else. Waste management services can depend on traffic, which can depend on income distributions, and even the weather. A trip planner app may need to work with hundreds of data sets to train a good model.

So we started thinking about the idea that we had thousands of data sets that were all different windows on city dynamics — what if we could gather these dynamics more directly? If so, we could enable predictive applications without requiring everyone to find, download and process hundreds of interrelated data sets. Instead, they could just work with our learned features, which we call EquiTensors.

Schacht: How does this project improve upon existing prediction methods?

Howe: Companies and agencies that use EquiTensors can reduce effort in downloading and processing multiple data sets, protect against making unfair predictions, reduce training time and improve accuracy by including signals from data sets they would not otherwise have access to. Moreover, it is sometimes surprising which data sets will have predictive power; the “kitchen sink” approach of EquiTensors eliminates this decision.

Schacht: Can you tell us about how this project focuses on equity and privacy?

Howe: A critical requirement was to combat discrimination. All city data reflects decades or more of historical discrimination. For example, racist redlining 100 years ago influences housing prices and racial demographics today. We can’t allow predictive models to propagate these signals. So, we adopt fair machine learning methods to reduce unwanted correlations between a sensitive attribute and other attributes. The result is that our learned features better represent data from the world we wish we had rather than the world we have.

Although we do not provide strong privacy guarantees in the current iteration, the single point of control offered by EquiTensors provides an opportunity to expose private information safely without having to release the raw data publicly. That is, agencies and companies can train EquiTensors using private data, where only the learned features are exposed rather than the personally identifiable information. For an even stronger guarantee, we can adapt differential privacy methods to this setting (at the expense of utility).

Schacht: What are the next steps for this project? Where do you see it going from here?

Howe: Our next steps include interpolating missing data in space and time using deep learning techniques, studying transfer learning so that EquiTensors trained in Seattle can be applied in Chicago, using EquiTensors to synthesize data in new contexts, and exploring the relationship between explainability and fairness.
Josh Schacht is the director of technology and strategy at MetroLab Network. He works to support MetroLab members and the civic research community as a whole in promoting evidence-based policy and local community engagement. Prior to his role at MetroLab, Josh was a solutions architect on the Master Data Management team at Katerra, working to leverage sustainable building materials to create efficient and affordable housing.