In this month’s installment of the Innovation of the Month series, we highlight EquiTensors, a project that is reflecting on and raising awareness of applications, opportunities and potential misuses of data science and AI applied to mobility and transportation, specifically as it refers to race, equity and diversity. MetroLab’s Josh Schacht spoke with the leader of the project, Bill Howe, University of Washington Associate professor in the Information School, adjunct associate professor in Computer Science and Engineering, and associate director and senior data Science fellow at the UW eScience Institute.
Josh Schacht: Can you tell us about the origin of this project and the big-picture problem it is addressing?
Bill Howe: We began the EquiTensors project after observing that most open data is severely underused in practice. There are a lot of reasons for this — the data is of variable reliability, provenance and structure, so it’s often hard to use. So we started thinking about what people typically wanted to do with the data, and increasingly people want to train some predictive model.
But cities are complex systems, so everything interacts with everything else. Waste management services can depend on traffic, which can depend on income distributions, and even the weather. A trip planner app may need to work with hundreds of data sets to train a good model.
So we started thinking about the idea that we had thousands of data sets that were all different windows on city dynamics — what if we could gather these dynamics more directly? If so, we could enable predictive applications without requiring everyone to find, download and process hundreds of interrelated data sets. Instead, they could just work with our learned features, which we call EquiTensors.
Schacht: How does this project improve upon existing prediction methods?
Howe: Companies and agencies that use EquiTensors can reduce effort in downloading and processing multiple data sets, protect against making unfair predictions, reduce training time and improve accuracy by including signals from data sets they would not otherwise have access to. Moreover, it is sometimes surprising which data sets will have predictive power; the “kitchen sink” approach of EquiTensors eliminates this decision.
Schacht: Can you tell us about how this project focuses on equity and privacy?
Howe: A critical requirement was to combat discrimination. All city data reflects decades or more of historical discrimination. For example, racist redlining 100 years ago influences housing prices and racial demographics today. We can’t allow predictive models to propagate these signals. So, we adopt fair machine learning methods to reduce unwanted correlations between a sensitive attribute and other attributes. The result is that our learned features better represent data from the world we wish we had rather than the world we have.
Although we do not provide strong privacy guarantees in the current iteration, the single point of control offered by EquiTensors provides an opportunity to expose private information safely without having to release the raw data publicly. That is, agencies and companies can train EquiTensors using private data, where only the learned features are exposed rather than the personally identifiable information. For an even stronger guarantee, we can adapt differential privacy methods to this setting (at the expense of utility).
Schacht: What are the next steps for this project? Where do you see it going from here?
Howe: Our next steps include interpolating missing data in space and time using deep learning techniques, studying transfer learning so that EquiTensors trained in Seattle can be applied in Chicago, using EquiTensors to synthesize data in new contexts, and exploring the relationship between explainability and fairness.