IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Do Data Releases Based on ZIP Codes Endanger Patient Privacy?

Indicating the number of COVID-19 cases in a ZIP code does not necessarily identify the patients. But, the chances that a patient could be identified increase as health departments release additional information.

privacy_shutterstock_1288099267
Shutterstock/Lightspring
(TNS) — Health departments, including the Ohio Department of Health and Summit County Public Health, have said they are hesitant to release the number or range of coronavirus cases in ZIP codes because that data could identify patients, thus violating privacy laws.

But, is that true?

For an answer, cleveland.com reached out to Richard Steiner, a biostatistician at the University of Akron. Here is what he has to say:

“The short - and a bit naïve — answer is, ‘No,’” Steiner said. However, for people who live in ZIP codes with smaller populations and in areas where health departments give out additional information about infected people, the answer is, “quite likely.”

HIPAA privacy rules that allow for the release of public health information indicate that increased “de-identification” measures need to be taken for patients who live in a ZIP code with fewer than 20,000 people, including possibly grouping them into a new area that has a population greater than 20,000.

Fifteen of the 36 ZIP codes located entirely within Summit County and included on the health department’s map have populations greater than 20,000, according to census data from 2018, the most recent year available.

Simply indicating the number of COVID-19 cases in a ZIP code does not necessarily identify the patients included in the total. But, the chances that a patient could be identified increase as health departments release additional information about cases, including their age, sex and race.

Steiner gave an example of a fictitious ZIP code with 20,000 residents, where 51% are males and 49% are females. Each resident was randomly assigned a birthday, an age and a race, so that the demographics had roughly the same distributions as Summit County, Steiner said.

“By knowing a person’s sex, birthday and age, nearly 62% of the residents could be uniquely identified, and another 28% could be narrowed down to just two people,” Steiner said. “If, in addition to a person’s sex, birthday and age, the person’s race could be determined, 72% of the residents could be uniquely identified, and another 22% could be narrowed down to just two people.”

In a ZIP code of about 4,000 residents, three of which exist in Summit County, 90% of residents can be uniquely identified based on sex, birthday and age.

Overall, when a health department releases the number of coronavirus cases in a ZIP code, the chances of that number identifying a patient are relatively low. However, with each additional piece of information, including age and race, the chances of identifying patients increase.

Summit County has not released racial data about coronavirus cases or birthdays, instead providing only an age range and how many cases are male and female. But Steiner notes that other information about people can be obtained by other sources.

“Demographics such as sex, age, birthday and possibly race may be easily obtainable,” he said. “For example, for people on Facebook, sex and birthday are just a few clicks away.”

Summit County Public Health officials have indicated that they are going to wait until the number of cases across the county increases before they release ZIP code-level data.

That, however, is flawed logic. Steiner said an increase in the number of coronavirus cases will also increase the chance of guessing accurately who is a patient.

For example, in a ZIP code of 20,000 people and 100 coronavirus cases, there is a .5% chance that anyone you randomly choose will be a COVID-19 patient. But if that same ZIP code reaches 200 cases, the likelihood doubles to a 1% chance that someone is a coronavirus patient.

In Summit County, with a total of 235 cases as of Friday, there are likely far fewer than 100 cases in even the largest ZIP codes, making the chances of randomly identifying a patient much less than .5%. So, in order to identify a patient, “you’d almost certainly have to have some additional information about the person or the cases in general, like if they reported the number of infections by race and gender and age,” Steiner said.

©2020 Advance Ohio Media, Cleveland, Distributed by Tribune Content Agency, LLC.