0521-b-cali-latino_covid

California Doesn’t Really Have 40 Times More Hispanic/Latino COVID-19 Cases Than Texas—They Just Have Better Data

The ongoing coronavirus pandemic has highlighted the challenge of data gaps in tracking the pandemic’s impact on racial and ethnic populations across the United States.

The ongoing coronavirus pandemic has highlighted the challenge of data gaps in tracking the pandemic’s impact on racial and ethnic populations across the United States. This challenge was underscored when UnidosUS asked PRB for help tracking trends in COVID-19 cases and deaths for the Hispanic/Latino population in six states (Arizona, California, Colorado, Florida, Nevada, Texas).

To tackle this project, we needed reliable time-series data on COVID-19 cases and deaths by race and ethnicity. These data would be important for trend analysis, serve as a benchmark for projections, and allow us to calculate COVID-19 case fatality rates (CFRs).

This data didn’t seem like too much to ask for but what we found was a hodgepodge of reporting and data quality standards that varied widely from state to state. And we’re not alone in uncovering less-than-ideal state-level demographic reporting. Ishaan Pathak and his colleagues found that among the 50 states, only California, Illinois, and Ohio had sufficient age and racial/ethnic detail to investigate disparities in CFRs, controlling for age.

California’s Comprehensive Data Reporting

Throughout the pandemic, California has offered comprehensive, timely reporting on coronavirus cases and deaths by race/ethnicity. As of January 2021, nearly all California coronavirus-related deaths were reported by race/ethnicity, and three quarters of cases were reported with racial/ethnic detail (see Table 1).

TABLE 1. Comprehensive Racial/Ethnic Reporting for California COVID-19 Cases and Deaths

Race/Ethnicity wdt_ID Cases Deaths
Latino 2 1,000,959 12,827
White 3 366,351 8,619
Asian 4 117,227 3,171
African American 5 73,573 1,853
Multi-Race 6 23,457 291
American Indian/Alaska Native 7 5,750 90
Native Hawaiian/Pacific Islander 8 10,337 147
Other 9 224,198 316
Total with race/ethnicity 10 1,828,872 27,314
Total (including records missing race/ethnicity) 11 2,482,226 27,462
12
Percent with race/ethnicity reported 13 74% 99%

Note: Racial and ethnic-group titles appear as they are reported by the source and are mutually exclusive.
Source: California Department of Public Health, COVID-19 Race and Ethnicity Data, as of Jan. 7, 2021.

If we look at California data for early January 2021, we see considerable racial/ethnic detail, with coverage for 99% of death records and 74% of case records. While the case coverage isn’t perfect (26% missing), the resulting CFRs aren’t wildly out of alignment with what we’d expect. For example, dividing the number of deaths by the number of cases among African Americans yields a CFR of around 2,500 deaths per 100,000 cases.

Texas, on the other hand… has some egregious problems with missing data.

Data Problems Are Bigger in Texas

Looking at the same time period (early January 2021), we see that in Texas the death data are reasonably complete (96% of deaths are identified by race/ethnicity). But there’s almost no tracking of race/ethnicity for cases (see Table 2). Texas reports racial/ethnic case detail for fewer than 5% of cases, and even within that small share, a substantial proportion are labeled “Unknown.” Based on these reported numbers, the CFR for Black Texans would be estimated at almost 25,000 deaths per 100,000 cases—10 times the estimated rate for African Americans in California.

TABLE 2. Almost No Race/Ethnicity Reporting for Texas COVID-19 Cases

wdt_ID Race/Ethnicity Cases Deaths
1 Asian 907 504
2 Black 11,024 2,753
3 Hispanic 26,363 13,956
4 Other 342 144
5 White 20,892 10,395
6 Unknown 8,085 19
7 Total with race/ethnicity 67,613 27,771
8 Total (including records missing race/ethnicity) 1,563,758 28,877
9 Percent with race/ethnicity reported 4% 96%

Note: Racial and ethnic group titles appear as they are reported by the source and are mutually exclusive.
Source: Texas Department of State Health Services, Texas COVID-19 Data, as of Jan. 8, 2021.

How Did We Deal With the Data Gap?

To produce realistic case trend data for UnidosUS, we tested a variety of estimation methods, ranging from presenting the data as reported to using the (more reliable) death data to reverse engineer an estimate of what the case numbers might have been. Each of the alternatives has pros and cons, as illustrated in our decision matrix.

wdt_ID Option Pro Con
1 1. Use reported totals None Massive underestimate
2 2. Apply % reported race/ethnicity cases to total cases Straightforward; consistent with reported case totals Doesn’t account for bias in reported race/ethnicity data
3 3. Apply % reported race/ethnicity deaths to total cases Straightforward; consistent with reported case totals Doesn’t account for differences in mortality by race/ethnicity
4 4. Reverse engineer number of cases based on death data and other-state case fatality rates Maintains consistency between case and death data Other-state case fatality rates are not the same; may over- or under-estimate cases in Texas

We quickly ruled out using the data as is. Reporting Hispanic/Latino COVID-19 cases based on reporting that covers just 4% of the universe would be misleading at best. Of the remaining alternatives we chose option 2: Estimate Hispanic/Latino cases using the racial/ethnic distribution from reported cases and apply that distribution to all cases. While this method has some potential for bias, further analysis (presented at the Population Association of America Applied Demography Conference in February 2021) suggested the approach was reasonable.

The main takeaway here is that regardless of the specific technique, estimating to fill in data gaps is a band-aid solution at best. Ideally, we would have better demographic data on COVID-19 cases.

Visit the UnidosUS website to learn more about the methods we used for these trends and view an interactive data visualization of the results.