How much do COVID-19 case counts underestimate the size of the pandemic?

By our estimates, total infections in the U.S. early in the pandemic were 9 times higher than case counts suggest.

Like Comment
Read the paper

Early in the severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2) pandemic, scientists and the general public alike watched with growing alarm as the number of people who tested positive grew over time. Charts of daily positive test counts made it clear that we were likely experiencing the spread of a once-in-a-generation pandemic and that the public health implications would be drastic. However, as epidemiologists, we were far more concerned about what those charts did not show us. 

Very few SARS-CoV-2 tests were available in the U.S. at the time, and the U.S. Centers for Disease Control and Prevention recommended that doctors primarily test people with moderate-to-severe coronavirus disease 2019 (COVID-19) symptoms. There was growing evidence that individuals could be infected and potentially spread the virus without showing symptoms. For these reasons, we suspected that the number of positive tests vastly underestimated the size of the pandemic — but by how much?

At the time, our group was working together on epidemiologic research focused on malaria eradication. While clearly very different,  the epidemiology of malaria in eradication settings shares some important features with SARS-CoV-2 and other novel pathogens. For both, asymptomatic transmission is possible, and diagnostic tests miss many cases. We wanted to answer the question: To what extent do SARS-CoV-2 testing practices (e.g., testing primarily symptomatic people) and test accuracy influence whether an infected person gets tested and tests positive? As epidemiologists, we are trained to identify and minimize bias — the gap between the numbers we measure and the truth. If we corrected for this bias in the entire population, we could estimate the total number of infections and better understand the magnitude of the pandemic. We quickly assembled a team of students and scientists who we were already working with on studies of malaria as well as influenza and enteric pathogens to help us answer this question.

First, we examined the daily testing rates in different states and found substantial regional variation. Overall, by mid-April 2020, testing rates were highest in the northeast (e.g., 31 per 1,000 in Rhode Island) and lower in the Midwest and South (e.g., 6 per 1,000 in Kansas). The variation in state-level testing rates likely meant that the amount of bias in case counts would vary by state as well.

To see an interactive version of this plot, please visit

We reviewed the available evidence about testing probabilities among people with and without COVID-19 symptoms and the accuracy of the tests used (how likely were false positives and false negatives?). Using this evidence and a tool called probabilistic bias analysis (Lash et al., 2011), we created a probabilistic model that corrected the number of confirmed COVID-19 cases for bias. We ran our model using daily data on the number of tests and confirmed cases in each state in the U.S. compiled by the COVID Tracking Project ( The model generated an estimate of the total number of SARS-CoV-2 infections we would expect if everyone was able to get tested with 100% accurate tests. 

By mid-April 2020, there were 721,245 confirmed COVID-19 cases in the U.S., but our model estimated 6,454,951 total infections — a 9-fold higher number. Our results suggested that at that time, 1.9% of the U.S. population had been infected, whereas confirmed case totals would have implied only 0.2% were infected. The discrepancy was the largest in Puerto Rico, where the case counts underestimated total infections by a factor of 33. Our analysis suggested that incomplete testing was responsible for the majority of this gap, while the remainder was due to less than 100% test accuracy.

To see an interactive version of this plot, please visit

In many ways, these results aren’t surprising. But for the public, the difference between hearing there are 100 new cases a day vs. 900 new cases a day may be enough to motivate better social distances practices. For policymakers, our findings underscore the many months-long urgent calls by scientists and physicians for more complete and equitable SARS-CoV-2 testing — including testing individuals without symptoms.

For more details, please see our paper. For interactive graphics, please visit


Lash, T. L., Fox, M. P. & Fink, A. K. Applying Quantitative Bias Analysis to Epidemiologic Data. (Springer Science & Business Media, 2011).

Jade Benjamin-Chung

Research Scientist, UC Berkeley

My research applies cutting edge causal inference and machine learning techniques to study interventions to control, eliminate, or eradicate infectious diseases, including interventions to prevent malaria, diarrhea, soil-transmitted helminths, and influenza. I have conducted research in Haiti, Thailand, Myanmar, Bangladesh, and Kenya.

No comments yet.