Skip to main content
BMC Research Notes logoLink to BMC Research Notes
. 2021 Jul 8;14:262. doi: 10.1186/s13104-021-05652-2

Estimating the wave 1 and wave 2 infection fatality rates from SARS-CoV-2 in India

Soumik Purkayastha 1, Ritoban Kundu 2, Ritwik Bhaduri 2, Daniel Barker 1, Michael Kleinsasser 1, Debashree Ray 3,4, Bhramar Mukherjee 1,5,6,
PMCID: PMC8264482  PMID: 34238344

Abstract

Objective

There has been much discussion and debate around the underreporting of COVID-19 infections and deaths in India. In this short report we first estimate the underreporting factor for infections from publicly available data released by the Indian Council of Medical Research on reported number of cases and national seroprevalence surveys. We then use a compartmental epidemiologic model to estimate the undetected number of infections and deaths, yielding estimates of the corresponding underreporting factors. We compare the serosurvey based ad hoc estimate of the infection fatality rate (IFR) with the model-based estimate. Since the first and second waves in India are intrinsically different in nature, we carry out this exercise in two periods: the first wave (April 1, 2020–January 31, 2021) and part of the second wave (February 1, 2021–May 15, 2021). The latest national seroprevalence estimate is from January 2021, and thus only relevant to our wave 1 calculations.

Results

Both wave 1 and wave 2 estimates qualitatively show that there is a large degree of “covert infections” in India, with model-based estimated underreporting factor for infections as 11.11 (95% credible interval (CrI) 10.71–11.47) and for deaths as 3.56 (95% CrI 3.48–3.64) for wave 1. For wave 2, underreporting factor for infections escalate to 26.77 (95% CrI 24.26–28.81) and to 5.77 (95% CrI 5.34–6.15) for deaths. If we rely on only reported deaths, the IFR estimate is 0.13% for wave 1 and 0.03% for part of wave 2. Taking underreporting of deaths into account, the IFR estimate is 0.46% for wave 1 and 0.18% for wave 2 (till May 15). Combining waves 1 and 2, as of May 15, while India reported a total of nearly 25 million cases and 270 thousand deaths, the estimated number of infections and deaths stand at 491 million (36% of the population) and 1.21 million respectively, yielding an estimated (combined) infection fatality rate of 0.25%. There is considerable variation in these estimates across Indian states. Up to date seroprevalence studies and mortality data are needed to validate these model-based estimates.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13104-021-05652-2.

Keywords: Case fatality rate, Excess deaths, False negative rates, India, RT-PCR test, SEIR model, Underreporting

Introduction

Main text

In late August 2020, India was predicted to surpass the United States in terms of reported case counts from SARS-CoV-2 infections. To the surprise of many modelers the curve turned corner in late September with the highest number (97,894) of daily new cases reported on 16 September 2020 [1]. After a steady decline for nearly five months, the curve started rising again, growing into an astronomic second wave. The highest number (414,280) of daily new cases in wave 2 was reported on May 6, 2021. As of May 15, 2021, India has reported 24.7 million cases, the second highest in the world, and nearly 270 thousand deaths, the third highest in the world. In this brief report, we reconcile estimates of the infection fatality rate (IFR) inferred from seroprevalence studies with epidemiologic model-based estimates that account for underreporting of infections and deaths in India for wave 1. We then proceed to compute, compare and combine wave 1 with wave 2 IFR estimates.

Methods

Synthesizing evidence from seroprevalence studies

We review available seroprevalence results that vary across states and specifically across rural versus urban areas. Whereas in many major metros and slum areas the seroprevalences were reported to be more than 50%, in rural areas there is a wide variation (Table 1). The latest national serosurvey (from 17 December 2020 to 8 January 2021) reports 21.4% of all Indians above age 18 have antibodies present that indicate past SARS-CoV-2 infection [2]. Since approximately 59% [3] of India’s 1.36 billion citizens are above age 18 and 10.45 million infections were reported as of 8 January 2021, this points to approximately 172.47 million infections, with an implied underreporting factor of 16.5 (172.47/10.45). In other words, only 6% of India’s COVID-19 infections are reported, while 94% remained undetected or unreported. We use this estimated number of infections to calculate the IFR. Regional studies based on crematorium data and counting obituaries in India have suggested an underreporting factor in the range of 2 to 5 for COVID-deaths; this is at best ad hoc and anecdotal in nature and no rigorous quantification of missing death numbers is currently available [4].

Table 1.

Summary of results from various serological surveys conducted in India during 2020–21

Part A: State-level results from serological surveys conducted by the Indian Council of Medical Research in 2020–21
Statea Population
(from 2011 Census)b
Serosurvey Ic
(May–June 2020)
Serosurvey IId
(August–September 2020)
Observed cumulative cases February 7e
(per million)
Observed cumulative deaths February 7e
(per million)
# of people tested # of positive samples (%) # of people tested # of positive samples (%)
Maharashtra 112,374,333 2385 19 (0.80) 2681 348 (12.98) 18,189.84 456.6
Kerala 33,406,061 1193 4 (0.34) 1282 11 (0.86) 28,989.92 115.79
Karnataka 61,095,297 1199 3 (0.25) 1287 186 (14.45) 15,427.01 200.28
Andhra Pradesh 49,577,103 1192 8 (0.67) 1245 352 (28.27) 17,920.03 144.4
Tamil Nadu 72,147,030 1200 16 (1.34) 1259 207 (16.44) 11,667.8 171.64
Uttar Pradesh 199,812,341 3616 15 (0.42) 3628 226 (6.23) 3009.75 43.48
West Bengal 91,276,115 2000 22 (1.10) 2097 219 (10.44) 6259.81 111.83
Odisha 41,974,219 1202 7 (0.58) 1223 294 (24.04) 7995.86 46.74
Rajasthan 68,548,437 1188 8 (0.67) 1212 27 (2.23) 4641.87 40.44
Chattisgarh 25,545,198 1210 4 (0.33) 1199 34 (2.84) 12,038.47 146.45
India 1,210,193,422 28,000 156 (0.56) 29,082 3135 (10.8) 9027.74 128.77
Part B: Results from some other serological surveys conducted in India in 2020–21
Region Study setting Study period # of people tested % of positive samples
Delhi (Round 1)f Urban June–July, 2020 21,387 22.9
Delhi (Round 2)g Urban August 1–7, 2020 15,046 28.4
Delhi (Round 3)g Urban September 1–7, 2020 17,049 24.1
Delhi (Round 4)g Urban October 15–21, 2020 15,015 24.7
Delhi (Round 5)h Urban January, 2021 28,000 56.1
Tamil Nadui Rural and Urban October–November, 2020 26,640

26.9 (rural areas)

36.9 (urban areas)

Mumbai (Round 1)j Urban First half of July, 2020

4234 (slum areas)

2702 (non-slum areas)

57.0 (slum areas)

16.0 (non-slum areas)

Mumbai (Round 2)k Urban Last half of August, 2020

3024 (slum areas)

2176 (non-slum areas)

45.2 (slum areas)

17.1 (non-slum areas)

Punel Urban July 20–August 5, 2020 1659 51.3
Chennai (Round 1)c Urban July 17–28, 2020 12,405 18.4
Chennai (Round 2)c Urban October 8–15, 2020 6366 30.1
Indored Urban August 11–23, 2020 7100 7.75
Karnatakam Rural and urban June 15–August 29, 2020 15,624

44.1 (rural areas)

53.8 (urban areas)

Jammu and Kashmirc Rural and urban October, 2020 6230 38.8
Indian Rural and urban

December 17, 2020–January 8, 2021

(Serosurvey III)

28,589 (general population)

7171 (healthcare workers)

21.4 (adults)

25.3 (children ≥ 10 years)

25.7 (healthcare workers)

aThe first ten states with maximum cumulative COVID-19 cases (as of 31 January 2021) are included in this table

bInformation sourced from Wikipedia. (https://en.wikipedia.org/wiki/List_of_states_and_union_territories_of_India_by_population)

cThe first national serosurvey conducted by the Indian Council of Medical Research (ICMR) began on May 11 and ended on June 4, 2020. A randomly sampled, community-based survey was conducted in 700 villages/wards, selected from the 70 districts of 21 chosen states of India, categorized into four strata based on the incidence of reported COVID-19 cases. Four hundred adults per district were enrolled from 10 clusters with one adult per household. A total of 28,000 adults were enrolled in the survey (Murhekar, Manoj V., et al. "Prevalence of SARS-CoV-2 infection in India: Findings from the national serosurvey, May–June 2020." Indian Journal of Medical Research 152.1 (2020): 48. 10.4103/ijmr.IJMR_3290_20)

dThe second national serosurvey conducted by the ICMR began on from August 18 and ended on September 20, 2020. A strata-based sampling design similar to that of the first serosurvey (see (b) above) was used. A total of 29,082 individuals aged 10 years or older were enrolled in the survey. (Murhekar, Manoj V., et al. "SARS-CoV-2 antibody seroprevalence in India, August–September, 2020: findings from the second nationwide household serosurvey." The Lancet Global Health (2021). 10.1016/S2214-109X(20)30544-1)

eAs of February 7, 2021. Information sourced from Coronavirus Outbreak In India—COVID-19 tracker (www.covid19india.org)

fData from media reports (The Hindu. Published online July 22, 2020. https://www.thehindu.com/news/cities/Delhi/percentage-of-people-with-antibodies-high/article32156162.ece)

gData from a preprint on repeated, cross-sectional, multi-stage sampling serosurvey conducted from all districts and wards of Delhi, with two-stage allocation proportional to population size. (Sharma, Nandini, et al. "The seroprevalence and trends of SARS-CoV-2 in Delhi, India: A repeated population-based seroepidemiological study". medRxiv (2021). https://doi.org/10.1101/2020.12.13.20248123)

hData from media reports (Hindustan Times. Published online February 02, 2021. https://www.hindustantimes.com/cities/delhi-news/delhis-5th-sero-survey-over-56-people-have-antibodies-against-covid19-101612264534349.html)

iData from a preprint on a population-representative serological survey conducted in all districts of Tamil Nadu, India in October–November 2020. (Malani, Anup, et al. "SARS-CoV-2 Seroprevalence in Tamil Nadu in October–November 2020." medRxiv (2021). 10.1101/2021.02.03.21250949)

jData collected by a consortium of government organisations (NITI Aayog and Municipal Corporation of Greater Mumbai) and research institutes (Tata Institute of Fundamental Research and IDFC Institute) (https://www.tifr.res.in/TSN/article/Mumbai-Serosurvey%20Technical%20report-NITI.pdf)

kData collected by a consortium of government organisations (NITI Aayog and Municipal Corporation of Greater Mumbai) and research institutes (Tata Institute of Fundamental Research and IDFC Institute) (https://www.tifr.res.in/TSN/article/Mumbai-Serosurvey%20Technical%20report-NITI_BMC-Round-2%20for%20TIFR%20website.pdf)

lData from a preprint on multi-stage cluster random sampling of participants recruited from Pune sub-wards classified as high incidence settings for a serosurvey. (Ghose, Aurnab, et al. "Community prevalence of antibodies to SARS-CoV-2 and correlates of protective immunity in an Indian metropolitan city". medRxiv (2021). https://doi.org/10.1101/2020.11.17.20228155)

mData from a research letter on a population-representative serological survey conducted in all districts of Karnataka, India in June 15–August 29 2020 (Mohanan M, Malani A, Krishnan K, Acharya A. Prevalence of SARS-CoV-2 in Karnataka, India. JAMA. Published online February 04, 2021. https://doi.org/10.1001/jama.2021.0332)

nData from media reports (PTI. Published online February 04, 2021. https://www.ndtv.com/india-news/over-21-of-indias-population-may-have-had-covid-19-shows-sero-survey-2363166)

Model-based estimates

Using a compartmental epidemiologic model (as explained in the Supplementary Methods) with a compartment for unascertained cases and deaths after accounting for the false negative rates of RT-PCR and rapid antigen tests used in India [5] we estimate the national and state-level IFR in India by inferring underreporting factors for cases and deaths. We assume that the estimated total infections (deaths) are comprised of reported and unreported infections (deaths). The model divides the population into ten disjoint compartments: S (Susceptible), E (Exposed), T (Tested), U (Untested), P (Tested positive), F (Tested False Negative), RR (Reported Recovered), RU (Unreported Recovered), DR (Reported Deaths) and DU (Unreported Deaths), as described in Additional file 1: Figure S1. A set of nine differential equations govern the transmission dynamics, which are approximated by means of discrete recurrence relations. For any compartment X, the instantaneous rate of change at time t (given by dXdt) is approximated by the difference of counts in that specific compartment on the t+1 th day and the t th day, i.e., say Xt+1-Xt. Parameters are estimated using Bayesian techniques by generating samples from the posterior distribution using a Metropolis–Hastings algorithm with Gaussian proposal density, with 95% credible intervals (CrI) to quantify uncertainty of the estimates. Additional file 1: Table T4 presents an overview of the parameter descriptions and settings for this model.

Comparing and combining waves 1 and 2

Due to the stress on the healthcare and reporting infrastructure, the fatality and underreporting processes were very different across the two waves. Thus, we consider two separate phases of the pandemic, with wave 1 from April 1, 2020–January 31, 2021 and wave 2 starting on February 1, 2021. This definition is artificial and is guided by the fact that the national effective reproduction number (Reff) crossed unity for the first time in 2021 on February 14 and we allow a two-week incubation period before that date. Using daily time series of case, death and recovery counts we compare fatality rates and underreporting factors associated with the two time periods using the compartmental models. Further, using observed data from the two waves and the model-based underreporting factor estimates, we compute cumulative case and death counts for the total duration of waves 1 and 2. We multiply the wave-specific cumulative counts with relevant underreporting factors and sum over both waves to get combined counts of cases and deaths. The estimated numbers of cumulative deaths and infections provide us with a combined IFR estimate for India as of May 15.

Results

IFR estimates for wave 1 using seroprevalence surveys

The observed case fatality rate (CFR) in India is low. With 154,428 deaths and 10.76 million cases reported as of January 31, 2021 the estimated CFR for wave 1 is 1.435% (95% confidence interval 1.428–1.442%) [1]. The estimated number of infections from the January seroprevalence survey imply an approximate infection fatality rate of 0.09% (i.e. 154,428/172.47 M). The anecdotal underreporting factor for deaths (in the range of 2–5) implies an ad hoc estimate of IFR in the range of 0.19–0.45%.

Estimates from epidemiological models

For wave 1 our estimate for the national IFR1 (observed cumulative deaths/estimated cumulative total infections) is 0.129% (95% CrI 0.125–0.134%) and IFR2 (estimated total cumulative deaths/estimated total cumulative infections) is 0.461% (95% CrI 0.455–0.468%) with an underreporting factor for cases estimated at 11.11 (95% CrI 10.71–11.47) and for deaths at 3.56 (95% CrI 3.48–3.64). These model-based estimates in wave 1 are largely consistent with the estimates from the latest and third nationwide seroprevalence study.

In wave 2, using the same model we see a stark contrast with wave 1, with case and death underreporting factor estimates escalate to 26.73 (95% CrI 24.26–28.81) and 5.77 (95% CrI 5.34–6.15) respectively, leading to IFR1 estimate of 0.032% (95% CrI 0.029–0.035%) and IFR2 estimate of 0.183% (95% CrI 0.18–0.186%). This pattern is consistent with wave 2 CFR being estimated at 0.845% (95% CrI 0.840–0.849%), 59% of wave 1 estimate.

Figure 1 shows underreporting factors and estimated infections and deaths in waves 1 and 2 for India while Fig. 2 highlights state-level variations in IFR1, IFR2, CFR for waves 1 and 2 for 20 states in India with large case/death counts.

Fig. 1.

Fig. 1

Comparison of observed and estimated case and death counts and associated underreporting factors from waves 1, 2 and both waves combined

Fig. 2.

Fig. 2

Forest plot of wave 1 and wave 2 infection fatality rates (IFR) and case fatality rates (CFR) associated with SARS-CoV-2 in various states in India. IFR1 is based on reported deaths whereas IFR2 estimates and includes the unreported deaths

Combining waves 1 and 2

The composite CFR as of May 15 stands at 1.1%. The estimate for total (reported + unreported) cumulative case count for waves 1 and 2 combined is 491.73 (95% CrI 453.03–524.56) million, while the estimated number of total (reported + unreported) deaths is 1216.35 (95% CrI 1154.21–1272.70) thousand. This leads to a combined IFR1 estimate of 0.06% and IFR2 estimate of 0.24%. Detailed numerical estimates of underreporting factors across states for waves 1 and 2 are presented in Additional file 1: Tables T1, T2 and T3 and Additional file 1: Figures S2 and S3.

Discussion

Despite accounting for underreported deaths, the large number of asymptomatic/undetected infections (more than 90% by any calculation) indicate a lower IFR in India in comparison with other Western countries. A meta-analysis across the world places the pooled mean of IFRs at 0.68% (95% CI: 0.53–0.82%) [6], while another meta-analysis places the median at 0.27% [7] (with a range of 0–1.63%). Seroprevalence surveys and epidemiologic models qualitatively agree on the estimated IFR for India for wave 1. Up to date serosurvey and excess death/mortality data are needed to validate wave 2 and combined estimates. The estimated number of total infections as of May 15 suggests roughly 36% of Indians have an active or past infection, a number that will need to be verified with synchronous sero-surveys.

The current reduction in fatality rates in wave 2 that we notice could be primarily due to two reasons, one is that we do not have the same length of follow-up period and complete data on the decay phase of wave 2 curve. The second could be the different age composition of the infected populations in the two waves; it has been reported that the younger population got infected in larger numbers in wave 2 and they have lower risk of COVID-19 mortality. A fraction of the older population (aged 65 + years) also got vaccinated during wave 2. However, this hypothesis about reduced fatality rates in wave 2 cannot be verified without more granular, age-sex stratified nationwide time-series data on case and death counts, which is currently unavailable.

Limitations

We do not have a rigorous way to validate the extent of underreporting of deaths. An excess death calculation based on historical mortality data is infeasible at this point due to absence of all-cause-mortality data in the last three years from India. India has a very young population with only 6.4% in age group 65 + (compared to the US where this proportion is 16.5%) so a comparison of overall IFR between India and say the US is not fair, and only age-specific IFRs should be calculated and compared when more data become available. We do recognize that wave 2 information is appreciably incomplete, and the estimates will change as we have more complete information on deaths. For example, while our wave 2 analysis period ended on May 15, the highest daily number of deaths (4529 daily new deaths) were reported shortly after on May 18. Thus, our analysis presents an updated but incomplete picture of wave 2.

Supplementary Information

13104_2021_5652_MOESM1_ESM.docx (13.4MB, docx)

Additional file 1: Figure S1. Schematic diagram for the SEIR-fansy model with imperfect testing and misclassification. Figure S2. Estimated first wave underreporting factors for cases and deaths associated with SARS-CoV-2 for states in India. Figure S3. Estimated first wave underreporting factors for cases and deaths associated with SARS-CoV-2 for states in India. Table T1. Summary of the different metrics for the states and the nation for wave 1, on 31st January, 2021. Table T2. Summary of the different metrics for the states and the nation for wave 2, on 15th May, 2021. Table T3. Summary of the different metrics for the states and the nation for waves 1 and 2 combined, on 15th May, 2021. Table T4. Parameter values and descriptions for the SEIRfansy model.

Acknowledgements

The authors are grateful for the computational resources available to them via the advanced research computing center at the University of Michigan.

Abbreviations

CFR

Case fatality rate

CI

Confidence interval

CrI

Credible interval

IFR

Infection fatality rate

RT-PCR

Real time reverse transcript polymerase chain reaction

SEIR

Susceptible-exposed-infected-recovered

URF

Underreporting Factor

Authors’ contributions

SP created the initial draft, conducted the literature review and collaborated on the analysis. RK and RB created the R package SEIR-fansy and implemented the epidemiologic models. DB and MK collaborated on analysing data from the second wave. DR helped create and modify the final draft and address issues raised by the review panel. BM conceived the project, planned the analysis and wrote the draft of the paper. All authors read, reviewed, edited and approved the manuscript for submission.

Funding

The research was supported by an internal pilot grant at the University of Michigan, awarded by the Michigan Institute of Data Science (MIDAS).

Availability of data and materials

All data are publicly available at covid19india.org. We used reported daily case, death and recovery counts for India and its states and union territories from April 1, 2020 to May 15, 2021. The statistical package SEIR-Fansy developed by the authors is available at covind19.org.

Declarations

Ethics approval and consent to participate

Not applicable. The analysis is based on publicly available completely de-identified aggregate counts. The study is exempt from IRB review as no patient participation or contact is involved.

Consent for publication

Not applicable.

Competing interests

There are no conflicts of interest perceived or declared by any of the authors.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Soumik Purkayastha, Email: soumikp@umich.edu.

Ritoban Kundu, Email: ritoban.kundu@gmail.com.

Ritwik Bhaduri, Email: ritwik.bhaduri@gmail.com.

Daniel Barker, Email: danbarke@umich.edu.

Michael Kleinsasser, Email: mkleinsa@umich.edu.

Debashree Ray, Email: dray@jhu.edu.

Bhramar Mukherjee, Email: bhramar@umich.edu.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13104_2021_5652_MOESM1_ESM.docx (13.4MB, docx)

Additional file 1: Figure S1. Schematic diagram for the SEIR-fansy model with imperfect testing and misclassification. Figure S2. Estimated first wave underreporting factors for cases and deaths associated with SARS-CoV-2 for states in India. Figure S3. Estimated first wave underreporting factors for cases and deaths associated with SARS-CoV-2 for states in India. Table T1. Summary of the different metrics for the states and the nation for wave 1, on 31st January, 2021. Table T2. Summary of the different metrics for the states and the nation for wave 2, on 15th May, 2021. Table T3. Summary of the different metrics for the states and the nation for waves 1 and 2 combined, on 15th May, 2021. Table T4. Parameter values and descriptions for the SEIRfansy model.

Data Availability Statement

All data are publicly available at covid19india.org. We used reported daily case, death and recovery counts for India and its states and union territories from April 1, 2020 to May 15, 2021. The statistical package SEIR-Fansy developed by the authors is available at covind19.org.


Articles from BMC Research Notes are provided here courtesy of BMC

RESOURCES