Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2021 May 19;196:114–116. doi: 10.1016/j.puhe.2021.05.010

A forensic analysis of SARS-CoV-2 cases and COVID-19 mortality misreporting in the Brazilian population

D Galvêas a, F Barros Jr b,, CA Fuzo c,
PMCID: PMC8133487  PMID: 34182256

Abstract

Objective

The study aimed to investigate the misreporting number of positively tested individuals for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) succumbed or not to coronavirus disease 2019 (COVID-19) pandemic in Brazil at the city, state, and national scales using statistical forensic analysis.

Study design

This is a register-based study over public health data collected, organized, and maintained by the Ministry of Health covering the Brazilian population.

Methods

We evaluated the Brazilian notifications of positively tested cases for SARS-CoV-2 who have succumbed or not to COVID-19 between February 26th to September 7th of 2020 at the city, state, and national scales for conformity to expected distribution provided by Benford's law (BL).

Results

Statistical analyzes demonstrated a significant rejection of SARS-CoV-2 notification cases at the city and the number of deaths by COVID-19 in all regional levels according to the hypothesis of conformity to BL.

Conclusion

We demonstrated by BL, which has been widely applied to query the quality and reliability of different numerical data sources, the misreporting number of cases and deaths throughout the SARS-CoV-2 pandemic in Brazil. Therefore, we brought to light pieces of evidence that raise questions about the reliability of SARS-CoV-2 data in Brazil. This situation may have led to inconsistencies in public health policy actions, recommendations, and drastic humanitarian, social, and economic consequences such as the intensive unit care overload in some Brazilian regions.

Keywords: COVID-19, Brazil SARS-CoV-2 pandemic, Benford's law, SARS-CoV-2 misreporting

Introduction

The ongoing pandemic caused by SARS-CoV-2 infection has been showing increased numbers of infected people worldwide, reaching the limit or collapsing their health systems capabilities. Recently, an unfortunate example of this disaster occurred in Manaus (Brazil), even within the perspective of herd immunity due to the high rate of infection in the local population,1 reinforcing the idea that great care must be taken to manage this disease. The Brazilian population figures up to more than 200 million people, and more than 70% depends on public health services.2 Currently, Brazil presents the second-highest number of COVID-19 infections and deaths globally according to the Johns Hopkins Coronavirus Resource Center (https://coronavirus.jhu.edu/).

Specialized public health researchers have pointed some problems in the response of Brazilian authorities to face COVID-19,3 such as evidence of significant underreporting ratios of cases and deaths in Brazil.4 This is a severe problem because, in critical situations just like this pandemic, the data generated by the health system is essential for crafting good responses, allocating resources, and measuring the effectiveness of interventions, such as social distancing. However, incomplete or incorrect data can lead to suboptimal decisions, especially problematic in a developing economy with a fragile health system like Brazil. The SARS-CoV-2 pandemic has generated a considerable amount of data that needs to be precisely curated by government agencies to be used in crisis management. In parallel, health authorities try to process the whole new information and fight against the pandemic impacts.

The confidence in reported numbers of cases and deaths is an important factor for ensuring robust epidemiological studies and, consequently, adequate public health strategies. In this aspect, the adherence of data to the distribution predicted by Benford's law5 (BL) has been evaluated as a criterion for the reliability of extensive numeric data, which states that the sequence of multiple numbers from real-life sources is likely to be distributed in a specific non-uniform way.6 This counter-intuitive result can be applied to disparate lists such as stock prices, death rates, population numbers, epidemiological data, and forensic fraud analysis that somehow obey an exponential pattern.6 Then, when a dataset does not exhibit conformity to BL, it is indeed a piece of evidence that some misreporting has occurred. In this context, the previous study conducted by Silva and Figueiredo Filho7 argued the quality of nationwide Brazilian COVID-19 data and found no conformity to BL. Because of previous inconsistencies and evidence of misreporting, we extend the analysis of Silva and Figueiredo Filho7 by testing the agreement of Brazilian COVID-19 data to BL at different regional nation levels.

Brazilian public health data

Our analysis covers both the number of deaths and the number of confirmed cases of SARS-Cov-2 infection. Besides, we conduct experiments using data at the country, state, and city levels. We have mixed results of confirmed infections in which some tests reject the conformity to BL, and others do not. For this purpose, we analyzed the report of COVID-19 cases and deaths from the Brazilian data maintained by the Minister of Health from February 26th to September 7th of 2020 (https://covid.saude.gov.br), marking respectively the first case registered in Brazil and the ending of the exponential path in both Ceará and Goiás (Brazilian states). Such data were obtained at the national, state, and city levels on October 14th of the same year. Although we have collected a wider data set, we trim it at the highest registered number of cases and deaths of COVID-19 both at the national and state levels to represent the initial exponential growth. After trimming the data set, we analyze all three data sets separately. Thus, we tested the national number of cases and deaths and then we tested the merged data set from all states. The same method was used for the cities dataset. Although it might seem counter-intuitive, this is a valid way of testing,6 especially considering that a wider data set provides better results to such tests. Supplementary Table 1 shows the statistical description of the analyzed data set showing an exponential pattern and cuts on the same day, i.e., 156 days after the first case registered by Brazilian authorities. Details of statistical methods are in Extended Methods in Supplementary Information.

Results

We computed the first digit frequency in Brazil, state, and city levels (Table 1 and Supplementary Fig. 1). Although the empirical distribution of the first digit of confirmed number cases fits BL, we can easily reject the hypothesis in the case of confirmed deaths. Before we move to more detailed tests, we point our first issue with the Brazilian data's reliability. Even though it is difficult to precisely estimate the fatality rate (deaths/cases) caused by SARS-CoV-2 infection as Atkeson points,8 it is commonly assumed to be a constant proportion of the cases. Thus, in accordance with Theorem 1 of Nigrini,6 if the confirmed cases data have conformity to BL we also should expect the same for confirmed deaths. However, deaths do not follow BL expected distribution in Brazil (Supplementary Fig. 1).

Table 1.

Significance test and empirical distribution of first digit according to BL and distribution of number of SARS-CoV-2 cases and COVID-19 deaths at different regional levels in Brazil.

1st Digit BL Cases
Deaths
Brazil States Cities Brazil States Cities
1 0.301 0.349 0.295 0.296∗∗∗ 0.182∗∗∗ 0.258∗∗∗ 0.156∗∗∗
2 0.176 0.233∗ 0.169 0.187∗∗∗ 0.167 0.212∗∗∗ 0.303∗∗∗
3 0.125 0.089 0.132 0.122∗∗∗ 0.098 0.138∗∗∗ 0.170∗∗∗
4 0.097 0.075 0.104 0.107∗∗∗ 0.121 0.104 0.117∗∗∗
5 0.079 0.055 0.072∗ 0.085∗∗∗ 0.114 0.082 0.077∗∗∗
6 0.067 0.055 0.073 0.063∗∗∗ 0.091 0.064 0.062∗∗∗
7 0.058 0.048 0.059 0.052∗∗∗ 0.106∗∗ 0.055 0.049∗∗∗
8 0.051 0.041 0.053 0.044∗∗∗ 0.083 0.047 0.036∗∗∗
9 0.046 0.055 0.043 0.044∗∗∗ 0.038 0.042 0.031∗∗∗
MAD 0.025N 0.006A 0.006A 0.036N 0.013M 0.043N
Chi-squared 8.233∗∗∗ 11.71∗∗∗ 17.97∗∗∗ 19.07∗∗∗ 68.61∗∗∗ 815.46∗∗∗
Kupier V 1.398 1.044 20.05∗∗∗ 1.901∗∗ 3.762∗∗∗ 158.66∗∗∗

Notes. ∗∗∗P < 0.01, ∗∗P < 0.05, ∗P < 0.1.

A = acceptable; M = marginal acceptable; N = nonconformity.

The significance tests for adherence of cases and deaths to BL are shown in Table 1. First, regarding the individual digits Z test, we notice that at a more aggregate level (country and states), only some Z statistics reject the null hypothesis of conformity to BL, but at city-level data, the opposite occurs. This happens for both confirmed cases and deaths. Considering the general test of conformity to BL, all three tests—MAD, Chi-squared, and Kuiper—do not support a BL distribution to confirmed Brazil deaths. On the other hand, we have mixed results in the tests using confirmed case data. Although the Chi-squared test rejects the null hypothesis in all cases, the MAD test indicates that only country-level data presents nonconformity to BL and Kupier V, indicating that only city-level data do not follow a BL distribution.

Discussion

Epidemiological studies are fundamental pieces of public health that guide, through mathematical models and population data, strategies for the management and coping of epidemics or pandemics, such as COVID-19, by governmental agencies. The detection of anomalies in the reported data used in these epidemiological models is essential for the reliability and predictive power that can be detected by some approaches like the BL law (see extended discussion in Supplementary Information for more details on the importance of BL for anomaly detection in COVID-19 data). In general, our results show that Brazilian data on COVID-19 do not follow the BL. One possible explanation is the underreporting of confirmed cases and deaths.4 It is important to mention that if the misreporting in the number of the pandemic is represented by a constant fraction of the actual numbers of the pandemics in Brazil, Theorem 1 of Nigrini6 assures that we should find conformity to BL even in this case, hence showing that even a specific misreporting might fit the BL proportion. However, at the municipality level, our results unanimously indicate no conformity to BL. It contrasts with Koch and Okamura9 results, where it was found that Chinese confirmed infections match the distribution expected in BL and are similar to what was observed in the United States and Italy. Our analysis of the number of deaths rejects the conformity to BL at all regional levels.

The reliability in the reported data of COVID-19 pandemic is extremely relevant, being the core of any reasonable response to this health crisis that started in early 2020. In a big developing country such as Brazil, data accountability is especially important because it should drive an adequate policy response to the chaotic situation of a continental country with a vulnerable population. Although the city data seems to fit BL, if one aggregate and compares it with the national vector, one will find different figures and nonconformity of this aggregated, suggesting that the number of observations plays an important role in the statistical significance. The data of confirmed cases and deaths are a raw input of many works that investigate policy responses to the COVID-19 pandemic. For example, many investigations use some of those numbers to calibrate SIR (Susceptible, Infected, Recovered) models and trace different scenarios given alternative policies.10 Hence, our study is an alert to those who strongly rely on COVID-19 data to study or confront the pandemic effects in Brazil.

Finally, even though we could not distinguish the sources of data nonconformity to BL, our results reinforce the alert for Brazilian COVID-19 data that have been applied in the study or construction of public health policy strategies for the management of pandemic in Brazil. Since other studies find COVID-19 data in conformity to BL in other countries, one should be careful when making conclusions using Brazilian data.

Author statements

Acknowledgments

The authors thank Prof. Dr. Andrew W. Horowitz for the helpful comments.

Ethical approval

Not required.

Funding

During the course of this work, C.A. Fuzo was supported by a postdoctoral fellowship from Coordination for the Improvement of Higher Educational Personnel (CAPES-Finance Code 001).

Competing interests

The authors declare no conflict of interest.

Author contributions

FBJ designed the study; DG and FBJ performed the acquisition, analysis, interpretation, and drafting the article; CAF revisited it critically for intellectual content and wrote the final version of the article. All authors have approved the final article.

Data availability

The raw data set analyzed during the present study is freely available by Brazilian Data for the COVID-19 at https://covid.saude.gov.br. Codes and related data will be available upon request for the authors.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.puhe.2021.05.010.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1

Extended Methods, Supplementary Table 1 containing the statistical description of the analyzed dataset, and Supplementary Fig. 1 that depict the proportion of the first digit of the number of SARS-CoV-2 cases and COVID-19 deaths.

mmc1.doc (436KB, doc)

References

  • 1.Sridhar D., Gurdasani D. Herd immunity by infection is not an option. Science. 2021;371(6526):230–231. doi: 10.1126/science.abf7921. [DOI] [PubMed] [Google Scholar]
  • 2.Paim J.S. Sistema Único de Saúde (SUS) aos 30 anos. Ciencia Saude Coletiva. 2018;23(6):1723–1728. doi: 10.1590/1413-81232018236.09172018. [DOI] [PubMed] [Google Scholar]
  • 3.Freitas C.M., Silva I.V.M., Cidade N.C. COVID-19 as a global disaster: challenges to risk governance and social vulnerability in Brazil. Ambiente Sociedade. 2020;23 doi: 10.1590/1809-4422asoc20200115vu2020l3id. [DOI] [Google Scholar]
  • 4.Silva L.V., Harb M.P.A.A., Santos A.M.T.B., Teixeira C.A.M., Gomes V.H.M., Cardoso E.H.S., et al. COVID-19 mortality underreporting in Brazil: analysis of data from government internet portals. J Med Internet Res. 2020;22(8) doi: 10.2196/21413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Benford F. The law of anomalous numbers. Proc Am Phil Soc. 1938;78(4):551–572. [Google Scholar]
  • 6.Nigrini M.J. John Wiley & Sons; New Jersey: 2012. Benford's Law: applications for forensic accounting, auditing, and fraud detection. [Google Scholar]
  • 7.Silva L., Filho D.F. Using Benford's law to assess the quality of COVID-19 register data in Brazil. J Public Health. 2020;43(1):107–110. doi: 10.1093/pubmed/fdaa193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Atkeson A. National Bureau of Economic Research Working Paper No. 26965; 2020. How deadly is COVID-19? understanding the difficulties with estimation of its fatality rate. [Google Scholar]
  • 9.Koch C., Okamura K. Benford's Law and COVID-19 reporting. Econ Lett. 2020;196:109573. doi: 10.1016/j.econlet.2020.109573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Brotherhood L., Kircher P., Santos C., Tertilt M. Banco de Portugal, Economics and Research Department Working Paper No. 202014; 2020. An economic model of the COVID-19 pandemic with young and old agents: behavior, testing and policies. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

Extended Methods, Supplementary Table 1 containing the statistical description of the analyzed dataset, and Supplementary Fig. 1 that depict the proportion of the first digit of the number of SARS-CoV-2 cases and COVID-19 deaths.

mmc1.doc (436KB, doc)

Data Availability Statement

The raw data set analyzed during the present study is freely available by Brazilian Data for the COVID-19 at https://covid.saude.gov.br. Codes and related data will be available upon request for the authors.


Articles from Public Health are provided here courtesy of Elsevier

RESOURCES