Skip to main content
Digital Health logoLink to Digital Health
. 2023 Apr 2;9:20552076231164098. doi: 10.1177/20552076231164098

Digital dashboards with paradata can improve data quality where disease surveillance relies on real-time data collection

Sanjeev K Gupta 1, Himmat Singh 1,2, Mahesh C Joshi 3, Amit Sharma 1,2,4,
PMCID: PMC10074606  PMID: 37034306

Abstract

Dealing with the threats of vector-borne diseases necessitates robust disease surveillance systems. The gathered information from surveillance studies is used to evaluate the effectiveness of control measures. It also guides the allocation of resources within the healthcare system. The disease surveillance data also identify high-risk populations or geographic areas to target interventions. Because of the importance of surveillance in decision-making and its timely requirement, real-time data collection is vital. A few advantages of real-time data collection apps are building powerful digital forms, exporting data for quick analysis in various formats, and being open-source. These apps automate data collection and transfer to an online server even without an internet connection. While collecting disease surveillance data digitally one crucial aspect lacking is data quality. This paper aims to present the importance of dashboards that includes paradata in improving data quality using real-time data collection tools in disease surveillance. Various types of paradata such as timestamps, geo-referencing, audio recording and so on help enhance the quality of data and can help monitor and evaluate surveillance staff. The outcomes of the paradata analysis may lead to the retraining of the surveillance team and even re-planning of surveillance. Undoubtedly, real-time data collection is the way of the future in any field-based study, and studies should be planned in conjunction with paradata to ensure that high-quality data are recorded.

Keywords: Data quality, real-time data collection, dashboard, vector-borne diseases, mobile app

Introduction

Vector-borne diseases such as malaria, dengue, chikungunya fever, Zika virus fever, yellow fever, West Nile fever, and Japanese encephalitis, transmitted by mosquitoes, account for around 17% of all infectious diseases worldwide and are driven by parasites and viruses. 1 They claim the lives of more than 700,000 people annually worldwide. 2 Malaria is a parasite infection transmitted by Anopheles mosquitoes, and 84 malaria-endemic countries had estimated 247 million cases worldwide in 2021. 3 The World Health Organization (WHO) South-East Asia Region suffered 2% of the malaria burden, with India accounting for ∼83% of the region's cases. 3 Dengue is another mosquito-borne disease expanding rapidly in the last two decades, with 5.2 million cases reported to WHO in 2019, affecting 129 countries. 2 The Zika virus has been found in 89 countries and territories worldwide as of February 2022. 4 Since 2005, almost 2 million chikungunya cases have been recorded worldwide. 5

In India, the National Center for Vector Borne Diseases Control is the national-level nodal agency under the government of India for vector-borne diseases control and elimination. Among vector-borne diseases, dengue fever was not common in India before 1990, but the number of cases has risen dramatically, with dengue prevalent in almost all states. 6 Many chikungunya outbreaks were reported in India in the 1960s and 1990s and there were 0.13 million clinically suspected cases in 2016 and 2017. 7 Zika infections, an emerging vector-borne disease, have been recorded sporadically in India since 2017. 8 Malaria-related morbidity and mortality have decreased considerably worldwide, but further efforts are needed to eliminate the disease in malaria-endemic nations like India. 9

In both developed and developing countries, dealing with the danger of vector-borne diseases demands comprehensive surveillance systems. Disease surveillance is a data-driven activity that requires acquiring, interpreting, and analyzing enormous volumes of data from several sources, as well as providing public health responses to data providers, stakeholders, and decision-makers. 10 The information acquired is then used to assess the efficacy of control and preventive health measures, distribute resources within the healthcare system based on need, and identify high-risk groups or geographic locations for interventions.

The need for improved communication and information resources has resulted in the development of healthcare apps, especially in the surveillance and control of mosquito-borne diseases. 11 COVID-19 has proved the worldwide applicability of app-based data collection technologies, which are helpful for timely reporting and raising community knowledge and attentiveness. 12 Several mobile “apps” that help authorities monitor and limit the dissemination of COVID-19 have been actively deployed. 13 In vector-borne diseases, the “FeverTracker” integrated graphical surveillance app was developed to ensure digital monitoring of community and health care workers and thereby help malaria control and elimination program. 12 Based on an estimated 6.4 billion smartphone subscriptions worldwide and a global population of around 7.8 billion, 14 a global smartphone penetration rate above 78% in 2020 has been estimated. The Android operating system continued to be the leading operating system worldwide (∼71%) in smartphones. 15 Also, the internet has become an increasingly important component nowadays and almost two-thirds of the worldwide population was connected to the internet by 2021. 16 Because of the usage of android-based mobile devices and the importance of surveillance in decision-making due to more than half the world's population being estimated to be at risk of VBDs, 17 and the demand to acquire real-time disease data, mobile-based surveillance is the way of the future.

In India, three vector-borne diseases viz. malaria, visceral leishmaniasis and lymphatic filariasis are targeted for elimination. 18 Every single case matters in the elimination phase and so there is an acute need for surveillance systems from aggregated data to near real-time case-based surveillance which will help in identifying the drivers of disease transmission in India. 19 China's malaria elimination surveillance system underwent a significant transformation with the introduction of real-time information reporting management system, which included individual case tracking and focus clearing. 20 While collecting disease surveillance data digitally, another crucial aspect lacking in studies is data quality. This refers to missing or duplicate data, consistency in information, and cogent responses to the questions in surveys. Relevance, accuracy, reliability, clarity, coherence, comparability, completeness, and validity are a few dimensions of quality data. 21 In digital data collection tools, paradata is the supplementary information recorded during a survey that explains the data collection technique. The paradata help in regular monitoring and evaluation to collect quality data and subsequently can help to guide the surveillance staff. This paper also aims to present the importance of paradata in improving data quality using real-time data collection tools in disease surveillance. Because the time lag between reporting and reacting can be costly, the current research aims to close the gap by adopting real-time data surveillance, preserving data quality with paradata, and visualizing data in real-time with a dashboard. The data may therefore be used to not only focus interventions but also to alert the public about potential outbreaks and guide them through essential preventive measures.

Near real-time disease surveillance tools

In the past few years, many mobile applications such as Open Data Kits, 22 Kobo toolbox (Kobo), 23 REDcap, 24 and EpiInfo. 25 have revolutionized field data collection and made it more secure, reliable, and scalable. These mobile applications are secure, web-based software that support data capture for research studies, providing validated data capture, and audit trails for tracking data manipulation and export procedures.26,27 Electronic data collection devices like mobile phones and tablets have several advantages over paper-based data collection. Firstly, data from these mobile devices can be captured and exported in near real-time for analysis and reporting. Secondly, in the absence of an internet connection, data gathering using these applications allows data to be captured on a mobile phone and afterwards uploaded. This method automates the data collection, allowing users to submit text, numeric data, latitude and longitude, images, barcodes, video, and audio to an online server. The captured data may be downloaded in a variety of formats, including comma-separated values, Keyhole Markup Language (KML), JavaScript Object Notation, etc., and then customized to connect with some external applications such as Google Data Studio, 28 Microsoft Power BI, 29 Tableau 30 and so on to develop a dashboard. Other advantages of these apps include building powerful digital forms, data collection even offline, exporting data for quick analysis in various formats, and open-source.

Open-source tools allow data collection using Android-based mobile devices and data submission to an online server. This data collection process is replacing traditional paper forms with electronic forms that not only allow traditional datatypes but also global positioning system (GPS) location, videos, audio, and paradata to the server in real-time. Paper forms form can be digitized using a spreadsheet or online drag-and-drop designers and can be uploaded to the server in extensible markup language commonly known as XML. Data collectors can download these digital forms on mobile devices and can begin collecting and submitting completed forms to the server. Data on the server is aggregated from multiple devices located at multiple sites that may be accessed or shared with stakeholders to monitor and evaluate the study's progress and further analysis (Figure 1).

Figure 1.

Figure 1.

Near real-time data collection method.

Paradata in improving data quality

In addition to gathering questionnaire-based information through these apps, sensors on mobile devices may be used to collect metadata, divided into passive and active sensors by Wenz, Jackle, and Couper 2019. 31 Active sensors include camera use, text messages, and apps to answer questions, whereas passive sensors include device usage monitoring applications, GPS, Bluetooth linkage to external devices and so on. 31 Sensors in the mobile device enable gathering data outside of the actual survey, which is known as auxiliary or paradata, and are used to monitor and inform the collecting process. 32

Several types of paradata are available in mobile-based applications to help in improvising data quality 14 ; a few of them are as follows:

  1. Timestamps. Time taken to complete a survey and travel time between two successive interviews are critical indicators necessary to investigate and analyze interviewer characteristics. Most mobile device-based applications provide timestamps that record the start and finish time of the questionnaire or a specific section as variables that aid in calculating the interview length. According to Bell et. al. (2016), 33 extensive unscripted conversations between the interviewer and the respondent can pose issues for survey reliability and interview ethics, as well as raise more fundamental epistemological concerns. Reading time reduction affects respondents’ reaction behavior to varying degrees, and deviations from structured interviewing may influence the comparability of survey responses and reduce data quality. 34 These indicators may demand extensive monitoring and training for the interviewer, as well as re-planning of the study or mobile application. 35

  2. GPS coordinates. The spatio-temporal analysis of the study, including trend analysis, clustering of cases, the exact location of participants, etc., requires geo-referenced data. GPS coordinates consist of latitude and longitude to determine the precise location on Earth. Another variable is altitude (or elevation, or z value), which is the height of a point on Earth with respect to sea level. Researchers may use GPS coordinates to compute the distance traveled by interviewers during successive interviews. The data can help track team movement, improve survey implementation strategy, and reduce survey coverage bias. Most mobile-based applications with real-time data capturing capabilities, data export in KML format, and surveyed locations are available in geographic information system (GIS) software without much effort. The GIS-based analysis aids in identifying spatial correlation and clustering required in the planning, managing, and monitoring of public health programs. 36

  3. Skip pattern. A skip pattern is a question or sequence of questions linked to a conditional response. When using a paper questionnaire, data collectors may forget the skip question pattern and ask the follow-up questions, while utilizing a mobile device, there is an automated skip pattern feature to lead the asking order. Complex skip-pattern questions are a significant source of inaccuracies in paper-based questionnaires. 37 Whereas in smartphone-based questionnaires it is easier to update the survey in the app rather than printing a huge number of questionnaires and then changing a question, saving the interviewer time and effort and being eco-friendly. Borkotoky et.al. (2014) 38 identified several reasons for skipping questions, either by the respondent or the interviewers, including the respondent acting as a proxy for another person, interviewers failing to ask questions properly either intentionally or because the instructions were not clear, and poor questionnaire design. Therefore, to handle the challenge, another key paradata indicator that compares the total number of sections missed by an interviewer in a tool to the total number of sections provided in the questionnaire is essential in improvising data quality.

  4. Audio recording. Non-response bias caused by non-contact or rejection by the respondent is another challenge in data quality. It can be due to various factors, including difficulty in obtaining sample units, failure to reach respondents, and failure to acquire cooperation from respondents. In mobile device-based surveys, random audio and video recordings may be utilized to determine the cause of non-response. These recordings of the interviewer's and respondent's interactions can aid in determining the reason for non-response and evaluating the interview. However, safeguarding individual privacy and preventing data breaches in smartphone data for health surveillance is a challenge as ethical consent for audio and video recordings is necessary. The respondent must be aware of the use of paradata, such as audio recording and location data, before the survey, but research found that such explicit consent may reduce survey participation if survey respondents are not fully informed about paradata. 39

  5. Dashboard. In data quality evaluation, a dashboard allows stakeholders to visualize and track certain aspects of data. 40 A digital dashboard can assess and interpret data based on the user's inputs, making it a useful resource for the research community. 41 A quality dashboard uses various information visualizations to depict data quality metrics to identify missing, inconsistent, incorrect, or non-informative data. 42 Further, it aids data collectors in receiving continual feedback and drives them to be cautious when collecting data. Real-time monitoring minimizes the time and effort to evaluate and monitor a long communication chain, which might obstruct quick input to field teams.

Figure 2 shows a part of a sample dashboard for monitoring and assessing surveillance teams developed in Google Data Studio. Interactive filters in the dashboard allow users to investigate independently and analyze data at any level, such as state, district, village, or even team, making dashboards far more dynamic and effective than creating separate pages and graphics for each possible variation (Figure 2(a)). Scorecards are helpful to show key performance indicators or variables that evaluate how well research is being performed. For example, a scorecard can highlight complete surveys, non-response rates, etc. (Figure 2(b)). Other data visualization styles provided on the dashboard to monitor and evaluate the study and staff performance include bar charts, line charts, and pie charts. Information such as the number of surveys completed on each day, the gender and age distribution of surveys, and the time it takes to complete each survey can be presented (Figure 2(c), (e), and (f)). Users of the dashboard will be able to estimate how long it will take to finish a round of surveys, as well as identify surveyors who are doing badly and in identifying extreme values present in the data. Variables that track survey progress are not always included in the data, but they may be generated with the help of a few metadata variables like startTime and endTime. For any GIS-based data analysis and tracking of the surveyor, the actual location of the respondent is essential. The actual location of the respondent can be visualized on the dashboard using latitude and longitude collected during the study (Figure 2(d))

Figure 2.

Figure 2.

Dashboard for monitoring and evaluating study and surveillance teams with near real-time data (a) demonstrating state, district, PHC, village, and team level data filtering of data, (b) it depicts key indicators such as the total number of surveys, non-response via scorecards, (c) line chart depicting daily surveys conducted, (d) it can represent the exact location of the respondents, (e) it represents the gender-wise distribution of the respondents using pie-chart, (f) displays bar chart showing the age distribution of respondents, and (g) represents data in tabular format such as qualification and occupation with count and percentage of each option.

Paradata mentioned above, such as the average time of the survey, number of surveys conducted per day, travel time between successive interviews, the exact location of the survey, and even outliers and extreme values, can be displayed on the dashboard. The number of missing values, observations that are duplicates or likely duplicates, the negative screening rate, list of questions with the most “don't know” or “no answer” response should be part of a dashboard to obtain quality data. With the introduction of real-time data collection via mobile devices, a massive amount of personal health-related data is being created and gathered continually.

The need for real-time data capture applications will probably keep increasing as technology develops and becomes more embedded into our daily lives. These are a few developments and trends for real-time data collection apps in the future:

  1. Artificial Intelligence and machine learning. The use of Artificial Intelligence in real-time data-capturing apps can improve the speed and accuracy of data analysis, enable predictive insights, and provide personalized recommendations to users. 43 In the case of an enormous volume of data, visual examination through the dashboard may not be sufficient, therefore, a data quality inspection module based on machine learning techniques may be required. Further, deep learning networks may be used to make suitable decisions by fusing huge and complicated sets of data, and Denotational Mathematics can serve as a formal foundation for modeling and regulating deep learning networks, therefore improving decision-making quality. 44 The deep learning modalities based on convolutional neural networks and convolutional long short-term memory were used in the coronavirus disease 2019 (COVID-19) detection system. 45

  2. Internet of Things (IoT). Real-time data capture applications will grow more potent and adaptable as more gadgets, including cars, household appliances, and other products embedded with electronics, software, sensors are connected to the internet. Sensors, cameras, and GPS systems are just a few examples of the types of data that IoT devices may collect. The use of big data technology may be able to assist in the real-time analysis and administration of vast volumes of data. The data gathered by sensors and actuators could be transmitted via a network to a cloud server, where it can be processed. 46

  3. Data privacy and security. Real-time data collection may generate a large number of personal health records, and data security is a concern; therefore, it is recommended to have a secure data-sharing mechanisms so that data can be shared with stakeholders while maintaining privacy. 47 The focus on data privacy and security will grow as more sensitive data are gathered and examined in real-time. Apps that capture data in real time must have strong security mechanisms in place to safeguard user information and prevent its misuse. Blockchain technology can play a significant role in enhancing the privacy and security of real-time data-capturing apps. Decentralization (Data is stored on a distributed network of nodes, making it harder for hackers to breach the system), encryption of data (even if an attacker succeeds in accessing the data, it will be unreadable to them without the proper decryption key), and immutability of data (difficult for attackers to tamper with or modify the data) are some of the key characteristics of Blockchain technology. 48

  4. Customized real-time data capturing apps. The need for customized apps geared to certain areas such as healthcare, logistics, or retail as the demand for real-time data gathering apps grows. Hybrid systems that combine traditional surveillance data with data from search queries, social media posts, and crowdsourcing are potential advancements for the future. 49 In general, real-time data-collecting applications have a bright future, and we can anticipate further development and innovation in this field in the years to come.

Conclusion

A primary intervention for vector-borne diseases is surveillance since local and environmental factors govern these diseases. Effective vector-borne disease management requires a comprehensive mobile device-based surveillance system connected with paradata. The current study highlights the necessity of near real-time data capture in vector-borne disease surveillance and elimination. In a country like India, real-time data capturing is not a big challenge because android-based mobile devices and internet access are widely available. Also, with the availability of open-source, secure, reliable, and scalable mobile apps to handle real-time data, mobile device-based data capturing provides fast, accurate, and largely inexpensive, automated data capture that can save time and money while also being environmentally friendly. However, data quality is a concern regardless of whether data are gathered in real-time or on paper. By utilizing mobile device paradata and audio and video recording tools, data quality concerns such as timestamp, geo-location of the surveys, non-response, and skipped questions may be tracked and data quality can be improved. Integrating real-time data with the dashboards can assist stakeholders in monitoring the progress of a study and of individual data-collection teams, as well as data quality indicators. The paradata also aids in the verification of data distortion caused by human interaction during data collection, lowering the risk of errors. Regular monitoring improves data dependability and ensures that the study's objectives are met. The collected data might be used to re-plan the targets, re-train data collectors, and even re-design the app. Researchers may decide on a course of action based on the observed data quality such as training field employees, changing the study plan, etc. Undoubtedly, real-time data collection is the way of the future in any questionnaire-based study, but the study should be planned in conjunction with paradata to collect quality data.

Acknowledgments

The author thank the National Institute of Malaria Research for providing infrastructure support.

Footnotes

Author contributions: AS and SG conceived the study that was completed by HS and MJ. All authors wrote the paper. AS is supported by JC Bose fellowship from DST and his laboratory is funded by GHIT, NIH, DBT, and ICMR grants. All authors read and approved the final manuscript.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

Guarantor: Amit Sharma.

References


Articles from Digital Health are provided here courtesy of SAGE Publications

RESOURCES