The Centers for Disease Control and Prevention (CDC) and the broader public health system safeguard the health security of people in the United States through science and innovative practices.1,2 Obtaining high-quality, timely data enables public health partners to learn about emerging pathogens, track trends, and identify adversely affected populations. However, the COVID-19 pandemic and other public health emergencies have revealed a fragmented landscape of data and data infrastructure at all levels that limits data access and use, creates security risks, and impedes science, innovation, and collaboration.3,4 Sustainable progress is needed for effective collection, management, and sharing of diverse volumes of data across the public health system to inform timely surveillance, epidemiologic, and laboratory activities. Improving data readiness to link data, decisions, and action may require public health agencies and their constituents to adopt new practices and innovations, build a culture around data, implement common policies and standards, develop decision-support tools, and expand the capacity of the data science workforce.
The patchwork of data systems and practices established over decades hampers the ability of the public health system to transition from day-to-day activities into emergency response operations. Single disease- or program-specific data solutions are designed to address routine needs, such as case surveillance. These solutions often lack shared processes and flexibility to adapt to the volume and unexpected applications of data needed to support large-scale or evolving public health emergencies and are developed with a fixed set of data sources and nonstandardized technologies, thereby limiting the ability of the systems to accept data from diverse sources or share data with alternate systems. Combined with a documented need to improve the capacity of the data science workforce, 5 these limitations create challenges with access, system interoperability, and scalability within existing fragmented data environments, further compromising rapid and effective decision-making during public health emergency response.
While this commentary is informed by a project initiated before the pandemic, COVID-19 response efforts have augmented the need for change. Coupled with change management and governance, this commentary describes a data-ready ecosystem framework designed to overcome existing challenges by focusing on the complete life cycle of data management and use. Through the framework, use of common systems and tools highlights dependencies among foundational and advanced data capabilities. Adoption of the ecosystem framework can shift CDC and public health organizations toward sustainable progress in accordance with CDC’s Data Modernization Initiative, 6 a national effort committed to transforming public health data readiness. This commentary intends to support modernization activities underway at CDC and across the field of public health.
Rethinking Data Practices Through an Ecosystem Framework
From October 2019 through January 2021, CDC, in collaboration with Georgia Tech Research Institute (GTRI), facilitated 4 meetings with CDC subject matter experts in data science to establish a foundational understanding of CDC’s data environment across 4 general areas: data sources used to facilitate decision-making, existing data management and analysis tools, workflow barriers or bottlenecks, and data-related resource limitations experienced during public health emergency response. In addition, CDC subject matter experts reviewed 25 CDC After Action Reports dated 2005-2019 (unpublished reports, CDC, 2005-2019) that focused on small- to large-scale public health emergencies and functional exercises conducted by the US Department of Health and Human Services.
The information gathered through the subject matter expert engagements and review revealed challenges in 3 areas: (1) holistic planning across the life cycle of data management and use, (2) limited data access and flexibility resulting from the lack of interoperable or scalable functionality, and (3) the capacity of the public health data science workforce. Building on the findings, we reviewed the work of multiple scholars who have adapted Maslow’s Hierarchy of Needs 7 pyramid to the data science context. We applied the principles of the adaptations available in the gray literature8-11 and aligned our approach with priorities of CDC’s Data Modernization Initiative 12 to further develop the pyramid into a data-ready ecosystem. The revised 5-tier pyramid (tier 1: collect; tier 2: store; tier 3: prepare; tier 4: analyze; and tier 5: learn/insights) incorporates user data needs and dependencies among various data activities (Figure, Table).
Figure.

A data-ready ecosystem framework focused on the complete life cycle of data management and use. A data-ready ecosystem is a systems-level framework for defining core competencies of data readiness based on existing adaptations of Maslow’s Hierarchy of Needs pyramid to the data science context.8-11 Although the data-ready ecosystem tiers build on each other, an arrow through the ecosystem indicates recurring activities that may arise during the data management process as situational needs change. The ecosystem builds functionality across a common set of tools at each tier to foster interoperability and data access across day-to-day activities and emergency response operations.
Table.
Description of data ecosystem tiers to improve how data are collected, stored, prepared, and analyzed for enhanced learning and insights in public health emergency response
| Tier | Description |
|---|---|
| Tier 5: Learn/insights | Data learning and insights through innovation may be best achieved through cutting-edge technology, nontraditional sources of data, and novel approaches to challenging problems. |
| Tier 4: Analyze | Data analysis is how data are linked, especially in support of answering specific research questions. It includes observation of data and overlaying or segmenting (although not modifying) data in novel ways. The information and reports gathered may then be shared within or outside the organization. |
| Tier 3: Prepare | Data preparation is when data are modified and mapped into new formats or structures. It includes data cleaning, validation, and integration to prepare data for later analysis. Data prep may include combining datasets of different sources, dimensions, and formats. |
| Tier 2: Store | Data storage encompasses handling diverse media and sources via provisioning, storage, and exchange. Accurate data permissions, accessibility, and security are chief concerns for data exchange. |
| Tier 1: Collect | Data collection includes identifying sources of data, the timely capture of relevant data, and the standardization of data formats. Data may be collected internally or externally to the organization. |
Applying the Hierarchy of Needs concept to improve data readiness requires advancements across defined stages of technological maturity within each tier (ie, implementing cloud-based solutions hosted on the internet, such as applications, storage, or computer networks to support system scalability and data access). Advanced technical capacity can help achieve the highest level (tier 5), where programs can use data for learning and insights. In the ecosystem context, an organization lacking foundational data collection and storage needs located at the structure’s base will have difficulty achieving goals at the top of the pyramid. As we considered our approach and existing data science hierarchies of needs, we developed 3 principles to support adoption of data readiness practices.
Placing a Lens on the Complete Life Cycle of Data Management and Use
Although the 5 data science tiers build on each other, a revolving arrow through the ecosystem indicates recurring activities that arise from the data management process as needs change. As an example, the pyramid depicts that data analysis capabilities (tier 4) depend directly on how data are collected (tier 1), stored (tier 2), and prepared (tier 3). Similarly, the evolution of data analysis needs may influence ongoing data collection practices. A complete view of data management also allows programs to consider how to efficiently store and consolidate data for rapid access to diverse datasets, use existing interoperable and scalable solutions, and securely share data between partners to support day-to-day surveillance and emergency response operations.
Highlighting Explicit Dependencies Among Foundational Data Activities and Advanced Capabilities
The functions described in upper tiers (analyze and learn/insights) depend on achieving and maintaining competency at lower tiers (collect and store). In contrast, bypassing lower tiers to use advanced analytic capabilities (eg, machine learning) may risk developing poor insight from unreliable or incomplete datasets. In the absence of a data-ready ecosystem, raw data may be collected, stored, and analyzed to answer common epidemiological questions but may not be usable for answering emerging questions that focus on disease trends and populations at risk. Although data are flexible and usable in innovative ways and may be integrated with other datasets to consider appropriate questions to derive new conclusions,13,14 the ability to do so accurately depends on paying careful attention to the foundational tiers of collection and storage. Standardizing collection practices or implementing a centralized unstructured and structured data repository can improve data quality and increase access to diverse datasets for actionable analysis and insight.
Promoting Use of Common Systems or Tools at Each Tier of the Ecosystem
Cross-organizational use of common or shared systems or tools at each tier of the ecosystem allows flexible integration and management of large volumes of data to meet emerging public health requirements. Should unexpected data needs or gaps be identified, newer interoperable tools may continually be introduced into the ecosystem to expand available options in current and future responses. Essential to this system is ensuring that tools are readily accessible to programs, that programmatic needs (including routine surveillance) are supported, and that shared investments expand functionality and support interoperable, scalable operations. An interoperable ecosystem approach can help drive efficiency, reduce costs, and generate higher-quality data. After a common set of tools and practices is established, coordinated deployment, tailored training to build workforce capacity, and provision of other enterprise support can facilitate their integration into program activities. Even more promising, an interoperable ecosystem makes it easier to capitalize on the benefits of automation, artificial intelligence, machine learning, and predictive analytics. During COVID-19, HHS Protect, a centralized real-time data collection, integration, and secure data-sharing system, helped establish a common operating picture of national and county-level data among government partners. 15 This system also contributed to increasing public awareness of disease impact and spread. As HHS Protect shared data with other databases, such as CDC’s COVID Data Tracker, 16 improved modeling predictions to forecast disease trends through future cases, hospitalizations, and deaths became possible. As more information became available, policy, state, and federal partners used the information to inform operations and decision-making on public health strategies needed to curb community transmission and mortality rates. 17 In this context, adopting the ecosystem creates a “systems of systems” environment, 18 where multisystem interoperability enables easier access to complete datasets for timely decision-making and response outcomes. Development of newer, open-source solutions for the ecosystem may benefit from cross-sector partnerships among users, developers, academia, and industry.
The Shift to a Data-Ready Ecosystem Framework Needs Additional Support
Overcoming technical and cultural barriers across public health and within an organization requires change management and governance to improve learning and insights. Adopting data readiness practices may be more successful by making targeted investments in data use agreements with jurisdictions in advance of an emergency response. These agreements should ensure that reliable, complete data are captured at the source. Within a public health organizational context, we identified the need for and developed customer-centric decision-support tools and standardized collection forms. Decision-support tools are intended to help public health programs ascertain barriers, address weaknesses, and identify growth opportunities in data environments. The tools include (1) a diagnostic worksheet to assess and improve current readiness levels, (2) an interactive systems assessment catalog that compares and contrasts functionality to address unmet needs (ie, interoperability and scalability), and (3) a set of core forms to standardize initial data collection or accommodate newer data elements as response needs evolve. The forms may be accessed through commonly used data collection systems or platforms, such as Epi Info (CDC). Preliminary drafts of these tools were developed with input from subject matter experts for use in the CDC environment. Refinement and pilot testing of these tools was paused during the COVID-19 pandemic but will resume as emergency response demands on the public health workforce abate.
Successful adoption of the ecosystem coupled with decision-support tools requires governance through common policies and standards. For example, CDC has an existing, centralized data governance process that oversees how investments are made and how information technology and data systems are maintained, used, and shared. Consistent with its charge, this governance process can also enable adoption to enhance organizational data readiness and reinforce continuous process improvement. 19
Implications for Public Health
The successes and missed opportunities of large-scale public health emergencies have drawn attention to the fragmented nature of public health data systems, the resulting negative effect on timely decision-making, and the need to develop a new and collaborative way to conduct business through a systems-level perspective. Given the role of data in guiding interventions and policies needed to support equitable response and recovery activities, the risk of not improving our readiness posture could leave public health officials in the position of making crucial response decisions without the best available data.20,21 The following insights, produced as an outcome of the CDC/GTRI project, provide options for public health programs to consider as they evaluate practices needed to pivot from the current state to a data-ready future.
Adopt a Data-Ready Culture
Establishing a data-ready ecosystem is crucial for all public health agencies. Governance is one component of overcoming technical and cultural barriers to achieving organizational data readiness but must be complemented by change management to create a new data culture. 22 Leveraging the full potential of the ecosystem requires public health programs to adopt a culture that prioritizes the coordination of data activities. The shift in mindset also calls for agreeing on shared investments and the use of a common set of systems and tools at each tier of the ecosystem. This change in practice will foster reusability, improved functionality, and system scalability as emergency response activities expand and contract.
Implement Common Policies and Standards
As part of institutional governance, improved policies and standards can reduce existing silos across data processes and systems and cross-agency communications. 23 Mitigating interoperability and data access challenges presented by isolated practices is particularly crucial when data origins and types vary during an emergency response. Entities considering governance and policies should also address those agreements required to transfer data into internal systems. Basic elements needed to support flexible data-use agreements may streamline the transfer, use, and integrity of data during an emergency response. In addition, resources and capital are often shifted into developing newer, high-capacity systems to meet evolving requirements during an emergency response. Establishing policies that promote the purchase and use of flexible and multiuse platforms for day-to-day activities that are scalable for surge operations can help overcome common data-related challenges. Likewise, supporting enterprise adoption of common data standards provides open, freely available, and system-agnostic approaches to interoperability for improved access to data. Ongoing monitoring and communication to foster adherence to common policies and standards should generate consistent, accurate, and reliable data for informed and effective decision-making. 23
Develop Decision-Support Tools to Make the Best Choice, the Easy Choice
Organizations seeking improved data readiness should develop program-specific decision-support tools. These tools may include a diagnostic worksheet with a set of embedded decision trees to identify limitations and strengths in data-related capabilities and capacities across the ecosystem tiers. By drawing attention to where opportunities for improvement exist, the worksheet can link to an interactive systems assessment catalog that assesses organizational preparedness and response data systems and tools based on interoperability, scalability, and ease-of-use features. In addition, a systems assessment can target inclusion of tools that are better suited to be used in the ecosystem. In the next phase of data readiness work, CDC will continue to develop and pilot the diagnostic worksheet and systems assessment catalog. These tools will help CDC pinpoint weaknesses within existing workflows and identify enterprise solutions to meet evolving data needs. Generic versions of these tools may be tailored to allow users to create locally applicable decision-support tools for use in their respective organizations, systems, and data environments.
Strengthen the Public Health Data Science Workforce
Successful adoption of the ecosystem requires coordination of people, skills, and tools to collect, manage, and use data effectively. This level of coordination necessitates an independent investment in the public health data science workforce to achieve learning and insights needed for public health impact. Recruiting individuals with highly specialized technical competencies is insufficient to meet demand. Instead, employing individuals with strategic skills (ie, systems thinking, change management) to complement specialized data skills is essential. 24 In addition, training the current data science workforce with adaptable knowledge and skills needed to support emergency response (eg, coding, statistical, or machine-learning methods) creates opportunities to streamline capacity across the organization and use emerging approaches to maximize the potential of data to meet the needs of the public. 25
Conclusion
With public health emergencies evolving, reliable and accessible data are essential to support rapid decision-making under high-stress conditions to reduce human and material losses.20,26 These circumstances augment existing opportunities to rethink business practices from a systems-level, data-ready ecosystem perspective. Given that CDC (and other federal) data systems largely depend on state and local systems, sustained improvements are also needed across the public health enterprise for a data-ready ecosystem framework to be effective. If directed properly and implemented in advance of an emergency, a trained data science workforce, optimal and efficient workflows, and consistent use of interoperable and scalable systems or tools could address weaknesses across current data environments and augment effects of data readiness. Over time, a disciplined approach should facilitate easier transition into emergency response and enable timely and actionable decision-making to safeguard lives against 21st-century health threats.
Acknowledgments
The authors acknowledge the following people from the Georgia Tech Research Institute who made important contributions to the development and delivery of the conceptual systems-level approach for improving data readiness for emergency response and, thereby, this commentary: Jon Duke, MD, MS; Rachel Cook, BA; Hannah Foster, MPH; Alek Francescangeli, BS; Raymond Hebard, BS; Connor McKelvey, BS; Courtney Reid, MS; John Rose, MS; and LaKesha Spikes, MBA.
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was supported by the Centers for Disease Control and Prevention, Office of Readiness and Response, Office of Science and Public Health Practice, Office of Applied Research—IAA #19FED1916863SSA. When this project was conducted, M. Kothari was sponsored in part by an appointment to the Research Participation Program at the US Department of Health and Human Services administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the US Department of Energy and the US Department of Health and Human Services. The findings and conclusions in this commentary are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention or Georgia Tech Research Institute.
ORCID iD: Mimi Kothari, MS
https://orcid.org/0000-0001-7503-0763
References
- 1. Centers for Disease Control and Prevention. 2020 Annual Report—A Bold Promise to the Nation: CDC Strategic Framework & Priorities. US Department of Health and Human Services; 2021. Accessed March 3, 2022. https://stacks.cdc.gov/view/cdc/101655 [Google Scholar]
- 2. CDC Foundation. What is public health? 2022. Accessed June 19, 2022. https://www.cdcfoundation.org/what-public-health
- 3. Miri A, O’Neill DP. Accelerating data infrastructure for COVID-19 surveillance and management. Health Affairs Forefront. 2020. doi: 10.1377/forefront.20200413.644614 [DOI]
- 4. Foraker RE, Lai AM, Kannampallil TG, Woeltje KF, Trolard AM, Payne PRO. Transmission dynamics: data sharing in the COVID-19 era. Learn Health Syst. 2020;5(1):e10235. doi: 10.1002/lrh2.10235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. McFarlane TD, Dixon BE, Grannis SJ, Gibson PJ. Public health informatics in local and state health agencies: an update from the Public Health Workforce Interests and Needs Survey. J Public Health Manag Pract. 2019;25(Suppl 2):S67-S77. doi: 10.1097/PHH.0000000000000918 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Centers for Disease Control and Prevention. Data modernization initiative. Updated 2022. Accessed July 7, 2022. https://www.cdc.gov/surveillance/data-modernization/index.html
- 7. Maslow AH. Motivation and Personality. Harper; 1954. [Google Scholar]
- 8. Rogati M. The AI hierarchy of needs. June 12, 2017. Accessed September 30, 2021. https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
- 9. Arsenault M-O. The data science pyramid. September 26, 2018. Accessed September 30, 2021. https://towardsdatascience.com/the-data-science-pyramid-8a018013c490
- 10. Seroussi Y. Data hierarchy of needs. KDnuggets. 2015. Accessed September 30, 2021. https://www.kdnuggets.com/2015/08/data-hierarchy-needs.html
- 11. Mangini F. Implementing a data science process in your company. February 9, 2019. Accessed September 30, 2021. https://www.thinkingondata.com/implementing-data-science-process-in-your-company
- 12. Centers for Disease Control and Prevention. Data modernization initiative strategic implementation plan. Updated December 22, 2021. Accessed August 2, 2022. https://www.cdc.gov/surveillance/pdfs/FINAL-DMI-Implementation-Strategic-Plan-12-22-21.pdf
- 13. Klievink B, Romijn BJ, Cunningham S, de Bruijn H. Big data in the public sector: uncertainties and readiness. Inf Syst Front. 2017;19:267-283. doi: 10.1007/s10796-016-9686-2 [DOI] [Google Scholar]
- 14. Ostrow KS, Wang Y, Heffernan NT. How flexible is your data? A comparative analysis of scoring methodologies across learning platforms in the context of group differentiation. J Learn Anal. 2017;4(2):91-112. doi: 10.18608/jla.2017.42.9 [DOI] [Google Scholar]
- 15. US Department of Health and Human Services. HHS Protect—a common operating picture for COVID-19. 2023. Accessed April 3, 2023. https://www.hhs.gov/sites/default/files/hhs-protect-faqs.pdf
- 16. Centers for Disease Control and Prevention. COVID data tracker. Updated 2023. Accessed January 31, 2023. https://covid.cdc.gov/covid-data-tracker/#datatracker-home
- 17. McBryde ES, Meehan MT, Adegboye OA, et al. Role of modelling in COVID-19 policy development. Paediatr Respir Rev. 2020;35:57-60. doi: 10.1016/j.prrv.2020.06.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Dahmann JS. Systems of systems characterization and types. The Mitre Corporation; 2013. Accessed February 7, 2023. https://www.sto.nato.int/publications/STO%20Educational%20Notes/STO-EN-SCI-276/EN-SCI-276-01.pdf
- 19. Centers for Disease Control and Prevention. Information technology strategic plan (ITSP) FY 2021-FY 2023: goal 5: enhance IT investment management and governance. Page last reviewed February 24, 2022. Accessed August 2, 2022. https://www.cdc.gov/od/ocio/it-strategic-plan/2021-23/overview/goal5/index.html
- 20. Morgan O. How decision makers can use quantitative approaches to guide outbreak responses. Philos Trans R Soc B Biol Sci. 2019;374(1776):20180365. doi: 10.1098/rstb.2018.0365 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Austin JM, Kachalia A. The need for standardized metrics to drive decision-making during the COVID-19 pandemic. J Hosp Med. 2021;16(1):56-58. doi: 10.12788/jhm.3549 [DOI] [PubMed] [Google Scholar]
- 22. Public Health National Center for Innovations. Discussing change management in public health. October 25, 2017. Accessed January 15, 2022. https://phnci.org/journal/discussing-change-management-in-public-health
- 23. Pan American Health Organization, World Health Organization. Information Systems for Health Toolkit: Knowledge Capsules—Data Governance in Public Health. May 2019. Accessed January 15, 2022. https://www3.paho.org/ish/images/toolkit/IS4H-KCDG-EN.pdf
- 24. de Beaumont Foundation, National Consortium for Public Health Workforce Development. Building skills for a more strategic public health workforce: a call to action. July 2017. Accessed January 15, 2022. https://debeaumont.org/wp-content/uploads/2019/04/Building-Skills-for-a-More-Strategic-Public-Health-Workforce.pdf
- 25. Goldsmith J, Sun Y, Fried LP, Wing J, Miller GW, Berhane K. The emergence and future of public health data science. Public Health Rev. 2021;42:1604023. doi: 10.3389/phrs.2021.1604023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kowalski-Trakofler KM, Vaught C, Scharf T. Judgment and decision-making under stress: an overview for emergency managers. Int J Emerg Manag. 2003;1(3):278-289. doi: 10.1504/IJEM.2003.003297 [DOI] [Google Scholar]
