Skip to main content
PLOS Global Public Health logoLink to PLOS Global Public Health
. 2023 Aug 15;3(8):e0002044. doi: 10.1371/journal.pgph.0002044

From biorepositories to data repositories: Open-access resources accelerate early R&D and validation of equitable diagnostic tools

Roger Peck 1,‡,*, Helen L Storey 1,‡,*, Becky Barney 1, Shirli Israeli 1,¤, Olivia Halas 1, Deborah Oroszlan 1, Shiri Brodsky 1, Neha Agarwal 1, Eileen Murphy 1, Mariana Sagalovsky 1, Jessica Cohen 1, Elizabeth Trias 1, Aaron Schutzer 1, David S Boyle 1,
Editor: Dan Kajungu2
PMCID: PMC10426984  PMID: 37582061

Abstract

Diagnostics are critical tools that guide clinical decision-making for patient care and support disease surveillance. Despite its importance, developers and manufacturers often note that access to specimen panels and essential reagents is one of the key challenges in developing quality diagnostics, particularly in low-resource settings. A recent example, as the COVID-19 pandemic unfolded there was a need for clinical samples across the globe to support the rapid development of diagnostics. To address these challenges and gaps, PATH, a global nonprofit, along with its partners collaborated to create a COVID-19 biorepository to improve access to biological samples. Since then, the need for data resources to advance universal rapid diagnostic test (RDT) readers and noninvasive clinical measurement tools for screening children have also been identified and initiated. From biospecimens to data files, there are more similarities than differences in creating open-access repositories. And to ensure equitable technologies are developed, diverse sample panels and datasets are critical in the development process. Here we share one experience in creating open-access repositories as a case study to describe the steps taken, the key factors required to establish a biorepository, the ethical and legal frameworks that guided the initiative and the lessons learned. As diagnostic tools are evolving, more forms of data are critical to de-risk and accelerate early research and development (R&D) for products serving low resource settings. Creating physical and virtual repositories of freely available, well characterized, and high quality clinical and electronic data resources defray development costs to improve equitable access and test affordability.

Introduction

According to the World Health Organization, diagnostics “are essential for advancing universal health coverage, addressing health emergencies, and promoting healthier populations” [1]. These critical tools guide clinical decision-making for patient care, reduce the use of prescription drugs, curb the spread of drug resistance, support disease surveillance programs, and raise early alerts of a potential pandemic [2, 3]. As in the COVID-19 response, accurate and timely diagnosis has been a critical cornerstone of control efforts [4], as these tools reveal disease location and its scale, contribute to contact tracing, guide vaccination rollouts, and advance global surveillance efforts. Despite their great importance, equitable access to much-needed diagnostic tools has been an ongoing challenge [5]. In some low resource settings, diagnostics may be either unavailable in the region, too expensive, or too resource-intensive for local health systems to acquire and use. For example, the beginning of the COVID-19 pandemic, the only diagnostic tools available were lab-based tests, which were complex, costly, and infrastructure intensive. Additionally, the relative lack of regional manufacturing capacity further limits the supply of affordable and appropriate products to lower and middle income (LMIC) markets [6, 7].

Developers and manufacturers often note that access to specimen panels and essential reagents is one of the key challenges in developing quality diagnostics, particularly during epidemics. This has historically delayed the development and evaluation of critical diagnostic technologies. Creating open-access biorepositories has been an essential solution to catalyzing early research on many diseases and accelerating the development of new technologies. There are disease and population specific biorepositories all over the world [8]. An additional consideration for biorepositories focused on diseases predominately occurring in global south settings is ensuring appropriate ownership and attribution to scientists and communities contributing invaluable samples to research and biobanks when they are housed and overseen in global north geographies.

Additionally, diagnostic tools are rapidly evolving to utilize more forms of clinical samples and data to better detect health conditions, as well as translate data to action faster through connected capabilities, particularly with the development of smartphone-based screening technologies [9, 10]. For example, there is a growing interest by larger diagnostic companies in developing RDT readers and applications to deploy alongside COVID-19 RDTs. When designed with an understanding of users and local context, RDT readers have the potential to digitize diagnostic data faster and more reliably, allowing facilities and health systems to better inform decision making, as well as providing training and streamlined data entry support for users [11].

PATH, a nonprofit organization committed to global health equity, enables product development for low resource settings by building partnerships between researchers in disease endemic regions and quality manufacturers interested in LMIC markets. To accomplish this goal, PATH maintains long-standing and trusted relationships with a wide range of partners including but not limited to entities in academia, national and subnational governments, nonprofit organizations, civil society organizations, global normative bodies, and industry. When COVID-19 started in Seattle, PATH along with its partners addressed the diagnostic gap by creating one of the first COVID-19 open-access biorepositories [12]. PATH leveraged existing local resources to create the biorepository with scientists and communities in the region to accelerate tools for use globally. In sharing the key steps taken, and lessons learned from the experience, the aim is to further discussions on best practices for developing, operating, and sustaining repositories for clinical samples and data. Additionally, there is a critical need and challenge with expanding to broader data sources as diagnostic tools leverage information in new formats such as images and videos.

Materials and methods

Ethics statement

The Office of Research Affairs (ORA) at PATH reviewed and provided ethical approval for this work (IRBNetID 1584172). Written consent was required for samples from human subjects’ research (HSR) studies. Samples were collected from 2020–2022, and the repository continues to accept requests. No identifiable information for any samples was received by PATH.

The Washington COVID-19 biorepository

When COVID-19 was detected in the Seattle area of Washington State—the first location in the United States—PATH had a unique opportunity to accelerate the development and validation of quality in-vitro diagnostics for COVID-19 by creating a biorepository of COVID-19 clinical specimens. With PATH’s laboratories located in Seattle, in-house assets were leveraged, such as its Biosafety Level (BSL) 2 laboratory, local partnerships, ethics review committee, and legal counsel, to rapidly scale the COVID-19 biorepository.

Checklist to launch the biorepository*:

  • ✓ Ethical oversight

  • ✓ Collaboration requirements

  • ✓ Governance structure (legal, ethical and scientific)

  • ✓ Processing and handling of biospecimens

  • ✓ Administrative and logistical operations

  • ✓ Communication and dissemination

*Not necessarily specific to Covid19

Ethical oversight

One of the earliest considerations in setting up the biorepository was ensuring proper ethical oversight for the purpose of the biorepository. PATH has an in-house program that provides scientific and ethical review for all research, with an emphasis on human subject research. Early discussions identified the need for an ethical review and approval process of the activities to create the biorepository, that would include creation of a biorepository governance plan.

Collaboration

Existing relationships with clinical partners across Seattle were utilized to create a repository of inactivated virus and clinical samples, including nasal swabs, tongue swabs, nasopharyngeal swabs, serum, and plasma. These partners included The Everett Clinic—Part of Optum, UW School of Medicine, FidaLab, Northwest Pathology, Washington State Public Health Laboratories and Bloodworks Northwest. Existing partnerships were critical to obtaining samples when specimens were limited, as well as at no cost when specimens were too costly for smaller diagnostic developers. The details of the partnership also varied slightly across partners. Some differences include how the samples were obtained, data accompanying the samples, the population from where the samples came, and ongoing communication with the partner about biorepository activities. The collaboration was named the Washington COVID-19 biorepository to represent the statewide effort that enabled the biorepository to scale rapidly and open its doors on March 26, 2020.

Governance

PATH’s Office of Research Affairs (ORA) had processes for Biorepository Governance Plans (BGP) to outline the scientific, ethical, and legal oversight mechanisms that govern sample collection, storage, and distribution for future research. The project team and ORA worked collaboratively to establish the BGP framework, including key tenets for accepting samples from different types of sources. These source types included, but were not limited to, samples from human subjects’ research (HSR) studies, from laboratories conducting clinical testing including clinical discards [13], and from research laboratories manipulating infectious material of COVID-19. The BGP also outlined the type of research the samples could be used to support, with a focus on COVID-19 diagnostic development, assessment of diagnostic performance, and basic research into the human immune response to COVID-19. Biospecimens from the repository cannot be sold to any recipients. To transfer samples, a process of material transfer agreements (MTAs) was used. Additional mechanisms, such as a governance committee were defined in the BGP pertaining to the ethical and scientific oversight of the biorepository samples to facilitate efficient processing of sample requests. A version of the BGP is available in the supplemental material (S1 Text).

Biospecimens

The PATH BSL2 laboratory in Seattle had capacity to add the biorepository, utilizing about 10% of its freezer space to host COVID-19 specimens compared to the other collections of specimens for product development research. Samples were collected with a confirmed diagnosis of COVID-19 in two distinct cohorts. The first cohort was specimens to support the development of virus-based diagnostics (e.g., antigen or nucleic acid tests) that were collected from routine clinical care at clinics throughout the region. These clinical samples were from patients with a confirmed diagnosis of COVID-19 using a Food and Drug Administration (FDA) Emergency Use Authorization (EUA) COVID-19 reverse transcription-polymerase chain reaction (RT PCR) diagnostic test and shared by the following partners: the Everett Clinic- Part of Optum, FidaLab, Northwest Pathology, Bloodworks Northwest, Washington State Public Health Laboratories and the UW School of Medicine. The nasal specimens, which made up most of the biorepository, were frozen by partnering labs then sent to PATH for cataloging and further freezer storage. All samples were deidentified prior to delivery to PATH’s labs to ensure patient privacy. Additionally, freeze/thaw cycles of any specimens were tracked through the specimen ID.

The second cohort of samples (serum and plasma specimens collected from 6 patient visits over 12 weeks) were derived from 63 individuals who were infected with COVID-19 between February–March 2020. These specimens support the development of COVID-19 antibody tests. Because the second cohort was part of a research study, participant consent was required to allow their samples to be collected and stored in the COVID-19 biorepository. Collected blood samples were packed on ice, couriered directly to the PATH laboratory, and immediately cataloged and processed with universal precautions, even during the citywide lock-down (Fig 1). Some of the modifications to procedures that were implemented due to the lock-downs include, alternative transportation logistics as city buses were disrupted, use of additional personal protective equipment, social distancing procedures to limit the number of people in the lab at one time as well as to space out individuals in the lab, and creating designated work groups to assist in contact tracing if an exposure occurred.

Fig 1. Biospecimens were immediately received, cataloged, and processed for freezer storage.

Fig 1

Logistics

A logistics plan for managing requests was created including a dedicated email alias (specimenrepository@path.org) to triage and ensure timely responsiveness to inquiries, as well as streamline the process for review and fulfillment of specimens. The intake process started with an intake form linked on the Washington COVID-19 biorepository landing page for interested parties to submit their requests, provide information on their primary research goals, specify the types of specimens requested, and detail the commercialization of technology and tools to be developed. The intake form also noted our BGP’s scientific and ethical requirements. Completed intake forms were reviewed by the governance committee with technical, ethical, and commercial considerations for approval. Following request approval, and execution of an MTA, the appropriate specimens were pulled from the laboratory freezer and packaged for biological material shipment including proper containment and dry ice. An existing shipper provided overnight shipping in compliance with applicable regulations. Recipients of specimens from the biorepository were responsible for shipping fees.

Dissemination

To generate awareness and demand for the biorepository, particularly reaching developers in LMIC settings, a landing webpage was launched to promote the biorepository and its goals. Traffic was driven to the landing page via email campaigns, social media, a press release, and a webinar. Digital marketing efforts included social media, blog posts, and media outreach. Email distributions were used to target cohorts by geography, translating into various languages and further distributed by PATH country offices.

From biospecimens to RDT images, expanding repositories to include new forms of clinical information

Because of PATH’s existing projects and partnerships, access to COVID-19 RDT manufacturers and products is currently being leveraged to advance an RDT image repository [14]. Large diagnostic companies (e.g., Abbott and Quidel) are starting to develop RDT readers and apps to be deployed alongside RDTs for Covid19 diagnostic tools. These combination diagnostic test and digital technology products present unique challenges from a regulatory pathway perspective, especially for smaller manufacturers who may lack the resources and regulatory experience to navigate critical requirements for submissions such as World Health Organization (WHO) Emergency Use Listing (EUL) and Pre-qualification (PQ). Several technical documents are required as part of the PQ submission process including the Design History File, Device Master Record, quality management system, device specification, and description of manufacturing site, to name a few. Understanding the regulatory landscape better and sharing these processes and learnings to enable more rapid WHO EUL or PQ approval of combination products helps accelerate new test designs.

Additionally, libraries of well-curated, catalogued images of RDTs are critical to enable a range of developers to design, train, and verify machine learning algorithms to correctly score positive and negative samples. Images are being collected and annotated using a range of image capture methods such as Android and iOS devices, under a range of lighting conditions to enable a more robust collection. Contrived (in silico or synthetic) images are also being made to further increase and diversify the repository with varying signal intensity, background, or lighting conditions. The development of the structure of the image repository is drawing on previous biorepository experiences generating, annotating, storing, and granting access to clinical specimens. Working closely with developers to identify how the image libraries should be organized for ideal utilization in app development and verification is important. The repository architecture is also being designed to allow third-party contributions to enable expansion and adaptation as product development evolves. The intent of all of PATH’s repositories are to make resources available for free to developers who are qualified, competent, and committed to Global Access provisions.

Adding video data to repositories advances development of remote sensing of clinical information

Hypoxemia, a common symptom of Covid-19, can be measured using a non-invasive pulse oximeter (PO), and while pulse oximeters have been around for decades, access and awareness of pulse oximetry increased worldwide due to the Covid-19 pandemic [15]. Development of next-generation, or multimodal PO devices was also accelerated, primarily focused on adults. Multimodal PO device manufacturers are leveraging the photoplethysmography (PPG) data used for oxygen saturation measurement to add functionality such as respiratory rate measurement, though other potential parameters include heart rate, anemia and blood pressure [16]. The first multimodal pulse oximeter to measure oxygen saturation and respiratory rate received FDA approval and listing on the United Nations Children’s Fund supply catalogue recently, and additional competitors are in the process of entering the market (https://www.masimo.com/products/continuous/rad-g/).

In addition to medical device sensors, there is interest in advancing research on using existing smartphone sensors to measure PPG directly from a video. As with other medical devices, a key barrier that manufacturers have flagged in advancing multimodal POs is the difficulty of validating new devices or algorithms, particularly among children. Respiratory rate is an important clinical measurment in detecting pneumonia, especially in children, but it is not done as often as it should be and is highly variable when manually counted [1720]. Though children would benefit greatly from better triage and disease management tools, these validation studies are costly, the potential demand for this class of products is unproven, and the global health market is highly price sensitive. As part of an ongoing study among children under five years, PATH is developing an open-access data repository to support the development of integrated primary healthcare clinical measurement tools, particularly for children. Among the data that is being collected across four countries is reference measurements for pulse oximetry, heart rate, respiratory rate, hemoglobin, and PPG, along with deidentified video and audio data files (Fig 2). Additionally, the data repository will follow FAIR Principles (Findability, Accessibility, Interoperability, and Reusability) [21]. The structure will be developed to follow general data modeling rules that reflect best practices for creating effective and efficient data models. This will include considerations such as identifying the entities and relationships that need to be represented, choosing appropriate data types and constraints, and defining clear and consistent naming conventions so the dataset could be merged with a larger system if required.

Fig 2. Example of an open-access data repository structure to support high-quality and responsibly collected data is available for future development of public health products.

Fig 2

There are key considerations as clinical samples or data are transferred from collection to storage to users. This flow diagram maps details for a specific PATH-led study; however, the attributes are relevant to other repository efforts as well. Early planning for the sources, structure, and purpose of the repository helps inform later decisions on execution of the work.

Results and discussion

Early outcomes of the COVID-19 biorepository

Since March 2020, the Washington COVID-19 biorepository has provided qualified sets of clinical samples to assist in the verification and validation of diagnostic tests. To date, 63 total submissions have been received from 12 different countries, of which 45 were approved, supporting 5 FDA EUA and 1 WHO EUL regulatory pathways. Pre-approval, some reasons for declining requests included a lack of inventory of the requested sample, misalignment between technology stage of development and value/scarcity of sample type, and the requesting organization not having a BSL-2 laboratory, which was a safety requirement. Post-approval, some requests were later “declined” due to an organization’s inability to meet legal terms around global access and international shipping restrictions, which in cases prevented import of samples. Improving access to qualified clinical samples allowed developers to better compare tests and identify those best suited for use in a particular environment. This has enabled public health leaders to determine how tests might work together or help troubleshoot challenges as they are deployed in the health system. Some partners who received specimens were able to detect and resolve required modifications for several of their diagnostic tests before deployment in a public health setting, for example ensuring detection of newer variants, and confirming performance of tests across transport media types and cycle thresholds.

Lessons learned in developing a biorepository for clinical specimens

Despite experience in developing and managing biorepositories, creating one at the start of a pandemic to be utilized as quickly as possible presented challenges. Even with the existing structures and guiding frameworks available as a resource, the team had to work closely with colleagues in facilities, research, and legal departments to address issues quickly and to ensure the safety of staff, the protection of human subjects, adherence to applicable laws, and appropriate compliance to partner and donor obligations with respect to specimen transfer, use, ownership, and global access requirements. A risk assessment plan was drafted to allow individuals back in the lab in small groups, requiring approval from PATH institutional leadership and further training for team members on updated COVID-19 processing requirements and lab safety before onsite lab access was re-granted.

Careful consideration of human subjects’ protection and legal ownership of specimens were also central to this effort, and individually evaluated for each source of samples to ensure alignment with regulatory, legal, and ethical requirements. For samples collected prospectively by other institutions through HSR, consent documentation in those studies had to allow for future use of participant’s biological samples for COVID-19 research. Even though PATH was not engaged in the research, ORA provided recommendations aligned with FDA and Common Rule regulations and guidance to ensure samples could be used for product development and commercialization efforts. As a result of this work, PATH also developed a process for utilizing clinical discard samples in future research that falls under FDA guidance.

Legal agreements set the terms for how PATH could receive and share partner samples so developing appropriate legal instruments was an integral step to the functioning framework of the biorepository. Both receiving and sharing samples required an MTA, which needed to have aligned terms, including donor obligations global access terms to ensure the biorepository supported access and availability of high-quality diagnostics to LMICs. Despite the usefulness of a template MTA, many partners did not have legal teams to advise on the documents. To address this challenge, an approach was adopted to make MTAs as flexible and accessible as possible for organizations with limited legal tools. This right-sizing approach also increases inclusiveness by supporting partners to fill gaps in capabilities, without sacrificing quality or safety.

Further challenges of scaling and ensuring a sustainable approach to managing a biorepository were highlighted in this effort, including long-term storage and funding, ensuring visibility to those who can best utilize the materials, and the potential value of global virtual biorepository networks. Significant operational agility, collaboration, and coordination across several stakeholders as well as local partnerships built over years were key to the development of this global good.

Critical data beyond biospecimens

Even in 2023, the biorepository continues to receive a handful of requests each year. And while new clinical samples have not been added recently, the focus has shifted to electronic data in the form of RDT images. Access to this information enables partners to develop digital readers and machine learning algorithms that support test result interpretation (i.e., controlled images taken under varying conditions can be used to train computer vision models). This work aims to support manufacturers as they incorporate RDT readers into diagnostic products for COVID-19, as well as other diseases like HIV and eventually malaria. Currently, the COVID-19 dataset contains 32,000 images and associated metadata, including a library of 24,000 images that can be used by developers to train machine learning algorithms and around 8,000 images available for algorithm validation purposes. The HIV dataset contains 12,000 images, each with an accompanying annotation file containing 12 attributes. The recently disseminated target product profile for readers of rapid diagnostic tests, led by WHO, speaks to the importance of this companion tool in promoting more consistent, accurate test performance, interpretation, and reporting [22].

A critical issue in diagnostics and AI model development is that the tools developed are only as good as the panels or data used to create them. During the Covid-19 pandemic, a previously known issue around the accuracy of pulse oximeters across varying skin pigmentation caused renewed concern among healthcare providers, as well as the FDA [2325]. Technologies that use light reflection and absorption sensors may be affected by skin pigmentation. If manufacturers have not validated their devices among diverse populations of users, then that variability is not well defined and adjusted for. This speaks to the need to ensure diverse user groups are included in device development and validation. Particularly for machine learning, diverse algorithms require diverse datasets to train them. And when those populations are harder to reach, such as children, collaborations are key to ensuring access to high quality and responsibly obtained data.

Another challenge that arises as we transition from freezer storage to cloud storage is how to store and share large data files most efficiently. Larger datasets are needed for machine learning advances, and those datasets will likely include multimodal data sources. A PPG waveform is one example of a data source that includes many data signals per second, and AI will be useful to find meaningful patterns in the signals. Similar to accelerating open access publications, there may be a role for donors to play in accelerating the access and benefits of open access data repositories. For example, offsetting the costs of cloud storage or developing common platforms for use by researchers, could make these global public goods more accessible and quickly.

As newer diagnostic technologies advance, such as connected medical devices or smartphone-based tools, which detect PPG measurement through existing sensors, new ways to accelerate the development, validation, and refinement of important public health products by more manufacturers will ensure a healthy ecosystem for innovation and that the potential benefits of these new tools are able to reach everyone. For example, developer groups are driving cutting edge research in health sensing technologies that use a standard smartphone-based platform. While consumer products are not replacements for medical devices, basic health information that is being derived from these tools is critical for frontline health workers to inform decision-making [26]. For community and primary health care settings, easily accessible and objective health information is valuable for identifying more critically ill or at-risk patients, referring them to necessary care faster, and directing scarce health resources to where they are needed most. Additionally, as smartphone-based sensors become more advanced [27], it is important to consider the potential role of smartphone-based clinical screening tools to digitize health data, enable risk-based stratification of patients, and more rapidly contribute to decision-making on individual and population-based care. With a market penetration of more than 6 billion users, smartphones could be the most readily available health tool already in the pockets of most providers, health workers, and patients globally [28].

Conclusion

The right diagnostic tools save lives if they are made widely accessible, especially in under-resourced settings across all countries. Through repositories of specimens and data, PATH cost-effectively increases access to resources that are central to the development of promising diagnostic tests for COVID-19 and other health areas. As diagnostic technologies evolve, the need for more diverse and data rich repositories will be key to continue to expand diagnostic product classes and ensure these tools perform as expected everywhere. Strengthening resources to develop appropriate diagnostic tools for high-priority pathogens is critical to more effectively controlling epidemics and pandemics in the future, as well as addressing endemic health conditions. Building partnerships to offer annotated and curated collections for free defrays early R&D risks and costs to support global access and test affordability, ultimately contributing to better health equity.

Supporting information

S1 Text. A version of the biorepository governance plan as an example.

(PDF)

Data Availability

Biospecimens referred to in this manuscript are available at request through the PATH repository website: https://www.path.org/programs/diagnostics/washington-covid-19-biorepository/.

Funding Statement

This work was funded by the Bill and Melinda Gates Foundation (https://www.gatesfoundation.org/) via the following grants: INV-016821 (RP, HLS, BB, SI, OH, DO, SB, NA, EM, MS, JC, ET, AS, DB) and INV-048193 (HLS, DO, SB). The funder did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002044.r001

Decision Letter 0

Dan Kajungu

24 Apr 2023

PGPH-D-23-00417

From biorepositories to data repositories: open-access resources accelerate early R&D and validation of equitable diagnostic tools.

PLOS Global Public Health

Dear Dr. Storey,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 24 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Dan Kajungu, PhD

Academic Editor

PLOS Global Public Health

Journal Requirements:

1. Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.

a. State the initials, alongside each funding source, of each author to receive each grant.

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

Authors can respond to the comment from reviewers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors of this manuscript (more a research methods paper, not a research paper per se) present a use case describing the considerations in developing a coordinated open-access specimen biorepository and data registry that can prioritize access and use for innovative research, development and validation of diagnostic tools. Distinguishing these efforts from other ‘academic biorepositories’ is, in part, the desire to support research applications amenable to translation to low and middle income countries, which in turn prompts special considerations in the implementation strategy. A critical point, made in several sections, is the importance of existing partnerships and thereby implicitly sustainability as a Gates Foundation-supported non-profit. More specific details on implementation and operations and on success in achieving its mission goals would be very helpful. The value of a capacity like PATH is an important message for the Global Health community,

STRENGTHS OF THE PAPER; Major Points for improvement

• The paper underscores the importance of pre-existing partners in addressing emergent public health emergencies. Through such relationships, organizations like PATH have developed the internal workflows and governance considerations that offer agility and speed in response.

o The partnerships established by PATH and reflected in this paper are quite diverse (academic health centers, non-profit labs, health departments, local clinics). What was the incentive to send their specimens to PATH? Articulating both the financial and non-financial incentives could help readers approach building similar relationships and infrastructure.

• The authors emphasize the importance of diversity at every step, including specimens and data themselves as well as in the approaches to test and validate technology. Collaborations are key to accessing ‘hard to reach’ and vulnerable populations to make sure that research discoveries pertain to diverse populations. Were these relationships already in place prior to COVID-19?

• The authors identified key considerations in the establishment, implementation and sustainability of the resources (biorepository, data registry) – e.g., IRB, freezer space, MTAs, laboratory capacity (BSL2) –

o More specific insight into the branch point decisions that must be made in developing the structure if a group wished to implement at their site would be helpful. For example, are there different documentation requirements for accepting prospective research specimens vs remnant clinical specimens? is there a ‘go/no go’ determination of whether a new partner is onboarded? Are there ‘checks’ in assessing the rigor and consistency in collecting & handling specimens (the latter could be really important in downstream use and could represent variability in results)? Are there staffing considerations that stem from decisions about strategy and scale? Are there infrastructure considerations? More specificity on the “issues” which can be anticipated and the authors’ team approaches to safety, IRB, legal, MTA, etc. (lines 227-230) would be helpful..

� To the latter point about “issues” – which of these were uniquely COVID-19 related (and may substantiate a “pandemic preparedness special consideration” subset) and which are generalizable to repositories/registries built outside of a public health emergency? How will these insights offer PATH and everyone that reads this paper greater agility to navigate the issues more effectively next time?

o The comment that “many partners did not have legal teams” (line 248) was especially meaningful and important as the U.S. research enterprise, broadly writ, is encouraged to expand the reach to underrepresented populations (e.g., via rural hospitals, federally qualified health centers, community locations).

• The authors illustrate the flexible value proposition of their repository by pointing to end users that were able to “detect and resolve required modifications”. More detail would again be helpful as it would set up nicely the premise that “but for the use of PATH biospecimens, these technologic innovations would have been delayed, diminished or deterred”. Detail would elevate the claim from sweeping generalization to evidence-based insight.

• PATH appears to prioritize research and innovation that will make diagnostic tools (and other clinical applications) more accessible to low and middle income countries. The core messages of this manuscript are just as relevant to rural settings in all countries and reinforce important considerations for the research infrastructure everywhere as our ability to conduct experiments at greater scale continues to grow.

WEAKNESSES OF THE PAPER: Additional points for improvement

• Figure 1 provides little useful information and can be deleted.

• The general framework of the first half of the paper is straightforward; however, the pivot (line 185) to applications (pulse oximeters) seems awkward. Can the authors make the transition from biospecimen to physiological measurement devices easier to grasp?

o COVID-19 is caused by the SARS-CoV-2 virus… COVID-19 can lead to hypoxemia. The latter is not the cause of the illness, but is symptomatic of it (line 186)

• Figure 2 needs a legend in order to be clear about the message intended.

o It is also not explained how the approach “ensures high-quality and responsibly collected data” – by what methods is this ensured? What metrics are used to determine quality? How are issues like missing data or interoperability addressed?

• In the context of the data registry description (and data repository of Figure 2), the absence of comment about data models (e.g., OMOP) and FAIR principles that ensure that data from multiple sources can be integrated meaningfully is surprising. This deserves comment.

• Every biorepository of this nature has a limited number of samples The authors mention a request process (63 asks, 45 approved), but there was little explanation of if/how the group prioritized requests to maximize/extend the lifetime of the repository. On what basis were ~1/3 of the requests rejected, if other than lack of mission alignment to address LMIC goals?

o Did mechanical issues like freeze/thaw cycles factor in the dissemination process?

• The specimens were required to be de-identified and remnant specimens (the largest fraction of the biorepository) would have very little accompanying metadata. So, the process for characterization of “qualified, pedigreed” samples is very unclear… did PATH collect information about familial health?

FORMATTING CONSIDERATIONS

• The authors use a lot of acronyms that are not defined (e.g., RDT, LMIC, ORA) which makes reading very difficult. The acronyms should be defined.

• For those unfamiliar with PATH, it reads like another undefined acronym at (its mission is not defined until line 70). The authors should acknowledge that PATH is a non-profit organization committed global health equity issues by fostering innovative and accessible solutions in line 23 and 70.

• The quote in the first sentence of the introduction does not have a ‘close quote’ demarcation, making it unclear how much of that sentence is attributed to the reference.

Reviewer #2: The work shares experiences and lessons learnt relevant for future development efforts of Biorepositories and some immediate outcomes. I commend the authors for the well written article. I have a few comments here below.

1. Ethics statement – Given that the ethical clearance was for the repository being developed, it would be more meaningful to present it as one of the key steps needed for the development of a Biorepository.

2. Lines 135-136- The authors state that “samples were packed, couriered directly to PATH and immediately cataloged and processed with universal precautions even during citywide lockdown”. They however do not state how they dealt with these restrictions to be able to go about their work. -these experiences are important to highlight to inform future work amidst similar pandemics.

2. Although the establishment leveraged on existing PATH resources, it would be important to indicate cost considerations and how these were managed. For example, how much of the total establishment costs, at least an estimate, were covered by existing infrastructure etc.

3. I find that the information on stakeholders and their roles have been presented in very broad terms. The governance plan provides some more overview but again this has its own appendices that are not accessible. I would have wanted to see more details on key stakeholders in the development process beyond collaborations, governance etc. for example who they were and what were their roles in the establishment of the repository.

4. lines 185-208. There is a long narrative under the section “Advancing PPG-derived technology through COVID-19” -that describes this new technology and how relevant it is to COVID19 testing. I however, did not understand how it fits with the “experience for repository development” that can be learnt. I would think that it is diagnostic technology related work not repository development.

5. Finally, I also think that one of the early outcomes of the Biorepository development is the data and metadata being created that can be used by developers to train their machine learning algorithms. This perhaps should have been highlighted as one of the low hanging fruit from this effort.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002044.r003

Decision Letter 1

Dan Kajungu

1 Jun 2023

From biorepositories to data repositories: open-access resources accelerate early R&D and validation of equitable diagnostic tools.

PGPH-D-23-00417R1

Dear Helen,

We are pleased to inform you that your manuscript 'From biorepositories to data repositories: open-access resources accelerate early R&D and validation of equitable diagnostic tools.' has been provisionally accepted for publication in PLOS Global Public Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact globalpubhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Global Public Health.

Best regards,

Dan Kajungu, PhD

Academic Editor

PLOS Global Public Health

***********************************************************

Reviewer Comments (if any, and for reference):

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. A version of the biorepository governance plan as an example.

    (PDF)

    Attachment

    Submitted filename: Response to reviewers May2023.docx

    Data Availability Statement

    Biospecimens referred to in this manuscript are available at request through the PATH repository website: https://www.path.org/programs/diagnostics/washington-covid-19-biorepository/.


    Articles from PLOS Global Public Health are provided here courtesy of PLOS

    RESOURCES