The drive toward open clinical trial data sharing promises to accelerate scientific discovery,9 yet shared datasets often lack the context needed for accurate interpretation. Although the National Institutes of Health (NIH) and international research organizations champion data sharing, incomplete documentation creates measurable downstream impacts. A systematic solution must ensure that shared datasets include comprehensive documentation of their nuances and limitations.6
Complex trial data stripped of critical context poses significant challenges for secondary analysis. Randomized trials comparing multiple interventions often contain specific design features precluding certain types of analyses. Only 30% of shared trial datasets include sufficient metadata for replication.10 The scope extends beyond documentation issues. In a striking example, 70 independent teams analyzing an identical neuroimaging dataset reached widely divergent conclusions.7 This crucial context is even more likely to be lost in translation when data dictionaries are created as an afterthought following study completion. This challenge grows as automated analyses of public datasets increase, raising the risk of misapplied methods.8
This is no longer theoretical, in 1 study, 40% of shared trial datasets have been analyzed in ways that violated their primary design constraints.19 Clinical trial data labeled as “open” but lacking interoperability led to misinterpretation in 15% of secondary analyses due to missing context.24 Although scientific communities eventually correct errors through letters to editors or retraction requests, retracted articles persist as cited valid evidence long after retraction.12 This problem extends to journal policies as well. In a recent review of data sharing policies across health research globally, data sharing was required by only 19% (52/273) of health sciences journals, with widely varying conditions for implementation.23
Clinical trial datasets embody complex development processes shaped by specific aims, design decisions, and operational realities. These elements determine valid data inferences. Original study teams understand these nuances, yet data dictionaries rarely capture these essential constraints after study completion.21 This documentation gap grows particularly problematic as repositories accumulate datasets for secondary use.
Truly interoperable clinical trial datasets hold immense potential. Large data collections aim to create FAIR (Findable, Accessible, Interoperable, and Reproducible) resources accelerating discovery through secondary analysis.25 Poor curation and insufficient guidance about dataset strengths and vulnerabilities jeopardize this goal. Analysis shows 12% of shared radiology datasets led to misinterpretation from lack of context, although proper documentation enabled benefits from verification and meta-analyses that outweighed these risks.18
Our NIH-funded BACPAC BEST study illustrates these challenges. This multisite trial uses Sequential Multiple Assignment Randomized Trial (SMART) design to identify phenotypic characteristics of patients with chronic low back pain responding to common treatments.15 We specifically selected treatments in different treatment classes (eg, physical therapy, cognitive behavioral therapy, duloxetine) ensuring each intervention contained only single-discipline content (ie, interventions that were purely from 1 treatment modality without combining elements from multiple approaches). This design choice, crucial for precision medicine inferences, explicitly prevents using data for treatment comparative effectiveness. Without proper documentation, analysts might misuse BEST data for inappropriate treatment comparisons that could mislead clinical practice.
Different analytical approaches legitimately yield varying results from identical datasets.13 This acceptable variation differs fundamentally from analyses violating core design principles. Better documentation and standards,5 not access restrictions, offer solutions. Studies show improved documentation and structured environments reduced misinterpretation by 35% while maintaining utility.22
Addressing these challenges demands multiple approaches. Community standards must expand beyond basic codebooks and curation. NIH funding should support technical and clinical guides capturing trial complexity. Data dictionaries must evolve into comprehensive documentation including design constraints, analytical boundaries, and key assumptions.16 Current guidelines like CONSORT, SPIRIT, and CDISC provide valuable frameworks, yet none fully addresses preparing trial data for sophisticated secondary use. FAIR principles offer high-level guidance, but implementation requires expertise and resources many research teams lack.25
Open science and data sharing in clinical trials present opportunities and challenges.1,3 Sensitive health data shared with proper protocols show reidentification risks as low as 1.2% after anonymization.20 These technical protections, however, are only 1 aspect of data governance. The administrative aspects also present challenges. Manual data access committees have proven ineffective, with evaluations of clinical data warehouses revealing inconsistent governance and only 8% having transparent approval criteria.17 Furthermore, “data available on request” policies result in less than 20% actual sharing rates due to administrative bottlenecks.11 Although manual oversight shows clear limitations, structured governance approaches like federated analysis offer more promising solutions.22
We propose unifying existing principles into practical guidance for trial data documentation (Fig. 1). This framework must address technical interoperability and contextual knowledge critical for valid secondary analyses.4 Sustained funding support for data preparation and stewardship remains essential. Robust documentation standards, sustained curation funding, and maintained scientific oversight can realize open science's promise while protecting research integrity. These datasets increasingly guide clinical practice and research directions, demanding infrastructure matching their complexity.14 The scientific community has a responsibility to make data sharing not just possible, but meaningful and reliable.2
Figure 1.
Framework for clinical trial data curation and sharing with 6 essential components that preserve scientific integrity while enabling secondary analysis. Data Quality and Validation: CDISC (Clinical Data Interchange Standards Consortium) SDTM/ADaM (Study Data Tabulation Model/Analysis Data Model) standards and FDA (Food and Drug Administration) quality guidelines. Metadata and Documentation: CONSORT (Consolidated Standards of Reporting Trials) extensions, NIH Pain Common Data Elements, and comprehensive data dictionaries documenting analytical constraints. Data Harmonization and Standardization: PCORnet (Patient-Centered Outcomes Research Network) and OMOP (Observational Medical Outcomes Partnership) common data models. Data Provenance and Attribution: DataCite identifiers, ORCID (Open Researcher and Contributor ID) integration, and W3C (World Wide Web Consortium) PROV standards. Data Preservation and Accessibility: DataVerse repositories, Zenodo archiving, and DataMed indexing services. Training and Support: FORCE11 (Future of Research Communications and e-Scholarship) FAIR (Findable, Accessible, Interoperable, and Reusable) data training and NIH Data Science initiatives.
Conflict of interest statement
The authors have no conflicts of interest to declare.
Acknowledgements
Research reported in this publication was supported by NIH HEAL Initiative, National Institute on Drug Abuse, NIH Office of the Director, and National Institute of Neurological Disorders and Stroke under grant numbers R24DA055306, R24DA055306-01S1, R24DA055306-02S1, U24DA057612, U24DA058606, R25DA061740, and U19 AR07634.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Footnotes
Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article.
References
- [1].Adams MCB, Hurley RW, Siddons A, Topaloglu U, Wandner LD. NIH HEAL clinical data elements (CDE) implementation: NIH HEAL initiative IMPOWR network IDEA-CC. Pain Med 2023;24:743–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Adams MCB, Bann CM, Bayman EO, Chao M, Hergenroeder GW, Knott C, Lindquist MA, Luo ZD, Martin R, Martone ME, McCarthy J, McCumber M, Meropol SB, Ridenour TA, Saavedra LM, Sarker A, Anstrom KJ, Thompson WK. Building community through data: the value of a researcher driven open science ecosystem. Pain Med 2025;26:295–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Adams MCB, Hassett AL, Clauw DJ, Hurley RW. The NIH HEAL pain common data elements (CDE): a great start but a long way to the finish line. Pain Med 2025;26:146–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Adams MCB, Perkins ML, Hudson C, Madhira V, Akbilgic O, Ma D, Hurley RW, Topaloglu U. Breaking digital health barriers through a large language model-based tool for automated observational medical Outcomes partnership mapping: development and validation study. J Med Int Res 2025;27:e69004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Adams MCB, Sward KA, Perkins ML, Hurley RW. Standardizing research methods for opioid dose comparison: the NIH HEAL morphine milligram equivalent calculator. PAIN 2025;166:1729-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Boeckhout M, Zielhuis GA, Bredenoord AL. The FAIR guiding principles for data stewardship: fair enough? Eur J Hum Genet 2018;26:931–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, Kirchler M, Iwanir R, Mumford JA, Adcock RA, Avesani P, Baczkowski BM, Bajracharya A, Bakst L, Ball S, Barilari M, Bault N, Beaton D, Beitner J, Benoit RG, Berkers RMWJ, Bhanji JP, Biswal BB, Bobadilla-Suarez S, Bortolini T, Bottenhorn KL, Bowring A, Braem S, Brooks HR, Brudner EG, Calderon CB, Camilleri JA, Castrellon JJ, Cecchetti L, Cieslik EC, Cole ZJ, Collignon O, Cox RW, Cunningham WA, Czoschke S, Dadi K, Davis CP, Luca AD, Delgado MR, Demetriou L, Dennison JB, Di X, Dickie EW, Dobryakova E, Donnat CL, Dukart J, Duncan NW, Durnez J, Eed A, Eickhoff SB, Erhart A, Fontanesi L, Fricke GM, Fu S, Galván A, Gau R, Genon S, Glatard T, Glerean E, Goeman JJ, Golowin SAE, González-García C, Gorgolewski KJ, Grady CL, Green MA, Guassi Moreira JF, Guest O, Hakimi S, Hamilton JP, Hancock R, Handjaras G, Harry BB, Hawco C, Herholz P, Herman G, Heunis S, Hoffstaedter F, Hogeveen J, Holmes S, Hu CP, Huettel SA, Hughes ME, Iacovella V, Iordan AD, Isager PM, Isik AI, Jahn A, Johnson MR, Johnstone T, Joseph MJE, Juliano AC, Kable JW, Kassinopoulos M, Koba C, Kong XZ, Koscik TR, Kucukboyaci NE, Kuhl BA, Kupek S, Laird AR, Lamm C, Langner R, Lauharatanahirun N, Lee H, Lee S, Leemans A, Leo A, Lesage E, Li F, Li MYC, Lim PC, Lintz EN, Liphardt SW, Losecaat Vermeer AB, Love BC, Mack ML, Malpica N, Marins T, Maumet C, McDonald K, McGuire JT, Melero H, Méndez Leal AS, Meyer B, Meyer KN, Mihai G, Mitsis GD, Moll J, Nielson DM, Nilsonne G, Notter MP, Olivetti E, Onicas AI, Papale P, Patil KR, Peelle JE, Pérez A, Pischedda D, Poline JB, Prystauka Y, Ray S, Reuter-Lorenz PA, Reynolds RC, Ricciardi E, Rieck JR, Rodriguez-Thompson AM, Romyn A, Salo T, Samanez-Larkin GR, Sanz-Morales E, Schlichting ML, Schultz DH, Shen Q, Sheridan MA, Silvers JA, Skagerlund K, Smith A, Smith DV, Sokol-Hessner P, Steinkamp SR, Tashjian SM, Thirion B, Thorp JN, Tinghög G, Tisdall L, Tompson SH, Toro-Serey C, Torre Tresols JJ, Tozzi L, Truong V, Turella L, van 't Veer AE, Verguts T, Vettel JM, Vijayarajah S, Vo K, Wall MB, Weeda WD, Weis S, White DJ, Wisniewski D, Xifra-Porxas A, Yearling EA, Yoon S, Yuan R, Yuen KSL, Zhang L, Zhang X, Zosky JE, Nichols TE, Poldrack RA, Schonberg T. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 2020;582:84–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Else H. “Papermill alarm” software flags potentially fake papers. Nature. 2022. doi: 10.1038/d41586-022-02997-x. . [DOI] [PubMed] [Google Scholar]
- [9].Flanagin A, Curfman G, Bibbins-Domingo K. Data sharing and the growth of medical knowledge. JAMA 2022;328:2398–9. [DOI] [PubMed] [Google Scholar]
- [10].Geifman N, Bollyky J, Bhattacharya S, Butte AJ. Opening clinical trial data: are the voluntary data-sharing portals enough? BMC Med 2015;13:280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Hamilton DG, Hong K, Fraser H, Rowhani-Farid A, Fidler F, Page MJ. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. BMJ 2023;382:e075767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Hsiao TK, Schneider J. Continued use of retracted papers: temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine. Quant Sci Stud 2022;2:1144–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Kummerfeld E, Jones GL. One data set, many analysts: implications for practicing scientists. Front Psychol 2023;14:1094150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Leonelli S. Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems. Philos Trans A Math Phys Eng Sci 2016;374:20160122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Mauck MC, Lotz J, Psioda MA, Carey TS, Clauw DJ, Majumdar S, Marras WS, Vo N, Aylward A, Hoffmeyer A, Zheng P, Ivanova A, McCumber M, Carson C, Anstrom KJ, Bowden AE, Dalton D, Derr L, Dufour J, Fields AJ, Fritz J, Hassett AL, Harte SE, Hue TF, Krug R, Loggia ML, Mageswaran P, McLean SA, Mitchell UH, O'Neill C, Pedoia V, Quirk DA, Rhon DI, Rieke V, Shah L, Sowa G, Spiegel B, Wasan AD, Wey HYM, LaVange L. The back pain consortium (BACPAC) research program: structure, research priorities, and methods. Pain Med 2023;24(suppl 1):S3–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Ohmann C, Banzi R, Canham S, Battaglia S, Matei M, Ariyo C, Becnel L, Bierer B, Bowers S, Clivio L, Dias M, Druml C, Faure H, Fenner M, Galvez J, Ghersi D, Gluud C, Groves T, Houston P, Karam G, Kalra D, Knowles RL, Krleža-Jerić K, Kubiak C, Kuchinke W, Kush R, Lukkarinen A, Marques PS, Newbigging A, O'Callaghan J, Ravaud P, Schlünder I, Shanahan D, Sitter H, Spalding D, Tudur-Smith C, van Reusel P, van Veen EB, Visser GR, Wilson J, Demotes-Mainard J. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 2017;7:e018647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Pavlenko E, Strech D, Langhof H. Implementation of data access and use procedures in clinical data warehouses. A systematic review of literature and publicly available policies. BMC Med Inform Decis Mak 2020;20:157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Sardanelli F, Alì M, Hunink MG, Houssami N, Sconfienza LM, Di Leo G. To share or not to share? Expected pros and cons of data sharing in radiological research. Eur Radiol 2018;28:2328–35. [DOI] [PubMed] [Google Scholar]
- [19].National Academies of Sciences Engineering and Medicine. In: Shore C, ed. Reflections on sharing clinical trial data: challenges and a way forward: proceedings of a workshop. Washington, DC: National Academies Press; 2020. [PubMed] [Google Scholar]
- [20].Simon GE, Shortreed SM, Coley RY, Penfold RB, Rossom RC, Waitzfelder BE, Sanchez K, Lynch FL. Assessing and minimizing re-identification risk in research data derived from health care records. EGEMS (Wash DC) 2019;7:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM, Fletcher J, Frizelle FA, Groves T, Haileamlak A, James A, Laine C, Peiperl L, Pinborg A, Sahni P, Wu S. Sharing clinical trial data—a proposal from the international committee of medical journal editors. N Engl J Med 2016;374:384–6. [DOI] [PubMed] [Google Scholar]
- [22].Tamuhla T, Lulamba ET, Mutemaringa T, Tiffin N. Multiple modes of data sharing can facilitate secondary use of sensitive health data for research. BMJ Glob Health 2023;8:e013092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Tan AC, Webster AC, Libesman S, Yang Z, Chand RR, Liu W, Palacios T, Hunter KE, Seidler AL. Data sharing policies across health research globally: cross-sectional meta-research study. Res Synth Methods 2024;15:1060–71. [DOI] [PubMed] [Google Scholar]
- [24].Watson H, Gallifant J, Lai Y, Radunsky AP, Villanueva C, Martinez N, Gichoya J, Huynh UK, Celi LA. Delivering on NIH data sharing requirements: avoiding open data in appearance only. BMJ Health Care Inform 2023;30:e100771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]