Skip to main content
Journal of Diabetes Science and Technology logoLink to Journal of Diabetes Science and Technology
. 2025 Jan 9:19322968241310896. Online ahead of print. doi: 10.1177/19322968241310896

Advancing Monogenic Diabetes Research and Clinical Care by Creating a Data Commons: The Precision Diabetes Consortium (PREDICT)

Michael E McCullough 1, Lisa R Letourneau-Freiberg 1, Rochelle N Naylor 1,2, Siri Atma W Greeley 1,2, David T Broome 3, Mustafa Tosur 4,5, Raymond J Kreienkamp 6, Erin Cobry 7, Neda Rasouli 8, Toni I Pollin 9, Miriam S Udler 10, Liana K Billings 11,12, Cyrus Desouza 13, Carmella Evans-Molina 14, Suzi Birz 2, Brian Furner 2, Michael Watkins 2, Kaitlyn Ott 2, Samuel L Volchenboum 2, Louis H Philipson 1,
PMCID: PMC11713946  PMID: 39781649

Abstract

Monogenic diabetes mellitus (MDM) is a group of relatively rare disorders caused by pathogenic variants in key genes that result in hyperglycemia. Lack of identified cases, along with absent data standards, and limited collaboration across institutions have hindered research progress. To address this, the UChicago Monogenic Diabetes Registry (UCMDMR) and UChicago Data for the Common Good (D4CG) created a national consortium of MDM research institutions called the PREcision DIabetes ConsorTium (PREDICT). Following the D4CG model, PREDICT has successfully established a multicenter MDM data commons. PREDICT has created a consensus data dictionary that will be utilized to address critical gaps in understanding of these rare types of diabetes. This approach may be useful for other rare conditions that would benefit from access to harmonized pooled data.

Keywords: monogenic diabetes, data commons, data sharing, data standards, data dictionary, data governance

Introduction

Monogenic diabetes mellitus (MDM) is a group of disorders caused by a growing number of rare variants in key genes that result in impaired insulin action, insulin production, glucose-sensing, or insulin secretion, causing hyperglycemia. Although relatively uncommon, MDM is estimated to collectively represent approximately 1% to 5% of diabetes cases diagnosed before age 30 to 35, with rates varying based on screening factors and geographic region.1-6 MDM is associated with multiple syndromes beyond diabetes, and with several subtypes having distinct treatment approaches, MDM is an exemplar for advancing precision medicine in diabetes.7,8 However, most of the cases remain unidentified and therefore inappropriately treated.3,9 Even when cases are successfully identified, data standards do not exist for MDM in the United States. Because each subtype is uncommon, uncertainty remains about subtype specific optimal treatment options and complications. The relative paucity of identified patients with MDM subtypes, lack of uniform data standards, and the absence of data sharing across research groups have cultivated a challenging environment for advancing MDM research and clinical management.

We identified the need to develop a structured collaboration with MDM clinical care and research organizations. The goal is to create a streamlined process for collecting data, combining data, and sharing resources; this will produce a comprehensive controlled-access data set. Ultimately, the collaboration will help to accelerate efficient diagnosis, advance understanding, and inform future clinical guidelines for the identification, diagnosis, and management of MDM.

To this end, we engaged UChicago Data for the Common Good (D4CG) to assist in forming a partnership with MDM research sites. Headquartered in the UChicago Department of Pediatrics, the D4CG team of experts works with collaborators worldwide to share high-quality data between institutions and increase opportunities for discovery in pediatric cancers. 10 D4CG now applies the streamlined and scalable infrastructure and processes that have been developed to disease groups beyond pediatric cancer. Using this approach, a MDM data commons was created, now called Precision Diabetes Consortium (PREDICT). PREDICT is the first MDM data commons in the country, and with the current set of engaged experts from multiple academic institutions, it is well poised to make a significant worldwide impact.

Methods

Origin

Prior to PREDICT, MDM research has remained siloed at individual institutions. This has resulted in limited sample sizes and has been a barrier to research advances. The UChicago MDM Registry (UCMDMR), established in 2008, is home to the largest collection of MDM cases in the Western Hemisphere and has demonstrated success at collecting comprehensive longitudinal participant data.11,12 Leveraging this extensive experience and the opportune institutional colocation with D4CG, the UCMDMR spearheaded the move to launch an MDM data commons (now called PREDICT).

With the guidance of D4CG, PREDICT was established to foster MDM research and collaboration. In April 2022, monthly meetings between D4CG experts and UCMDMR members were initiated, serving as brainstorming environments to refine ideas for organizational structure and data models. These foundational meetings laid the early groundwork for consortium leadership, data decision-making, technical infrastructure, and creating a data dictionary for uniform data capture among the collaborators.

The project was then introduced to groups outside of UChicago. The UCMDMR team emailed a project overview to other U.S.-based MDM research organizations. An introductory virtual meeting was held in December 2022. The meeting was attended by MDM experts from 15 large academic referral centers. Subsequently, a monthly virtual meeting cadence was established for 2023 onward.

Following the D4CG model (Figure 1), the early 2023 meetings established the organizational structure of PREDICT and overarching objectives. Subsequent meetings focused on establishing priority research questions and the development of a standardized data dictionary.

Figure 1.

Figure 1.

D4CG data commons model with current PREDICT milestones.

Abbreviations: D4CG, UChicago Data for the Common Good; PREDICT, Precision Diabetes Consortium.

Establishing a Data Model

The first step in harmonizing MDM data across wide-ranging data types and organizational protocols was the development of a consensus data dictionary to delineate a standardized format where data can be transformed to meet established standards. The initial data dictionary draft was modeled primarily off the data capture forms used by the UCMDMR. From there, several important challenges arose.

First, most sites did not have existing patient data repositories or Institutional Review Board approval allowing for systematic data capture and sharing. Most of these sites had known MDM patients listed in local electronic medical record (EMR) systems, but these data would need to be extracted from the EMR, harmonized to the data dictionary, de-identified and shared to the PREDICT data portal. Each site is responsible for following institutional policies and regulatory requirements necessary for local data collection and de-identified data sharing. Institutions approached this in a variety of ways, including a) approval for a consent waiver and retrospective EMR chart review and de-identified data sharing protocol, or b) obtaining informed consent, which contains language that informs the participant, and asks for their agreement, to share of de-identified data. After regulatory requirements were met, each site extracted and harmonized data using IT-facilitated EMR data pulls, manual chart review, REDCap data entry forms, or some combination of these methods.

A second challenge was the diversity of site data, which included a mix of EMR and participant-reported survey data. These data differences were accounted for in the data model by using inclusive data dictionary fields and data descriptors to guide end users.

A key factor driving data dictionary decisions was the research questions identified at the project outset as being a high priority for the data commons. These priority research questions were used to properly scope the data dictionary to the critical data elements necessary to address those questions. It was understood that additional data fields could be added at future version releases.

A first data dictionary iteration has been completed, which includes a relational database schema with over 75 data elements encompassing a wide variety of pertinent clinical and genetic variables. Simultaneously, the underlying technical infrastructure that will maintain the harmonized data set has been developed.

Given the labor-intensive nature of data harmonization, the current data dictionary serves as a template for data capture forms for future research groups joining the consortium. This ensures that newly collected data will conform to established consensus data standards, eliminating the need for additional data harmonization.

Governance

The governance structure underlying the workflow processes and decision-making rules of the consortium was modeled from previous D4CG established consortia in the Pediatric Cancer Data Commons (PCDC) and is shown in Figure 2.13,14

Figure 2.

Figure 2.

PREDICT governance structure.

Abbreviation: PREDICT, Precision Diabetes Consortium

Memorandum of understanding

A nonlegally binding Memorandum of Understanding (MOU) establishes the consortium and details the Executive Committee (EC) membership and responsibilities. All institutions with data to contribute are parties to the MOU. While the MOU is nonlegally binding, legal agreements are executed between the University of Chicago and data contributors and receivers of line-level data. PREDICT is not a legal entity and is not directly providing data and therefore it is not a party to these legal agreements.

Executive committee (EC)

Responsibilities of the EC described in the MOU include strategic planning, managing the data commons service provider, amending the MOU, approving membership, approving data commons access, approving data commons data contributions, and approving funding applications related to the consortium. The PREDICT EC is composed of one representative from each data contributor, at least two statisticians, at least two members-at-large, and one Chief Information Officer representing the data commons service provider. A consortium manager plans, organizes, and facilitates meetings, handles project requests, and acts as a liaison between the consortium and the D4CG.

Members of the EC make decisions and handle disputes by mutual, unanimous agreement and consensus. These responsibilities and the decision-making framework have bolstered trust among data contributors willing to bring their data into PREDICT, as they provided reassurance that the contributed data will be used responsibly and only for projects approved by the data contributors through their representation on the EC.

Data contributor agreement

To meet regulatory requirements, D4CG (through the University of Chicago) entered into data contributor agreements (DCAs) with each collaborating institution. The University of Chicago, as the data commons service provider, enters into this agreement directly with each data contributor. The DCA provides the mechanism for the data contributor to outline what data are being transferred to D4CG and any restrictions. The DCA also assures the contributor that D4CG will release the data only to authorized users.

Data access and project request forms

Once local data from each data contributor have been de-identified, harmonized, and combined into the data commons, policies and procedures are necessary to govern access to the data. Modeled from previous PCDC efforts and technologies 5 , a web-based cohort discovery tool is being developed that will be freely available to any user who registers to use it (Figure 3). The cohort discovery tool can be used to explore a limited amount of aggregate data to identify whether suitable cohorts of participant data are available for specific research projects. Line-level de-identified data can be accessed only through a project request and approval process.

Figure 3.

Figure 3.

D4CG PCDC data cohort explorer.

Abbreviations: D4CG, UChicago Data for the Common Good; PCDC, UChicago Pediatric Cancer Data Commons.

External investigator-initiated project requests

A process was needed to identify overlapping project requests from groups outside of PREDICT, ensure project feasibility, regulate access to the data, and ensure the data are handled responsibly. The PREDICT EC developed a project request form that can be completed and submitted for review by any investigator worldwide. Submitted requests will be reviewed and discussed by EC members. The consensus decision is then provided to the requesting investigator. Participants agreed that incorporation of PREDICT members with an intimate understanding of the clinical topic and research studies will greatly enhance the research conducted. A statistician with relevant expertise may also be nominated to facilitate data analysis.

Data use agreement

To meet regulatory requirements, D4CG (through the University of Chicago) enters into data use agreements (DUAs) with each investigator who will receive data for an approved project. The University of Chicago, as the data commons service provider, enters into this agreement directly with the data user and their institution. The DUA provides the mechanism for PREDICT to detail the terms of use of the data being transferred, limiting its use to the approved project. These agreements include any specific requirements from the DCA under which the data being provided were contributed.

Results

Remarkable progress has been achieved in the last 2 years. Ten institutions (Appendix A) have been actively engaged with the project on a volunteer basis, meeting monthly as a group and working individually to ensure internal protocols comply with the requirements necessary to support the project. We have adopted and signed the MOU and nominated the EC representatives. PREDICT has established a uniform data dictionary that harmonizes data fields across wide-ranging data types and local protocols. We estimate that approximately 3000 MDM cases will be pooled into the data commons from the initial data contributors. This data set will allow PREDICT to address currently unanswered and important research questions.

Precision Diabetes Consortium has already generated interest from additional research institutions who want to join the consortium. We aim to expand by adding additional institutions within the United States and internationally.

We will develop and launch a publicly available web-based data platform. This will allow any researcher to use our cohort discovery tool and other analysis tools, explore available data, and assess study feasibility. Interested researchers can then submit a project request form for review by the EC. The data platform will be modeled off the existing D4CG run Pediatric Cancer Data Commons (Figure 3). 10

The current data dictionary can provide a template for future research groups focused on MDM. The overall consortium and data dictionary structure may be a useful model for the diabetes community more generally.

Conclusions

Precision Diabetes Consortium has successfully established a multicenter MDM data commons that is reducing barriers to data sharing and laying the groundwork for advancing critical research. Considerable milestones have been achieved, including stakeholder buy-in, MOU signature, and data model development. We aim to expand the consortium both domestically and internationally and launch the cohort discovery tool. This process has highlighted how consensus can facilitate advances in clinical care, the conduct of future collaborative trials, and combination of new data sets. Other research areas that would benefit from access to harmonized and pooled data, particularly rare forms of common conditions, could utilize this approach.

Acknowledgments

The authors would like to acknowledge the following individuals for their assistance with this project: Colby Chase, MS, CGC, Brigid E. Gregg, MD, Elif A. Oral, MD, MSc, William H. Herman, MD, MPH, Brett McKinney, BS, Katharine Garvey, MD, MPH, Andrea Steck, MD, Maria J. Redondo, MD, PhD, MPH, Anh D. Nguyen, BS, Sara Cromer, MD, Evelyn Greaux, BS, CCRC, and Varinderpal Kaur, BA.

Appendix A

PREDICT Institution List

  • University of Chicago Kovler Diabetes Center.

  • Barbara Davis Center/University of Colorado.

  • Baylor College of Medicine.

  • Boston Children’s Hospital.

  • Indiana University School of Medicine.

  • Massachusetts General Hospital.

  • University of Maryland School of Medicine.

  • University of Michigan.

  • University of Nebraska.

  • Endeavor Health (Previously NorthShore University HealthSystem).

Footnotes

Abbreviations: DCA, data contributor agreement; DUA, data use agreement; EMR, electronic medical record; EC, executive committee; MOU, memorandum of understanding; MDM, monogenic diabetes; PREDICT, Precision Diabetes Consortium; D4CG, UChicago Data for the Common Good; UCMDMR, UChicago Monogenic Diabetes Registry; PCDC, UChicago Pediatric Cancer Data Commons.

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MSU is involved in a research collaboration between Novo Nordisk and the Broad Institute.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Gray Foundation.

ORCID iDs: Michael E. McCullough Inline graphic https://orcid.org/0009-0006-0470-4429

Lisa R. Letourneau-Freiberg Inline graphic https://orcid.org/0000-0001-9465-4870

Erin Cobry Inline graphic https://orcid.org/0000-0002-8494-1814

References


Articles from Journal of Diabetes Science and Technology are provided here courtesy of Diabetes Technology Society

RESOURCES