Skip to main content
. 2019 Aug 21;5(8):eaaw7195. doi: 10.1126/sciadv.aaw7195

Fig. 1. Workflow overview of creating and mining the FMR1-informed biobank.

Fig. 1

Starting with recruiting 20,353 PMRP participants, 19,996 individuals were genotyped. We identified premutation carriers and controls and matched them on year of birth and duration of receiving care from Marshfield Clinic. The diagnostic codes were used to examine whether the overall health profile of premutation carriers is different than controls. To filter possible noise and error in the EHR data, we applied the rule of 2 and limited our dataset to health conditions that were observed in more than five participants. We applied random forest to create a model representing the health conditions differentiating the two groups. Further examination of the model showed that premutation carriers suffer from a higher burden of disease throughout the life span compared with the controls for differentiating conditions. In a separate set of analysis, we used PheWAS to identify individual clinical conditions that are primary phenotypes of premutation. The resulting phenotypes are unconfounded by concerns about one’s own genetic status, stressful parenting, or clinical ascertainment bias.