Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Apr 5:2024.04.03.587948. [Version 1] doi: 10.1101/2024.04.03.587948

Learning Gaussian Graphical Models from Correlated Data

Zeyuan Song, Sophia Gunn, Stefano Monti, Gina Marie Peloso, Ching-Ti Liu, Kathryn Lunetta, Paola Sebastiani
PMCID: PMC11014549  PMID: 38617340

Abstract

Gaussian Graphical Models (GGM) have been widely used in biomedical research to explore complex relationships between many variables. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a Bootstrap algorithm to infer GGM from correlated data. We use extensive simulations of correlated data from family-based studies to show that the Bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from a family-based study known as the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well in this real example.

Full Text Availability

The license terms selected by the author(s) for this preprint version do not permit archiving in PMC. The full text is available from the preprint server.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES