Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Oct 4;109(11):1986–1997. doi: 10.1016/j.ajhg.2022.09.009

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2022 The Authors

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

MagicalRsq workflow

MagicalRsq starts from “training dataset array data” (which are data used for imputation among training individuals) and performs imputation using these data, which gives us standard Rsq and estimated MAF for each marker, in the training dataset. Then we calculate the true R² by comparing imputed dosages with truth genotypes (established by additional genotype data in the training set). Combining external MAF and alternative allele count (AC), as well as population genetics summary statistics, with the above three metrics (i.e., standard Rsq, estimated MAF, and true R²), we train MagicalRsq models using the XGBoost method where we build supervised models to predict true R² from all the other features. We then proceed to the testing dataset where we follow the same imputation workflow starting again from array genotype data and obtaining estimated MAF and standard Rsq after imputation. We then calculate MagicalRsq in the testing dataset by plugging in the predictor features into the MagicalRsq models built from the training dataset. Finally, we evaluate the performance of MagicalRsq (and Rsq) by comparing with true R² in the testing dataset. Yellow highlights represent all the instruments specific for the training dataset, light blue highlights represent the instruments specific for the testing dataset, green highlights represent external information used in both training and testing, and red rectangles represent statistics used during final evaluation and comparison of MagicalRsq and standard Rsq, using true R² as the gold standard.