Skip to main content
. 2022 Jan 13;11:e70382. doi: 10.7554/eLife.70382

Figure 1. Schematic summary of the automatic mitochondrial copy (AutoMitoC) pipeline.

The AutoMitoC pipeline is comprised of four major steps: (i) preprocessing, (ii) background correction, (iii) detection of probe cross-hybridization, and (iv) final derivation of mitochondrial DNA copy number (mtDNA-CN) estimates. First, preprocessing is simplified by restricting analysis of autosomal variants to those that have low minor allele frequency ( <0.01) and low genotype missingness ( <0.05). For probes passing quality control, MT and autosomal log2ratio (L2R) values undergo an initial correction for guanine cytosine (GC) waves using the method by Diskin et al., 2008. Samples exhibiting high genomic waviness post GC-correction (L2R SD >0.35) are removed. Second, background correction consists of performing principal component analysis of the autosomal probe L2R values and finding the top k principal components (PCs) that correspond to the ‘elbow’ of the scree plot. In our case, ~70% variance in autosomal L2R values was explained by the top k PCs in both UKbiobank and INTERSTROKE datasets. GC-corrected MT and L2R values are then further adjusted for the top autosomal PCs (representing technical background noise) by taking the residuals of the association between the L2R values versus the k autosomal PCs. Third, we derive a ‘clean’ set of autosomal and MT probes without signs of off-target probe cross-hybridization by empirically testing the GC-corrected and background-corrected L2R values for association with either the sample medians of off-target genome L2R values or self-reported gender (to capture off-target hybridization to sex chromosomes). Fourth, using the ‘clean’ probeset, we repeat the autosomal background correction, extract the top MT PC as a crude measure of mtDNA-CN, change the sign of the MT PC according to association of the MT PC with known predictors of mtDNA-CN that are commonly reported (sex or age), and last, standardize the MT PC values as the final AutoMitoC estimate.

Figure 1.

Figure 1—figure supplement 1. Intuition behind differentiation of genotypes and determination of mitochondrial DNA copy number (mtDNA-CN).

Figure 1—figure supplement 1.

Contrast in the intensities of mitochondrial probes ‘X’ and ‘Y’ discriminate genotypes (left). Intracluster variation in signal intensities may reflect differences in mtDNA-CN (right). (Adapted from Lane, 2014).
Figure 1—figure supplement 2. Overview of the MitoPipeline (Source: http://genvisis.org/MitoPipeline/) (Lane, 2014).

Figure 1—figure supplement 2.

Figure 1—figure supplement 3. Minor allele frequency (MAF)-stratified analyses demonstrating utility of rare vs common autosomal variants for signal normalization.

Figure 1—figure supplement 3.

(A) Density plot illustrating the correlation (R2) between autosomal probe log2ratio (L2R) values and median MT L2R stratified by common (MAF >0.01) and rare (MAF <0.01) variant status. (B) Cumulative variance explained by inclusion of top eigenvectors for sets of common (M = 86,677) and rare (M = 79,611) autosomal probe sets.
Figure 1—figure supplement 4. Distribution of log10 transformed coefficients of determination (r2) from the association between autosomal probe intensities and median mitochondrial (MT) signal with (blue) or without (red) correction for background noise (i.e. 120 autosomal principal components [PCs]).

Figure 1—figure supplement 4.

The dashed vertical line represents the threshold corresponding to ‘moderate’ correlation (|r| > 0.05 or r2 >0.0025), which is used to remove outlying probes that are associated with MT signal. Without correction for top PCs, most autosomal probes exhibit some correlation with MT signal.
Figure 1—figure supplement 5. Validation of automatic mitochondrial copy in an ethnically diverse cohort with qPCR measurements.

Figure 1—figure supplement 5.

Both qPCR and array-based mitochondrial DNA copy number estimates are presented as standardized units (mean = 0; SD = 1). The sample consisted of 2431 Europeans, 1704 Latin Americans, 542 Africans, 471 South East Asians, 186 South Asians, and 360 participants of other ancestry. Correlations between array and qPCR estimates were comparable for European (r = 0.60; p=2.7 × 10–238), Latin American (r = 0.70; p=3.9 × 10–251), African (R = 0.66; p=1.8 × 10–68), South East Asian (r = 0.59; p=6.2 × 10–46), South Asian (r = 0.53; p=4.2 × 10–15), and other (r = 0.72; p=5.4 × 10–59) ethnic groups. The blue line indicates the linear trendline and the surrounding shaded region indicates the 95% CI for the trendline.
Figure 1—figure supplement 6. Bland Altman plots illustrating the extent of agreement between array and qPCR measurements.

Figure 1—figure supplement 6.

The black solid line indicates perfect agreement. The dashed blue line indicates the mean difference (or bias) between estimates. The horizontal red line corresponds to the 95% upper and lower limits of agreement (U/L LOA) for the observed data. The dashed black lines indicate the 95% U/L LOA that is expected under the null for two unrelated variables.