Skip to main content
. 2024 Jan 26;25:115. doi: 10.1186/s12864-023-09935-9

Table 4.

Ability of LUSTR to estimate allele fraction by in silico mixture of samples

Loci Original Genotypea Expectation LUSTR estimated allelic fraction

Father (32.8% of mixture)

Ashkenazim Trio library 1

Son (67.2% of mixture)

Chinese Trio MGI library 1

(520837211 pair-ends)

ATN1 (CAG)

12:7045880-938

0/0 0/+4

0 (67%)

+4 (33%)

0 (74 ± 26%)

 + 4 (26 ± 26%)

***

ATXN1 (TGC)

6:16327865-955

0/+1 -1/-1

-1 (67%)

0 (17%)

 +1 (17%)

-1 (57 ± 26%)

0 (43 ± 26%)

-

ATXN2 (GCT)

12:112036754-823

-1/+7 -1/-1

-1 (83%)

+7 (17%)

-1 (61 ± 48%)

 + 7 (21 ± 51%)

*

ATXN3 (CTG)

14:92537355-96

 +6.7/+9 0/0

0 (67%)

+  6.7 (17%)

+9 (17%)

0 (68 ± 11%)

 + 6.7 (21 ± 11%)

 + 9 (11 ± 11%)

***

ATXN7 (GCA)

3:63898361-423

0/0 0/+2

0 (67%)

+2 (33%)

0 (100 ± 0%) -

ATXN10 (ATTCT)

22:46191235-304

0/+2 0/+7

0 (50%)

+2 (17%)

+7 (33%)

0 (40 ± 20%)

 + 2 (36 ± 20%)

 + 7 (24 ± 24%)

*

C9ORF72 (GCCCCG)

9:27573483-544

-1/-1 -1/-1 -1 (100%) -1 (100 ± 0%) ***

CACNA1A (CTG)

19:13318673-712

-2/-1 0/0

-2 (17%)

-1 (17%)

0 (67%)

-2 (32 ± 48%)

-1 (33 ± 47%)

0 (34 ± 47%)

*

CBL (CGG)

11:119077000-33

0/+5 0/0

0 (83%)

+ 5 (17%)

0 (76 ± 24%)

 + 5 (24 ± 24%)

***

DMPK (CAG)

19:46273463-524

-9/-7 -7/-4

-9 (17%)

-7 (50%)

-4 (33%)

-9 (17 ± 9%)

-7 (51 ± 10%)

-4 (32 ± 10%)

***

HTT (CAG)

4:3076604-67

-2/-2 -1/-1

-2 (33%)

-1 (67%)

-2 (30 ± 23%)

-1 (70 ± 23%)

***

JPH3 (GCT)

16:87637889-935

0/+2 0/ + 2

0 (50%)

 + 2 (50%)

0 (35 ± 8%)

 + 2 (65 ± 8%)

*

PPP2R2B (GCT)

5:146258291-322

0/0  + 3/ + 6

0 (33%)

+ 3 (33%)

+ 6 (33%)

0 (17 ± 10%)

 + 3 (40 ± 11%)

 + 6 (43 ± 12%)

*
Loci Original Genotypea Expectation LUSTR estimated allelic fraction

Father (8.9% of mixture)

Ashkenazim Trio library 1

Son (91.1% of mixture)

Chinese Trio MGI library 1

Mixture 1

(384221009 pair-ends)

Mixture 2

(384225464 pair-ends)

Mixture 3

(384223036 pair-ends)

ATN1 (CAG)

12:7045880-938

0/0 0/ + 4

0 (55%)

 + 4 (45%)

0 (63 ± 37%)

 + 4 (37 ± 37%)

0 (67 ± 33%)

 + 4 (33 ± 33%)

0 (63 ± 37%)

 + 4 (37 ± 37%)

***

ATXN1 (TGC)

6:16327865-955

0/ + 1 -1/-1

-1 (90%)

0 (5%)

 + 1 (5%)

-1 (100 ± 0%)

-1 (77 ± 34%)

0 (23±34%) (lq)

-1 (77 ± 34%)

0 (23 ± 34%)

-

ATXN2 (GCT)

12:112036754-823

-1/ + 7 -1/-1

-1 (95%)

 + 7 (5%)

-1 (78 ± 52%) -1 (78 ± 52%) -1 (73 ± 73%) -

ATXN3 (CTG)

14:92537355-96

 + 6.7/ + 9 0/0

0 (90%)

 + 6.7 (5%)

 + 9 (5%)

0 (86 ± 13%)

 + 6.7 (9 ± 13%)

 + 9 (5 ± 14%)

0 (90 ± 14%)

 + 7 (5 ± 13%)

 + 9 (5 ± 14%)

0 (90 ± 14%)

 +6.7 (5±13%) (lq)

 + 9 (5 ± 14%)

***

ATXN7 (GCA)

3:63898361-423

0/0 0/ + 2

0 (55%)

 + 2 (45%)

0 (100 ± 0%) 0 (100 ± 0%) 0 (100 ± 0%) -

ATXN10 (ATTCT)

22:46191235-304

0/ + 2 0/ + 7

0 (50%)

 + 2 (5%)

 + 7 (45%)

0 (53 ± 47%)

 + 7 (47 ± 47%)

0 (53 ± 47%)

 + 7 (47 ± 47%)

0 (60 ± 40%)

 + 7 (40 ± 40%)

-

C9ORF72 (GCCCCG)

9:27573483-544

-1/-1 -1/-1 -1 (100%) -1 (100 ± 0%) -1 (100 ± 0%) -1 (100 ± 0%) ***

CACNA1A (CTG)

19:13318673-712

-2/-1 0/0

-2 (5%)

-1 (5%)

0 (90%)

0 (100 ± 0%)

-2 (49 ± 51%)

0 (51 ± 51%)

0 (100 ± 0%) -

CBL (CGG)

11:119077000-33

0/ + 5 0/0

0 (95%)

 + 5 (5%)

0 (87 ± 35%)

 + 5 (13 ± 35%)

0 (87 ± 35%)

 + 5 (13 ± 35%)

0 (89 ± 32%)

 + 5 (11 ± 32%)

***

DMPK (CAG)

19:46273463-524

-9/-7 -7/-4

-9 (5%)

-7 (50%)

-4 (45%)

-9 (11 ± 11%)

-7 (49 ± 12%)

-4 (41 ± 12%)

-7 (54 ± 14%)

-4 (46 ± 14%)

-7 (58 ± 12%)

-4 (42 ± 12%)

***

HTT (CAG)

4:3076604-67

-2/-2 -1/-1

-2 (10%)

-1 (90%)

-1 (100 ± 0%) -1 (100 ± 0%)

-2 (10 ± 29%) (lq)

-1 (90 ± 29%)

*

JPH3 (GCT)

16:87637889-935

0/ + 2 0/+2

0 (50%)

 + 2 (50%)

0 (39 ± 10%)

 + 2 (61 ± 10%)

0 (33 ± 10%)

 + 2 (67 ± 10%)

0 (30 ± 10%)

 + 2 (70 ± 10%)

*

PPP2R2B (GCT)

5:146258291-322

0/0 +3/+6

0 (10%)

 + 3 (45%)

 + 6 (45%)

0 (7 ± 12%)

 + 3 (44 ± 13%)

 + 6 (48 ± 13%)

0 (11 ± 11%)

 + 3 (43 ± 12%)

 + 6 (47 ± 13%)

0 (4 ± 12%)

 + 3 (46 ± 13%)

 + 6 (50 ± 13%)

***

Two in silico mixture sets applying different sample proportions were processed and tested by LUSTR. Mixing was repeated three runs to ensure that non-dominant reads source were represented in the second test for extreme low fraction alleles. Libraries were picked from the father in Ashkenazim trio and the son in Chinese trio, both sequenced by MGISEQ platform (Supplementary Table 1). The list of tested STRs was identical to the one used in the test for GIAB libraries (Tables 2 and 3 and Supplementary Table 1). Name, repeat unit, and location in genome (build 37) of each STR are shown in the first column. The expectations of STR alleles and their fractions in the in silico mixtures (column 4) were calculated based on the original genotypes of the samples and their mixing proportions (column 2 and 3). The estimation results by LUSTR were indicated in either *** (matching expectation), * (mismatching fraction estimation, > 10%) or—(loss of allele). Alleles called but deemed low quality by LUSTR were marked with “lq”. Low quality calls are likely due to low representation of minor alleles in the mixture

aThe original genotypes of the two samples were determined by both GIAB calls and LUSTR estimations (Supplementary Table 1), with assumption of germline patterns as either homozygous or heterozygous (i.e., no mosaicism at targeted loci)