Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jun 30.
Published in final edited form as: J Thromb Haemost. 2015 Oct 20;13(11):2031–2040. doi: 10.1111/jth.13144

Complexity and diversity of F8 genetic variations in the 1000 genomes

J N Li *,, I G Carrero *,, J F Dong ‡,§, F L Yu *,†,
PMCID: PMC4928474  NIHMSID: NIHMS792295  PMID: 26383047

Summary

Background

Hemophilia A (HA) is an X-linked bleeding disorder caused by deleterious mutations in the coagulation factor VIII gene (F8). To date, F8 mutations have been documented predominantly in European subjects and in American subjects of European descent. Information on F8 variants in individuals of more diverse ethnic backgrounds is limited.

Objectives

To discover novel and rare F8 variants, and to characterize F8 variants in diverse population backgrounds.

Patients/methods

We analyzed 2535 subjects, including 26 different ethnicities, whose data were available from the 1000 Genomes Project (1000G) phase 3 dataset, for F8 variants and their potential functional impact.

Results

We identified 3030 single nucleotide variants, 31 short deletions and insertions (Indels) and a large, 497 kb, deletion. Among all variants, 86.4% were rare variants and 55.6% were novel. Eighteen variants previously associated with HA were found in our study. Most of these ‘HA variants’ were ethnic-specific with low allele frequency; however, one variant (p.M2257V) was present in 27% of African subjects. The p.E132D, p.T281A, p.A303V and p.D422H ‘HA variants’ were identified only in males. Twelve novel missense variants were predicted to be deleterious. The large deletion was discovered in eight female subjects without affecting F8 transcription and the transcription of genes on the X chromosome.

Conclusion

Characterizing F8 in the 1000G project highlighted the complexity of F8 variants and the importance of interrogating genetic variants on multiple ethnic backgrounds for associations with bleeding and thrombosis. The haplotype analysis and the orientation of duplicons that flank the large deletion suggested that the deletion was recurrent and originated by homologous recombination.

Keywords: deletion Xq28, ethnic groups, factor VIII, genetic variation, human genome project

Introduction

Hemophilia is an X-linked recessive genetic disorder adversely affecting blood coagulation. The most common form of the disease is hemophilia A (HA), which occurs in approximately 1 in 5000–10 000 males [1], and is caused by various types of pathological defects in the coagulation factor VIII gene (F8).

The F8 gene encodes coagulation factor VIII (FVIII). It contains 26 exons, spanning over 186 kb of DNA in the most distal band of the long arm of the X-chromosome (Xq28) [2]. FVIII plays an essential role in the coagulation cascade, where activated FVIII serves as a cofactor for coagulation FIXa, enabling it to activate FX. FVIII was previously thought to be synthesized in the hepatic sinusoidal cells, but recent studies have identified endothelial cells as the primary site of FVIII synthesis [3,4]. FVIII has a very short half-life due to proteolytic degradation in the circulation and its survival time is significantly prolonged through formation of a complex with the adhesive ligand von Willebrand factor.

To date, more than 2000 variants of F8 have been identified, corresponding to over 5000 individual cases, mostly in patients of European ancestry (http://www.factorviiidb.org). A detailed baseline survey of genetic variants in large numbers of non-diseased individuals from diverse ethnic backgrounds provides a critical foundation for understanding and interpreting the functional implications of genetic variants in the F8 gene with studies involving patients. The 1000 Genomes Project (1000G) presents an opportunity to expand our knowledge base of genetic variants in the F8 gene, especially among multiple ethnicities. Next-generation sequencing allows the detection of rare variants with minor allele frequencies (MAFs) as low as 0.03% (i.e. singletons), as well as multiple types of variants, including single-nucleotide variations (SNVs), short insertions or deletions (Indels) and structural variants (SVs) in F8.

We obtained the latest 1000G variants release (Phase 3 release). Functional annotation of SNVs and Indels was then performed on F8, followed by characterization of the loci using population genetics statistics (e.g. FST and variant conservation scores). The discovery of a large number of novel variants, especially some rare ones that can be putatively detrimental, could lead to new biomedical hypotheses. The genetic analysis of the 497 kb deletion found in the 1000G subjects shows the underlying molecular process driven by segmental duplications, which also accounts for inversions and duplications in F8 [5,6].

Materials and methods

The 1000G samples and variant datasets

We obtained F8 variants from the 1000G project (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130723_phase3_wg/shapeit2). The project has sequenced 2535 non-diseased subjects from 26 ethnic groups originating from five continents (Table S1): Europe (EUR), America (AMR), Africa (AFR), East Asia (EAS) and South Asia (SAS) (http://1000genomes.org). The project’s ethical framework requires that sample donors are non-vulnerable adults (age over 18) who are able to consent to participation in the project. RNA-seq data were obtained from the Genetic European Variation in Health and Disease (GEUVADIS) (ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experimant/GEUV/E-GEUV-1/processed/). The GEUVADIS has 421 samples that overlap with the 1000G project [7].

Genetic variation annotation and functional impact analysis

We applied an internal software package Cassandra v14.2.5 [8] to annotate F8 SNVs and Indels. The nomenclature of F8 variants is based on the recommendation of Goodeve et al. [9]. The reference NCBI human genome build 37 (GRCh37/hg19) was used. The F8 gene spans position 154 064 063 to 154 250 998 on chromosome X. The 5′ and 3′ untranslated regions (UTR) are 171 bp and 1806 bp, respectively. We also included 4217 bp upstream of the F8 gene to cover the alternative transcripts. In the alternative transcript 1 region, SNVs that were < 1 kb from the F8 5′ UTR were referred to as the upstream region, and SNVs that were > 1 kb from the F8 5′ UTR were referred to as intergenic variants. Variants found in the 1000G project but absent in dbSNP138 were considered as novel SNVs.

In silico criteria were employed for functional prioritization. Sorting intolerant from tolerant (SIFT) identifies critical amino acids by their conservation scores in a specific protein family [10]. Polymorphism Phenotyping V2 (PolyPhen-2) also uses a normalized cross-species conservation score and combines this with a variety of protein structural features when available [11]. SIFT and PolyPhen-2 use non-redundant protein databases. Both methods have high sensitivity but false-positive rates are uniformly high due to the difficulty of accurately modeling the sequence conservation. PolyPhen-2 achieves a true-positive prediction of 73% on HumVar (for Mendelian disease prediction) datasets with a false-positive rate of 20% [11]. Similarly, the true-positive rate and false-positive rate of SIFT are 69% and 13%, respectively [12]. The consistency of the two programs is approximately 56%. The low overlap rate arises mainly because of differences in the sequences and/or alignments used to identify evolutionary conserved amino acids [12]. In order to achieve optimum sensitivity (i.e. detect all variants that may affect an individual’s genetic susceptibility to hemophilia), we took the union of PolyPhen-2 and SIFT in this study. Genomic Evolutionary Rate Profiling (GERP++) was used to identify constrained elements in multiple-species alignments by quantifying substitution deficits, in order to examine the evolutionary conservation of the nucleotides. GERP++ rejected substitutions (RS) scores larger than 2.0 are typically regarded as the threshold for being evolutionarily conserved [13].

Population differentiation analysis

The pairwise fixation index (FST) was calculated for different continental groups in order to measure the degree of differentiation of genetic variants [14]. The FST was calculated separately for male and female subjects using vcftools [15].

Results

More than 50% of the F8 variants found in the 1000G project were novel

From the 1000G data release, with the sample size of 2535 individuals, we parsed out 2981 SNVs within the F8 gene and 49 SNVs (total 3030 SNVs) in the 4217 bp region upstream of the gene, coding for the alternative transcript 1. Among all SNVs observed, 2.81% (85/3030) were exonic, 94.55% (2865/3030) were intronic, 0.86% (26/3030) were in the 3′ UTR and 0.16% (5/3030) in 5′ UTR. In the alternative transcript 1 region, 31 SNVs were in the upstream region and 18 SNVs were intergenic (Table 1). The large sample size and diverse ethnicities of the 1000G project allowed us to discover different variant subtypes. In addition to SNVs and the large deletion, we also identified 31 intronic indels with sizes ranging from 1 to 12 nucleotides (data not shown). We also identified a 497 kb large deletion (chrX: 154 110 804–154 607 929) in three different populations: African, American and European (Table 1).

Table 1.

F8 variants in the 1000G subjects

Variants region Total Number of variants Shared (% with
MAF < 1%)

AFR
(n = 669)*
AMR
(n = 352)
EAS
(n = 515)
EUR
(n = 505)
SAS
(n = 494)
3′ UTR 26 8 9 8 6 7 2 (50)
5′ UTR 5 2 3 1 3 2 1 (0)
Intronic 2865 1292 756 706 636 747 106 (0.9)
Silent 29 10 5 9 6 6 1 (0)
Missense 56 18 7 15 8 15 0
Intergenic 18 8 3 4 2 5 0
Upstream 31 14 8 10 4 8 0
Indels§ 31 29 29 16 23 24 15 (6.7)
497-kb Deletion 1 1 1 0 1 0 0
Total 3062 1382 821 769 689 814 125 (2.4)

AFR, Africa; AMR, America; EAS, East Asia; EUR, Europe; SAS, South Asia.

*

The number of subjects in the population.

Variants discovered in all five continental populations.

Intronic represents SNVs.

§

All indels were observed in intronic regions.

The number of variants in each FVIII domain was examined, and more than 50% of exonic variants (43/85) were located in the B domain. When examining the variant density (i.e. the number of variants normalized by the length of domains) of the B domain, it was not the highest among all the domains (Table S2). The B domain is the largest domain and is only functional for the intracellular processing, transport and secretion of FVIII [16]. The large number of variants that accumulated in the B domain suggests that the B domain is more tolerant of the accumulation of variants.

The density of SNVs in the exons (mean = 60 per 5 kb) was lower than that in the introns (mean = 80 per 5 kb), reflecting a stronger selection constraint on exonic sequences compared with intronic sequences. We observed a 10 kb low SNV density region within F8 intron 22 (Fig. 1). This 10 kb window encompasses a 9.5 kb low-copy repeat region, intron 22 homologous region 1 (int22h-1; chrX: 154 109 091–154 118 601), which poses challenges for accurate variant calling [6,17]. Therefore, post hoc filters have been applied by the 1000G project that masked these regions from the final VCF release [17]. One of the breakpoints of the large deletion was in the int22h-1 (Fig. 1).

Fig. 1.

Fig. 1

Single-nucleotide variation (SNV) density plot for F8. All SNVs (2981 SNVs within F8 are shown) are plotted onto the F8 gene (green). The SNV density (black) is calculated by counting the number of variants in the 5 kb sliding window. The F8 exons are represented in navy blue and the introns are in pink, the 5′ UTR and 3′ UTR are labeled in red (5′ UTR is 171 bp and masked by the first exon). The int22h-1 region and the part of the large deletion in F8 are labeled with arrows.

We found that 86.38% of the total variants (2645/3062) were rare with an MAF < 1%, and 49.0% (1499/3062) were singletons (variant alleles present only once in the 1000G subjects, Fig 2A). By definition, the singletons are only found in one ethnic group, and therefore considered as ‘ethnicity specific’. Furthermore, we observed 1686 novel variants (i.e. not found in the dbSNP138 dataset), accounting for 55.6% (1686/3062) of the total variants. When comparing MAFs of novel and known variants, 98.8% (1663/1686) of novel variants had an MAF < 1%, while only 70.4% (983/1376) known variants had an MAF of < 1% (Fig. 2B). Among these novel rare variants (MAF < 1%), 68.9% (1146/1663) were singletons (Fig. 2B). When comparing the number of variants carried by female subjects and male subjects, we did not detect a gender difference in terms of the number of variants per X chromosome (Fig. S1A), but the number of exonic variants per X chromosome was higher in male subjects (Fig. S1B). The high MAF (> 10%) variants in male subjects (e.g. p.M2257V has a higher MAF in men than in women, as shown in Table 3) contributed to the excess number of exonic SNVs per X chromosome.

Fig. 2.

Fig. 2

(A) MAF distributions. Inserted image illustrates the detailed distribution of the allele with MAF < 1%. (B) MAF distributions of F8 novel and known single-nucleotide variations (SNVs): we defined novel F8 SNVs as those that were not present in the dbSNP138 dataset, which is before the entry of the 1000G phase 3 data. The inserted image illustrates the detailed distribution of the novel variants with MAF below 1%.

Table 3.

Known F8 mutations associated with hemophilia A

Amino
acid
change
Exon Domain Minor allele frequency (%)

Overall AFR AMR EUR EAS SAS






G* E* Total E* Total E* Total E* Total E* Total E*
p.E132D 4 A1 0.03 0.01 0 0 0 0 0 0 0 0 0 0.42 0.12 0.02 0 0 0 0 0 0 0 0
p.H330R 7 A1 0.03 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.43 0 0.28 0.02
p.A303V 7 A1 0.03 N/A 0 0 0 N/A 0 0.58 0.19 N/A 0 0 0 N/A 0 0 0 N/A 0 0 0 N/A
p.T281A 7 A1 0.03 0.002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.38 0.14 0.01
p.D422H 8 A2 0.03 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.38 0.14 0.10
p.A362= 8 a1 1.10 0.34 4.30 3.44 4.03 3.7 0 0 0 0.2 0 0 0 0.003 0 0 0 0.01 0.22 0 0.14 0.04
p.E340K 8 A1 0.08 0.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.43 0.38 0.41 0.16
p.E1642G 14 B 0.03 0.002 0 0 0 0 0 0 0 0 0.19 0 0.12 0.002 0 0 0 0.01 0 0 0 0
p.V1511I 14 B 0.05 0.02 0 0 0 0 0 0 0 0 0 0 0 0 0.38 0 0.26 0.23 0 0 0 0
p.R1126W 14 B 0.10 0.09 0 0 0 0 0 0 0 0 0 0 0 0.003 0 0 0 0 0.65 0.76 0.69 0.67
p.E1057K 14 B 0.08 0.06 0 0 0 0 0 0 0 0 0 0 0 0.001 0.57 0 0.38 0.78 0 0 0 0.02
p.P947R 14 B 0.03 0.003 0 0 0 0 0 0 0 0 0 0 0 0 0.19 0 0.13 0.03 0 0 0 0.009
p.D845E 14 B 0.05 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0.19 0.40 0.26 0.11 0 0 0 0
p.R795G 14 B 0.16 0.04 0 0 0 0 0.28 0 0.19 0.01 0 0 0 0 0.94 0.80 0.9 0.47 0 0 0 0
p.N1824T 16 A3 0.03 0.0008 0 0 0 0 0 0 0 0 0.19 0 0.12 0.002 0 0 0 0 0 0 0 0
p.I1901= 17 A3 0.18 0.01 0 0 0 0 0 0 0 0.01 0 0 0 0 0 0 0 0 0.22 2.28 0.97 0.85
p.M2257V 25 C2 7.74 2.20 26.22 30.94 27.7 24.6 2.49 2.34 2.44 0.89 0 0.42 0.12 0.04 0 0 0 0 0 0 0 0.04
p.V2242M 25 C2 0.05 0.04 0 0 0 0 0 0 0 0 0 0 0 0 0.19 0.40 0.26 0.58 0 0 0 0

Based on the Factor VIII Variant Database (http://www.factorviii.org).

E*, MAF from ExAC Browser; G*, overall MAF in this study; N/A, not found in ExAC browser; ♀, female; ♂, male.

When the 3062 variants were stratified into five continental groups (Africa, America, East Asia, Europe and South Asia), the total number of genetic variations was higher in African subjects compared with non-African subjects (Table 1). The proportion of variants shared among the five groups varied greatly by variant type, ranging from none in missense SNVs and the large deletion to 20.0% (1/5) in 5′ UTR SNVs. Most of the shared variants had an allele frequency higher than 1% (Table 1). In contrast, 48.3% (15/31) of Indels were shared among subjects from five continents (Table 1). This high sharing rate could be attributed to a more stringent method that filtered variants with an MAF of < 0.5%. For the variants with an MAF > 5%, the patterns of linkage disequilibrium were similar in four continents, but not in East Asia, whose variants were grouped in a single block. This is possibly due to a genetic drift or a stronger bottleneck that reduced the number of common variants shared with other populations (Fig. S2).

Twelve novel missense variants were predicted to be deleterious by computational methods

The functional prioritization of genetic variation is one of the central themes within the current biomedical community [18]. In this study, we have applied well-established principles to systematically prioritize the functional impact of variants. These principles include (i) in silico functional predictors such as SIFT and PolyPhen-2; (ii) evolutionary conservation in genomic sequences; (iii) the congruence between allele frequencies and the phenotypic prevalence; and (iv) genetic diversity among different ethnicities.

Overall, 18 out of 56 missense variants were predicted to be deleterious by at least one of the two computational methods using cut-offs of < 0.05 for SIFT and > 0.447 for PolyPhen-2 [10,11] (Table S3). Twelve out of the 18 variants were not found in the HA database (http://www.factorviii-db.org). Among the 12 novel variants, three were predicted to be deleterious by SIFT, nine by PolyPhen-2, and one by both methods (Tables 2 and S3). All 12 variants were rare with MAFs < 1%, and the MAFs were consistent with the ExAC dataset (http://exac.broadinstitute.org) (Table 2). The majority of these predicted variants (7/12; p.T94I, p.Q324K, p.D371N, p.Q462R, p.Q1129R, p.P1560R and p.G2102S) were singletons (MAF = 0.03% in the 1000G) and were only found in female subjects in the heterozygous state (Table 2), suggesting that they are plausible de novo mutations.

Table 2.

Deleterious F8 mutations predicted by SIFT, PolyPhen-2

Amino acid change Exon Domain GERP++ Minor allele frequency (%)

AFR AMR EUR EAS SAS Overall






E* E* E* E* E* G* E*
p.T94I 3 A1 5.03 0 0 0 0 0 0 0 0 0 0 0 0 0.22 0 0.006 0.03 8e-04
p.Q324K 7 A1 3.75 0 0 0 0 0 0 0 0 0 0 0 0 0.22 0 0.06 0.03 0.008
p.D371N 8 a1 3.3 0 0 0 0 0 0 0 0 0 0 0 0 0.22 0 0.01 0.03 0.001
p.Q462R 9 A2 3.99 0.14 0 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0.03 0.001
p.R503H 10 A2 3.14 2.01 1.56 1.80 0 0.58 0.04 0 0 0 0 0 0 0 0 0.01 0.51 0.16
p.P1560R* 14 B 3.49 0 0 0 0 0 0 0 0 0 0 0 0 0.22 0 0.13 0.03 0.02
p.V1478G* 14 B −3.89 0 0 N/A 0 0 N/A 0 0 N/A 0 0 N/A 0 0.38 N/A 0.04 N/A
p.Q1129R* 14 B 1.19 0 0 N/A 0 0 N/A 0 0 N/A 0 0 N/A 0.22 0 N/A 0.03 N/A
p.T818I 14 B 4.84 0.72 0.63 0.36 0 0 0 0 0 0 0 0 0 0 0 0 0.18 0.03
p.P1765T 15 A3 4.73 0.29 0.31 0.13 0 0 0 0 0 0 0 0 0 0 0 0 0.08 0.01
p.G2102S 22 C1 3.63 0.14 0 0.01 0 0 0 0 0 0.001 0 0 0 0 0 0.01 0.03 0.003
p.T2256S 25 C2 4.11 0 0 0 0.83 0.58 0.43 0 0 0 0 0 0 0 0 0 0.10 0.004

♀, female; ♂, male; E*, MAF from ExAC browser; G*, overall MAF from 1000 Genome Project Phase3 data; N/A, not found in ExAC browser.

*

SIFT predicted.

PolyPhen-2 predicted.

SIFT and PolyPhen-2 both predicted.

Evolutionary conservation analysis using GERP++ identified 110 evolutionarily constrained variants [13] (Table S4, also see Table S5 for the full list of the RS scores of the 110 variants). Among them, 26 were non-synonymous, eight were synonymous and 73 were intronic (Table S4). Overall, 10 variants overlapped with the variants that were predicted to be deleterious by SIFT or PolyPhen-2 (Tables 2 and S3). However, they were not the top ranking ones in the RS scores (Table S5), presumably due to the algorithmic differences among the computational methods.

The FST of non-synonymous SNVs is very low (< 0.02), suggesting a low degree of genetic differentiation (the accumulation of differences in allelic frequencies between completely or partially isolated populations) amongst subjects from five continents (Table S6). It is worth noting that many non-synonymous variants were rare and unique in some populations, which reduced the FST values [19]. We further stratified the data based on gender and observed that male subjects had a smaller FST value than female subjects (Table S6). This is expected because variations of the X chromosome demonstrate a stronger penetrance in men than in women due to hemizygousity and presumably a higher negative selection pressure [20].

Characterization of known F8 variants using the 1000G data

In the 1000G data, we found 18 variants that have been identified in HA case studies (Table 3, also see Table S7). Most variants have an MAF less than 0.5%, consistent with them being disease-causing mutations. Nine variants (p.E132D, p.T281A, p.A303V, p.D422H, p.E340K, p.R795G, p.D845E, p.R1126W and p.V2242M) were found in hemizygous states in male subjects. Four variants (p.E132D, p.T281A, p.A303V and p.D422H) were singletons (MAF = 0.03%) and ethnicity specific (Table 3). p.E132D, p.D845E, p.E340K, p.R1126W and p.V2242M were identified in ethnic groups consistent with the patients’ ancestries that were initially reported [21] (Table S7). Four variants (p.E132D, p.A362=, p.R795G and p.M2257V) have been reported in F8 variant screening studies carried out in non-hemophilic individuals [22,23].

In our study, p.M2257V was found in three populations (AFR, AMR and EUR), with MAFs > 1% in African and American subjects (Table 3). In African subjects, p.M2257V had a high MAF of 26.22% for female subjects and an even higher MAF (30.94%) for male subjects (Table 3). We further examined whether p.M2257V affects F8 mRNA transcription using the RNA-seq data from the GEUVADIS project [7], and we did not observe significant changes in F8 RNA levels in African subjects or the European male subjects (Fig. S3), suggesting that p.M2257V does not affect F8 transcription. p.M2257V was first found in 1995 in two Brazilian patients with moderate HA [24], and was re-identified in a European patient with severe HA in 1998 [25] (Table S7). In both studies, the authors were concerned that p.M2257V (previously described as p.M2238V) was a polymorphic variant that originated from Africa. Viel et al. identified p.M2257V in a F8 variant scanning study using 137 unrelated non-hemophilic individuals [23]. Together with our results, this indicates that p.M2257V is likely to be a benign variant in the African population.

Two synonymous variants in the F8 mutation database, p.A362= and p.I1901=, were found in the 1000G subjects with MAFs of 1.10% and 0.18%, respectively (Table 3). To examine whether the synonymous variants affect F8 transcription or mRNA stability, we analyzed the RNA levels of two p.A362= carriers with available RNA-seq data (RNA-seq data were unavailable for carriers of p.I1901=), and no defect of F8 RNA expression was observed (Fig. S4), suggesting that p.A362= is a benign variant.

The 497 kb deletion appears to be recurrent and does not affect RNA levels in the carriers

The large deletion that spans F8 exons 1–22 and several upstream genes, including FUNC2, CMC4, MTCP1, BRCC3, CLIC2, H2AFB3, RAB39B, VBP1 and F8A1 (http://genome.ucsc.edu), is situated between two well-characterized segmental duplications (int22h-1 and int22h-2), which also flank the known duplication and one of the two different inversions [6,26] (Figs. 3A and S6).

Fig. 3.

Fig. 3

(A) Schematic representation of structural variations mediated by int22h duplicons. Snapshot of the Xq28 region displayed using the UCSC genome browser. The ~0.5 Mb (154 110 804–154 607 929) large deletion is indicated with a blue bar. The recurrent ~0.5 Mb duplication is represented as a red bar and two types of inversion are represented as green bars. Genes shown at the bottom are encompassed by the deletion; non-protein-coding genes are hidden. Segmental Dups: the int22h region track shows the int22h-1, -1, -3 regions and their orientation (arrowheads). This track is modified from the UCSC Segmental Dups track, which indicates highly similar segments of DNA (> 1 kb and > 90% similarity; light to dark orange segments have > 99% similarity). (B) The haplotype backgrounds of the 497 kb deletion carriers. The haplotypes cover 50 kb upstream and 50 kb downstream of the large deletion (homozygous variants were removed). Orange blocks represent the allele that is the same as the reference genome. Blue blocks represent the allele that is different from the reference genome. The deletion is marked in red.

In our dataset, the large deletion was observed in eight female individuals in a heterozygous state (MAF = 0.21%). The eight deletion carriers came from three different populations: European, African and American (Table S8), suggesting that the deletion mutagenesis was not strongly associated with any specific ethnic or geographic origin. Haplotype background analysis showed that only three individuals (two from EUR and one from AFR) grouped together, while the rest of the subjects exhibited very different haplotype patterns (Fig. 3B), indicating that the deletion did not originate from a common ancestral haplotype.

We also examined whether this 497 kb deletion affects mRNA transcription. RNA-seq data were available for four deletion carriers: one from Finnish in Finland (FIN), one from Yoruba in Ibadan, Nigeria (YRI) and two from Toscanni in Italia (TSI). We compared the average mRNA level of F8 in the four deletion carriers with that of normal subjects from TSI, FIN and YRI ethnicities (overall 145 individuals from the three ethnicities) and no significant difference was observed in F8 mRNA levels (P = 0.15) (Fig. S5A). Because this large deletion encompasses several F8 upstream genes, we further compared the average mRNA levels of all genes on the X chromosome. Similarly, there is no significant difference between the deletion carriers and non-carriers (P = 0.48) (Fig. S5B).

Discussion

In this study, we observed 18 F8 mutations previously associated with HA in the 1000G project (Tables 3 and S5). The presence of these ‘F8 mutations’ in the 1000G subjects, especially in male subjects, raises the question of whether these variants alter cellular and biochemical properties of F8 consistent with the HA phenotype. Possible explanations include (i) they are pathogenic only when they are confined in ethnic-specific haplotypes or have additional environmental factors. This possibility was not only supported by this study but also other studies. For instance, cystic fibrosis (CF) is an autosomal recessive disorder of epithelial ion transport caused by mutations in the CF transmembrane conductance regulator gene (CFTR). Although CF has been observed in all races, it is predominantly a disease of those of northern European ancestry. For North Americans who do not have a family history of CF and are of northern European ancestry, the empirical risk factor for them to be a carrier is approximately 1 in 29 [27]. Usher syndrome type 2, a recessive disorder characterized by combined deafness and blindness, is caused by deleterious variants in the USH2A gene. A variant (c.2138G>C; p.G713R) of USH2A, previously reported as being causal, had a high frequency in the YRI population [28]. In addition, African-American HA patients experience a higher incidence of neutralizing anti-FVIII antibodies (‘inhibitors’) compared with Caucasian patients. Devi Gunasekera et al. recently reported that African American subjects with a F8 intron-22 inversion showed a ~3× higher incidence of inhibitor than Caucasian subjects with the same mutation [29]. (ii) The clinical phenotype of some variants was mild or even classified as within the range of normal health variation, so they do not have high penetrance and may need other casual variants in trans or in cis to alter the phenotypes. For example, two carriers of the p.V2242M variant also had a splice site mutation in intron 24 [30] (and http://www.factorviii-db.org). Among the three severe HA patients with the p.R1126W variant, one had the recurrent intron 1 inversion (IVS1) and the other two carried nonsense variants (p.Arg1966*) [31]. (iii) Alternatively, these variants might not actually be pathogenic at all, especially those variants found in male subjects in hemizygous states.

The 497 kb large deletion is likely to be caused by non-allelic homologous recombination of int22h-1 and int22h-2 segmental duplications. In the same region, three different SVs (one duplication and two inversions) have been reported (Fig. 3A). Those SVs were all originated by ectopic homologous recombination of three well-defined segmental duplications: int22h-1, int22h-2 and int22h-3 (Fig. S6) [6,26]. Given the fact that both breakpoints of this 497 kb deletion are within segmental duplications that are in the same orientation (Fig. 3A), it is likely that this large deletion was generated by inter- or intra-chromosomal ectopic recombination (Fig. S6). El-Hattab et al. not only reported the duplication, but also identified the reciprocal deletion in a girl and her mother, and the mother carrying the deletion had two spontaneous abortions [6]. In addition to F8, the homozygous or hemizygous large deletion also results in the loss of a number of other genes, and five of them, MTCP1, BRCC3, VBP1, RAB39B and CLIC2, are well characterized on Mendelian Inheritance in Man (OMIM) (http://www.ncbi.nlm.nih.gov/omim/). Therefore, it is possible that the identified deletion could have been segregated in the family and is potentially lethal in male fetuses. This may explain why only female carriers were found in our dataset and previous studies. The possible mechanism for the lack of clinical symptoms in female carriers is the preferential inactivation of the X chromosome harboring the F8 geneinclusive deletion [32]. This hypothesis is supported by our RNA-seq data analysis, which showed no significant difference in the F8 mRNA transcription level between the subjects with or without the large deletion. In addition, El-Hattab et al. reported that the reciprocal deletion carrier, the girl, and her mother, both exhibit completely skewed chromosome X-inactivation (XCI) and no clinical signs of HA or other disease [6].

Our study provides a baseline that can be used to identify additional pathogenic mutations in the F8 gene that can be used for genetic tests and may also help in understanding the development of autoantibodies in HA patients who receive FVIII. In addition, the characterization of heterozygous causative HA variants in female subjects can contribute to genetic counseling and prenatal diagnosis [33].

Supplementary Material

Supplemental

Acknowledgments

We thank C. R. Beck and J. R. Bartanus for comments on earlier versions of this manuscript, M. N. Bainbridge for sharing the Cassandra software, and the participants of the 1000 Genome project for their work and contributions.

Footnotes

Addendum

J. N. Li and I. G. Carrero designed the study, analyzed the data and wrote the manuscript. F. L Yu and J. F. Dong developed the hypothesis, designed the study and wrote the manuscript.

Disclosure of Conflict of Interests

The authors state that they have no conflict of interest.

Supporting Information

Additional Supporting Information may be found in the online version of this article:

Table S1. Number of samples from 26 ethnic groups from five continents available from the 1000G project

Table S2. Variant density of different domains of FVIII

Table S3. Chromosomal location, predicted impact and frequency of F8 missense variants for subjects from five continents. Blank space: variants were not found in this population

Table S4. Summary of estimated evolutionary constraint variants with GERP++ RS score > 2.0

Table S5. Chromosomal locations and RS scores of the 110 variants

Table S6. FST of non-synonymous variants among the five continents, split by gender

Table S7. Description and sources of F8 variants previously reported in hemophilia A case studies

Table S8. The ethnicities of the 497 kb large deletion carriers

Fig. S1. The number of F8 variants for the female X chromosome compared with the male X chromosome.

Fig. S2. F8 LD plots for subjects from five continents.

Fig. S3. The F8 transcription was not affected by p.M2257V.

Fig. S4. The p.A362= did not affect F8 mRNA level.

Fig. S5. The F8 transcription levels of the subjects with the large deletion and subjects without the large deletion.

Fig. S6. Previously proposed mechanism causing genomic structural variation between int22h copies [1].

References

  • 1.Graw J, Brackmann H-H, Oldenburg J, Schneppenheim R, Spannagl M, Schwaab R. Haemophilia A: from mutation analysis to new therapies. Nat Rev Genet. 2005;6:488–501. doi: 10.1038/nrg1617. [DOI] [PubMed] [Google Scholar]
  • 2.Toole JJ, Knopf JL, Wozney JM, Sultzman LA, Buecker JL, Pittman DD, Kaufman RJ, Brown E, Shoemaker C, Orr EC. Molecular cloning of a cDNA encoding human antihaemophilic factor. Nature. 1984;312:342–347. doi: 10.1038/312342a0. [DOI] [PubMed] [Google Scholar]
  • 3.Everett LA, Cleuren ACA, Khoriaty RN, Ginsburg D. Murine coagulation factor VIII is synthesized in endothelial cells. Blood. 2014;123:3697–3705. doi: 10.1182/blood-2014-02-554501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fahs SA, Hille MT, Shi Q, Weiler H, Montgomery RR. A conditional knockout mouse model reveals endothelial cells as the principal and possibly exclusive source of plasma factor VIII. Blood. 2014;123:3706–3713. doi: 10.1182/blood-2014-02-555151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Andersen EF, Baldwin EE, Ellingwood S, Smith R, Lamb AN. Xq28 duplication overlapping the int22 h-1/int22 h-2 region and including RAB39B and CLIC2 in a family with intellectual and developmental disability. Am J Med Genet A. 2014;164A:1795–1801. doi: 10.1002/ajmg.a.36524. [DOI] [PubMed] [Google Scholar]
  • 6.El-Hattab AW, Fang P, Jin W, Hughes JR, Gibson JB, Patel GS, Grange DK, Manwaring LP, Patel A, Stankiewicz P, Cheung SW. Int22 h-1/int22 h-2-mediated Xq28 rearrangements: intellectual disability associated with duplications and in utero male lethality with deletions. J Med Genet. 2011;48:840–850. doi: 10.1136/jmedgenet-2011-100125. [DOI] [PubMed] [Google Scholar]
  • 7.Lappalainen T, Sammeth M, Friedländer MR, ‘t Hoen PAC, Monlong J, Rivas MA, Gonzàlez-Porta M, Kurbatova N, Griebel T, Ferreira PG, Barann M, Wieland T, Greger L, van Iterson M, Almlöf J, Ribeca P, Pulyakhina I, Esser D, Giger T, Tikhonov A. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bainbridge MN, Hu H, Muzny DM, Musante L, Lupski JR, Graham BH, Chen W, Gripp KW, Jenny K, Wienker TF, Yang Y, Sutton VR, Gibbs RA, Ropers HH. De novo truncating mutations in ASXL3 are associated with a novel clinical phenotype with similarities to Bohring-Opitz syndrome. Genome Med. 2013;5:11. doi: 10.1186/gm415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Goodeve AC, Reitsma PH, McVey JH. Nomenclature of genetic variants in hemostasis. J Thromb Haemost. 2011;9:852–855. doi: 10.1111/j.1538-7836.2011.04191.x. [DOI] [PubMed] [Google Scholar]
  • 10.Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Adzhubei Ia, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Flanagan SE, Patch A-M, Ellard S. Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations. Genet Test Mol Biomarkers. 2010;14:533–537. doi: 10.1089/gtmb.2010.0036. [DOI] [PubMed] [Google Scholar]
  • 13.Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++ PLoS Comput Biol. 2010;6:e1001025. doi: 10.1371/journal.pcbi.1001025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cockerham CC, Weir BS. Covariances of relatives stemming from a population undergoing mixed self and random mating. Biometrics. 1984;40:157–164. [PubMed] [Google Scholar]
  • 15.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Pipe SW. Functional roles of the factor VIII B domain. Haemophilia. 2009;15:1187–1196. doi: 10.1111/j.1365-2516.2009.02026.x. [DOI] [PubMed] [Google Scholar]
  • 17.Li H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics. 2014;30:2843–2851. doi: 10.1093/bioinformatics/btu356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.MacArthur DG, Manolio Ta, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, Adams DR, Altman RB, Antonarakis SE, Ashley Ea, Barrett JC, Biesecker LG, Conrad DF, Cooper GM, Cox NJ, Daly MJ, Gerstein MB, Goldstein DB, Hirschhorn JN, Leal SM. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508:469–476. doi: 10.1038/nature13127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bhatia G, Patterson N, Sankararaman S, Price AL. Estimating and interpreting FST: the impact of rare variants. Genome Res. 2013;23:1514–1521. doi: 10.1101/gr.154831.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Schaffner SF. The X chromosome in population genetics. Nat Rev Genet. 2004;5:43–51. doi: 10.1038/nrg1247. [DOI] [PubMed] [Google Scholar]
  • 21.Repessé Y, Slaoui M, Ferrandiz D, Gautier P, Costa C, Costa JM, Lavergne JM, Borel-Derlon A. Factor VIII (FVIII) gene mutations in 120 patients with hemophilia A: detection of 26 novel mutations and correlation with FVIII inhibitor development. J Thromb Haemost. 2007;5:1469–1476. doi: 10.1111/j.1538-7836.2007.02591.x. [DOI] [PubMed] [Google Scholar]
  • 22.Tarpey PS, Smith R, Pleasance E, Whibley A, Edkins S, Hardy C, O’Meara S, Latimer C, Dicks E, Menzies A, Stephens P, Blow M, Greenman C, Xue Y, Tyler-Smith C, Thompson D, Gray K, Andrews J, Barthorpe S, Buck G. A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat Genet. 2009;41:535–543. doi: 10.1038/ng.367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Viel KR, Machiah DK, Warren DM, Khachidze M, Buil A, Fernstrom K, Souto JC, Peralta JM, Smith T, Blangero J, Porter S, Warren ST, Fontcuberta J, Soria JM, Flanders WD, Almasy L, Howard TE. A sequence variation scan of the coagulation factor VIII (FVIII) structural gene and associations with plasma FVIII activity levels. Blood. 2007;109:3713–3724. doi: 10.1182/blood-2006-06-026104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Arruda VR, Pieneman WC, Reitsma PH, Deutz-Terlouw PP, Annichino-Bizzacchi JM, Briët E, Costa FF. Eleven novel mutations in the factor VIII gene from Brazilian hemophilia A patients. Blood. 1995;86:3015–3020. [PubMed] [Google Scholar]
  • 25.Williams IJ, Abuzenadah A, Winship PR, Preston FE, Dolan G, Wright J, Peake IR, Goodeve AC. Precise carrier diagnosis in families with haemophilia A: use of conformation sensitive gel electrophoresis for mutation screening and polymorphism analysis. Thromb Haemost. 1998;79:723–726. [PubMed] [Google Scholar]
  • 26.Bagnall RD, Giannelli F, Green PM. Int22 h-related inversions causing hemophilia A: a novel insight into their origin and a new more discriminant PCR test for their detection. J Thromb Haemost. 2006;4:591–598. doi: 10.1111/j.1538-7836.2006.01840.x. [DOI] [PubMed] [Google Scholar]
  • 27.Ferec C, Cutting GR. Assessing the disease-liability of mutations in CFTR. Cold Spring Harb Perspect Med. 2012;2:a009480. doi: 10.1101/cshperspect.a009480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, Shaw K, Stenson PD, Cooper DN, Tyler-Smith C. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet. 2012;91:1022–1032. doi: 10.1016/j.ajhg.2012.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Gunasekera D, Ettinger RA, Nakaya Fletcher S, James EA, Liu M, Barrett JC, Withycombe J, Matthews DC, Epstein MS, Hughes RJ, Pratt KP. Factor VIII gene variants and inhibitor risk in African American hemophilia A patients. Blood. 2015;126:895–904. doi: 10.1182/blood-2014-09-599365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Lin S-Y, Su Y-N, Hung C-C, Tsay W, Chiou S-S, Chang C-T, Ho H-N, Lee C-N. Mutation spectrum of 122 hemophilia A families from Taiwanese population by LD-PCR, DHPLC, multiplex PCR and evaluating the clinical application of HRM. BMC Med Genet. 2008;9:53. doi: 10.1186/1471-2350-9-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jayandharan G, Shaji RV, Baidya S, Nair SC, Chandy M, Srivastava A. Identification of factor VIII gene mutations in 101 patients with haemophilia A: mutation analysis by inversion screening and multiplex PCR and CSGE and molecular modelling of 10 novel missense substitutions. Haemophilia. 2005;11:481–491. doi: 10.1111/j.1365-2516.2005.01121.x. [DOI] [PubMed] [Google Scholar]
  • 32.Pegoraro E, Whitaker J, Mowery-Rushton P, Surti U, Lanasa M, Hoffman EP. Familial skewed X inactivation: a molecular trait associated with high spontaneous-abortion rate maps to Xq28. Am J Hum Genet. 1997;61:160–170. doi: 10.1086/513901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.De Brasi C, El-Maarri O, Perry DJ, Oldenburg J, Pezeshkpoor B, Goodeve A. Genetic testing in bleeding disorders. Haemophilia. 2014;20:54–58. doi: 10.1111/hae.12409. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental

RESOURCES