Abstract
Introduction
Esophageal atresia with or without tracheoesophageal fistula (EA/TEF) occurs approximately 1 in 3.500 live births representing the most common malformation of the upper digestive tract. Only half a century ago, EA/TEF was fatal among affected newborns suggesting that the steady birth prevalence might in parts be due to mutational de novo events in genes involved in foregut development.
Methods
To identify mutational de novo events in EA/TEF patients, we surveyed the exome of 30 case-parent trios. Identified and confirmed de novo variants were prioritized using in silico prediction tools. To investigate the embryonic role of genes harboring prioritized de novo variants we performed targeted analysis of mouse transcriptome data of esophageal tissue obtained at the embryonic day (E) E8.5, E12.5, and postnatal.
Results
In total we prioritized 14 novel de novo variants in 14 different genes (APOL2, EEF1D, CHD7, FANCB, GGT6, KIAA0556, NFX1, NPR2, PIGC, SLC5A2, TANC2, TRPS1, UBA3, and ZFHX3) and eight rare de novo variants in eight additional genes (CELSR1, CLP1, GPR133, HPS3, MTA3, PLEC, STAB1, and PPIP5K2). Through personal communication during the project, we identified an additional EA/TEF case-parent trio with a rare de novo variant in ZFHX3. In silico prediction analysis of the identified variants and comparative analysis of mouse transcriptome data of esophageal tissue obtained at E8.5, E12.5, and postnatal prioritized CHD7, TRPS1, and ZFHX3 as EA/TEF candidate genes. Re-sequencing of ZFHX3 in additional 192 EA/TEF patients did not identify further putative EA/TEF-associated variants.
Conclusion
Our study suggests that rare mutational de novo events in genes involved in foregut development contribute to the development of EA/TEF.
Introduction
Esophageal atresia with or without tracheoesophageal fistula (EA/TEF) occur approximately 1 in 3.000 to 3.500 live births representing the most common malformation of the upper digestive tract [1; 2; 3; 4]. According to the “European network of population-based registries for the epidemiological surveillance of congenital anomalies (EUROCAT) EA/TEF account for 1% of all birth defects in Europe every year (https://eu-rd-platform.jrc.ec.europa.eu/eurocat). Hence, with 5.075 million babies born in the EU in 2017, 1.237 babies have been born with EA/TEF.
In about 40–50% of cases, EA/TEF occurs within the context of additional anomalies mostly belonging to the VATER/VACTERL association (OMIM #192350) spectrum. This acronym refers to the rare, nonrandom co-occurrence of the following component features (CFs): vertebral defects (V), anorectal malformations (A), cardiac defects (C), tracheoesophageal fistula with or without esophageal atresia (TE), renal malformations (R), and limb defects (L) [5]. Only half a century ago, EA/TEF was fatal among affected newborns suggesting that the steady birth prevalence might in parts be due to mutational de novo events in genes involved in foregut development. Support for this hypothesis comes from early reports of chromosomal de novo aberrations present in 6–10% of syndromic EA/TEF cases [6]. Furthermore, using copy number variation (CNV) analysis in 375 EA/TEF patients we identified eight rare CNVs in six patients, all of which occurred de novo, including one CNV previously associated with EA/TEF [7]. Hence, 1.55% of isolated EA/TEF patients and 1.62% of patients with additional congenital anomalies carried de novo CNVs. Moreover, several monogenic EA/TEF associated syndromes are caused by smaller de novo changes comprising single nucleotides or small indels e.g. N-MYC in Feingold syndrome (OMIM #164280), GLI2 in Pallister-Hall syndrome (OMIM #146510), CHD7 in CHARGE syndrome (OMIM #214800), and SOX2 in AEG syndrome (OMIM #206900) [8; 9; 10; 11].
To further explore the involvement of small genetic de novo events in the etiology of EA/TEF, we profiled 30 case-parents trios using exome sequencing (ES). Prior to ES chromosomal microarray analysis was negative in all cases [7; 12]. All confirmed de novo variants were prioritized using in silico prediction tools. To investigate the embryonic role of genes harboring prioritized de novo variants we performed targeted analysis of mouse transcriptome data of esophageal tissue obtained at embryonic day (E) E8.5, E12.5, and postnatal.
Materials and methods
Patients and DNA isolation
In 2011, the authors JS and HR founded the scientific network “great” (genetic risk for esophageal atresia; www.great-konsortium.de). The “great network” was founded in order to initiate a nationwide investigation into the genetic causes of EA/TEF. Prior to the commencement of recruitment, the network partners generated a unique standardized case report form (CRF). The CRF comprises an epidemiological questionnaire and a clinical assessment battery. The epidemiological questionnaire is based on: (i) the National Birth Defect Prevention Study questionnaire of the U.S. Centers of Disease Control and Prevention (www.nbdpn.org); and (ii) the questionnaire of the European Surveillance of Congenital Malformations (EUROCAT) network (www.eurocat-network.eu). The clinical assessment battery comprises the classification system of the EA/TEF phenotype according to Gross (1953), and the ICD10 coding with the British Pediatric Association one digit extension (www.eurocat-network.eu/content/EUROCAT-Guide-1.3.pdf) for classification of additional congenital anomalies. The great cohort is being recruited with the support of pediatric surgical departments across Germany, and the German self-help organization for patients and families with EA/TEF (KEKS e.V.; www.keks.org). KEKS e.V. is the largest self-help organization for EA/TEF families in Europe, and supports both the ongoing great investigations and the present proposal.
The here described study fulfilled the requirement of the Declaration of Helsinki and ethical approval was obtained from the local ethic committee of the Medical Faculty of Bonn (Lfd. Nr. 073/12). Every participating family provided written informed consent. The 30 here reported case-parent trios as well as the EA/TEF cohort for resequencing of ZFHX3, were recruited through the efforts of the scientific network “great”. In 14 of the 30 case-parent trios, EA/TEF occurred isolated/nonsyndromic. In the remaining case-parent trios EA/TEF co-occurred with additional phenotypic features (syndromic cases) mostly belonging to the VATER/VACTERL spectrum (S1 Table). From each case-parent trio, EDTA blood samples were obtained. Genomic DNA was isolated using the Chemagic DNA Blood Kit special (Chemagen, Baesweiler, Germany). Through personal communication we identified another patient with EA/TEF as part of his VATER/VACTERL association (patient 750_501).
Exome Sequencing (ES) and data analysis
Exome capture was performed using the NimbleGen SeqCap EZ Human Exome Library v2.0 enrichment kit and sequenced with an Illumina paired end 2x100 bp sequencing (protocol v1.2). Primary data was filtered according to signal purity by the Illumina Realtime Analysis (RTA) software v1.8. Subsequently, reads were mapped to the human genome reference build hg19 using the bwa-aln [13] alignment algorithm. GATK v1.6 [14] was used to mark duplicated reads, for local realignment around short insertions and deletions, to recalibrate the base quality scores and to call SNVs (incorporating variants quality score recalibration) and short indels [15]. Scripts developed in-house at the Cologne Center for Genomics (unpublished) were used to incorporate allele frequencies reported by the ESP6500 database [Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/)] and to detect changes in the protein structure. Acceptor and donor splice site mutations were analyzed with a Maximum Entropy model [16]. De novo variant calling was performed with the program DeNovoGear (v.0.5.1) [17] The Varbank GUI (unpublished, https://varbank.ccg.uni-koeln.de) was used to filter for high quality (coverage>15; quality>25), rare (MAF<0.005), de novo (posterior probability of a de novo mutation = PP_DNM>0.5) variants predicted to alter protein structure or splicing. We also filtered against an in-house database containing all variants from 511 exomes from epilepsy patients to exclude pipeline-related artefacts (MAF<0.004). Variants with MAF<0.004 that have been described to occur homozygous in gnomAD were also excluded. Finally, we further excluded all variants with a MAF≥0.0003 since the EA/TEF birth prevalence has been reported to be 1 in 3.500 live births (frequency of ≈ 0.0003). Hence, (full penetrant) monoallelic variants with a MAF≥0.0003 cannot account for the occurrence of EA/TEF.
Variant validation and classification
Variants identified by ES were validated by using polymerase chain reaction (PCR). Automated sequence analysis was carried out using standard procedures. In brief, primers were directed to all variants observed and the resultant PCR products were subjected to direct automated BigDye Terminator sequencing (3130XL Genetic Analyzer, AppliedBiosystems, FosterCity, California, USA). Both strands from each amplicon were sequenced for the presence of these variants in the respective case-parent trio. In order to further prioritize the identified and confirmed de novo variants, we analyzed them using ten different in silico prediction tools which are encountered in dbNSFP v3.0 (https://sites.google.com/site/jpopgen/dbNSFP): SIFT, LRT, MutationTaster, Mutation Assessor, FATHMM, PROVEAN, MetaSVM, MetaLR, fathmm-MKL coding and CADD [18; 19] (details about these prediction tools are given as supporting information S1 Data).
Re-sequencing of ZFHX3 in EA/TEF patients
All three human ZFHX3 protein coding transcripts (ENST00000641206.2, ENST00000268489.10, and ENST00000397992.5) listed in ‘ensembl database’ (www.ensembl.org/ Ensembl Release 98 (September 2019)) were sequenced in 192 unrelated EA/TEF patients. PCR-amplified DNA products (primer sequences available upon request) were subjected to sequencing using a 3130XL Genetic Analyzer (Applied Biosystems, Foster City, USA).
Structural modeling and in-silico analysis of ZFHX3 protein variants
The secondary structure prediction of human OCTs protein sequences was done using PSIPRED in I-Tasser. Three-dimensional protein structural models for ZFHX3 were built using SWISS-MODEL (https://swissmodel.expasy.org/). Since, Swiss model cannot handle large protein sequence, for the prediction of ZFHX3 de novo changes p.Pro534Arg and p.Ala2126Val we trimmed the sequence of 60 amino acids upstream and 20 downstream of the mutated site. The sequence was subjected to swiss-model based modeling. The structural comparison between wild-type and mutant variant was done in Chimera after superimposing the structure of mutant onto the wild structure using SuperPose using default parameters (superpose.wishartlab.com).
RNA isolation and mRNA library preparation of mouse embryonic esophageal tissue
All animals used in this study were anesthetized by Isoflurane and killed by cervical dislocation. The animals that were used in this study are documented and their usage reported to the local authorities Regierungspräsidium Darmstadt). Embryos from pregnant females of the C57Bl6J strain were harvested at embryonic days (E) E8.5, E12.5, and postnatal. The embryos of the E8.5 litter were determined to be of the developmental Theiler stage 13 (TS13) and the E12.5 embryos TS21. From E8.5 embryos, the pharyngeal pouch containing endoderm and adjacent mesoderm tissue was surgically isolated and transferred into QIAzol®. Multiple embryos were pooled for each embryonal timepoint. For the E8.5 stage we pooled biopsies from 5 embryos to prepare the RNA and for the E12.5 and neonates we pooled two each for RNA preparation. From E12.5 and postnatal embryos, the distinct structures of the esophagus and the trachea was surgically isolated, combined and transferred into QIAzol®. RNA was isolated from these tissues with the RNEasy Mini Plus Kit (Qiagen) according to the manufacture’s protocols. The transcriptome profile was assessed by RNA-Sequencing with the 3’-mRNASeq Library Preparation Kit from Lexogen, (Lexogen, Vienna, Austria). This protocol generates for each transcript only one single-end strand specific fragment for sequencing at the 3’-end of poly(A)-RNA. Libraries were quality checked on a TapeStation2200 (Agilent, Santa Clara, USA). The sequencing was performed on a HiSeq 2500 (Illumina, San Diego, USA) with two technical replicates of sample.http://www.bioinformatics.babraham.ac.uk/projects/
Transcriptome analysis
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2654802/After demultiplexing with bcl2fastq (Illumina, San Diego, USA), FastQC v0.11.8 (http://www.bioinformatics.babraham.ac.uk/projects/) was used for quality control of FASTQ files. Read alignment was performed using STAR_2.6.1d [20] against the primary assembly of murine genome reference build GRCm38 according to the manufacturer’s analysis protocol. Read counting was also performed with STAR (“quantMode GeneCounts”) using the Ensembl gene annotation (Release 97). Quality metrics were gathered with multiQC [21].
Statistical analyses were performed with the programming language R (R Core Team, 2019) and the DESeq2 R package [22] Technical replicates where combined and differential gene expression considering the embryonal timepoint was assessed with the DESeq2’s Wald test as described in Love et al. [23] We required an alpha level of 0.01 and a minimum log2 foldchange of log2(1.5). Cumulative expression distributions were calculated for rlog normalized expression values for each timepoint separately. We identified the mouse homologous genes of our human genes of interest using the biomaRt R package [24].
Results
ES analysis
ES analysis identified 25 apparent de novo variants in 25 genes in 18 unrelated case-parent trios. Confirmation of these variants using Sanger sequencing validated all of them and confirmed 23 as being de novo in patients. 14 of these variants were novel according to the “Genome Aggregation Database (gnomAD; https://gnomad.broadinstitute.org/; November 2019)”. In addition, eight of the confirmed de novo variants were found to be rare with a minor allele frequency (MAF) between 0.000003–0.00003 (Table 1). One confirmed de novo variant in TPP2 (c.1534G>A, p.Val512Ile, NM_003291.2, rs73578896) has been previously reported in gnomAD with a MAF of 0.002 (303/275.928) and was therefore filtered out. Through personal communication during the project (Dr. Julia Höfele, Institute of Human Genetics, Klinikum Rechts der Isar, Technical University of Munich, School of Medicine, Munich, Germany) we identified an additional EA/TEF case-parent trio (750_501) in which the patient carries a rare de novo variant in ZFHX3 (c.6377C>T, p.Ala2126Val, allele frequency 0.000019) (marked with an asterisks in Table 1).
Table 1. Prioritized de novo variants.
Ext-Code | Phenotype | Variant | HGNC | Refseq | gnomAD (MAF) | MutCDNA | MutProt | Mm | Gg | Dr | Xt |
---|---|---|---|---|---|---|---|---|---|---|---|
4_501 | VATER/VACTERL-like association | 1 | EEF1D | NM_032378.4 | c.874C>T | p.Arg292* | K | E | K | ||
2 | CELSR1 | NM_014246.1 | 3/282,594 (0.00001) | c.4357G>A | p.Val1453Ile | V | I | I | |||
21_501 | nonsyndromic | 3 | HPS3 | NM_032383.3 | 10/282,776 (0.00004) | c.1189C>T | p.Arg397Trp | H | R | R | R |
27_501 | nonsyndromic | 4 | PIGC | NM_153747.1 | c.716C>T | p.Ala239Val | A | G | A | ||
35_501 | VATER/VACTERL-like association | 5 | NFX1 | NM_002504.4 | c.1723G>A | p.Val575Met | V | V | |||
36_501 | nonsyndromic | 6 | ZFHX3 | NM_006885.3 | c.1601C>G | p.Pro534Arg | P | P | P | N | |
41_501 | VATER/VACTERL-like association | 7 | MTA3 | NM_020744.2 | 1/237,600 (0.000004) | c.393C>A | p.Phe131Leu | F | F | ||
46_501 | nonsyndromic | 8 | FANCB | NM_152633.2 | c.782G>A | p.Arg261Gln | R | Q | S | ||
9 | PLEC | NM_201379.1 | 17/272,690 (0.00006) | c.6704G>A | p.Arg2394His | R | R | K | R | ||
63_501 | VATER/VACTERL-like association | 10 | PPIP5K2 | NM_015216.2 | 2/247,732 (0.000008) | c.686G>A | p.Arg229Gln | R | R | R | R |
88_501 | nonsyndromic | 11 | CLP1 | NM_006831.2 | 1/251,486 (0.000003) | c.814C>A | p.His272Asn | H | H | H | H |
12 | GPR133 | NM_198827.3 | 6/282,534 (0.00002) | c.1033G>A | p.Ala345Thr | A | |||||
13 | SLC5A2 | NM_003041.3 | c.644T>C | p.Leu215Pro | L | L | L | ||||
90_501 | VATER/VACTERL-like association | 14 | KIAA0556 | NM_015202.2 | c.3730C>T | p.His1244Tyr | H | H | H | H | |
141_501 | VATER/VACTERL-like association | 15 | STAB1 | NM_015136.2 | 9/278,948 (0.00003) | c.6145C>T | p.Arg2049Cys | R | S | ||
154_501 | VATER/VACTERL association | 16 | GGT6 | NM_153338.2 | c.1045A>G | p.Ser349Gly | S | ||||
167_501 | nonsyndromic | 17 | CHD7 | NM_017780.3 | c.4187C>G | p.Ala1396Gly | A | A | A | ||
172_501 | VATER/VACTERL-like association | 18 | NPR2 | NM_003995.3 | c.952C>G | p.Arg318Gly | R | K | T | ||
174_501 | nonsyndromic | 19 | UBA3 | NM_198195.1 | c.1088C>T | p.Ser363Phe | S | S | T | P | |
181_501 | nonsyndromic | 20 | TANC2 | NM_025185.3 | c.2357C>T | p.Pro786Leu | P | P | P | ||
288_501 | VATER/VACTERL association | 21 | TRPS1 | NM_014112.2 | c.1630C>T | p.Arg544* | R | R | R | ||
22 | APOL2 | NM_145637.1 | c.319G>C | p.Glu107Gln | D | ||||||
750_501* | VATER/VACTERL association | 23 | ZFHX3 | NM_006885.3 | 5/250,880 (0.00002) | c.6377C>T | p.Ala2126Val | A | T | T | A |
Annotations marked in bold red represent: “known disease genes” involved in the formation of congenital malformations, variants with truncating consequence, variants in highly conserved regions of the protein, or novel variants (not found in (n.f.i.), gnomAD (MAF)).
Among the novel variants (i) five reside within previously described disease genes (CHD7, FANCB, TRPS1, KIAA0556, and ZFHX3), (ii) two variants were truncating (c.874C>T (p.Arg292*) in EEF1D and c.1630C>T (p.Arg544*) in TRPS1), and (iii) three amino acid changes (p.Arg229Gln in PPIP5K2; p.His272Asn in CLP1; p.His1244Ytyr in KIAA0556) reside in highly conserved regions of the respective protein (Table 1). Of the novel de novo variants constituting missense variants four amino acid changes (p.Pro534Arg in ZFHX3; p.Phe131Leu in MTA3; p.Leu215Pro in SLC5A2; p.Ala1396Gly in CHD7) were called deleterious by at least seven out of nine in silico prediction tools (written in bold in Table 2). Similarly, the rare de novo variant in ZFHX3 (c.6377C>T, p.Ala2126Val, allele frequency 0.000019) found in the additional case-parent trio (750_501), was also called deleterious by seven out of nine in silico prediction tools (written in bold in Table 2).
Table 2. Classification of de novo variants using in silico prediction programs.
Ext-Code | Variant | HGNC | MutCDNA | gnomAD (MAF) | SIFT | LRT | Mutation Taster | Mutation Assessor | FATHMM | PROVEAN | Meta SVM | Meta LR | Fathmm MKL_coding | CADD Score |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4_501 | 1 | EEF1D | c.874C>T | - | N | A | - | - | - | - | - | N | 28,5 | |
2 | CELSR1 | c.4357G>A | 3/282,594 (0.00001) | T | U | N | N | T | N | T | T | D | 15,3 | |
21_501 | 3 | HPS3 | c.1189C>T | 10/282,776 (0.00004) | D | D | A | M | T | D | D | T | D | 35 |
27_501 | 4 | PIGC | c.716C>T | T | D | D | M | T | N | T | T | D | 11,6 | |
35_501 | 5 | NFX1 | c.1723G>A | D | N | N | L | T | N | T | T | D | 20,8 | |
36_501 | 6 | ZFHX3 | c.1601C>G | D | N | D | L | T | N | T | T | D | 22,3 | |
41_501 | 7 | MTA3 | c.393C>A | 1/237,600 (0.000004) | D | D | D | H | D | D | D | D | D | 26 |
46_501 | 8 | FANCB | c.782G>A | T | N | N | N | T | N | T | T | N | 7,2 | |
9 | PLEC | c.6704G>A | 17/272,690 (0.00006) | D | U | D | N | T | N | T | T | D | 26,5 | |
63_501 | 10 | PPIP5K2 | c.686G>A | 2/247,732 (0.000008) | D | D | D | M | T | D | T | T | D | 34 |
88_501 | 11 | CLP1 | c.814C>A | 1/251,486 (0.000003) | T | D | D | L | T | N | T | T | D | 17,4 |
12 | GPR133 | c.1033G>A | 6/282,534 (0.00002) | T | N | N | N | T | N | T | T | N | 0,016 | |
13 | SLC5A2 | c.644T>C | D | D | D | H | D | D | D | D | D | 27,6 | ||
90_501 | 14 | KIAA0556 | c.3730C>T | D | N | N | L | T | N | T | T | N | 1,9 | |
141_501 | 15 | STAB1 | c.6145C>T | 9/278,948 (0.00003) | T | N | N | M | T | D | T | T | N | 24,1 |
154_501 | 16 | GGT6 | c.1045A>G | D | N | N | N | T | D | T | T | N | 5,9 | |
167_501 | 17 | CHD7 | c.4187C>G | D | D | D | H | T | D | D | D | D | 33 | |
172_501 | 18 | NPR2 | c.952C>G | T | N | D | L | D | D | T | T | D | 22,2 | |
174_501 | 19 | UBA3 | c.1088C>T | D | D | D | L | T | D | T | T | D | 27,8 | |
181_501 | 20 | TANC2 | c.2357C>T | T | D | D | L | T | D | T | T | D | 22,7 | |
288_501 | 21 | TRPS1 | c.1630C>T | - | D | A | - | - | - | - | - | D | 36 | |
22 | APOL2 | c.319G>C | T | N | N | N | T | N | T | T | N | 0,004 | ||
750_501* | 23 | ZFHX3 | c.6377C>T | 5/250,880 (0.00002) | D | D | D | L | T | D | D | D | D | 19,2 |
*A: automatic disease causing; D: disease causing; H: high functional; L: non-functional; M: medium functional; N: neutral; T: tolerant. Annotations marked in bold red represent: “variants that are classified to be disease causing by at least eight out of ten in silico prediction programs (except for truncating variants) used by dbNSFP v3.0 (https://sites.google.com/site/jpopgen/dbNSFP)”.
One of the novel de novo variants (PIGC, c.716C>T, CADD score 11,6) and three of the rare de novo variants (CELSR1, c.4357G>A, CADD score 15.3; CLP1, c.814C>A, CADD score 17.4; ZFHX3, c.6377C>T, CADD score 19.2) reached CADD scores between 10 and 20 indicating that these variants have been predicted to be among the 10% most deleterious substitutions within the human genome. Nine of the novel de novo variants (EEF1D, c.874C>T, CADD score 28.5; NFX1, c.1723G>A, CADD score 20.8; ZFHX3, c.1601C>G, CADD score 22.3; SLC5A2, c.644T>C, CADD score 27.6; CHD7, c.4187C>G, CADD score 33; NPR2, c.952C>G, CADD score 22.2, UBA3, c.1088C>T, CADD score 27.8, TANC2, c.2357C>T, CADD score 22.7; TRPS1, c.1630C>T, CADD score 36) and five of the rare de novo variants (HPS3, c.1189C>T, CADD score 35; MTA3, c.393C>A, CADD score 26; PLEC, c.6704G>A, CADD score 26.5; PPIP5K2, c.686G>A, CADD score 34; STAB1, c.6145C>T, CADD score 24.1) reached CADD scores over 20 indicating that these variants are predicted to be among the 1% most deleterious variants in the human genome (written in bold in Table 2).
Re-sequencing of ZFHX3 in EA/TEF patients
Re-sequencing of ZFHX3 in 192 EA/TEF patients did not identify additional putative disease-causing variants.
Structural modeling and in-silico analysis of ZFHX3 protein variants
Swiss model employed template id 3wbj.1.A as a template and built the ZFHX3 amino acid change p.Pro534Arg. Structural models were obtained with sequence identity 14.89%, coverage of 58.75%, and normalized Z-score of -2.90. The respective values are considered as an indicative of correctly folded and good modeled structures close to native structure. For the amino acid change p.Ala2126Val, structural models were obtained with sequence identity of 19.15%, coverage 27.48%, and normalized Z-score of -1.76. From the structural modeling of the ZFHX3 amino acid changes, we found that the two changes do not have any distortion in the native protein amounting to RMSD change at α-carbon is 0.02 Å and at backbone is 0.03 Å in p.Pro534Arg and RMSD change of at α-carbon is 0.05 Å and at backbone is 0.06 Å in p.AlaA2126Val (S1–S4 Figs).
Transcriptome analysis
Evaluation of the transcriptome data showed that all murine genes were expressed at E8.5, E12.5, and postnatal except for some APOL2 orthologous. Differential gene expression analysis revealed that four out of 24 genes were transcriptome-wide differentially expressed between the time points of E8.5 and E12.5 (Chd7: logFC 2.243, p.adj 1.85E-29; Npr2: logFC -2.268, p.adj 5.18E-04; Trps1: logFC -2.927, p.adj 1.17E-28; Eef1d: logFC 1.248, p.adj 6.41E-05), and two between the time points E12.5 and postnatal (Apol7a: logFC -9.273, p.adj 5.18E-07; Plec: logFC -2.352, p.adj 9.09E-24). Interestingly, most of the candidate genes were highly expressed at each time point (Fig 1, Fig 2, S2 Table). The candidate genes Zfhx3, Ppip5k2, Chd7 and Eef1d were even expressed above the 95th percentile at E8.5 compared to the expression of all other genes. In addition, the genes Ppip5k2, Trps1, Zfhx3 and Eef1d were expressed above the 93rd percentile at E12.5.
Discussion
The etiology of EA/TEF is heterogeneous. Previously, disease causing monoallelic mutations of variable genomic size have been reported among EA/TEF patients [7; 25] Here, we identified 23 single nucleotide de novo variants in 23 different genes using 30 unrelated case-parent trios and ES. All confirmed de novo variants were prioritized using in silico prediction tools. The embryonic role of genes harboring prioritized de novo variants was further investigated by targeted analysis of mouse transcriptome data of esophageal tissue obtained at E8.5, E12.5, and postnatal.
After prioritization of variants using in silico prediction tools, targeted analysis of mouse transcriptome data, and review of the literature we prioritize TRPS1 and ZFHX3 as new EA/TEF candidate genes and provide further support for CHD7 as a key player in esophageal development.
The identified de novo amino acid change in CHD7 has not been previously described. CHD7 has been established as the major disease gene for CHARGE syndrome (OMIM #214800) [26]. Eight out of nine in silico prediction programs used by dbNSFP v3.0 (https://sites.google.com/site/jpopgen/dbNSFP) classified this de novo amino acid change p. Ala1396Gly in CHD7 as deleterious. As about 20% of patients with CHARGE syndrome present with EA/TEF [27] we consider the identified variant as disease causing in our patient (167_501, Table 1) even though patient 167_501 did not present with additional congenital anomalies besides EA/TEF that would have suggested the clinical diagnosis of CHARGE syndrome. None of the other identified de novo variants resided within a gene that was previously linked to the formation of EA/TEF.
In patient 288_501 we identified a novel de novo truncating amino acid change p.Arg544* in TRPS1 associated with tricho-rhino-phalangeal syndrome I (OMIM #190350). Previously Maas et al. (2015) reported the same truncating variant in three unrelated patients with TRPS1 [25]. Unlike our patient 288_501, these previously reported patients did not present with any congenital anomaly of esophagus or trachea nor with any congenital anomaly of the heart (personal communication with Dr. Raoul C. Hennekam). Interestingly, in the here generated expression data Trps1 shows a consistently high expression levels of 67th percentile at E8.5 and a log2 Foldchange of -2.92 between days E8.5 and E12.5 in mouse embryos in the esophageal area, suggestive of an involvement of Trps1 during vertebrate foregut development. The latter hypothesis suggests that the here identified de novo variant in TRPS1 might be involved in the expression of EA/TEF in patient 288_501.
In patient 36_501 with nonsyndromic EA/TEF we identified a novel de novo variant in ZFHX3. Through personal communication during the project we identified another patient with EA/TEF as part of his VATER/VACTERL association (patient 750_501) with a de novo variant in ZFHX3. While the novel variant p.Pro534Arg resides in a well-conserved region of ZFHX3 and has not been reported in gnomAD, the variant p.Ala2126Val has been reported five times heterozygous in 250,880 alleles in gnomAD (MAF 0.00002) (Table 1) and resides in a less well conserved region of ZFHX3. Prompted by this finding, we re-sequenced ZFHX3 in 192 additional EA/TEF patients but did not find any further putative EA/TEF associated variant. In order to further analyze the two identified amino acid changes in ZFHX3 we further performed structural modeling and in-silico analysis of ZFHX3 protein. Here, substitution of C to G at position c.1601 has resulted in substitution of Pro to Arg at position 534 with a RMSD value amounting to 0.02Å at C-alpha carbon and 0.03Å in the protein backbone. Similarly, for the C to T substitution at c.6377 position that resulted into Ala to Val substitution at position 2126 has also recorded a similar change in RMSD value, 0.05Å at C-alpha carbon and 0.06Å in the protein backbone. While the structural modeling suggests that both changes do not cause distortion of the native protein, a possible functional impact of both variants would warrant further functional testing. According to our transcriptome analysis Zfhx3 is not differentially expressed between either E8.5 and E12.5 or E12.5 and postnatal. However, Zfhx3 is among the top expressed genes at E8.5 (>95th percentile) and E12.5 (>97rd percentile). Interestingly, Thisse and Thisse (2004) reported also expression of zfhx3 in zebrafish larvae 24 hours post fertilization in the region of the pharyngeal arches representing a series of paired bony or cartilaginous arches that develop along the lateral walls of the foregut, supporting the role of ZFHX3 in vertebrate foregut development [28]. Taken together, the here detected de novo variants in human EA/TEF patients, the high Zfhx3 expression at 8.5 and 12.5 in embryonic foregut tissue of mouse embryos and the previously reported expression of zfhx3 in zebrafish larvae in the region of the pharyngeal arches suggests ZFHX3 as a putative EA/TEF candidate gene.
Overall, interpretation of the data is limited by the lack of animal models, at least for the findings in CHD7, TRPS1, and ZFHX3. To the best of our knowledge, there has no animal model been described that would have investigated embryonic foregut development, when these genes have been deleted. In order to definitely conclude that our findings respectively de novo variants in CHD7, TRPS1, and ZFHX3 have been directly causative for the EA/TEF phenotype in the respective patients, in vivo experiments including animal models would be necessary, which were beyond the scope of our present study.
Conclusion
In summary, we detected 23 de novo mutations in 23 genes in 17 unrelated patients. Human exome and mouse embryonic expression analyses suggest ZHFX3 and TRPS1 as putative EA/TEF candidate genes and endorse CHD7 as a key player for esophageal development.
Supporting information
Acknowledgments
We thank all patients and their families for their participation, as well as the German self-help organizations for individuals with anorectal malformations (SoMA e.V.) and tracheoesophageal fistula with or without esophageal atresia (TE) (KEKS e.V.) for their assistance with recruitment. We also thank Prof. Raoul Hennekam for fruitful discussion on the manuscript.
Data Availability
All relevant data are within the manuscript and its Supporting Information files.
Funding Statement
F.K. was supported by a stipend of the University of Bonn, BONFOR (O-149.0115.1; https://www.medfak.uni-bonn.de/de/forschung/foerderung/interne-foerderung/bonfor). H.R., J.S., M.L., and P.G. are supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) (BE 3910/6-1, RE 1723/4-1, Exc147-2; https://www.dfg.de/). H.R., J.S., H.T, and E.J. are further supported by a grant of the Else-Kröner-Fresenius-Stiftung (EKFS, 2014_A14; https://www.ekfs.de/). The Exome analysis was performed on CHEOPS, a high performance computer cluster of the regional data center (RRZK) of the University of Cologne, funded by the DFG (215828658). The transcriptome analysis were performed on the de.NBI cloud, a national infrastructure supported by the German Federal Ministry of Education and Research (FKZ 031A532-0331A540 and 031L0101-0310108). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.David T.J., and O’Callaghan S.E. (1975). Oesophageal atresia in the South West of England. J. Med. Genet. 12, 1–11. 10.1136/jmg.12.1.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Depaepe A., Dolk H., and Lechat M.F. (1993). The epidemiology of tracheo-oesophageal fistula and oesophageal atresia in Europe. EUROCAT Working Group. Arch. Dis. Child. 68, 743–748. 10.1136/adc.68.6.743 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Pedersen R.N., Calzolari E., Husby S., Garne E., and EUROCAT Working group (2012). Oesophageal atresia: prevalence, prenatal diagnosis and associated anomalies in 23 European regions. Arch. Dis. Child. 97, 227–232. 10.1136/archdischild-2011-300597 [DOI] [PubMed] [Google Scholar]
- 4.Torfs C.P., Curry C.J., and Bateson T.F. (1995). Population-based study of tracheoesophageal fistula and esophageal atresia. Teratology 52, 220–232. 10.1002/tera.1420520408 [DOI] [PubMed] [Google Scholar]
- 5.Quan L., and Smith D.W. (1973). The VATER association. Vertebral defects, Anal atresia, T-E fistula with esophageal atresia, Radial and Renal dysplasia: a spectrum of associated defects. J. Pediatr. 82, 104–107. 10.1016/s0022-3476(73)80024-1 [DOI] [PubMed] [Google Scholar]
- 6.Geneviève D., de Pontual L., Amiel J., Sarnacki S., and Lyonnet S. (2007). An overview of isolated and syndromic oesophageal atresia. Clin. Genet. 71, 392–399. 10.1111/j.1399-0004.2007.00798.x [DOI] [PubMed] [Google Scholar]
- 7.Brosens E., Marsch F., de Jong E.M., Zaveri H.P., Hilger A.C., Choinitzki V.G., et al. (2016). Copy number variations in 375 patients with oesophageal atresia and/or tracheoesophageal fistula. Eur. J. Hum. Genet. EJHG 24, 1715–1723. 10.1038/ejhg.2016.86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.van Bokhoven H., Celli J., van Reeuwijk J., Rinne T., Glaudemans B., van Beusekom E., et al. (2005). MYCN haploinsufficiency is associated with reduced brain size and intestinal atresias in Feingold syndrome. Nat. Genet. 37, 465–467. 10.1038/ng1546 [DOI] [PubMed] [Google Scholar]
- 9.Jongmans M.C.J., Admiraal R.J., van der Donk K.P., Vissers L.E.L.M., Baas A.F., Kapusta L., et al. (2006). CHARGE syndrome: the phenotypic spectrum of mutations in the CHD7 gene. J. Med. Genet. 43, 306–314. 10.1136/jmg.2005.036061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Motoyama J., Liu J., Mo R., Ding Q., Post M., and Hui C. (1998). Essential function of Gli2 and Gli3 in the formation of lung, trachea and oesophagus. Nat. Genet. 20, 54–57. 10.1038/1711 [DOI] [PubMed] [Google Scholar]
- 11.Williamson K.A., Hever A.M., Rainger J., Rogers R.C., Magee A., Fiedler Z., et al. (2006). Mutations in SOX2 cause anophthalmia-esophageal-genital (AEG) syndrome. Hum. Mol. Genet. 15, 1413–1422. 10.1093/hmg/ddl064 [DOI] [PubMed] [Google Scholar]
- 12.Zhang R., Marsch F., Kause F., Degenhardt F., Schmiedeke E., Märzheuser S., et al. (2017). Array-based molecular karyotyping in 115 VATER/VACTERL and VATER/VACTERL-like patients identifies disease-causing copy number variations. Birth Defects Res. 109, 1063–1069. 10.1002/bdr2.1042 [DOI] [PubMed] [Google Scholar]
- 13.Li H., and Durbin R. (2009). Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., et al. (2010). The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kawalia A., Motameny S., Wonczak S., Thiele H., Nieroda L., Jabbari K., et al. (2015). Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow. PLOS ONE 10, e0126321 10.1371/journal.pone.0126321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yeo G., and Burge C.B. (2004). Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals. J. Comput. Biol. 11, 377–394. 10.1089/1066527041410418 [DOI] [PubMed] [Google Scholar]
- 17.Ramu A., Noordam M.J., Schwartz R.S., Wuster A., Hurles M.E., Cartwright R.A., et al. (2013). DeNovoGear: de novo indel and point mutation discovery and phasing. Nat. Methods 10, 985–987. 10.1038/nmeth.2611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Liu X., Jian X., and Boerwinkle E. (2011). dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899. 10.1002/humu.21517 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Liu X., Wu C., Li C., and Boerwinkle E. (2016). dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum. Mutat. 37, 235–241. 10.1002/humu.22932 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., et al. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinforma. Oxf. Engl. 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ewels P., Magnusson M., Lundin S., and Käller M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinforma. Oxf. Engl. 32, 3047–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Love M.I., Huber W., and Anders S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Love M.I., Anders S., Kim V., and Huber W. (2015). RNA-Seq workflow: gene-level exploratory analysis and differential expression. F1000Research 4, 1070 10.12688/f1000research.7035.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Durinck S., Spellman P.T., Birney E., and Huber W. (2009). Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191. 10.1038/nprot.2009.97 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Maas S.M., Shaw A.C., Bikker H., Lüdecke H.-J., van der Tuin K., Badura-Stronka M., et al. (2015). Phenotype and genotype in 103 patients with tricho-rhino-phalangeal syndrome. Eur. J. Med. Genet. 58, 279–292. [DOI] [PubMed] [Google Scholar]
- 26.Janssen N., Bergman J.E.H., Swertz M.A., Tranebjaerg L., Lodahl M., Schoots J., et al. (2012). Mutation update on the CHD7 gene involved in CHARGE syndrome. Hum. Mutat. 33, 1149–1160. 10.1002/humu.22086 [DOI] [PubMed] [Google Scholar]
- 27.Kutiyanawala M., Wyse R.K., Brereton R.J., Spitz L., Kiely E.M., Drake D., et al. (1992). CHARGE and esophageal atresia. J. Pediatr. Surg. 27, 558–560. 10.1016/0022-3468(92)90445-d [DOI] [PubMed] [Google Scholar]
- 28.Thisse B., and Thisse C. (2004). Fast Release Clones: A High Throughput Expression Analysis. ZFIN Direct Data Submiss. [Google Scholar]