Abstract
Background
Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely characterized for their role in restricting infection by diverse enveloped and non-enveloped viruses. The chicken IFITM (chIFITM) genes are clustered on chromosome 5 and to date four genes have been annotated, namely chIFITM1, chIFITM3, chIFITM5 and chIFITM10. However, due to poor assembly of this locus in the Gallus Gallus v4 genome, accurate characterization has so far proven problematic. Recently, a new chicken reference genome assembly Gallus Gallus v5 was generated using Sanger, 454, Illumina and PacBio sequencing technologies identifying considerable differences in the chIFITM locus over the previous genome releases.
Methods
We re-sequenced the locus using both Illumina MiSeq and PacBio RS II sequencing technologies and we mapped RNA-seq data from the European Nucleotide Archive (ENA) to this finalized chIFITM locus. Using SureSelect probes capture probes designed to the finalized chIFITM locus, we sequenced the locus of a different chicken breed, namely a White Leghorn, and a turkey.
Results
We confirmed the Gallus Gallus v5 consensus except for two insertions of 5 and 1 base pair within the chIFITM3 and B4GALNT4 genes, respectively, and a single base pair deletion within the B4GALNT4 gene. The pull down revealed a single amino acid substitution of A63V in the CIL domain of IFITM2 compared to Red Jungle fowl and 13, 13 and 11 differences between IFITM1, 2 and 3 of chickens and turkeys, respectively. RNA-seq shows chIFITM2 and chIFITM3 expression in numerous tissue types of different chicken breeds and avian cell lines, while the expression of the putative chIFITM1 is limited to the testis, caecum and ileum tissues.
Conclusions
Locus resequencing using these capture probes and RNA-seq based expression analysis will allow the further characterization of genetic diversity within Galliformes.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-017-3801-8) contains supplementary material, which is available to authorized users.
Keywords: PacBio RSII, Illumina MiSeq, Chicken IFITM, Genetic characterization, RNA-seq
Background
Poultry accounts for almost half of all meat consumed in the UK, with 875 million chickens, 17 million turkeys, 16 million ducks and 250,000 geese a year supplied by over 2500 poultry farms [1]. Their production can be adversely affected by infection with avian specific viruses such as infectious bursal disease virus (IBDV), infectious bronchitis virus (IBV) and Newcastle virus (NDV) [2–7]. Poultry can also serve as the source of zoonotic, or potentially zoonotic, infections with viruses such as H5N1 and H7N9, transmitted to humans through contact with poultry. To reduce the threat to the global food supply and to minimize the risk of zoonotic events, there is an ongoing need to better understand the biology of avian viral infections, the mechanism of natural resistance (viral intrinsic and innate immunity) and the characterization of the biological factors that might be involved.
Interferon inducible transmembrane (IFITM) proteins are effectors of the immune system widely involved in restricting entry into cells of a broad range of viruses including Influenza viruses, Ebola and Zika [8–14]. In chickens, four IFITM genes have been annotated to date by the chicken gene nomenclature consortium (CGNC), namely chIFITM1 (LOC422993-3-like), chIFITM3 (LOC770612-1-like), chIFITM5, and chIFITM10 [15–22]. Although not yet annotated by the CGNC, we have previously shown the existence of chIFITM2 (putative LOC107053353-dispanin-2b-like) and suggested a hypothetical genetic structure of the locus based on the human syntenic genome region [23].
As IFN-stimulated genes (ISGs), IFITM abundance within a cell increases following activation of the type 1 IFN signaling pathway in response to the detection of pathogen associated molecular patterns (PAMPs) such as viral nucleic acid in the cytoplasm of the infected cell. In addition, binding of the IFNα/β to their cell surface receptors induces translocation of the transcription factor complex IFN-stimulated gene factor 3 (ISGF3) into the nucleus [24]. This induces the transcription of several ISGs, among which are the IFITM genes. The IFITM proteins target the final stages of viral entry by preventing fusion of the viral and cellular membranes [25]. This mechanism also reflects the localization of the human IFITM2 and IFITM3 which are found predominantly in intracellular membrane compartments such as late endosomes and lysosomes [21]. It is suggested that the membrane-defined site of fusion, namely plasma membrane and endosomes, is critical for the antiviral activity of these proteins [15].
While genetics and cell biology of the human IFITMs has been extensively characterized, lack of an accurate and complete reference genome sequence has hampered progress in characterizing the locus in diverse vertebrates including avian IFITMs. The genetic structure of the chicken locus was proposed based on the human locus however critical differences suggest that the current chIFITM nomenclature might be incorrect [23]. Indeed, the relative intracellular localizations of chIFITM1 and 2 as defined by genome synteny are the opposite of their human counterpart. This prompted Smith at al. to suggest an inversion might have occurred within the locus [23]. Subsequently, it was shown that duck IFITM1 localizes on the plasma membrane, like human IFITM1, highlighting further classification difficulty in avian IFITMs [16]. In addition, in the effort to explain the conserved antiviral activities of the different human IFITMs genes, Compton et al. have recently suggested that these differences, which reflect their localization and abundance in a cell, are a sign of a duplication and mutational events of the IFITM genes that arose millions of years [26]. Although their studies focused in the evolution of the IFITM genes in various non-human primates, it underlines the necessity to consider how avian IFITM genes should be considered as their nomenclature does not reflect necessarily their human orthologues. In this scenario, while ancestral IFITM3 is clearly syntenic with hIFITM3, more studies are required to elucidate the relationship between the other two IFITM proteins.
The most recent version of the chicken genome (v5) has incorporated long PacBio sequencing reads. This new sequencing has improved the chicken genome, including the IFITM locus. However, when sequencing an entire genome and performing whole genome assembly, minor assembly errors can occur, often due to lack of coverage or because paralogous sequences at other loci compromise accurate assembly. The IFITM gene family is one of the most paralogous families known with multiple copies of both IFITM genes and pseudogenes. For this reason, we sequenced just a small region of chromosome 5 containing the IFITM locus at high coverage with PacBio and with Illumina MiSeq.
The average PacBio read length is >10 kb, depending only on the activity of the polymerase [27–29] and although PacBio raw reads have a higher error rate compared to other technologies (14% versus 0.1 to 1% for Illumina), high quality consensus sequence can be obtained from overlapping reads. To complement the new Gallus gallus reference we have focused solely on the chIFITM locus and better elucidated its genetic structure by sequencing a bacterial artificial chromosome (BAC), from the BAC library used to generate the original Gallus gallus genome. The 203Kb-long BAC (CH261-109H20 [30]), containing the chIFITM locus, does not include chIFITM10. Given that the current literature focuses mainly on the antiviral activity of chIFITM1, 2, 3 and 5, we present evidence of the high confidence, high coverage sequence of this locus and the expression of these 4 genes by mapping of publicly-available RNA-seq data, to define each of the chIFITM proteins at the transcriptional level. Further, we describe the design and use of hybrid capture (SureSelect) probes and their use in genome capture and sequencing of other Galliform IFITM loci.
Results
De novo assembly of PacBio and MiSeq sequencing reads
In order to obtain a consensus reference sequence from the raw sequencing data, PacBio reads derived from the BAC clone sequencing were quality filtered and de novo assembled with HGAP using the protocols available on the SMRT portal (Additional file 1). Summaries of assembly and mapping statistics for PacBio (and also Illumina, see below) reads are shown in Table 1. Because of the length of the PacBio reads, the PacBio de novo assembled consisted of 6 assembled fragments (compared to 13 with Illumina). Of these, one contig (number 2, Table 2) contained the chicken genome sequence; the others represented genomic sequences from the E.coli BAC vector (Additional file 2). Contig 2, containing chicken sequences, had the highest base coverage and its length suggested it represented the full-length BAC clone. Therefore, to confirm the identity of this de novo assembled fragment, we utilized ACT and sequence similarly plots to compare contig 2 with chromosome 5 reference sequence from both Gallus gallus v4 and Gallus gallus v5 (Fig. 1a-c). Contig 2 contained the full chIFITM locus and highlights the substantial deficiencies to the Gallus gallus v4 genome assembly (Fig. 1a and b). This contrasts with Gallus gallus v5 genome assembly where fewer large gaps are observed, but with the presence of a small INDEL (Fig. 1c-d). Inspection of the similarity plot shows these differences observed at the nucleotide level fall in the genomic region of the chIFITM3 gene, within the intronic region (Fig. 1c, bottom Dot Plot and sequence alignment in 1D). To further analyse this reagion we have also screened the full locus for repeats and low complexity DNA sequences as shown in Additional file 3. We attempted de novo assembly of Illumina MiSeq paired-end reads using three software packages (namely IVA, SGA and HGAP) resulting in only partial consensus sequence covering between 50 and 70% of the full chIFITM locus (including the flanking genes ATHL1 and B4GALNT4) (data not shown). The best assembly was generated using IVA, which produced the least number of contigs (13). In order to identify Illumina contigs that contained the BAC, and specifically the chIFITM locus, sequence similarity was used to compare the Illumina MiSeq contigs with the PacBio contig 2 (Additional file 4). All of the Illumina MiSeq contigs covered either portions of the PacBio contig 2, or just the chIFITM locus. These results suggest that while the longer PacBio reads map well to the reference genome (Additional file 5), Illumina MiSeq raw reads on their own are not be sufficient to assemble this region de novo, although they do map accurately to the de novo PacBio reference.
Table 1.
PacBio RSII | Illumina MiSeq | |||
---|---|---|---|---|
Number of reads | 78,140 | 665,450 | ||
Number of Bases | 401,758,407 | 199,635,000 | ||
Mean Read Length | 5141 | 300 | ||
De novo assembly | ||||
Assembly software | HGAP | IVA | ||
Polished contigs | 6 | 13 | ||
Sum of contigs length | 4,818,915 bp | 277,830 bp | ||
Largest fragment | 2,323,934 bP | 73,284 bp | ||
N50a | 1,102,549 bp | NA | ||
Mapping | ||||
Reference | Mapped reads | Mean coverage | Mapped reads | Mean coverage |
Chr.5 Gallus gallus v4 | 33,892 | 193 | 586,297 | 607 |
Chr.5 Gallus gallus v5 | 34,068 | 196 | 693,474 | 440 |
PacBio_contig N.2 | NA | NA | 606,994 | 599 |
aN50 read length metric: The read length at which 50% of the bases are in reads longer than, or equal to, this value
Table 2.
Contig | Length | Base calleda | Consensus accuracyb | Base coveragec |
---|---|---|---|---|
1 | 2323934 | 1.0 | 0.99 | 38.8 |
2 | 223345 | 0.99 | 0.99 | 419.37 |
3 | 1102486 | 1.0 | 0.99 | 40.56 |
4 | 623652 | 0.99 | 0.99 | 38.0 |
5 | 537146 | 0.99 | 0.99 | 36.7 |
6 | 17862 | 1.0 | 0.99 | 28.56 |
aBases Called: The percentage of reference sequence that has ≥ 1x coverage. % Bases Called + % Missing Bases should equal 100; bConsensus Accuracy: The accuracy of the consensus sequence compared to the reference; cBase Coverage: The mean depth of coverage across the reference sequence
Organization of the chIFITM locus in PacBio contig 2 and Gallus gallus v4 and v5 reference sequences
To study in more detail the gene order of the v4 and v5 assembled locus relative to our assembly we used Artemis. Concentrating on the chIFITM genes, we show that combined reads from both sequencing technologies mapped well to v4 or v5 assemblies, covering the locus to significant depths and aligning to all the regions of interest (Fig. 2 and Additional file 5A-D). The deep and accurate sequence of the chIFITM locus allows us to be confident that the chIFITM1 and 2 genes as named and annotated in the v5 genome are indeed inverted in comparison to the human locus with chIFITM1, 2, 3 and 5 genes having their transcriptional units in the same direction (Table 3) [23].
Table 3.
Gene | Location in contig 2 |
---|---|
chIFITM1 | 162068..163611 |
chIFITM2 | 164151..165395 |
chIFITM3 | 158589..159917 |
chIFITM5 | 165955..167524 |
ATHL1 | 168807..177724 |
B4GALNT4 | 138150..157395 |
SureSelect probes design and pull down of the IFITM locus from turkey breast tissue and DF1 cells
The consensus sequence we have generated was used to design Agilent SureSelect probes covering the 40 kb region encompassing the IFITM locus. Our primary purpose is to use these probes to study possible IFITM variants in different chicken breeds and further into the phylogeny of Galliformes. We were able to successfully pull down the IFITM locus in DF1 cells (chicken embryonic fibroblasts) as well as turkey breast tissue (Fig. 3), showing we are able to use chicken (Phasianinae, sub-family of Galliformes) IFITM probes to pulldown and sequence the locus in a different Galliform sub-family, namely the Meleagridinae, to which the turkey belongs. The BAC clone, like Gallus gallus v5 of the chicken genome, is from a Red Jungle fowl, inbred line UCD001 (Inbred 256, female) while the DF1 cells are derived from a White Leghorn (East Lansing line-0, 10-day old eggs). Mapping of PacBio reads from DF1 cells against either v5 of the chicken genome sequence or our PacBio contig 2 gives a good coverage but with low coverage gaps detected in IFITM3 and B4GALNT4 (Fig. 3a-b). The IFITM3 gap was closable with the low frequency PacBio reads and the PacBio contig 2 reference, yielding an accurate IFITM locus sequence for DF1 cells. Illumina sequencing of the turkey IFITM locus assembles more poorly to the turkey reference genome (Fig. 3c), suggesting the current turkey genome is in need of improvement with long read PacBio sequences as achieved for the chicken genome. We were however, able to identify all four IFITM genes in the turkey locus. We constructed multiple sequence alignments for the two chicken and turkey genome IFITMs (Fig. 4). Amino acid sequence alignment of the IFITM proteins of DF1, turkey and Gallus gallus v5 shows substantial differences as we can see from Fig. 4. For the known antiviral IFITMs one amino acid change was found between Red Jungle fowl and White Leghorn, namely A63V in the CIL domain of IFITM2. More amino acid substitutions were seen for Turkey compared to chicken IFITMs with 13, 13 and 11 differences between IFITM1, 2 and 3 respectively. Variation in one of the chicken IFITMs is maintained in the turkey gene, namely amino acid 63 A (Red Jungle Fowl) or V (White Leghorn) and 63 V (Turkey) in IFITM2.
Mapping RNA-seq data to the PacBio contig 2 reference containing the chIFITM locus
The generation of a high quality de novo assembly of the IFITM locus sequence allows accurate mapping of RNA-sequence data from previous published studies for qualitative and quantitative analysis. To validate which chIFITM transcripts were expressed, and to assess their level of expression, we first used RNA-seq reads from 293 T cells, engineered to express only chicken IFITM proteins constitutively. Reads from the control cells (wild type 293 T) do not map to the chIFITM locus (Table 4). Focusing on the 40 kb region containing the chIFITM locus, including the flanking genes ATHL1 and B4GALNT4, we observed RNA-seq reads from 293 T cell lines stably expressing chIFITM1, 2, or 3 with expected peaks of expression at gene exon locations (Additional file 6). The number of mapped reads and by implication the expression level for chIFITM3 was higher than that of chIFITM2 and in turn both higher than that of chIFITM1 (Additional file 6). We analysed 26 RNA-seq studies totaling 293 sequenced chicken tissues and avian cell lines that were identified in the ENA database. The samples were examined for constitutive expression levels of the chIFITMs in a subset of each study covering at least one immune relevant tissue type (Table 5). To analyze constitutive expression, RNA-seq data from liver, spleen, lung and trachea samples taken from the studies as listed in Table 6, were mapped against the PacBio contig 2. To these, we added expression data from commonly used laboratory cell lines (DF1, CEF, HD11, DT40).
Table 4.
Cell line | Average coverage |
---|---|
293 T | 35 |
293 T - chIFITM1 | 34 |
293 T - chIFITM2 | 339 |
293 T - chIFITM3 | 746 |
Table 5.
Table 6.
N. | Tissue | Condition | Species |
---|---|---|---|
1 | Lung | H5N3 AIV | Fayoumi and leghorn |
2 | DF-1 | IRF 7 overexpression and knockdown assays/poly I:C | East Lansing Line (ELL-0) White Leghorn |
3 | DF-1 | Cell-adapted Infectious Bursal Disease Virus (ca-IBDV) infection | East Lansing Line (ELL-0) White Leghorn |
4 | Trachea | Infectious laryngotracheitis virus vaccine | 15-day-old SPF white leghorn chickens |
5 | DT40 CL18 chicken B lymphoma cells | Basal | Bursal lymphoma cell line derived from a Hyline SC chicken |
6 | Caecal tissue | C.jejuni strain NCTC11168v1 | Barred Rock chickens |
7 | Breast muscle | Basal | White rock/Xinghua chickens |
8 | Abdominal adipose tissue | Body weight | 7 week old broiler chickens |
9 | Primary hepatocellular carcinoma epithelial cell line | Heat stress response | Chicken male white-leghorn hepat ocellular (LMH) cell line. |
10 | Spleen | J Subgroup Avian Leukosis Virus (ALV-J) Infection | White Recessive Rock |
11 | Facial | Talpid2 heterozygous carriers | HH25 chickens |
12 | DT40 cells | Splicing factor SRSF10 | Bursal lymphoma cell line derived from a Hyline SC chicken |
13 | MSB1 cell line | Marek’s disease virus 1 | Chicken lymphoblastoid cell line |
14 | Liver | Heat stress response | Broiler chickens |
15 | Endocardial cells | Endocardial EMT | HH18 chicken/embryo |
16 | Brain (cerebral cortex/whole brain without cerebellum), cerebellum, heart, kidney, liver and testis | Basal | Red jungle fowl |
17 | Liver/muscle | Basal | 7 day red jungle fowl and broiler |
18 | CEF/HD11 | Lipopolysaccharide | 11-day white leghorn |
19 | Mid shaft tibial bone | Basal | White leghorn |
20 | Ileum/lung | H5N2/H5N1 | White leghorn/Domestic Gray Mallards |
21 | Adrenal gland, adipose, cerebellum, testis, ovary, heart, hypothalamus, kidney, liver, lung, breast muscle, sciatic nerve, proventriculus, spleen | Basal | Red Jungle Fowl |
22 | Whole embryo | Basal | UE1295 PEAT/F-37380 cross |
23 | Testis | New Hampshire | |
24 | Spleen | IBDV | Gallus gallus |
25 | CEF | chIFNα | CEF |
26 | Chicken embryo | Basal | Gallus Gallus |
ChIFITM3 is constitutively expressed (both exons) in all tissues and cell lines analysed at levels higher than the putative chIFITM1 and chIFITM2. Indeed, putative chIFITM1 is barely detectable in most of the tissues, and much lower compared to the other IFITM transcripts, as also shown from the RPKM values in Table 5. Further, when infected or subject to cellular stress chIFITM2 and chIFITM3 are abundantly expressed, again with little IFITM1 expression. Indeed, it is not possible to detect convincing levels of IFITM1 expression at any time except for Caecal tissue and Ileum tissue infected by influenza A H5N2 or H5N11 (Figs. 5 and 6, Additional file 7 and Table 5). In addition, the coverage graphs confirm that the typical genetic structure of the chIFITM genes is maintained, with two exons separated by a single intron in all cases, although reads were observed to map beyond the boundaries of the annotated genes particularly in the stretch of genomic region between IFITM2 and 5 (Figs. 5 and 6).
Discussion
In this study we have sequenced a BAC clone containing the complete chIFITM locus using both PacBio and Illumina MiSeq sequencing technologies producing an accurate assembly of the locus. We analysed expression levels of the chIFITM genes using publicly available RNA-seq data from different chicken lines and tissues, and produced hybrid capture probes for ‘pull-down’ sequencing of another chicken line and the more distant turkey IFITM locus.
The chIFITM locus showed several gaps in the version 4 of the chicken genome release (Gallus gallus 4). It had been improved by sequencing the same DNA reference source (Female Red Jungle Fowl, UCD001 inbred line) with PacBio technology. Comparison of the two public versions of the chIFITM locus with the one generated in our study (PacBio contig 2) still demonstrated differences, despite being the same inbred line. We believe these discrepancies in the public genome assemblies might be a consequence of genome wide assembly required for full chicken genome, suggesting that our BAC sequence (203 kb) is likely to be more accurate, particularly in GC-rich regions. In addition, quality control analysis and type of assembler used will influence the final consensus sequence generated for any region of the chicken genome, leading to the differences observed in the sequences. To produce our sequence, we employed both PacBio RSII and Illumina MiSeq technologies because they have complementary properties that met our requirements for covering gaps and maintaining sequence integrity. Sequencing within Gallus gallus domesticus lines, more outbred chickens and more divergent Galliforms is now possible using hybrid capture genome sequencing. Indeed, we have been able to document many amino acid sequence changes between chickens and turkeys in the antiviral IFITMs in regions of the proteins known to be important for their antiviral activity (Fig. 4).
The importance of obtaining an accurate sequence is vital to understand the genetic structure and confirm the identity of the IFITM locus, thus to correctly annotate the genes. Hypothetical structures of the chIFITM locus have been suggested, based on the human locus but inconsistencies remain between alignments for the putative chIFITM1 and chIFITM2 [16, 17, 23]. Based on the literature and current annotation the four genes are clustered on chromosome 5 which also contains the chIFITM10 gene (the function of which remains to be elucidated). Following the discovery of chIFITM2, Smith at al. [23] proposed an organizational structure for the locus, based on features such as membrane localization and lack of an N-terminal extension (both characteristic of the IFITM2 and IFITM3 proteins), suggesting that chIFITM2 is actually analogous to human IFITM1 [23]. Our immunofluorescence analysis to study localization of the chicken proteins expressed in human (293 T) stable cell lines is in agreement with Smith et al. (data not shown, Bassano et al. in preparation) [23]. Indeed, chIFITM2 is membrane-bound, while chIFITM1 localizes to the early endosomes. Here our RNA-seq analysis of the ENA dataset shows that chIFITM1 basal expression levels are very low compared to chIFITM2 and chIFITM3. The analysis of the samples in presence of IFN, H5N2, H5N1, H5N3, IBDV, IRF7, ALV, Lipopolysaccharide or in heat-stress induced conditions, also shows that higher expression levels can be observed for chIFITM3 and chIFITM2 suggesting a key role for these two proteins as antiviral IFITMs compared to chIFITM1, expression of which is only in the intestinal tract and in the testis. Although immunofluorescence staining seems to suggest that chIFITM2 is analogous to hIFITM1 (they are both plasma membrane-bound) the genome organisation supported by long read PacBio sequences now unambiguously confirms that the chIFITM2 and chIFITM1 locus is inverted compared to the human locus. We therefore, propose based on gene expression, genome architecture and published functional data the gene order in the chicken locus on chromosome 5q should be renamed: centromeric – B4GALNT4 – chIFITM3 – chIFITM2 – chIFITM1 – chIFITM5 – ATHL1 – telomeric.
Conclusions
In this report we have produced an updated genomic map of chIFITM locus that includes the two flanking genes ATHL1 and B4GALNT4, by combining and analyzing sequencing data derived from PacBio RS II and Illumina MiSeq sequencing technologies. The only difference detected in our assembled locus sequence relative to the Gallus Gallus (v5) is a 5 bp insertion in the intronic region of chIFITM3. This change in sequence may not have any influence on the function and expression of the chIFITM3 gene. However, RNA-seq analysis shows expression of all IFITMs from this locus but that chIFITM1 has different patterns of expression from the other antiviral IFITMs. Initial analysis of different chicken breeds shows IFITM amino acid variation between different chicken breeds and turkeys.
Methods
Bacterial Artificial Chromosome (BAC) construct recovery
The BAC clone (CHORI-261) from Red Jungle Fowl strain UCD001 covering the predicted IFITM locus was purchased from BACPAC Resources Centre. The BAC clone, delivered as a stab culture was streaked directly on Luria Broth (LB) agar (chloramphenicol 12.5 μg/mL) to isolate single colonies and incubated overnight at the designated growth temperature. Single colonies were picked and cultured in LB media. Plasmid DNA was then extracted and purified according to Qiagen Plasmid DNA kit manufacturer’s protocol.
Sequencing, assembly and alignment
A total of 3 μg isolated plasmid DNA was sequenced across two platforms, the Illumina MiSeq and PacBio RSII. Library preparation and quality control was undertaken by The Wellcome Trust Sanger Institute’s core sequencing facility. Assembly of PacBio sequencing reads was performed using protocols available on the SMRT® Portal. Briefly, sequencing fragments were first filtered to remove reads that did not meet read quality and length thresholds, then de novo assembled using HGAP [31]. Errors in the re-circularization of the BAC as well as sequence consensus generation for the DF1 cell line were corrected using iCORN v2, Interative Correction of Reference Nucleotides [32]. MiSeq reads were first analysed for low quality reads with FastQC [33] and low quality reads were trimmed using Trimmomatic [34]. De novo assembly of MiSeq reads was attempted using IVA [35], SGA [36] and HGAP from the SMRT® Analysis package [31]. SMALT, (http://www.sanger.ac.uk/science/tools/smalt-0) a pairwise sequence alignment program was used to map MiSeq reads onto genomic reference sequences, either chromosome 5 of Gallus gallus (v4 and v5) or the consensus sequence generated from de novo assembly of PacBio sequencing reads. The SAM files generated were converted into indexed BAM files using Samtools 0.1.19 [37]. Artemis (v13.0) and ACT (Artemis Comparison Tool) [38] were used to analyse locus coverage and accuracy of the alignment. Comparison files required to run ACT were generated with megablast [39]. Dot plots were generated calling dotter from the command line [40]. Annotation for the PacBio consensus sequence was generated by RATT, Rapid Annotation Transfer Tool [41] using as scaffold the annotation from Gallus gallus version 4.
All sequences produced in this manuscript are deposited in the ENA under the accession numbers ERS556272, ERS565108, ERS1276179, PRJNA361311.
SureSelect pull down of the IFITM locus
SureSelect probes covering the chicken IFITM locus (40Kb region) were purchased from Agilent and samples processed for targeting pulldown according to the Illumina and PacBio protocols.
Cell culture
Two hundred ninety-three T and DF1 cells were cultured in DMEM medium supplemented with 10% FCS, in absence of any antibiotics. Stable transfections were performed using Fugene (Promega) according to the manufacturer’s instructions and cells maintained in culture in presence of puromycin for positive selection. RNA extraction was performed using Qiagen RNA extraction kit according to the manufacturer’s instructions. Up to 5 μg of extracted RNA was reverse transcribed and sequenced using Illumina MiSeq. DNA extraction from turkey breast tissue was performed using Qiagen Tissue and blood DNA extraction kit, according to the manufacturer’s protocol.
European Nucleotide Archive (ENA) sequencing data download and RNA-seq analysis
RNA-seq datasets for this study were retrieved from ENA records (Table 5). A total of 26 studies for chicken sequencing datasets were identified. FastQC-corrected reads were aligned to the PacBio-derived consensus sequence using BWA version 0.7.12-r1039, Samtools 0.1.19 and MAFFT version 7.205. The BAM files generated were then visualized using Artemis. To quantify transcripts expression, RPKM (Reads Per Kilobase per Million mapped reads) were calculated using Artemis by selecting the feature of interest. Read depth for RNA-seq alignment was calculated using Ugene v1.25.0.
Additional files
Acknowledgements
The authors thank Michael Skinner and Michael Quail for useful comments on the manuscript and Thomas D. Otto for help with Artemis and ACT software.
Funding
This research was supported by the BBSRC (Animal Health Research Club) grant Number BB/L003996/1, BB/L00397X/1 and BB/L00397X/2.
Availability of data and materials
All sequences produced in this manuscript are deposited in the ENA under the accession numbers ERS556272, ERS565108, ERS1276179, PRJNA361311.
Authors’ contributions
IB designed and performed the experiments, interpreted the results and wrote the manuscript. SHO designed methodology for RNA-seq data analysis and wrote the manuscript. NL wrote the manuscript. TW prepared the BAC for sequencing. MF and PK wrote the manuscript. All authors read and approved the final manuscript.
Authors’ information
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-017-3801-8) contains supplementary material, which is available to authorized users.
Contributor Information
Irene Bassano, Email: i.bassano@imperial.ac.uk.
Swee Hoe Ong, Email: so7@sanger.ac.uk.
Nathan Lawless, Email: nathan.b.lawless@gmail.com.
Thomas Whitehead, Email: thomas.whitehead@pirbright.ac.uk.
Mark Fife, Email: mark.fife@pirbright.ac.uk.
Paul Kellam, Email: p.kellam@imperial.ac.uk, Email: Paul.Kellam@kymab.com.
References
- 1.The British Poultry Council. 2016. http://www.britishpoultry.org.uk/.
- 2.Bande F, et al. Pathogenesis and Diagnostic Approaches of Avian Infectious Bronchitis. Adv Virol. 2016;2016:4621659. doi: 10.1155/2016/4621659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wickramasinghe IN, et al. The avian coronavirus spike protein. Virus Res. 2014;194:37–48. doi: 10.1016/j.virusres.2014.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ingrao F, et al. Infectious Bursal Disease: a complex host-pathogen interaction. Dev Comp Immunol. 2013;41(3):429–438. doi: 10.1016/j.dci.2013.03.017. [DOI] [PubMed] [Google Scholar]
- 5.Mahgoub HA, Bailey M, Kaiser P. An overview of infectious bursal disease. Arch Virol. 2012;157(11):2047–2057. doi: 10.1007/s00705-012-1377-9. [DOI] [PubMed] [Google Scholar]
- 6.Seal BS, King DJ, Sellers HS. The avian response to Newcastle disease virus. Dev Comp Immunol. 2000;24(2–3):257–268. doi: 10.1016/S0145-305X(99)00077-4. [DOI] [PubMed] [Google Scholar]
- 7.Rahn J, et al. Vaccines against influenza A viruses in poultry and swine: Status and future developments. Vaccine. 2015;33(21):2414–2424. doi: 10.1016/j.vaccine.2015.03.052. [DOI] [PubMed] [Google Scholar]
- 8.Savidis G, et al. The IFITMs Inhibit Zika Virus Replication. Cell Rep. 2016;15(11):2323–2330. doi: 10.1016/j.celrep.2016.05.074. [DOI] [PubMed] [Google Scholar]
- 9.Wilkins J, et al. Nonhuman Primate IFITM Proteins Are Potent Inhibitors of HIV and SIV. PLoS One. 2016;11(6):e0156739. doi: 10.1371/journal.pone.0156739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Weston S, et al. Alphavirus restriction by IFITM proteins. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Yu J, et al. IFITM Proteins Restrict HIV-1 Infection by Antagonizing the Envelope Glycoprotein. Cell Rep. 2015;13(1):145–156. doi: 10.1016/j.celrep.2015.08.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Munoz-Moreno R, et al. Antiviral Role of IFITM Proteins in African Swine Fever Virus Infection. PLoS One. 2016;11(4):e0154366. doi: 10.1371/journal.pone.0154366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang W, et al. Human respiratory syncytial virus infection is inhibited by IFN-induced transmembrane proteins. J Gen Virol. 2015;96(Pt 1):170–182. doi: 10.1099/vir.0.066555-0. [DOI] [PubMed] [Google Scholar]
- 14.Kuhl A, Pohlmann S. How Ebola virus counters the interferon system. Zoonoses Public Health. 2012;59(Suppl 2):116–131. doi: 10.1111/j.1863-2378.2012.01454.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bailey CC, et al. IFITM-Family Proteins: The Cell’s First Line of Antiviral Defense. Annu Rev Virol. 2014;1:261–283. doi: 10.1146/annurev-virology-031413-085537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Blyth GA, et al. Duck Interferon-Inducible Transmembrane Protein 3 Mediates Restriction of Influenza Viruses. J Virol. 2015;90(1):103–116. doi: 10.1128/JVI.01593-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Smith J, et al. A comparative analysis of host responses to avian influenza infection in ducks and chickens highlights a role for the interferon-induced transmembrane proteins in viral resistance. BMC Genomics. 2015;16:574. doi: 10.1186/s12864-015-1778-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Desai TM, et al. IFITM3 restricts influenza A virus entry by blocking the formation of fusion pores following virus-endosome hemifusion. PLoS Pathog. 2014;10(4):e1004048. doi: 10.1371/journal.ppat.1004048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Perreira JM, et al. IFITMs restrict the replication of multiple pathogenic viruses. J Mol Biol. 2013;425(24):4937–4955. doi: 10.1016/j.jmb.2013.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Smith S, et al. IFITM proteins-cellular inhibitors of viral entry. Curr Opin Virol. 2014;4:71–77. doi: 10.1016/j.coviro.2013.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Weston S, et al. A membrane topology model for human interferon inducible transmembrane protein 1. PLoS One. 2014;9(8):e104341. doi: 10.1371/journal.pone.0104341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Everitt AR, et al. IFITM3 restricts the morbidity and mortality associated with influenza. Nature. 2012;484(7395):519–523. doi: 10.1038/nature10921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Smith SE, et al. Chicken interferon-inducible transmembrane protein 3 restricts influenza viruses and lyssaviruses in vitro. J Virol. 2013;87(23):12957–12966. doi: 10.1128/JVI.01443-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Diamond MS, Farzan M. The broad-spectrum antiviral functions of IFIT and IFITM proteins. Nat Rev Immunol. 2013;13(1):46–57. doi: 10.1038/nri3344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Li K, et al. IFITM proteins restrict viral membrane hemifusion. PLoS Pathog. 2013;9(1):e1003124. doi: 10.1371/journal.ppat.1003124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Compton AA, et al. Natural mutations in IFITM3 modulate post-translational regulation and toggle antiviral specificity. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Au KF, et al. Improving PacBio long read accuracy by short read alignment. PLoS One. 2012;7(10):e46679. doi: 10.1371/journal.pone.0046679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
- 29.Levene MJ, et al. Zero-mode waveguides for single-molecule analysis at high concentrations. Science. 2003;299(5607):682–686. doi: 10.1126/science.1079700. [DOI] [PubMed] [Google Scholar]
- 30.BACPAC Resources Center. 2016. https://bacpacresources.org/.
- 31.Chin CS, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–569. doi: 10.1038/nmeth.2474. [DOI] [PubMed] [Google Scholar]
- 32.Otto TD, et al. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010;26(14):1704–1707. doi: 10.1093/bioinformatics/btq269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Leggett RM, et al. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics. Front Genet. 2013;4:288. doi: 10.3389/fgene.2013.00288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hunt M, et al. IVA: accurate de novo assembly of RNA virus genomes. Bioinformatics. 2015;31(14):2374–2376. doi: 10.1093/bioinformatics/btv120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Simpson JT, Durbin R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 2012;22(3):549–556. doi: 10.1101/gr.126953.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Carver TJ, et al. ACT: the Artemis Comparison Tool. Bioinformatics. 2005;21(16):3422–3423. doi: 10.1093/bioinformatics/bti553. [DOI] [PubMed] [Google Scholar]
- 39.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Barson G, Griffiths E. SeqTools: visual tools for manual analysis of sequence alignments. BMC Res Notes. 2016;9(1):39. doi: 10.1186/s13104-016-1847-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Otto TD, et al. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 2011;39(9):e57. doi: 10.1093/nar/gkq1268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang Y, et al. RNA-seq analysis revealed novel genes and signaling pathway associated with disease resistance to avian influenza virus infection in chickens. Poult Sci. 2014;93(2):485–493. doi: 10.3382/ps.2013-03557. [DOI] [PubMed] [Google Scholar]
- 43.Kim TH, Zhou H. Functional Analysis of Chicken IRF7 in Response to dsRNA Analog Poly(I:C) by Integrating Overexpression and Knockdown. PLoS One. 2015;10(7):e0133450. doi: 10.1371/journal.pone.0133450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hui RK, Leung FC. Differential Expression Profile of Chicken Embryo Fibroblast DF-1 Cells Infected with Cell-Adapted Infectious Bursal Disease Virus. PLoS One. 2015;10(6):e0111771. doi: 10.1371/journal.pone.0111771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Luo J, et al. Transcriptome analysis reveals an activation of major histocompatibility complex 1 and 2 pathways in chicken trachea immunized with infectious laryngotracheitis virus vaccine. Poult Sci. 2014;93(4):848–855. doi: 10.3382/ps.2013-03624. [DOI] [PubMed] [Google Scholar]
- 46.Connell S, et al. Avian resistance to Campylobacter jejuni colonization is associated with an intestinal immunogene expression signature identified by mRNA sequencing. PLoS One. 2012;7(8):e40409. doi: 10.1371/journal.pone.0040409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chen B, et al. A Genome-Wide mRNA Screen and Functional Analysis Reveal FOXO3 as a Candidate Gene for Chicken Growth. PLoS One. 2015;10(9):e0137087. doi: 10.1371/journal.pone.0137087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Resnyk CW, et al. Transcriptional analysis of abdominal fat in genetically fat and lean chickens reveals adipokines, lipogenic genes and a link between hemostasis and leanness. BMC Genomics. 2013;14:557. doi: 10.1186/1471-2164-14-557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Sun L, et al. Transcriptome response to heat stress in a chicken hepatocellular carcinoma cell line. Cell Stress Chaperones. 2015;20(6):939–950. doi: 10.1007/s12192-015-0621-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li Z, et al. MicroRNA-23b Promotes Avian Leukosis Virus Subgroup J (ALV-J) Replication by Targeting IRF1. Sci Rep. 2015;5:10294. doi: 10.1038/srep10294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chang CF, et al. The cellular and molecular etiology of the craniofacial defects in the avian ciliopathic mutant talpid2. Development. 2014;141(15):3003–3012. doi: 10.1242/dev.105924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zhou X, et al. Transcriptome analysis of alternative splicing events regulated by SRSF10 reveals position-dependent splicing modulation. Nucleic Acids Res. 2014;42(6):4019–4030. doi: 10.1093/nar/gkt1387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Parnas O, Corcoran DL, Cullen BR. Analysis of the mRNA targetome of microRNAs expressed by Marek’s disease virus. MBio. 2014;5(1):e01060–13. doi: 10.1128/mBio.01060-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Coble DJ, et al. RNA-seq analysis of broiler liver transcriptome reveals novel responses to high ambient temperature. BMC Genomics. 2014;15:1084. doi: 10.1186/1471-2164-15-1084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.DeLaughter DM, et al. Spatial transcriptional profile of the chick and mouse endocardial cushions identify novel regulators of endocardial EMT in vitro. J Mol Cell Cardiol. 2013;59:196–204. doi: 10.1016/j.yjmcc.2013.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Brawand D, et al. The evolution of gene expression levels in mammalian organs. Nature. 2011;478(7369):343–348. doi: 10.1038/nature10532. [DOI] [PubMed] [Google Scholar]
- 57.Li Q, et al. Genome-wide mapping of DNA methylation in chicken. PLoS One. 2011;6(5):e19428. doi: 10.1371/journal.pone.0019428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Lai WS, et al. Life without TTP: apparent absence of an important anti-inflammatory protein in birds. Am J Physiol Regul Integr Comp Physiol. 2013;305(7):R689–R700. doi: 10.1152/ajpregu.00310.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Fresard L, et al. Transcriptome-wide investigation of genomic imprinting in chicken. Nucleic Acids Res. 2014;42(6):3768–3782. doi: 10.1093/nar/gkt1390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Glazov EA, et al. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res. 2008;18(6):957–964. doi: 10.1101/gr.074740.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All sequences produced in this manuscript are deposited in the ENA under the accession numbers ERS556272, ERS565108, ERS1276179, PRJNA361311.