Abstract
Background
Humans have adapted to widespread changes during the past 2 million years in both environmental and lifestyle factors. This is evident in overall body alterations such as average height and brain size. Although we can appreciate the uniqueness of our species in many aspects, molecular variations that drive such changes are far from being fully known and explained. Comparative genomics is able to determine variations in genomic sequence that may provide functional information to better understand species-specific adaptations. A large number of human-specific genomic variations have been reported but no currently available dataset comprises all of these, a problem which contributes to hinder progress in the field.
Results
Here we critically update high confidence human-specific genomic variants that mostly associate with protein-coding regions and find 856 related genes. Events that create such human-specificity are mainly gene duplications, the emergence of novel gene regions and sequence and structural alterations. Functional analysis of these human-specific genes identifies adaptations to brain, immune and metabolic systems to be highly involved. We further show that many of these genes may be functionally associated with neural activity and generating the expanded human cortex in dynamic spatial and temporal contexts.
Conclusions
This comprehensive study contributes to the current knowledge by considerably updating the number of human-specific genes following a critical bibliographic survey. Human-specific genes were functionally assessed for the first time to such extent, thus providing unique information. Our results are consistent with environmental changes, such as immune challenges and alterations in diet, as well as neural sophistication, as significant contributors to recent human evolution.
Electronic supplementary material
The online version of this article (10.1186/s12859-019-2886-2) contains supplementary material, which is available to authorized users.
Keywords: Human-specific, Brain, Neuron, Glia, Metabolism, Gene expression
Background
Since humans split from the chimpanzee at around 6 million years ago, the different species of the genus Homo (from which modern humans are now the sole representative) have evolved very rapidly, apparently superseding all other events of evolutionary novelty accumulation [1]. Especially prominent differences are observed in aspects such as height, brain size and changes to our gut and skeleton. Environmental alterations such as diet and immune challenges are thought to have played a major role in human-specific adaptations [2, 3]. Although these phenotypic traits, which have a whole-body effect are more readily noticeable, one can easily assume humans have also undergone significant change at the microscopic scale. The question of what makes humans unique at a molecular level is now being more broadly addressed as new and advanced laboratory and bioinformatics tools are enabling comparisons between species from genetic and functional perspectives. Genetic differences between species may have distinct mechanisms of origin, such as alterations in the cytogenetic architecture, local chromosomal rearrangements, gene family duplications, single gene modifications, creations or losses, differences in gene transcription levels and/or patterns and alternative splicing. Functional differences can be observed in general behaviour or tissue and organ development and function, and molecularly in circuits, pathways or cellular variation.
Historically, genomic comparisons in this context date back from the 1970s, when studies comparing humans with non-human primates at the karyotype level were first published, revealing a very close organization of chromosome banding and identical euchromatin [4]. Later, at the chromosome level, translocation and fission events were reported as the first detectable differences between humans and their closest relatives and these were the known genomic landmarks for the origin of Anthropoids [5, 6]. Further, using fluorescent in situ hybridization and comparative genomic hybridization arrays, human-specific segmental duplications and genes displaying human-specific copy number variation were identified [7]. The first human-chimpanzee comparative genome map was published in 2002 and further updated in 2005 [8]. Also in 2005 [9], the first attempt to comprehensively identify human-specific segmental duplications was published from comparisons with the chimpanzee genome, revealing the extent of such alterations, which account for ~ 2.7% of the genomic differences between these species. For comparison, at the nucleotide level, the human and chimpanzee genomes genomes are estimated to differ by > 30 million single substitutions (or ~ 1.2% of the human genome) [8].
Although functional differences between humans and other primates are evident in major morphological features such as the skeleton (e.g. jaws [10] and hands [11]), hair (humans have thinner hair) and muscle tissue [12], and global functions including speech [13] and language [14], changes in the brain have presumably had the most significant impact on the human lineage. The size of the human brain tripled over a period of approximately 2 million years, which overlaps with the estimated period of transition from Australopithecus to Homo [15]. Comparative neuroanatomy has revealed a specific expansion of both the neocortex, with increase in size and neuronal interconnectivity during hominid evolution and the right side of the human brain compared to chimpanzee [16]. While this expansion is believed to be important to the emergence of human language and other high-order cognitive functions, its genetic basis remains largely unknown.
In these last two decades following the first discoveries of genomic differences between humans and other species, numerous studies have identified events that generated human-specific genetic features, such as gene duplications, structural gene alterations and accumulation of significant nucleotide substitutions. Although many authors have worked to identify the genes associated with such human-specific genetic features (hereby referred to as ‘human-specific genes’), no comprehensive and structured list is currently available and the published literature is redundant (in the sense that the same event or gene is many times reported in multiple studies) as well as diverse (in the sense that authors frequently direct their work to different aspects and subsets of genes, thus producing limited results). In summary, current knowledge on the subject is scattered and there is an inherent lack of standard, given the diversity of studies in which one or more human-specific gene is described. Such limitations hinder the study of human-specific genes at a genomic scale, regardless of information being publicly available. Through an extensive bibliographic survey, we gathered, curated and critically assessed the human-specific genes reported in the literature to provide the most comprehensive list to date. We further use this dataset as a platform to explore the general impact of these human-specific genes, assessing their biological impact through functional network and pathway analyses. Finally, we investigate differential gene expression in subpopulations of glial cells and in active versus inactive neurons to examine whether the human-specific genes are involved in specialized neural functions such as cortical development or neuronal activation. Our results highlight the importance of rapid adaptations in immunological, neurological and metabolomic areas that likely contribute to human evolution and identify human-specific genes that are differentially expressed in the brain.
Results
The generation of a high confidence structured dataset for human-specific genes
Before describing the obtained results, it is necessary to define our object of study. In this report we use the term human-specific gene when referring to a gene impacted by one or more genetic alterations, which seem to have happened after divergence from non-human primates (usually proposed by genomic comparison with chimpanzee) and result in the emergence of human-specific features. The event causing these genetic alterations may change the gene itself or its regulatory region, as we report in detail.
An extensive bibliographic survey (described within the Materials and Methods section) of the literature published since 2000 resulted on a selective list of 54 scientific articles describing thousands of human-specific features. After triage and manual curation of the data we obtained a set of 982 associated gene descriptors. A descriptor was the most accurate term used by the original author(s) to describe the gene of interest (e.g. name, acronym, database entry number, etc). To standardize notation, for each gene we retrieved information from the human genome version GRCh38. Automatic annotation based on gene descriptor was carried out against the genome and 676 of these genes were directly annotated. Additionally, some gene names contained typos or were slightly modified from their actual name and over 100 other genes had been renamed or restructured since their first annotation. For such genes we carried out manual curation and further annotation when possible. In addition to these individual genes, there are 19 gene families, comprising at least 10 members each, with reported human-specific features that could not be individually attributed to a single gene (Additional file 1 Table S1). Although these gene families were treated separately (to avoid introducing bias given the high number of genes they encompass), when specific genes were described in the literature these were included in the main dataset.
Approximately 130 of the original descriptors could not be associated to any particular gene or gene family, many of these representing genomic fragments as opposed to specific genes and others obsolete or untraceable gene identifiers (IDs). A total of 856 genes (or 871 gene IDs, as some names map to multiple gene IDs, e.g. HAR1A and OR5AL1) with reported human-specific characteristics were curated and annotated and, to the best of our knowledge, comprise the most complete dataset of human-specific genomic features (Additional file 1 Table S1). This number is considerably higher than previously predicted or reported in the literature. For example, the genetics domain of the Matrix of Comparative Anthropogeny (MOCA), which is a repository for available information on human features that differ from great apes, lists only 103 genes known from literature. From these, over 70% are represented in our dataset and most of the remaining were either absent in the current version of the human genome or were filtered out during our manual curation process for lacking strong evidence of human specificity at the gene level.
Associated to these genes there are many types of human-specific genetic features and we grouped these in broader classes according to their causative events, also keeping the original description obtained from the correspondent publication from which they were retrieved. All human-specific genes were allocated in one of the 10 following classes (in order of abundance): gene amplification, human-specific gene (undefined feature), gene sequence alteration, gene structure alteration, gene loss, regulatory region alteration, de novo origin, new non-coding gene, lost in chimpanzee and human accelerated region (Fig. 1a). Most genes reported in these articles are protein-coding and thus the resulting database is mainly composed of such genes (588). There are also a large proportion of pseudogenes (186) and non-coding RNAs (55 long ncRNAs and 27 small ncRNAs).
Regarding chromosomal distribution, the 856 genes with human-specific features come from all 22 autosome chromosomes and both sexual chromosomes. No gene was listed from the mitochondrial chromosome. When proportionally compared, the distribution of protein-coding genes with human-specific features and the distribution of all human protein-coding genes per chromosome were relatively similar. A few chromosomes, however, bear a significantly higher number of human-specificity in protein-coding regions. Chromosomes X and 7 seem to be particularly enriched in proteins encoded by genes with human-specific features (Additional file 1 Table S2).
Although this report successfully listed hundreds of genes, it was limited not only by the current availability of studies regarding human-specific genes, but also by poorly defined terminology (the term ‘human-specific’ per se is object of debate, being ambiguously used to describe different levels of specificity). The field itself is specially limited by technical difficulties, such as the lack of a high-quality genome for archaic hominins, complexity of our gene architecture, poorly defined non-coding elements, problems faced when defining genomic correspondence between species, availability of functional data and complications of subsequent validation of predicted variation.
Functional analyses highlight neuronal, immunological and metabolic features
In possession of the newly generated dataset of genes with human-specific features, we set to investigate the general biological impact that altering their characteristics may have posed to our species. To this end we focus on the functional analysis of each human-specific gene searching further for overall patterns and relationships. Functional enrichment analysis was performed by FGNet [17] using GeneTerm Linker [18] as the underlying algorithm. The resulting network represents the links and associations between metagroups of genes and enriched terms. In total, 295 genes (~ 35%) were successfully functionally annotated by FGNet and assigned to 25 metagroups, two of which were automatically filtered out based on silhouette width. The comprehensive network of metagroups comprising 225 genes is provided as Additional file 1 Figure S1A and the description of each metagroup as Additional file 1 Table S3. Reported p-values for all metagroups are lower than 0.0006 (thus orders of magnitude lower than the threshold of 0.05) and each metagroup has at least 10 genes. Since the full network is highly complex, we manually selected 12 metagroups that we trust represent interesting functional classes of systemic level (as opposed to broad molecular or cellular level features). This sub-network clustered into 3 broad functional categories: neural function, immunological function and metabolic function (Fig. 1b and Additional file 1 Figure S1B).
Although FGNet provides a broad overview of the biological impact of human-specific genetic alterations by clustering functional terms in metagroups and establishing relationships between such clusters, it lacks the detail achieved by analyzing each functional class separately. Also, the subset of genes for which GeneTerm Linker could attribute information was only around 35% of the total. Therefore, to examine functional aspects of a higher number of human-specific genes and at a lower scale, we turned to gene ontology (GO) analysis. In total, 596 gene IDs were assigned to at least one human protein sequence, obtained from the Ensembl database (~ 70% of the 871 gene IDs), as a first step for GO annotation. Among the gene IDs for which no protein sequence was retrieved, 187 (~ 70%) are pseudogenes, 84 (~ 30%) are ncRNAs and only 4 are currently annotated as protein-coding (despite no correspondent protein sequences were found). We then assigned functional attributes at the gene level, both for the set of human-specific genes and for the entire set of human proteins, which was used to provide expected abundances. Attributes were assigned to each gene based on the GOSlim catalogue of ontologies for biological processes. We calculated the percentual abundance for each term among human-specific genes and compared with the expected abundance based on observations in all human proteins. Numeric and statistical comparisons indicate the functional terms which are most significantly differentially represented among human-specific genes. Only 3 of the 70 broad GOSlim terms assigned to the entire set of known human proteins were completely absent among the human-specific genes. Among the remaining terms, 11 were significantly over-represented (p-values lower than or equal to 0.05) within human-specific genes when compared with the entire set of human proteins (15 other terms had p-values lower than or equal to 0.1; Fig. 2). Enriched terms were involved with neurological system, carbohydrate metabolism, structural growth and functions at the cell level, such as cytoskeleton organization, motility, morphogenesis, locomotion, cell signaling, protein targeting, protein modification and cellular component assembly (Fig. 2). Additionally, interesting terms such as reproduction and symbiosis (encompassing mutualism through parasitism) were highly represented among the human-specific genes, (although their p-values were of 0.06 and 0.1, respectively). It is worth mentioning that the term symbiosis in this context was almost entirely related with parasite-host relationships, (50% of the occurrences of this umbrella term related to virus-host interactions) and the term reproduction mostly refers to male reproduction (with 40% occurrences, while the remaining 60% are almost equally shared between female reproduction, general development of the reproductive system and pregnancy-related processes, which encompass fertilization, embryonic and placental development and birth). In summary, based on ontology assignments and subsequent statistical analysis, we highlight that the higher order categories of neural function, carbohydrate metabolism, reproduction and parasite-host relationships are highly correlated with human-specific gene features.
Focusing on pathways as opposed to individual categories or broad clusters of functions, we further analyzed human-specific genes using Ingenuity Pathway Analysis (IPA; [19]). In summary, IPA analysis used 729 out of the 845 genes (~ 85%) and supported the importance of neuronal (e.g. mNOS signaling in neurons, Huntington’s disease signaling), immunological (e.g. phagosome formation, phagocytosis in macrophages) and metabolic (e.g. inositol pyrophosphates biosynthesis, adipogenesis pathway, glutamate biosynthesis and degradation) functions (Additional file 1 Figure S2). Taken together multiple functional analyses tools have converged to generally implicate neuronal, immunological and metabolic systems with human evolution and species-specific characteristics.
Highly expressed human-specific genes are cell-type enriched across different radial glial cell populations
Since the human brain has such remarkable properties, with many cognitive traits being postulated to be unique to our species [20], we turned to investigate the unique expression profile of human-specific genes within glial cell subpopulations (which ensure homeostasis and provide support and protection to neuronal cells in the brain). The cell populations we selected as object of study are distinctively located at the subventricular zone, a well known center for neuronal cell production in primates. The expression of human-specific genes in such location could be related with the unique enlargement and folding of the human brain, driven by neocortical expansion (see [21] for further information). Using publicly available samples retrieved from the Sequence Read Archive (SRA) we have assessed transcript abundance for the set of human embryonic radial glial cells, outer radial glial cells, intermediate progenitor cells and neuron cells (study SRP094417). We used FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values calculated with RSEM as expression measures. A consistent number of transcripts was shown to be expressed (at any level) across all 4 sets of samples and these represent approximately 10% of the ~ 200,000 transcripts in the reference transcriptome. We defined highly expressed transcripts as the top ~ 10% of the expressed transcripts, i.e. the 2000 transcripts with highest average FPKM values for each set of samples. Highly expressed transcripts were then mapped to their gene of origin (on average ~ 1580 genes were characterized as highly expressed) and compared with the set of 856 genes with human-specific features. We retrieved 23 highly expressed human-specific genes from the radial glial cell samples, 17 from the outer radial glial cell samples, 26 from the progenitor cells samples and 24 from the neuron cell samples. The list of transcripts related to these human-specific genes as well as their estimated expression in each cell population is available as Additional file 1 Table S4. From the non-redundant total of 61 genes overall, 43 (> 70%) were highly expressed in a cell-specific manner (Fig. 3). The heatmap represents expression levels for the set of transcripts associated with these 43 genes (which generate 52 transcripts) across all 4 cell populations (Fig. 3b). We thus have uncovered sets of human-specific genes for which all transcripts are highly expressed in specific cells and have very low expression across all other 3 cell types (i.e. are virtually cell-specific). Many of these genes have been previously implicated with human phenotypes, including developmental delay (e.g. ASPM [22], AFF3 [23] and MAPT [24]) and intellectual disability (e.g. NEMF [25], PI4KA [26] and KANSL1 [27]).
Multiple human-specific genes are differentially expressed upon activation in neurons derived from induced pluripotent stem cells (iPSC)
As another example of roles human-specific genes may perform in the brain, we carried out RNA-Seq analyses of neurons differentiated from human iPSC before and after cell activation (50 mM KCl for 3 h) to investigate differential expression of human-specific genes upon neuronal activation. As a result, 798 transcripts were shown to be differentially expressed, 407 being under-expressed upon activation and 391 over-expressed. These transcripts correspond to 755 genes, 12 of which have human-specific features (Fig. 4a, b). These 12 genes have multiple roles and some are implicated in synaptic function (e.g. SEPT7 [28] and CAPN1 [29]) and neurological diseases (e.g. AFF3 [30], NLGN4X [31], CAPN1 [32] and KIAA0319L [31]). We performed RT-qPCR to validate the expression profile of these genes and found 4 of these to be significantly altered after 3 h KCl activation AFF3, KIAA0319L, PPIP5K2 and SLC7A6 (Fig. 4c).
Discussion
We set out to survey the scientific literature for genes previously reported as human-specific, knowing a better understanding of how these genes have mechanistically impacted our evolution would be broadly beneficial for the study of human physiology and disease. The resulting dataset of genes associated with human-specific variants is, to the best of our knowledge, the most detailed, structured and comprehensive to date. Here we highlight higher order functional areas which house a large number of human-specific genes and are likely to by impacted by these genes and their products. Functional assessment of more than 850 human-specific genes emphasized the significance of brain, immune and metabolic adaptations.
In hindsight these findings may not be completely unexpected as infections, dietary alterations (coincident with the discovery of tools and the domestication of fire for cooking) and extraordinary brain expansion have been well documented.
Although humans possess a great degree of plasticity for adaptation, it is likely that the real origin of the human adaptations that truly ignited human uniqueness occurred during the time of Australopithecus and early Homo species [33, 34]. At this time there was widespread movement, the emergence of tools, an enlargement of the brain and a decrease in masticatory apparatus relative to an increasing body size. The human brain has evolved rapidly in the past 2 million years (coincidental with the emergence of Homo species) and continues to do so through highly unstable, or rather adaptable, regions in our genome, tissue-specific and function-specific gene expression and reorganized circuitry [35]. Nevertheless, it was very likely a conjunction of factors that enabled human evolution to occur at such a rapid rate. For example, newly formed regions of the human brain such as the prefrontal cortex seem to have far higher energy requirements than more conserved regions [36]. It may be that it was only possible to meet such requirements through modifications to food preparation methods that ultimately resulted in higher energy intake [37]. This example could illustrate a crosstalk between different aspects of human evolution which may have resulted in emergent properties of our species. Significant changes are also observed in local adaptations in recent human populations to environmental and behavioral factors such as diet, infections, altitude and temperature [38]. Emerging pathogens that specifically infect humans have to some degree been impacted by our own innovations, such as agriculture, and continue to shape our immune evolution through host-pathogen interactions [39].
Conclusions
Despite limitations, our comprehensive study contributes to the current knowledge by considerably updating the number of human-specific genes and further emphasizing the importance of brain, immune and metabolic adaptation in defining our species. It also highlights the potential significance of considering metabolism in conjunction with brain function to fully understand human-specific function and disease.
Materials and methods
Database of genes with human-specific features
We have extensively scanned and curated the current literature and searched for articles describing human-specific genetic features and its associated genes. PubMed (www.ncbi.nlm.nih.gov/pubmed) was used as the search platform with the criteria “Search human specific gene Filters: Publication date from 2000/01/01 to 2017/12/31” (further expanded to 2019/12/31), which resulted in over 218,000 publications. From these articles, we selected for terms such as “human-specific”, “duplication”, “de novo”, “evolution” among other terms of interest. Studies were also assessed regarding their relevance/direct relation to the topic, design of the study, type of publication and whether or not the publication was peer reviewed. An initial subset of 36 highly relevant and non-redundant studies were selected and further expanded (mainly through citation relationships) to 54 references from which data were retrieved. These articles report human-specific genetic features, i.e. gene-related molecular characteristics that have been reported to differ between humans and other species and are likely to impact the associated gene (such as changes to the sequence of a gene promoter, exon losses, gene duplications, etc). The genetic features are related to specific genes, which are the object of study of the present work. Gene names were listed and duplicated entries were collapsed. Ambiguities were assessed in as much detail as possible to clarify the specific gene authors referred to. The initial list was mapped back to the GRCh38 version of the human genome and remaining non-annotated entries mainly represented genes that have been renamed or excluded since their first annotation. The final set of genes was categorized according to the reported human-specific feature and grouped by biotypes as proposed in the Ensembl glossary (publicly available at ensembl.org/Help/Glossary).
Chromosomal distribution of human-specific protein-coding genes
There are 596 gene IDs associated with protein-coding genes. These were listed regarding their chromosome of origin and the proportion of entries per chromosome was calculated. The same was performed with the entire set of protein-coding genes annotated in the human genome, for comparison. In parallel, we used the GeneOverlap library (version 1.12.0) of the R package to infer significance of overlapping genes. The internal algorithm for Fisher’s exact test used by this package determined the respective p-values (which were not corrected for multiple hypothesis).
Functional analysis of genes with human-specific features
Genes were also subject to functional analyses for the generation of a protein-protein interaction network and functional clusterization using the Bioconductor package FGNet version 3.10.0 [17] and GeneTerm Linker [18] for functional enrichment analysis. Metagroups with silhouette width of less than 0 were excluded and a minimum support of 3 genes was required for cluster validation.
Human protein sequences were obtained from Ensembl GRCh38 [40] and genes with human-specific features had their respective protein sequence(s) retrieved. The retrieved sequences were submitted to AgBase GoAnna version 2.0.0 [41] for GO assignment based on sequence homology. Blastp was used as the underlying algorithm and search parameters were an E-value cutoff of 10e-50, BLOSUM62 as the substitution matrix, a minimum of 80% sequence identity plus 75% coverage and default word size and gap penalty values. GoAnna results were submitted to AgBase GOSlim [41] to obtain high-level summaries of functions for the given dataset and further analyses were restricted to categories of biological processes, which involve pathways and activities of multiple genes. The same protocol was used to assign GOSlim terms to the entire set of human proteins obtained from Ensembl. Results report the percentual of each term both in the set of human-specific proteins and all human proteins, which was used as background. Against this background of expected abundance, significance for differential representation of functional terms within the human-specific subset of proteins was calculated using Fisher’s exact test (implemented in the GeneOverlap library of the R package version 1.12.0) to determine the respective p-values (which were not corrected for multiple hypothesis).
SRA samples of radial glial cells
We retrieved fastq files from the SRA-deposited study SRP094417, which contains 18 runs from samples of prenatal human brain, representing data with replicates from radial glial cells, outer radial glial, intermediate progenitor and mature neuronal cells. Reads are paired-end and were generated from cDNA with the Illumina HiSeq2000 platform in 2016.
RNA-Seq of iPSC
The generation and activation of human iPSC-derived neurons and RNA isolation, preparation and sequencing were described in a previous report by our group [42].
RNA-Seq analysis
Both the set of iPSC and SRA-retrieved RNA-Seq samples were treated with the same bioinformatics pipeline, which is composed of 5 main steps: (1) Pre-trimming quality control with FastQC version 0.11.5 (bioinformatics.babraham.ac.uk/projects/fastqc); (2) Read trimming with Trimmomatic version 0.36 [43]; (3) Post-trimming quality control with FastQC; (4) Alignment or pseudoalignment to reference transcriptome and read counting for transcript abundance estimation with Kallisto version 0.43.0 or STAR-RSEM versions 2.5.2a and 1.2.30 [44–46]; (5) Measurement of differential expression of transcripts with EdgeR version 3.18.1 [47]. Each step is generally described below.
FastQC was used for quality control of raw reads and a comparative round of quality control after running Trimmomatic, to ensure overall quality was either maintained or increased after read trimming. The set of default parameters was used for this step. Trimmomatic was employed for cleaning reads from sequencing artifacts. The set of Illumina adapters for the TruSeq paired-end library preparation kit was used as database for adapter trimming. Reads were scanned with a 4-base wide sliding window and trimmed when the average quality per base was lower than 20. Reads shorter than 40 bases after trimming were further excluded. Kallisto and STAR-RSEM were used as different alternatives to generate read counts. Kallisto performs pseudoalignments and read counts within the same command line, while STAR performs alignments to the reference transcriptome and the result is used by RSEM to generate read counts. Kallisto indexing tool was used to generate an index for the FASTA formatted file of the human transcriptome with k-mer size of 31. Reads were counted for transcript quantification using default parameters and a number of bootstrap samples of 100. As an alternative to estimate transcript abundance, STAR was used to perform alignments between the paired-end reads and the reference human transcriptome. An index was built with default parameters and the alignment was performed discarding multimappers and defining parameters for splicing treatment. Resulting bam alignment files were further converted to sam files using Samtools (samtools.sourceforge.net) and sorted with Novosort (novocraft.com/products/novosort), as an intermediate step. RSEM was used to prepare a reference file from the human transcriptome and count reads to provide transcript abundance in the paired-end mode. EdgeR was used to perform statistical analysis and define differentially expressed genes. Kallisto and STAR-RSEM results were compared to evaluate data robustness. In summary, when results were qualitatively similar, parameters were considered well adjusted. After assessing different thresholds, a minimum of 5 reads per transcript before normalization was needed to validate expression. Read counts generated by STAR-RSEM were used for differential expression assessment. Samples were normalized based on sample sizes and data variability was estimated according to a negative binomial dispersion parameter. Differential expression was reported with limits being ap-value of less than 0.001 and false discovery rate of less than 0.01.
Quantitative RT-PCR for differentially expressed human-specific genes in iPSC data
Quantitative RT-PCR was used to validate expression patterns for the subset of genes with human-specific features shown to be differentially expressed in iPSC. cDNA synthesis was performed using the SuperScript III First-Strand Synthesis System (ThermoFisher Scientific, USA). Briefly, 500 ng of total RNA was used and random hexamer primed protocol was followed. Each cDNA sample was amplified in triplicate using SYBR Green PCR Master Mix (ThermoFisher Scientific, USA). Primer pairs used for this analysis are described in Additional file 1 Table S5.
Additional file
Acknowledgements
Not applicable.
About this supplement
This article has been published as part of BMC Bioinformatics, Volume 20 Supplement 9, 2019: Italian Society of Bioinformatics (BITS): Annual Meeting 2018. The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-9.
Abbreviations
- FPKM
Fragments Per Kilobase of transcript per Million mapped reads
- GO
gene ontology
- IDs
identifiers
- IPA
Ingenuity Pathway Analysis
- iPSC
induced pluripotent stem cells
- MOCA
Matrix of Comparative Anthropogeny
- ncRNAs
non-coding RNAs
- SRA
Sequence Read Archive
Authors’ contributions
MB and GB conceived and designed the study and drafted and revised the manuscript. MB carried out the bibliographic survey, data collection and processing, statistical analyses and gene expression studies. SK participated in the bioinformatics analyses and data collection and processing. EAOB carried out experimental validation and contributed with drafting the manuscript. GB coordinated all instances of the project. All authors read and approved the final manuscript.
Funding
Authors declare to have received not specific funding, additional to their salary, to perform the study. All laboratorial supplies used in for experimental validations were provided as basic infrastructure by QIMR Berghofer Medical Research Institute.
This article did not receive sponsorship for publication. Publication costs were covered by the authors.
Availability of data and materials
Authors state that all data used to generate the set of human-specific genomic regions can be found within the manuscript text and/or at the Additional Material (mainly in Additional file 1 Table S1).
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
All authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Mainá Bitar, Phone: +61 7 3362 0138, Email: Maina.Bitar@qimrberghofer.edu.au, Email: bitar.maina@gmail.com.
Stefanie Kuiper, Email: stefanie.kuiper@griffithuni.edu.au.
Elizabeth A. O’Brien, Email: Elizabeth.O'Brien@qimrberghofer.edu.au
Guy Barry, Email: Guy.Barry@qimrberghofer.edu.au.
References
- 1.Tattersall I. Why was human evolution so rapid? In:Human Paleontology and Prehistory. 2017:1:1–9.
- 2.Zink KD, Lieberman DE. Impact of meat and lower Palaeolithic food processing techniques on chewing in humans. Nature. 2017;531:500–503. doi: 10.1038/nature16990. [DOI] [PubMed] [Google Scholar]
- 3.Weyrich LS, Duchene S, Soubrier J, Arriola L, Llamas B, Breen J, Morris AG, Alt KW, Caramelli D, Dresely V, et al. Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus. Nature. 2017;544:357–361. doi: 10.1038/nature21674. [DOI] [PubMed] [Google Scholar]
- 4.Dutrillaux B. Chromosomal evolution in primates: tentative phylogeny from Microcebus murinus (prosimian) to man. Hum Genet. 1979;48:251–314. doi: 10.1007/BF00272830. [DOI] [PubMed] [Google Scholar]
- 5.Muller S, Stanyon R, Finelli P, Archidiacono N, Wienberg J. Molecular cytogenetic dissection of human chromosomes 3 and 21 evolution. Proc Natl Acad Sci U S A. 2000;97:206–211. doi: 10.1073/pnas.97.1.206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Long M. Origin and evolution of new gene functions. Dordrecht. Boston: Kluwer Academic Publishers; 2003. [Google Scholar]
- 7.Franchini LF, Pollard KS. Genomic approaches to studying human-specific developmental traits. Development. 2015;142:3100–3112. doi: 10.1242/dev.120048. [DOI] [PubMed] [Google Scholar]
- 8.Chimpanzee Sequencing and Analysis Consortium Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. doi: 10.1038/nature04072. [DOI] [PubMed] [Google Scholar]
- 9.Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, Church D, DeJong P, Wilson RK, Paabo S, et al. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature. 2005;437:88–93. doi: 10.1038/nature04000. [DOI] [PubMed] [Google Scholar]
- 10.Humphrey LT, Dean MC, Stringer CB. Morphological variation in great ape and modern human mandibles. J Anat. 1999;195:491–513. doi: 10.1046/j.1469-7580.1999.19540491.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Almecija S, Smaers JB, Jungers WL. The evolution of human and ape hand proportions. Nat Commun. 2015;6:7717. doi: 10.1038/ncomms8717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.O'Neill MC, Umberger BR, Holowka NB, Larson SG, Reiser PJ. Chimpanzee super strength and human skeletal muscle evolution. Proc Natl Acad Sci U S A. 2017;114:7343–7348. doi: 10.1073/pnas.1619071114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fitch WT. The evolution of speech: a comparative review. Trends Cogn Sci. 2000;4:258–267. doi: 10.1016/s1364-6613(00)01494-7. [DOI] [PubMed] [Google Scholar]
- 14.Carreiras M, Seghier ML, Baquero S, Estevez A, Lozano A, Devlin JT, Price CJ. An anatomical signature for literacy. Nature. 2009;461:983–986. doi: 10.1038/nature08461. [DOI] [PubMed] [Google Scholar]
- 15.Dennis MY, Nuttle X, Sudmant PH, Antonacci F, Graves TA, Nefedov M, Rosenfeld JA, Sajjadian S, Malig M, Kotkiewicz H, et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell. 2012;149:912–922. doi: 10.1016/j.cell.2012.03.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Preuss TM. The human brain: rewired and running hot. Ann N Y Acad Sci. 2011;1225(Suppl 1):E182–E191. doi: 10.1111/j.1749-6632.2011.06001.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Aibar S, Fontanillo C, Droste C, De Las Rivas J. Functional gene networks: R/bioc package to generate and analyse gene networks derived from functional enrichment and clustering. Bioinformatics. 2015;31:1686–1688. doi: 10.1093/bioinformatics/btu864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fontanillo C, Nogales-Cadenas R, Pascual-Montano A, De las Rivas J. Functional analysis beyond enrichment: non-redundant reciprocal linkage of genes and biological terms. PLoS One. 2011;6:e24289. doi: 10.1371/journal.pone.0024289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Krämer A, Green J, Pollard J, Jr, Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30(4):523–530. doi: 10.1093/bioinformatics/btt703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Enard W. The molecular basis of human brain evolution. Curr Biol. 2016;26:R1109–R1117. doi: 10.1016/j.cub.2016.09.030. [DOI] [PubMed] [Google Scholar]
- 21.Liu J, Liu W, Yang L, Wu Q, Zhang H, Fang A, Li L, Xu X, Sun L, Zhang J, Tang F, Wang X. The primate-specific gene TMEM14B Marks outer radial glia cells and promotes cortical expansion and folding. Cell Stem Cell. 2017;21(5):635–649.e8. doi: 10.1016/j.stem.2017.08.013. [DOI] [PubMed] [Google Scholar]
- 22.Passemard S, Titomanlio L, Elmaleh M, Afenjar A, Alessandri JL, Andria G, de Villemeur TB, Boespflug-Tanguy O, Burglen L, Del Giudice E, et al. Expanding the clinical and neuroradiologic phenotype of primary microcephaly due to ASPM mutations. Neurology. 2009;73:962–969. doi: 10.1212/WNL.0b013e3181b8799a. [DOI] [PubMed] [Google Scholar]
- 23.Metsu S, Rooms L, Rainger J, Taylor MS, Bengani H, Wilson DI, Chilamakuri CS, Morrison H, Vandeweyer G, Reyniers E, et al. FRA2A is a CGG repeat expansion associated with silencing of AFF3. PLoS Genet. 2014;10:e1004242. doi: 10.1371/journal.pgen.1004242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shaw-Smith C, Pittman AM, Willatt L, Martin H, Rickman L, Gribble S, Curley R, Cumming S, Dunn C, Kalaitzopoulos D, et al. Microdeletion encompassing MAPT at chromosome 17q21.3 is associated with developmental delay and learning disability. Nat Genet. 2006;38:1032–1037. doi: 10.1038/ng1858. [DOI] [PubMed] [Google Scholar]
- 25.Anazi S, Maddirevula S, Faqeih E, Alsedairy H, Alzahrani F, Shamseldin HE, Patel N, Hashem M, Ibrahim N, Abdulwahab F, et al. Clinical genomics expands the morbid genome of intellectual disability and offers a high diagnostic yield. Mol Psychiatry. 2017;22:615–624. doi: 10.1038/mp.2016.113. [DOI] [PubMed] [Google Scholar]
- 26.Tucker T, Zahir FR, Griffith M, Delaney A, Chai D, Tsang E, Lemyre E, Dobrzeniecka S, Marra M, Eydoux P, et al. Single exon-resolution targeted chromosomal microarray analysis of known and candidate intellectual disability genes. Eur J Hum Genet. 2014;22:792–800. doi: 10.1038/ejhg.2013.248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Arbogast T, Iacono G, Chevalier C, Afinowi NO, Houbaert X, van Eede MC, Laliberte C, Birling MC, Linda K, Meziane H, et al. Mouse models of 17q21.31 microdeletion and microduplication syndromes highlight the importance of Kansl1 for cognition. PLoS Genet. 2017;13:e1006886. doi: 10.1371/journal.pgen.1006886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yadav S, Oses-Prieto JA, Peters CJ, Zhou J, Pleasure SJ, Burlingame AL, Jan LY, Jan YN. TAOK2 kinase mediates PSD95 stability and dendritic spine maturation through Septin7 phosphorylation. Neuron. 2017;93:379–393. doi: 10.1016/j.neuron.2016.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Diepenbroek M, Casadei N, Esmer H, Saido TC, Takano J, Kahle PJ, Nixon RA, Rao MV, Melki R, Pieri L, et al. Overexpression of the calpain-specific inhibitor calpastatin reduces human alpha-Synuclein processing, aggregation and synaptic impairment in [A30P]alphaSyn transgenic mice. Hum Mol Genet. 2014;23:3975–3989. doi: 10.1093/hmg/ddu112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Moore JM, Oliver PL, Finelli MJ, Lee S, Lickiss T, Molnar Z, Davies KE. Laf4/Aff3, a gene involved in intellectual disability, is required for cellular migration in the mouse cerebral cortex. PLoS One. 2014;9:e105933. doi: 10.1371/journal.pone.0105933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Platt MP, Adler WT, Mehlhorn AJ, Johnson GC, Wright KA, Choi RT, Tsang WH, Poon MW, Yeung SY, Waye MM, et al. Embryonic disruption of the candidate dyslexia susceptibility gene homolog Kiaa0319-like results in neuronal migration disorders. Neuroscience. 2013;248:585–593. doi: 10.1016/j.neuroscience.2013.06.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wang Y, Hersheson J, Lopez D, Hammer M, Liu Y, Lee KH, Pinto V, Seinfeld J, Wiethoff S, Sun J, et al. Defects in the CAPN1 gene result in alterations in cerebellar development and cerebellar Ataxia in mice and humans. Cell Rep. 2016;16:79–91. doi: 10.1016/j.celrep.2016.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Anton SC, Potts R, Aiello LC. Human evolution. Evolution of early Homo: an integrated biological perspective. Science. 2014;345:1236828. doi: 10.1126/science.1236828. [DOI] [PubMed] [Google Scholar]
- 34.Spoor F, Gunz P, Neubauer S, Stelzer S, Scott N, Kwekason A, Dean MC. Reconstructed Homo habilis type OH 7 suggests deep-rooted species diversity in early Homo. Nature. 2015;519:83–86. doi: 10.1038/nature14224. [DOI] [PubMed] [Google Scholar]
- 35.Sousa AMM, Meyer KA, Santpere G, Gulden FO, Sestan N. Evolution of the human nervous system function, structure, and development. Cell. 2017;170:226–247. doi: 10.1016/j.cell.2017.06.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Fu X, Giavalisco P, Liu X, Catchpole G, Fu N, Ning ZB, Guo S, Yan Z, Somel M, Paabo S, et al. Rapid metabolic evolution in human prefrontal cortex. Proc Natl Acad Sci U S A. 2011;108:6181–6186. doi: 10.1073/pnas.1019164108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Pontzer H, Brown MH, Raichlen DA, Dunsworth H, Hare B, Walker K, Luke A, Dugas LR, Durazo-Arvizu R, Schoeller D, et al. Metabolic acceleration and the evolution of human brain size and life history. Nature. 2016;533:390–392. doi: 10.1038/nature17654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Fan S, Hansen ME, Lo Y, Tishkoff SA. Going global by adapting local: a review of recent human adaptation. Science. 2016;354:54–59. doi: 10.1126/science.aaf5098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wolfe ND, Dunavan CP, Diamond J. 2007. Origins of major human infectious diseases. Nature. 2017;447:279–283. doi: 10.1038/nature05775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Aken Bronwen L., Ayling Sarah, Barrell Daniel, Clarke Laura, Curwen Valery, Fairley Susan, Fernandez Banet Julio, Billis Konstantinos, García Girón Carlos, Hourlier Thibaut, Howe Kevin, Kähäri Andreas, Kokocinski Felix, Martin Fergal J., Murphy Daniel N., Nag Rishi, Ruffier Magali, Schuster Michael, Tang Y. Amy, Vogel Jan-Hinnerk, White Simon, Zadissa Amonida, Flicek Paul, Searle Stephen M. J. The Ensembl gene annotation system. Database. 2016;2016:baw093. doi: 10.1093/database/baw093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.McCarthy FM, Wang N, Magee GB, Nanduri B, Lawrence ML, Camon EB, Barrell DG, Hill DP, Dolan ME, Williams WP, et al. AgBase: a functional genomics resource for agriculture. BMC Genomics. 2006;7:229. doi: 10.1186/1471-2164-7-229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Roussos P, Guennewig B, Kaczorowski DC, Barry G, Brennand KJ. Activity-dependent changes in gene expression in schizophrenia human-induced pluripotent stem cell neurons. JAMA Psychiatry. 2016;73:1180–1188. doi: 10.1001/jamapsychiatry.2016.2575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
- 45.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Authors state that all data used to generate the set of human-specific genomic regions can be found within the manuscript text and/or at the Additional Material (mainly in Additional file 1 Table S1).