Abstract
To account for sex as a biological variable, it is sometimes necessary to identify the sex of an embryo or embryonic cell that was used to generate libraries for RNA sequencing, without the sex being known a priori. The preferred approach for this would take advantage of the mRNA data, rather than relying on other methods that require separation and analysis of genomic DNA or diversion of limiting RNA for other assays. We describe here a method that has been optimized for this purpose in samples of rhesus monkey and mouse embryos. This method is broadly applicable to any species for which a sufficiently well characterized genome and knowledge of polymorphisms are available, and for embryos that are transcriptionally active and expressing their genome.
Keywords: embryo sexing, nonhuman primate, RNAseq
INTRODUCTION
The advent of efficient methods for production of sequencing-ready cDNA libraries from samples as small as single cells provides powerful new opportunities for discovering essential gene regulatory responses in such samples at a quantitative resolution and with an in-depth understanding of splicing variants and other RNA features previously unattainable for reproductive and developmental biologists studying mammalian oocytes, preimplantation stage embryos, or single somatic cells. With the ability to determine gene expression profiles in single embryos or cells has come the need to be able to discern the sex of each individual sample, so that sex as a biological variable can be addressed. The most preferable approach would be to make this determination on the basis of the mRNA expression data, rather than relying on separate genetic testing that would require separation of the cellular DNA. PCR has previously been used to determine sex of a cDNA libraries. For example, PCR of Sry, Eif2s3y or other male-specific markers (1, 10, 12), and Xist (female-specific) (1, 6) were used in mouse. But when the cDNA library is to be processed for deep sequencing, reserving an aliquot of the library and performing a separate PCR analysis can be avoided. We report here development and validation of a new method for determining the sex of single embryos or cells from which RNA-Seq libraries are produced and demonstrate applicability in rhesus monkey and mouse embryos.
We investigated three potential indicators of sex in RNA-Seq data sets for the rhesus monkey: expression of male-specific Y-chromosome linked mRNAs, expression of female-specific X-chromosome linked mRNAs, and heterozygosity for X-chromosome single nucleotide polymorphisms (SNPs) to reveal embryos with two copies of the X chromosome. For mouse embryos, we used a combination of Xist expression, Y-chromosome marker gene expression, and X-chromosome heterozygosity. We identified a combination of measurements that allows unambiguous determination of sex of individual embryos used to generate RNA-Seq libraries, so long as the embryos are expressing their own genomes (Fig. 1).
METHODS
Data sets employed.
Two series of libraries in Gene Expression Omnibus (GEO) were selected for the analysis and validation phase of the study for rhesus monkey samples (GSE85377 (9) and GSE96731 (7)). These data sets were derived from rhesus monkey brain tissue, with information about animals’ sex and with sufficient sequencing depth to quantify the necessary X and Y chromosome expressed mRNAs. GSE85377 contains libraries from both female and male rhesus monkeys and was used to identify genes with sex-specific expression and X chromosome SNPs appropriate for detection of biallelic expression. GSE96731 contains only libraries from male rhesus monkeys and was used for validation.
To demonstrate applicability of the sexing method once validated, we used rhesus monkey oocyte and preimplantation embryo samples from three collections of libraries. Two of these were previously described. GEO series GSE86938 (15) consists of six oocyte (3 × GV, and 3 × MII oocytes) and 20 embryo derived libraries (3 × pronucleate, 3 × 2-cell, 2 × 4-cell, 5 × 8-cell, 3 × morula, and 4 × blastocyst embryos). GEO series GSE103313 (4) consists of seven oocyte- (2 × GV, 2 × MI, and 3 × MII oocytes) and 15 embryo-derived libraries (3 × pronucleate, 3 × 2-cell, 3 × 4-cell, 2 × 8-cell, 3 × morula, and 2 × blastocyst embryos). In addition to the two published data sets, we used a collection of eight oocyte- (8 × MII) and 36 embryo-derived libraries (11 × 8-cell, 10 × morula, 2 × delayed blastocyst, and 13 × expanded blastocyst embryos) that we generated using the NuGen Ovation RNA-Seq System v2 kit (NuGen, San Carlos, CA), followed by amplification, barcoding, and multiplex sequencing on the Illumina 4000 platform as described (11). These data will be made available in GEO.
For mouse embryo studies, we used data from GEO series GSE80810. Sex of these libraries was previously determined by comparing expression of Xist and Eif2s3y with PCR (1).
Data processing and SNP analysis.
Libraries were aligned to the Mmul8.0.1 genome assembly with HiSAT2 (8). The expression of individual genes on X and Y chromosomes was quantified by the number of reads aligned to genes’ exons, expressed in RPKM (reads per kilobase of transcript per million mapped reads) to account for the differences in library sizes and differences in transcript lengths. Male-specific genes were identified as those genes with high ratio <the lowest expression in male libraries>/<the highest expression in female libraries>. Similarly, female-specific genes were identified as those with high ratio of <the lowest expression in female libraries>/< the highest expression in male libraries>.
To examine X-linked SNP loci, we identified 746 SNPs on the X chromosome located in the exons of genes previously reported to escape X-inactivation in humans (3). We quantified the reads aligned at these SNPs in female and male rhesus monkey brain RNA-Seq libraries (GSE85377) and excluded SNP loci with fewer than 10 aligned reads. For each library and each SNP locus that passed this depth-of-coverage threshold, we calculated a “heterozygosity score” that we defined as: HetScore = ∑b = A,C,G,Tf(b)log2f(b). HetScore is based on Shannon’s entropy, where f(b) is frequency of base b at the locus and log2f(b) = 0 for f(b) = 0 and is an indicator of two alleles being equally expressed at a SNP locus in one library. Its value is 0 for SNPs with only one observed allele and 1 for SNPs where two alleles are observed at the same frequency of 0.5. In theory, for male-derived libraries all X chromosome SNPs should have a HetScore = 0, and for female-derived libraries a substantial number of X chromosome SNPs should have a HetScore ≈1. For the selected SNPs we used the formula ∑f.lib.HetScore − 5·∑m.lib.HetScore to find the SNPs with the highest HetScore-s in female libraries and lowest HetScore-s in male libraries. A subset of SNPs was thus selected for inclusion in the sexing protocol.
RESULTS
Our overall strategy for sexing of embryos from which individual RNA-Seq libraries were obtained was to employ a series of criteria based on expression of Y-linked and X-linked genes in a hierarchical fashion. This strategy consists of examining three parameters of gene expression in the RNA-Seq profile: expression of Y-linked marker genes, expression of XIST homolog, and expression of two alleles for X-linked gene SNPs. These data are interpreted according to the decision tree shown (Fig. 1). This approach avoids basing sex determination solely on negative results (e.g., using a single Y-linked marker would require absence of detection to be taken as evidence of an XX genotype). Measurements were made in the following order: 1) quantitation of Y-linked markers as evidence of an XY genotype; 2) quantification of human XIST homolog as evidence of an XX genotype; 3) determination of heterozygosity at X-linked SNPs as evidence of an XX genotype. For initial validation of the method, we first developed the assays for these three steps and validated them on RNA-Seq data sets of rhesus monkey brain tissue where sex of samples was known. We then applied the method to rhesus monkey oocytes as additional validation and a test of the limits of the assay. Finally, the validated method was then used on rhesus monkey preimplantation embryos of different stages to demonstrate application and efficiency and to determine assay limitations.
Validation of three-step assay on samples of known sex.
We first used libraries in series GSE85377 to identify the genes that are expressed from the Y chromosome and free of confounding homologous non-Y sequences. We identified RPS4Y1 and RPS4Y2 as two male-specific Y-linked transcripts that were expressed in all male-derived libraries (Table 1). Expression levels were 159–271 RPKM for RPS4Y1 and 69–113 RPKM for RPS4Y2. Expression signal was negligible in the female-derived libraries (<0.3 RPKM for RPS4Y1 and <0.1 RPKM for RPS4Y2), representing a low level of incorrect alignment.
Table 1.
Study | Library ID | Sample Type | Aligned Reads, M | XIST FPKM | RPS4Y1 FPKM | RPS4Y2 FPKM | CountHet | SNPs w/≥10 Reads, n |
---|---|---|---|---|---|---|---|---|
GSE85377 | SRR4015393 | F CC | 30.54 | 43.7 | 0.0 | 0.1 | 10 | 25 |
SRR4015394 | M CC | 43.69 | 0.0 | 271.5 | 113.1 | 25 | ||
SRR4015395 | F CC | 41.12 | 52.2 | 0.3 | 0.1 | 13 | 25 | |
SRR4015396 | M CC | 39.54 | 0.0 | 230.4 | 107.4 | 25 | ||
SRR4015397 | F CC | 35.16 | 35.8 | 0.2 | 0.0 | 8 | 25 | |
SRR4015398 | M CC | 40.26 | 0.0 | 243.6 | 109.5 | 25 | ||
SRR4015399 | F CC | 36.81 | 33.9 | 0.0 | 0.1 | 7 | 25 | |
SRR4015400 | M CC | 37.17 | 0.0 | 228.5 | 112.9 | 25 | ||
SRR4187444 | M OC | 47.82 | 0.0 | 159.4 | 68.6 | 25 | ||
SRR4187445 | F OC | 47.03 | 18.6 | 0.0 | 0.0 | 10 | 25 | |
SRR4187446 | F OC | 40.38 | 16.5 | 0.0 | 0.1 | 12 | 25 | |
SRR4187447 | M OC | 39.32 | 0.0 | 218.8 | 82.3 | 25 | ||
SRR4187448 | F OC | 41.39 | 22.1 | 0.0 | 0.1 | 7 | 25 | |
SRR4187449 | M OC | 31.98 | 0.0 | 212.7 | 88.0 | 25 | ||
SRR4187450 | F OC | 51.03 | 19.6 | 0.0 | 0.0 | 7 | 25 | |
SRR4187451 | M OC | 43.82 | 0.0 | 216.4 | 88.1 | 25 | ||
GSE96731 | SRR5351411 | M CA 32 | 8.08 | 0.0 | 78.9 | 38.7 | 23 | |
SRR5351412 | M CA 32 | 9.01 | 0.0 | 78.5 | 37.8 | 20 | ||
SRR5351413 | M CA 32 | 9.42 | 0.0 | 39.7 | 18.3 | 17 | ||
SRR5351414 | M CA 32 | 7.64 | 0.0 | 46.5 | 26.9 | 17 | ||
SRR5351415 | M CA 32 | 7.80 | 0.0 | 67.1 | 39.1 | 21 | ||
SRR5351416 | M CA 32 | 9.15 | 0.0 | 39.1 | 21.7 | 13 | ||
SRR5351417 | M CA 32 | 7.55 | 0.0 | 41.2 | 20.0 | 15 | ||
SRR5351418 | M CA 32 | 9.15 | 0.0 | 70.8 | 32.6 | 21 |
FPKM, fragments per kilobase of transcript per million mapped reads SNP, single nucleotide polymorphism; F, female; M, male; Sample type: cingulate cortex (CC), occipital cortex (OC), hippocampus (CA).
We next sought X-linked transcripts that would be diagnostic of an XX genotype. One potential marker is the X-linked XIST RNA, which exhibits female-specific transcription in mice. XIST is not annotated in the current rhesus monkey Mmul8.0.1 genome assembly, but LOC106995245 appears to be its homolog, both in terms of sequence similarity and the position relative to other genes on X chromosome. This RNA displays high expression in all female-derived libraries (17–52 RPKM) and negligible expression in male-derived libraries (< 0.02 RPKM, Table 1).
Because XIST expression has been reported in human preimplantation stage embryos and may not be entirely female specific (5, 13), we sought a second means of revealing an XX genotype in the RNA-Seq data, specifically, the presence of two alleles of SNPs in genes that escape X-chromosome inactivation and that are expressed in embryos at a high enough level to enable discernment between XX and XY individuals. We identified 25 SNPs satisfying these criteria (Table 2). For classification of new libraries, we defined CountHet = number of SNPs (out of selected 25) with HetScore ≥ 0.75. Positive (above 0) CountHet is an indication that the library is of female origin, and the confidence increases with the increase of CountHet. CountHet scores ranged from 7 to 12 for the known female brain samples and remained zero for the male samples.
Table 2.
Locus (Gene Symbol) | ||
---|---|---|
X:9,306,730 (CLCN4) | X:43,833,732 (LOC708371) | X:79,961,444 (CHM) |
X:9,306,973 (CLCN4) | X:47,379,911 (CDK16) | X:79,963,004 (CHM) |
X:12,888,480 (TCEANC) | X:47,382,274 (CDK16) | X:79,963,019 (CHM) |
X:12,989,092 (GPM6B) | X:52,647,563 (IQSEC2) | X:79,963,546 (CHM) |
X:12,989,794 (GPM6B) | X:52,734,504 (IQSEC2) | X:87,721,222 (NAP1L3) |
X:13,235,718 (GEMIN8) | X:52,830,865 (SMC1A) | X:87,722,429 (NAP1L3) |
X:15,143,891 (AP1S2) | X:73,342,193 (ITM2A) | X:147,424,720 (HCFC1) |
X:15,144,734 (AP1S2) | X:73,342,285 (ITM2A) | |
X:16,190,433 (RBBP7) | X:75,171,614 (SH3BGRL) |
We performed careful selection of SNPs to prevent male-derived libraries being called female. We observed some SNPs with HetScores slightly above 0 (indicative of possible heterozygosity), but with a minor allele observed at a very low frequency, for which the most likely explanation is that this was due to sequencing errors. However, we also observed some SNPs with high HetScores (≈1) in male brain-derived libraries. SNPs were therefore penalized if they obtained a high HetScore in any of the male libraries to push them down in the ranking of SNPs and effectively exclude them from further use. We carefully examined aligned reads to exclude sequencing errors or high-mismatch alignments as possible causes of observing such SNPs in male-derived libraries. Furthermore, we performed alignment to the whole genome but could not find any similar sequences. Consequently, the only explanation for SNPs with high HetScores in male libraries is the existence of highly similar gene(s) in parts of genome that are not included in the assembly.
Based on the results from analysis of the brain samples of known sex, our proposed sexing strategy was to assign sex to embryo samples as follows: 1) if RPS4Y1 and RPS4Y2 are expressed (at nonnegligible level), the library is male-derived regardless of expression of XIST, unless CountHet >0, in which case the embryo sex cannot be unambiguously determined for that sample, and 2) if RPS4Y1 and RPS4Y2 are not expressed, and expression of XIST is substantial and/or CountHet >0, the library is female derived. Embryos not conforming to either of these two situations were judged questionable.
Assay application and limits in oocytes and preimplantation embryos.
The ability of the above three-step assay and individual markers to allow sexing of embryos will depend on level of expression and aspects of library quality that may selectively affect diagnostic RNA expression. Analyses of oocytes (Table 3), which are known to be of female origin, showed the expected negligible expression of RPS4Y1 and RPS4Y2 mRNAs. XIST expression was not detected in GSE86938 or GSE103313 oocyte libraries but was detected in our in-house oocyte libraries (≈6.5 RPKM). This difference in detection of mRNAs may be due to differences in protocols for RNA-Seq library preparation, and in the case of GSE86938 a generally lower depth of coverage in that data set (average of 6.6M aligned reads, compared with 16.3M for oocyte libraries in GSE103313 and 31.3M for oocyte libraries in our collection). CountHet scores were informative for six of the oocyte samples in GSE103313, and not for many other oocyte samples. The limited depth of coverage for GSE86938 may have contributed to this, but this explanation does not hold for the other sets of libraries, where SNP-bearing mRNAs were detected at reasonable frequencies (up to 7 SNPs with ≥10 reads). Low XIST expression and low expression of RPS4Y1 and RPS4Y2 mRNAs were also seen in pronucleate, 2-cell and 4-cell stage embryos (Table 3). Positive CountHet results were obtained for seven of eight pronucleate to 4-cell stage samples, again echoing results for oocytes in GSE103313. The similarity between these early stages and oocytes was expected due to the occurrence of the major embryonic transcription activation event at the 6–8 cell stage in primates (2, 4, 14), which a priori renders RNA-Seq-based sexing untenable at those early stages.
Table 3.
Study | Library ID | Sample Type | Aligned Reads, M | MXIST FPKM | RPS4Y1 FPKM | RPS4Y2 FPKM | CountHet | SNPs w/≥10 Reads, n* |
---|---|---|---|---|---|---|---|---|
GSE86938 oocytes | SRR4242895 | GV | 6.83 | 0.0 | 1.3 | 0.4 | 0 | |
SRR4242896 | 6.87 | 0.0 | 0.3 | 0.0 | 0 | |||
SRR4242897 | 7.05 | 0.0 | 1.0 | 0.0 | 0 | |||
SRR4242898 | MII | 7.97 | 0.0 | 0.3 | 0.0 | 0 | ||
SRR4242899 | 5.03 | 0.0 | 0.5 | 0.0 | 0 | |||
SRR4242900 | 5.68 | 0.0 | 0.0 | 0.0 | 0 | |||
GSE103313 oocytes | SRR5991098 | GV | 16.45 | 0.0 | 0.0 | 0.0 | 1 | 3 |
SRR5991099 | 17.74 | 0.0 | 0.0 | 0.0 | 3 | 3 | ||
SRR5991100 | MI | 10.74 | 0.0 | 0.6 | 0.2 | 2 | 4 | |
SRR5991101 | 15.22 | 0.0 | 0.1 | 0.2 | 2 | 5 | ||
SRR5991102 | MII | 18.68 | 0.0 | 0.0 | 0.0 | 1 | 3 | |
SRR5991103 | 14.76 | 0.0 | 0.0 | 0.0 | 1 | |||
SRR5991104 | 20.41 | 0.0 | 0.0 | 0.0 | 1 | 2 | ||
In-house oocytes | 020516.05 | MII | 27.11 | 6.9 | 0.0 | 0.0 | 1 | 7 |
020516.06 | 34.28 | 0.9 | 0.1 | 0.0 | 1 | 7 | ||
021916.01 | 35.15 | 8.6 | 0.1 | 0.0 | 7 | |||
021916.02 | 31.07 | 4.5 | 0.0 | 0.0 | 5 | |||
110515.02 | 29.83 | 6.3 | 0.2 | 0.3 | 7 | |||
111815.02 | 30.28 | 2.0 | 0.2 | 0.2 | 5 | |||
120115.01 | 30.49 | 10.6 | 0.0 | 0.0 | 6 | |||
120115.02 | 31.98 | 11.9 | 0.0 | 0.0 | 6 | |||
GSE86938 early embryos | SRR4242901 | PN | 5.61 | 0.0 | 0.4 | 0.0 | 0 | |
SRR4242902 | 6.50 | 0.0 | 0.0 | 0.0 | 0 | |||
SRR4242903 | 6.60 | 0.0 | 0.0 | 0.0 | 0 | |||
SRR4242904 | 2C | 4.93 | 0.0 | 0.0 | 0.0 | 0 | ||
SRR4242905 | 5.75 | 0.0 | 0.4 | 0.0 | 0 | |||
SRR4242906 | 7.46 | 0.0 | 0.3 | 0.7 | 0 | |||
SRR4242907 | 4C | 5.47 | 0.1 | 0.0 | 0.0 | 0 | ||
SRR4242908 | 4.41 | 0.0 | 0.5 | 0.0 | 0 | |||
GSE103313 early embryos | SRR5991105 | PN | 8.00 | 0.0 | 0.0 | 0.0 | 1 | 5 |
SRR5991106 | 5.18 | 0.0 | 0.0 | 0.2 | 1 | 5 | ||
SRR5991107 | 8.70 | 0.0 | 0.9 | 0.9 | 1 | 6 | ||
SRR5991108 | 2C | 12.31 | 0.0 | 0.2 | 0.4 | 1 | 6 | |
SRR5991109 | 8.30 | 0.0 | 0.0 | 0.0 | 1 | 4 | ||
SRR5991110 | 4C | 10.37 | 0.0 | 0.3 | 0.1 | 1 | 4 | |
SRR5991111 | 7.24 | 0.0 | 0.5 | 1.2 | 1 | 4 | ||
SRR5991112 | 11.14 | 0.0 | 0.1 | 0.1 | 5 |
Sample types are germinal vesicle stage oocyte (GV), meiosis metaphase I and II stage oocytes (MI and MII), and pronucleate 1-cell stage (PN), two-cell (2C), and four-cell (4C) stage embryos.
Series GSE86938 has low depth of coverage, limiting detection of mRNAs with informative SNPs.
Beginning at the 8-cell stage, RNA-Seq-based sexing should become feasible, and reliability should increase with progressive development. This expectation was borne out (Table 4), but differences between brain and oocyte/embryo samples were evident. We note that there were slightly more embryos scored as male than female (24 vs. 19) from the 8-cell stage onward, but this difference was not statistically significant.
Table 4.
Study | Library ID | Sample Type | Aligned Reads, M | XIST FPKM | RPS4Y1 FPKM | RPS4Y2 FPKM | CountHet | SNPs w/≥10 Reads, n* | Deduced Sex |
---|---|---|---|---|---|---|---|---|---|
GSE86938 | SRR4242909 | 8C | 5.35 | 0.4 | 5.6 | 1.9 | 0 | ? | |
SRR4242910 | 5.58 | 0.1 | 0.4 | 0.0 | 0 | ? | |||
SRR4242911 | 5.82 | 0.2 | 0.8 | 17.8 | 0 | M | |||
SRR4242912 | 4.17 | 0.0 | 0.5 | 0.6 | 0 | ? | |||
SRR4242913 | 4.48 | 0.0 | 0.3 | 0.0 | 0 | ? | |||
SRR4242914 | MOR | 5.94 | 18.8 | 0.8 | 0.0 | 0 | F | ||
SRR4242915 | 6.50 | 7.5 | 1734.3 | 2006.4 | 0 | M | |||
SRR4242916 | 4.66 | 27.3 | 33.1 | 31.5 | 0 | ? | |||
SRR4242917 | BLA | 7.24 | 1.8 | 3028.2 | 2599.0 | 0 | M | ||
SRR4242918 | 5.36 | 2.2 | 3013.7 | 1583.6 | 0 | M | |||
SRR4242919 | 8.82 | 1.9 | 3125.4 | 1396.0 | 0 | M | |||
SRR4242920 | 7.52 | 3.6 | 3016.7 | 1659.0 | 0 | M | |||
GSE103313 | SRR5991113 | 8C | 15.08 | 0.5 | 0.2 | 0.3 | 7 | ? | |
SRR5991114 | 3.46 | 0.0 | 3.0 | 34.7 | 1 | 1 | ? | ||
SRR5991115 | MOR | 10.28 | 3.3 | 1009.4 | 1367.5 | 3 | M | ||
SRR5991116 | 14.12 | 7.2 | 307.8 | 340.3 | 0 | M | |||
SRR5991117 | 12.15 | 2.5 | 0.0 | 0.0 | 3 | F | |||
SRR5991118 | BLA | 9.59 | 0.7 | 0.0 | 0.0 | 1 | 7 | F | |
SRR5991119 | 8.34 | 7.3 | 1488.0 | 1049.5 | 4 | M | |||
In-house | 011516.02 | 8C | 13.88 | 33.6 | 0.0 | 0.0 | 1 | F | |
011516.03 | 24.20 | 1.8 | 0.0 | 0.1 | 3 | F | |||
020816.06 | 23.09 | 19.7 | 39.3 | 30.8 | 2 | M | |||
020816.07 | 32.18 | 19.0 | 56.3 | 28.9 | 5 | M | |||
020816.08 | 8.50 | 5.9 | 24.9 | 72.5 | 2 | M | |||
020816.09 | 22.02 | 3.5 | 0.1 | 0.0 | 4 | F | |||
020816.10 | 25.47 | 90.0 | 0.1 | 0.0 | 5 | F | |||
112015.01 | 25.76 | 0.2 | 0.0 | 0.0 | 4 | ? | |||
112015.02 | 27.18 | 0.3 | 0.0 | 4.5 | 5 | ? | |||
112815.01 | 12.61 | 9.4 | 0.0 | 0.0 | 4 | F | |||
112815.02 | 11.58 | 16.6 | 68.4 | 73.4 | 3 | M | |||
020816.02 | MOR | 9.80 | 10.6 | 45.7 | 97.0 | 0 | M | ||
020816.03 | 8.22 | 13.0 | 103.0 | 52.7 | 2 | M | |||
020816.04 | 25.46 | 17.7 | 46.5 | 14.4 | 6 | M | |||
021016.02 | 11.55 | 14.9 | 164.1 | 162.5 | 4 | M | |||
021016.03 | 11.93 | 55.5 | 0.2 | 0.0 | 4 | F | |||
021016.04 | 12.10 | 41.6 | 0.0 | 0.0 | 1 | F | |||
022416.02 | 13.39 | 25.4 | 0.0 | 0.1 | 2 | F | |||
022416.03 | 13.87 | 36.3 | 0.0 | 0.0 | 2 | F | |||
022916.01 | 8.88 | 48.7 | 4.6 | 7.6 | 4 | ? | |||
022916.03 | 24.75 | 7.0 | 0.1 | 0.0 | 5 | F | |||
112515.01 | DEL. BLA | 8.23 | 2.4 | 183.8 | 120.2 | 1 | M | ||
112515.02 | 7.70 | 42.4 | 0.4 | 0.2 | 1 | 5 | F | ||
111215.01 | EXP. BLA | 9.50 | 8.2 | 152.2 | 100.8 | 2 | M | ||
111215.02 | 17.76 | 11.7 | 260.2 | 153.4 | 7 | M | |||
111215.03 | 9.08 | 153.2 | 0.3 | 0.0 | 2 | F | |||
111715.01 | 9.42 | 118.8 | 0.5 | 0.3 | 5 | F | |||
111715.02 | 9.77 | 20.3 | 165.1 | 73.8 | 3 | M | |||
111715.03 | 8.02 | 70.3 | 2.8 | 8.4 | 3 | ? | |||
111715.04 | 11.50 | 79.2 | 0.1 | 0.1 | 4 | F | |||
112415.01 | 9.36 | 5.9 | 115.6 | 59.4 | 2 | M | |||
112415.02 | 9.69 | 66.8 | 0.6 | 0.4 | 3 | F | |||
112415.03 | 11.76 | 9.2 | 73.3 | 80.5 | 2 | M | |||
112415.04 | 10.70 | 81.9 | 2.5 | 4.4 | 1 | ? | |||
112515.03 | 14.05 | 18.7 | 45.6 | 55.0 | 1 | M | |||
112515.04 | 12.98 | 52.9 | 0.0 | 0.0 | 1 | F |
Sample types are eight-cell (8C), morula (MOR), and blastocyst (BLA) stage embryos, delayed blastocysts (DEL. BLA), and expanded blastocysts (EXP. BLA).
Series GSE86938 has low depth of coverage, limiting detection of mRNAs with informative SNPs.
One key difference between the brain and embryo libraries is that, whereas XIST was expressed exclusively in female-derived brain libraries, it was detectable in some embryos with high levels of RPS4Y1 and RPS4Y2 expression (i.e., male embryos), albeit at lower levels compared with samples flagged as female by X-linked SNP markers. Additionally, several libraries labeled as 8-cell stage appear to be derived from early 8-cell stage embryos, lacking sufficient expression of XIST, RPS4Y1, or RPS4Y2 to be judged as male or female. Where the sequencing depth was insufficient to apply the SNP-based markers, these samples were left uncharacterized for sex. Most of the remaining libraries fell in one of two categories: 1) high expression of RPS4Y1 and RPS4Y2 and low expression of XIST, which we characterized as male, 2) expression of XIST and no (or negligible) expression of RPS4Y1 and RPS4Y2, which we characterized as female. We did not observe any cases where a positive HetScore was seen and thus contradicted male designation even when a low XIST signal was seen with high Y gene expression. There were some libraries that fell outside of these two categories, which we left uncharacterized. SRR5991114 (8-cell), with high expression of RPS4Y2 and suspiciously low expression of RPS4Y1 also had biallelic expression in one of 25 selected X-chromosome SNPs. SRR4242916 (morula) has expression of RPS4Y1 and RPS4Y2 that is not negligible but is orders of magnitude lower than the other male-morula-derived library in the same series. Several libraries (022916_01, 111715_03, 112415_04) showed high XIST expression and expression of RPS4Y1 and RPS4Y2 at the level that was in the ambiguous range between negligible (noise) and substantial (real expression); with the lack of SNP-based evidence, we were not confident in characterizing these libraries as female derived. Overall, 43 of 55 samples from 8-cell, morula, and blastocyst stages were successfully identified as male or female, for an overall success rate of 78%. Excluding the seven early 8-cell stage embryos, the success rate is 90% for embryos that were likely to be fully transcriptionally active.
Sexing of murine embryos based on RNA-Seq transcriptome profiles.
We applied a similar methodology to sexing of mouse embryo-derived RNA-Seq libraries (1). We made several small adjustments to the methodology: we used Eif2s3y as the male (Y-chromosome) marker instead of RPS4Y1/RPS4Y2, and we calculated HetScore directly from the reported (paternal) allelic ratio = paternal reads/(paternal+maternal) reads, by using the formula HetScore = ∑b = P,Mf(b)log2f(b), where f(P) = allelic ratio and f(M) = 1 – allelic ratio. Although the unit used to report the level of gene expression in (1) was RPRT (reads per retro-transcribed length per million mapped reads), the definition of RPRT is essentially the same as for RPKM/FPKM, except that the read counts are normalized based on the amplification size of each transcript instead of the whole transcript size, to account for the fact that the RNA reverse transcription method that was used only allowed sequencing up to on average of 3 kilobases from the 3′-untranslated region.
GSE80810 consists of two groups of embryo libraries: wild-type (WT) embryos and heterozygous mutant Xist WT/KO embryos [obtained by crossing WT females and Xist knockout (KO) males]. Scatter-plots in Fig. 2A (Xist WT/WT) and Fig. 2B (Xist WT/KO) show the expression of Eif2s3y and Xist for these libraries, with markers (“o” for female and “x” for male) representing sex as determined by the PCR-based method (1). Inset scatterplots provide enhanced visualization of the region close to the origin, where libraries with lower or no expression of markers Eif2s3y and Xist are clustered.
The majority of embryos in Fig. 2A can be clearly classified as either females (high Xist expression) or as males (high Eif2s3y). However, as seen in the inset of Fig. 2A, we observed several embryos for which the markers Xist and Eif2s3y are inconclusive: either their expression is too low, or both of them are expressed at a substantial level. Inversely, although sex can be clearly determined for a few Xist WT/KO embryos in Fig. 2B, the majority of embryos yielded inconclusive results using only Xist and Eif2s3y as markers. For these embryos, X chromosome SNP heterozygosity proved highly informative. Figure 2C shows the expression levels of the two markers for ambiguous embryos from Fig. 2, B and C, as well as the percentages of X-chromosome SNPs for which HetScore ≥ 0.75 which helps resolve the ambiguities. This is followed by sex of the embryo as previously determined with the PCR method (Sex/PCR), as well as our call based on the RNA-Seq data (Sex/RNA-Seq). Each of our calls is supported by one or more of the following: high expression of Xist (“X”), high expression of Eif2s3y (“Y”), and high (female) or low/negligible (female) percentage of SNPs with HetScore ≥0.75 (“SNPs”). The ambiguity could be resolved for most (81/87) of the listed embryos. The sex of only six embryos remained unresolved; these had both a high percentage of SNPs with HetScore ≥ 0.75 (indicating femaleness), as well as high expression of Eif2s3y (indicating maleness), which may indicate a technical problem with sample or library.
Of note, for all 88 libraries that were labeled as female or male based on the high expression of Xist or Eif2s3y marker (Fig. 2, A and B), the SNP-based criterion was consistent with that classification, i.e., the percentage of SNPs with HetScore ≥ 0.75 was high for female and zero/negligible for male libraries (data not shown).
DISCUSSION
The use of a three-step process for determining embryo sex (Fig. 1) by measuring expression of Y-linked genes, assessing XIST expression, and detecting two distinct expressed alleles of X-linked genes provides a robust and reliable new way of determining the sex of single rhesus monkey or mouse embryos from which RNA-Seq libraries are created, without need of separate DNA isolation and analysis, or a separate RT-PCR analysis. We validated this method here with a mixture of published and unpublished rhesus monkey RNA-Seq libraries and published mouse embryo libraries, and demonstrate its utility for both species. Furthermore, our data indicate that approaches employed previously in rodents may not readily translate to nonhuman primates, because of differences in genome structure or genome assembly and annotation quality, highlighting the importance of developing such assays for application in nonhuman primates. Additionally, the general approach employed here should be applicable to any species for which a sufficiently well characterized genome and knowledge of SNPs are available.
It is emphasized that the use of both X- and Y-linked marker genes provides a high confidence of determining the sex of an embryo that yielded an RNA-Seq library. One obvious requirement for applying an RNA-Seq based assay for sexing is that the embryo must be transcriptionally active, and thus expressing RNA reflective of its own genotype. The assay identifies embryo sex on the basis of the embryo’s own transcriptome. To do that, an embryonic transcriptome must be present. Obviously, before transcriptional activation, there is no embryonic transcriptome to assay. Because the stage of major transcriptional activation events are well known for major model organisms (e.g., 2-cell stage for mouse and rat, 6–8 cell stage for other major mammalian model organisms), users will be aware of which stages of embryos will be affected in this manner. Moreover, an inability to identify sex of embryo before transcriptional activation will not constitute a significant barrier to studies that seek to determine the role of embryo sex in early embryo processes and phenotype, because before transcriptional activation, phenotype and processes are driven by the maternal (oocyte) genotype. In other words, embryo sex will likely not be a significant biological variable affecting embryo phenotype until the embryo is expressing its own genome, including its sex chromosomes.
Application of the method to a series of mouse embryo-derived libraries used a different Y-linked male marker (Eif2s3y). Interestingly, several factors likely made the sexing of these mouse embryos more efficient compared with the rhesus monkey embryos. Earlier activation of embryonic transcription in mouse embryos makes Xist and Eif2s3y markers more useful at earlier stages. A high number of heterozygous X-chromosome SNPs, due to the use of crosses between genetically distant strains (C57BL/6J and Cast/EiJ) contributed to the success of the SNP-based criterion. Notably, even though the Xist WT/KO female embryos lacked a high level of Xist expression, most of these embryos were successfully identified as female based on the SNP analysis. The SNP-based method obviously will be less useful for embryos derived by crossing less distant strains and would not be applicable for homozygous inbred embryos.
While the use of Y-linked markers, XIST and X-linked SNPs together provides a high degree of discernment between male and female embryos when used in combination, the markers can also be used individually with some level of success, but this is best done for embryos within a single set of similarly processed samples, because relative levels of expression can vary between data sets, different embryo stages, cell types, or tissue types, and thus a hard threshold cannot readily be identified. Additionally, a low rate of false detection of male-specific marker mRNAs in females, and low but nonzero HetScores for X-linked SNPs in male embryos may be encountered. Prevailing background rates of error in sequencing or erroneous assignment of reads from homologous genes as marker gene transcripts make this unavoidable with the current technology. However, these technical problems are essentially negated when the three sets of markers are used together. We observed several limitations to the SNP-based approach. Aside from low or absent expression, only a fraction of SNPs are heterozygous in any given rhesus monkey embryo. Furthermore, many SNPs may not display equal allele expression due to partial X-chromosome inactivation or some other genetic factor, causing the heterozygosity score HetScore to stay below the threshold. Therefore, this method should not be used alone, as female embryo-derived libraries sequenced at insufficient depth would likely be (incorrectly) called male. Biological variation in marker gene regulation and expression may also affect ability to unambiguously identify the sex on embryo for occasional libraries, but overall our success rate, once embryonic transcription has begun, is quite high. The markers and three-step approach described here thus provide a valuable new tool to allow sex to be assigned to RNA-Seq libraries of single embryos. This same approach is applicable to RNA-Seq libraries from cells or tissues of previous studies, for which donor sex is unknown or was not recorded.
GRANTS
This work was supported in part by grants from the Office of Research Infrastructure Programs Division of Comparative Medicine Grants R24 OD-012221 (to K. E. Latham), and OD-011107/RR-00169 (California National Primate Research Center) and OD-010967/RR025880 (to C. A. VandeVoort), and by Michigan State University (MSU) AgBioResearch (K. E. Latham), and MSU (K. E. Latham).
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the authors.
AUTHOR CONTRIBUTIONS
U.M. and K.E.L. conceived and designed research; U.M. and C.A.V. performed experiments; U.M. analyzed data; U.M. interpreted results of experiments; U.M. prepared figures; U.M. and K.E.L. drafted manuscript; U.M., C.A.V., and K.E.L. edited and revised manuscript; U.M., C.A.V., and K.E.L. approved final version of manuscript.
REFERENCES
- 1.Borensztein M, Syx L, Ancelin K, Diabangouaya P, Picard C, Liu T, Liang JB, Vassilev I, Galupa R, Servant N, Barillot E, Surani A, Chen CJ, Heard E. Xist-dependent imprinted X inactivation and the early developmental consequences of its failure. Nat Struct Mol Biol 24: 226–233, 2017. doi: 10.1038/nsmb.3365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Braude P, Bolton V, Moore S. Human gene expression first occurs between the four- and eight-cell stages of preimplantation development. Nature 332: 459–461, 1988. doi: 10.1038/332459a0. [DOI] [PubMed] [Google Scholar]
- 3.Carrel L, Willard HF. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434: 400–404, 2005. doi: 10.1038/nature03479. [DOI] [PubMed] [Google Scholar]
- 4.Chitwood JL, Burruel VR, Halstead MM, Meyers SA, Ross PJ. Transcriptome profiling of individual rhesus macaque oocytes and preimplantation embryos. Biol Reprod 97: 353–364, 2017. doi: 10.1093/biolre/iox114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Daniels R, Zuccotti M, Kinis T, Serhal P, Monk M. XIST expression in human oocytes and preimplantation embryos. Am J Hum Genet 61: 33–39, 1997. doi: 10.1086/513892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hartshorn C, Rice JE, Wangh LJ. Developmentally-regulated changes of Xist RNA levels in single preimplantation mouse embryos, as revealed by quantitative real-time PCR. Mol Reprod Dev 61: 425–436, 2002. doi: 10.1002/mrd.10037. [DOI] [PubMed] [Google Scholar]
- 7.Iancu OD, Colville A, Walter NA, Darakjian P, Oberbeck DL, Daunais JB, Zheng CL, Searles RP, McWeeney SK, Grant KA, Hitzemann R. On the relationships in rhesus macaques between chronic ethanol consumption and the brain transcriptome. Addict Biol 23: 196–205, 2017. doi: 10.1111/adb.12501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12: 357–360, 2015. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liu S, Wang Z, Chen D, Zhang B, Tian RR, Wu J, Zhang Y, Xu K, Yang LM, Cheng C, Ma J, Lv L, Zheng YT, Hu X, Zhang Y, Wang X, Li J. Annotation and cluster analysis of spatiotemporal- and sex-related lncRNA expression in rhesus macaque brain. Genome Res 27: 1608–1620, 2017. doi: 10.1101/gr.217463.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.McClive PJ, Sinclair AH. Rapid DNA extraction and PCR-sexing of mouse embryos. Mol Reprod Dev 60: 225–226, 2001. doi: 10.1002/mrd.1081. [DOI] [PubMed] [Google Scholar]
- 11.Midic U, Vincent KA, VandeVoort CA, Latham KE. Effects of long-term endocrine disrupting compound exposure on Macaca mulatta embryonic stem cells. Reprod Toxicol 65: 382–393, 2016. doi: 10.1016/j.reprotox.2016.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Prantner AM, Ord T, Medvedev S, Gerton GL. High-throughput sexing of mouse blastocysts by real-time PCR using dissociation curves. Mol Reprod Dev 83: 6–7, 2016. doi: 10.1002/mrd.22595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Ray PF, Winston RM, Handyside AH. XIST expression from the maternal X chromosome in human male preimplantation embryos at the blastocyst stage. Hum Mol Genet 6: 1323–1327, 1997. doi: 10.1093/hmg/6.8.1323. [DOI] [PubMed] [Google Scholar]
- 14.Schramm RD, Bavister BD. Onset of nucleolar and extranucleolar transcription and expression of fibrillarin in macaque embryos developing in vitro. Biol Reprod 60: 721–728, 1999. doi: 10.1095/biolreprod60.3.721. [DOI] [PubMed] [Google Scholar]
- 15.Wang X, Liu D, He D, Suo S, Xia X, He X, Han JJ, Zheng P. Transcriptome analyses of rhesus monkey preimplantation embryos reveal a reduced capacity for DNA double-strand break repair in primate oocytes and early embryos. Genome Res 27: 567–579, 2017. doi: 10.1101/gr.198044.115. [DOI] [PMC free article] [PubMed] [Google Scholar]