Skip to main content
Genomics Data logoLink to Genomics Data
. 2015 Jul 10;5:309–313. doi: 10.1016/j.gdata.2015.06.026

Piwi proteins and piRNAs in mammalian oocytes and early embryos: From sample to sequence

David Rosenkranz a,, Chung-Ting Han c, Elke F Roovers b, Hans Zischler a, René F Ketting b
PMCID: PMC4583685  PMID: 26484274

Abstract

The role of the Piwi/piRNA pathway during mammalian oogenesis has remained enigmatic thus far, especially since experiments with Piwi knockout mice did not reveal any phenotypic defects in female individuals. This is in striking contrast with results obtained from other species including flies and zebrafish. In mouse oocytes, however, only low levels of piRNAs are found and they are not required for their function. We recently demonstrated dynamic expression of PIWIL1, PIWIL2, and PIWIL3 during mammalian oogenesis and early embryogenesis. In addition, small RNA analysis of human, crab-eating macaque and cattle revealed that piRNAs are also expressed in the female germline and closely resemble piRNAs from testis. Here, we thoroughly describe the experimental and computational methods that we applied for the generation, processing and analyses of next generation sequencing (NGS) data associated with our study on Piwi proteins and piRNAs in mammalian oocytes and embryos (Roovers et al., 2015). The complete sequence data is available at NCBI's Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under the accession GSE64942.

Keywords: Piwi, piRNA, Oocytes, Ovary, Embryo


Specifications
Organism/cell line/tissue Human (adult and fetal ovary), marmoset (adult testis and ovary), cattle (adult ovary, oocytes, cumulus and in vitro fertilized 2–4 cell stage embryos)
Sex Male and female
Sequencer or array type Illumina HiSeq 2500
Data format Raw and analyzed
Experimental factors Normal and untreated tissues/cells
Experimental features Sequencing of small RNA transcriptomes. Comparison of small RNA libraries with and without a NaIO4 oxidation step (for selected samples).
Consent Human adult ovary samples were from cancer patients that underwent unilateral oophorectomy for fertility preservation and have signed informed consent. The human fetal material used was from elective abortions and donated for research with informed consent. The research on human material was approved by the Medical Ethical Committee of the Leiden University Medical Center (CME P08.087 and CME 05/03 K/YR).
Sample source location Leiden, Netherlands (human samples), Göttingen, Germany (macaque samples), Utrecht, Netherlands (bovine samples)

1. Direct link to deposited data

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE64942.

2. Experimental design, materials and methods

The Gene Expression Omnibus (GEO) Series GSE64942 comprises 41 high-throughput sequencing datasets (raw and processed) associated with the Roovers et al. study [1]. The following table lists all samples and provides the most relevant data associated with each sample (Table 1).

Table 1.

Key features of GEO Series GSE64942 samples (hsap: Homo sapiens, mfas: Macaca fascicularis, and btau: Bos taurus).

GEO id Sample name Species Tissue or cell type NaIO4 treatment
GSM1584495 Bovine_GV_a btau Oocytes at GV stage No
GSM1584496 Bovine_GV_b1 btau Oocytes at GV stage No
GSM1584497 Bovine_MI_1 btau Oocytes at MI stage No
GSM1584498 Bovine_MI_2 btau Oocytes at MI stage No
GSM1584499 Bovine_MII_a1 btau Oocytes at MII stage No
GSM1584500 Bovine_MII_a2 btau Oocytes at MII stage No
GSM1584501 Bovine_MII_b1 btau Oocytes at MII stage No
GSM1584502 Bovine_MII_b2 btau Oocytes at MII stage No
GSM1584503 Bovine_ovary_1 btau Adult ovary No
GSM1584504 Bovine_ovary_2 btau Adult ovary No
GSM1584505 Bovine_oxidized_1 btau Adult ovary Yes
GSM1584506 Bovine_oxidized_2 btau Adult ovary Yes
GSM1584507 Bovine_testis_1 btau Adult testis No
GSM1584508 Bovine_testis_2 btau Adult testis No
GSM1584509 Bovine_cumulus_MI_1 btau Cumulus cells from MI oocytes No
GSM1584510 Bovine_cumulus_MI_2 btau Cumulus cells from MI oocytes No
GSM1584511 Bovine_cumulus_MII_1 btau Cumulus cells from MII oocytes No
GSM1584512 Bovine_cumulus_MII_2 btau Cumulus cells from MII oocytes No
GSM1584513 Bovine_IVF_1 btau IVF embryos in 2–4 cell stages No
GSM1584514 Bovine_IVF_2 btau IVF embryos in 2–4 cell stages No
GSM1584515 Macaque_ovary_1 mfas Adult ovary No
GSM1584516 Macaque_ovary_2 mfas Adult ovary No
GSM1584517 Macaque_oxidized_1 mfas Adult ovary Yes
GSM1584518 Macaque_oxidized_2 mfas Adult ovary Yes
GSM1584519 Macaque_testis_1 mfas Adult testis No
GSM1584520 Macaque_testis_2 mfas Adult testis No
GSM1584521 Human_ovary_1 hsap Adult ovary No
GSM1584522 Human_ovary_2 hsap Adult ovary No
GSM1584523 Human_ovary_1_oxidized hsap Adult ovary Yes
GSM1584524 Human_ovary_2_oxidized hsap Adult ovary Yes
GSM1584525 Fetal_1st_1 hsap Ovary from 1st trimester fetus No
GSM1584526 Fetal_1st_2 hsap Ovary from 1st trimester fetus No
GSM1584527 Fetal_1st_1_oxidized hsap Ovary from 1st trimester fetus Yes
GSM1584528 Fetal_1st_2_oxidized hsap Ovary from 1st trimester fetus Yes
GSM1584529 Fetal_2nd_1 hsap Ovary from 2nd trimester fetus No
GSM1584530 Fetal_2nd _2 hsap Ovary from 2nd trimester fetus No
GSM1584531 Fetal_2nd _1_oxidized hsap Ovary from 2nd trimester fetus Yes
GSM1584532 Fetal_2nd _2_oxidized hsap Ovary from 2nd trimester fetus Yes
GSM1614231 Bovine_oxidized_2_repeat btau Adult ovary Yes
GSM1614232 Bovine_cumulus_GV_a btau Cumulus cells from oocytes at GV stage No
GSM1614233 Bovine_cumulus_GV_b btau Cumulus cells from oocytes at GV stage No

2.1. RNA isolation and NaIO4 treatment

RNA was extracted from the ovary and testis tissues by adding 750 μl TRIzol LS reagent (Ambion) and the tissues were mashed and sonicated (3 times 30 s, low power). The oocyte samples of all stages (~ 1000 oocytes per sample) and the IVF samples (450 embryos per sample) were treated similarly, but instead of adding 750 μl TRIzol LS, only 325 μl TRIzol LS was used (for following steps half amounts were used as well). First trimester fetal samples were taken up in 325 μl TRIzol LS and sonicated 3 times 30 s, whereas second trimester ovary samples were first grinded under liquid N2 and then taken up in 325 μl TRIzol LS. The RNA extraction was performed according to instructions of the manufacturer with small adjustments: Precipitation was performed overnight at − 80 °C with the addition of 1 μl glycoblue (Ambion). Secondly, samples were spun down for 1 h at 16,000 × g at 4 °C before washing with 70% ethanol.

Samples that were sequenced directly without NaIO4 treatment were taken up in 6 μl RNase free MQ followed by library preparation. Samples that were oxidized were either first enriched for > 200 nt RNA molecules (bovine, macaque and human ovary) using the mirVana kit (Ambion), or were oxidized directly (low input samples: GV oocytes and fetal ovary tissues). In the case of GV oocytes, to control for successful oxidation and subsequent RNA isolation, we added 1/10th concentration of macaque testis RNA, since the testis piRNAs are methylated and the methylation state of the piRNAs from GV stage oocytes was still unknown. For the oxidation, we divided the samples into two and performed NaIO4 treatment or mock treatment as follows: prepare 5 × borate buffer (148 mM borax, 148 boric acid, adjust pH to 8.6) and prepare freshly 200 mM NaIO4. Mix 4 μl 5 × borate buffer, 2.5 μl 200 mM NaIO4 or 2.5 μl MQ (mock samples), RNA, MQ to 20 μl. Leave the reaction at RT for 10 min and add 2 μl glycerol and incubate 1 more minute at RT. Continue with RNA precipitation by adding 2.2 μl 3 M NaAc pH 5.5, 25 μl isopropanol and 1 μl glycoblue. Mix and precipitate overnight at − 80 °C. Spin down samples 1 h at 16,000 × g at 4 °C and wash once with 70% ethanol. Spin 20 more minutes, remove supernatant and take up the pellet in 6 μl RNase free MQ and continue with library preparation.

2.2. Library construction and high-throughput sequencing

Total RNA was subjected to 15% TBE-urea gel for size selection of 15–35 nt. This excised gel fraction was eluted in 0.3 M NaCl for > 16 h and precipitated with 100% isopropanol and Glycoblue for > 1 h at − 20 °C. The precipitated RNA pellet was washed once with 75% ethanol and dissolved in nuclease-free water. The purified RNA fraction was confirmed by Bioanalyzer Small RNA assay (Agilent). Library preparation was based on the NEBNext® Small RNA Library Prep Set for Illumina® (New England Biolabs) with minor modifications. To counteract ligation bias and to remove PCR duplicates, small RNA was first ligated to the 3′ adapter and then the 5′ adapter, both of which contained four random bases at the 5′ and 3′ end, respectively. Adapters with random bases were chemically synthesized by Bioo Scientific. Adapter-ligated RNA was reverse-transcribed and PCR amplified for 14 cycles using index primers. The PCR amplified cDNA construct was checked on the Bioanalyzer (Agilent) using High Sensitivity DNA assay. We performed a size selection of the small RNA library on LabChip XT instrument (PerkinElmer) using the DNA 300 assay kit. All libraries were pooled to obtain 10 nM, which was denatured to 9 or 10 pmol with 5% PhiX spiked-in and sequenced as single-read for 50 cycles on an Illumina MiSeq or HiSeq 2500 instrument in either rapid or high-output mode.

2.3. Computational processing and analysis of sequence data sets

2.3.1. Software deposition

For the processing, filtering and analyses of sequence datasets we applied a set of Perl scripts that are available using the following download link: http://www.smallrnagroup-mainz.de/software/scripts_Roovers-et-al.zip. Many of these Perl scripts are part of the NGS toolbox, which is subject to constant updating and debugging. The NGS toolbox collection as well as the latest sRNAmapper and proTRAC [2] software version including detailed documentations are available at http://www.smallrnagroup-mainz.de/software.html.

2.3.2. Data filtering, mapping and annotation

The 5′ and 3′ RNA adapters used for library construction were tagged with a short stretch of four random nucleotides at their 3′ or 5′ end, respectively. As a result, all 50 bp raw sequence reads comprise the original small RNA sequence flanked by four random nucleotides, and end with the 3′ adapter sequence (Fig. 1).

Fig. 1.

Fig. 1

Scheme of sequencing construct. The cloned RNA molecule is flanked by 5′ and 3′ random tags. The read length is 50 nt and therefore, depending on the cloned RNA length, typically ends with an incomplete 3′ adapter sequence.

In order to avoid any PCR-induced bias we clipped adapter sequences and excluded raw sequence reads with identical small RNA sequences and flanking random tags from the sequence data sets as these may constitute cloning products that each represents the same molecule (cliplinker_random.pl). Remaining sequence reads within the size range of 18–34 nt (NGS toolbox: length-filter) were collapsed to non-identical sequences while storing information on read counts for each sequence in the FASTA headers (NGS toolbox: collapse). We then mapped the sequences to the genome of Homo sapiens (assembly: GRCh38, GenBank accession: GCA_000001405.15), Macaca fascicularis (assembly: Macaca_fascicularis_5.0, GenBank accession: GCA_000364345.1) and Bos taurus (assembly: UMD3.1, GenBank accession: 6 GCA_000003055.3) using piRmapper_1.0.pl which is now replaced by the sRNAmapper software that is available at http://www.smallrnagroup-mainz.de/software.html. The mapping process was performed requiring a perfect match for positions 1 to 18 and allowing one internal mismatch in the following part of the sequence. In addition, we allowed up to two non-template 3′ nucleotides in order to map piRNAs that may carry post-transcriptionally modified 3′ ends. Fig. 2 displays the results of the initial processing and mapping procedures.

Fig. 2.

Fig. 2

Results of initial processing and mapping.

Sequences that were successfully mapped to the genome were also mapped (in sense orientation) to non-coding (nc) RNA datasets from Ensembl database (release 77), miRBase (release 21 [3]), Genomic tRNA database [4], and Silva rRNA database (release 119, [5]) using SeqMap [6] in order to identify miRNAs or fragments of miRNA precursors, lincRNA (no lincRNAs are described for Bos taurus), miscRNA, rRNA, snoRNA, snRNA and tRNA. In addition, we scanned for low complexity reads discarding reads with ≥ 75% of their sequence being composed of the same 1–4 nt motif (kill_simples.pl. = NGS toolbox: duster). In case of multiple annotation for one sequence, we apportioned the read counts accordingly to the different ncRNA types. Fig. 3 displays the results of this annotation step. Exact values for the information provided in the Fig. 2, Fig. 3 are available in supplementary file 1 (Excel sheet). A set of FASTA-formatted sequence files with sequences sorted according to the described annotation are available for each sample in supplementary file 2 (zip-compressed folder).

Fig. 3.

Fig. 3

Results of ncRNA annotation for successfully mapped reads.

2.3.3. Cloning efficiency of different random tags

Besides the possibility to minimize a putative sequence bias induced by PCR amplification, the usage of randomly tagged RNA adapters also ensures a high cloning efficiency due to providing all possible ligation substrates in order to antagonize putative ligation bias resulting from unknown or obscure ligation preferences of the applied RNA ligase. Indeed, we observed a bias for specific random tags in both 5′ and 3′ adapters across the different sequence datasets (Fig. 4).

Fig. 4.

Fig. 4

Positional nucleotide composition of 5′ and 3′ random tags.

While all positions of the 5′ random tag are biased towards G, only the first position of the 3′ random tag is biased towards G, whereas the following positions are biased towards T (U). Interestingly, this pattern is essentially the same comparing 3′ uridylated testis and 3′ adenylated oocyte and IVF derived sequences [1]. This suggests equal ligation efficiency and adapter bias for 3′ U and 3′ A RNA populations. However, we cannot exclude a difference in the population composition of cloned RNAs for different random adapters as observed by Jayaprakash and coworkers [7].

2.3.4. Prediction of genomic piRNA clusters

We predicted genomic piRNA clusters with proTRAC version 2.0.4 using the following command line arguments: -clstrand 0.5, -pimin 24, -pimax 31, -distr 1-90, -nr, -nh, and -mmr. The option -mmr (per million mapped reads) is renamed -rpm (reads per million) in proTRAC version 2.0.5 and later. The original proTRAC output is available for download at http://www.smallrnagroup-mainz.de/data/proTRAC_Roovers-et-al.zip as these results are not attached to the original study. Selected datasets from this GEO Series have also been added to the piRNA cluster database that can be accessed under the following URL: http://www.smallrnagroup-mainz.de/piRNAclusterDB.html.

Acknowledgments

This work was supported by a VICI grant (724.011.001) from the Dutch Organization for Scientific Research (to R.F.K.), by the Rhineland-Palatinate Forschungsschwerpunkt GeneRED (to H.Z) and by the research funding program MAIFOR of the University Medical Center Mainz (to D.R.).

Footnotes

Appendix A

Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.gdata.2015.06.026.

Appendix A. Supplementary data

Supplementary material 1.

mmc1.xlsx (21.2KB, xlsx)

Supplementary material 2.

mmc2.zip (11.9MB, zip)

References

  • 1.Roovers E.F., Rosenkranz D., Mahdipour M., Han C.T., He N., de Sousa C., Lopes S.M., van der Westerlaken L.A.J., Zischler H., Butter F., Roelen B.A.J., Ketting R.F. Piwi proteins and piRNAs in mammalian oocytes and early embryos. Cell Rep. 2015;10:2069–2082. doi: 10.1016/j.celrep.2015.02.062. [DOI] [PubMed] [Google Scholar]
  • 2.Rosenkranz D., Zischler H. proTRAC — a software for probabilistic piRNA cluster detection, visualization and analysis. BMC Bioinformatics. 2012;13:5. doi: 10.1186/1471-2105-13-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res. 2004;32:D109–D111. doi: 10.1093/nar/gkh023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chan P.P., Lowe T.M. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37:D93–D97. doi: 10.1093/nar/gkn787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Quast C., Pruesse E., Yilmaz P., Gerken J., Schweer T., Yarza P., Peplies J., Glöckner F.O. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jiang H., Wong W.H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics. 2008;24:2395–2396. doi: 10.1093/bioinformatics/btn429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jayaprakash A.D., Jabado O., Brown B.D., Sachidanandam R. Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 2011 doi: 10.1093/nar/gkr693. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1.

mmc1.xlsx (21.2KB, xlsx)

Supplementary material 2.

mmc2.zip (11.9MB, zip)

Articles from Genomics Data are provided here courtesy of Elsevier

RESOURCES