Abstract
Natural antisense transcripts (NATs), as one type of regulatory RNAs, occur prevalently in plant genomes and play significant roles in physiological and pathological processes. Although their important biological functions have been reported widely, a comprehensive database is lacking up to now. Consequently, we constructed a plant NAT database (PlantNATsDB) involving approximately 2 million NAT pairs in 69 plant species. GO annotation and high-throughput small RNA sequencing data currently available were integrated to investigate the biological function of NATs. PlantNATsDB provides various user-friendly web interfaces to facilitate the presentation of NATs and an integrated, graphical network browser to display the complex networks formed by different NATs. Moreover, a ‘Gene Set Analysis’ module based on GO annotation was designed to dig out the statistical significantly overrepresented GO categories from the specific NAT network. PlantNATsDB is currently the most comprehensive resource of NATs in the plant kingdom, which can serve as a reference database to investigate the regulatory function of NATs. The PlantNATsDB is freely available at http://bis.zju.edu.cn/pnatdb/.
INTRODUCTION
Gene regulation at RNA level has been progressively shown to be more important and prevalent than previously presumed (1,2). With the advances of high-throughput experimental technologies and bioinformatics methods, an explosion of recent findings underscores both the predominance and complexity of regulatory RNA molecules in eukaryotes, including the discovery of ubiquitous regulatory short non-coding RNAs (ncRNAs) (3), including microRNAs (miRNAs), endogenous short interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs), and the functional long ncRNAs (1,4). Natural antisense transcripts (NATs), as a new member of regulatory RNAs, occur prevalently in prokaryote and eukaryote genomes, and play significant roles in physiological and/or pathological processes (5). NATs are a group of endogenous RNA molecules containing sequences that are complementary to other transcripts (5–7). This class of RNAs includes both protein- and non-coding transcripts. NATs can be grouped into two categories, cis-NATs and trans-NATs, based on whether they act in cis or trans. Cis-NAT pairs are transcribed from opposing DNA strands at the same genomic locus and have a variety of orientations and differing lengths of overlap between the perfect sequence complementary regions, whereas trans-NAT pairs are transcribed from different loci and form partial complementarily (5). Although underlying mechanistic insights are largely unknown, NATs have been implicated in many aspects of gene regulation including genomic imprinting, transcriptional interference, RNA masking, RNA editing, RNA interference (RNAi) and translational regulation (5,7,8). However, since the discovery of the founder example of cis-NATs, SRO5 and P5CDH, involving in the regulation of salt tolerance through RNAi pathway in Arabidopsis (Arabidopsis thaliana) (9), more and more examples of NATs have been shown to act together with endogenous siRNAs (nat–siRNAs) from the overlapping regions in both plant and animal species (10–16). Moreover, deep sequencing of small RNAs (sRNAs) together with bioinformatics analysis reveals that the overlap portions of NATs are the hotspots for siRNA generation (12,13,17,18), further indicating that NATs are an important biogenesis mechanism of endogenous siRNAs. These recent discoveries revealed the unexpected complexity of the regulatory networks formed by NATs (17).
Whole-genome searches based on computational analysis have identified thousands of NAT pairs in multiple eukaryotes. Thus, standardized applications or databases are required for data description, deposition, organization, parsing and analysis, and also allowing for functional discovery by integrating other biological data. To date, there are just a few free available NAT databases, one of which, NATsDB (19), comprises 10 animal species. However, the existing databases mainly focus on cis-NATs and none of them expand to any plant species, although both cis-NATs and trans-NATs have been reported in several plant species including two model plants, the monocot rice (Oryza sativa) (17,18,20) and the eudicot Arabidopsis (18,21–23). Furthermore, the functional annotation and graphical visualization of the NATs is limited.
In the current analysis, we developed a genome-scale computational pipeline to identify NATs in plant species. A convenient database of plant NATs (PlantNATsDB) was constructed, which contains 69 plant species and provides the most comprehensive data set to date. PlantNATsDB serves the plant research community by providing facilitated access to a huge amount of resources regarding the NATs as well as a variety of specific analysis tools including browsing, searching, viewing, downloading and so on. In addition, it integrates Gene Ontology (GO) annotation (24) and sRNA high-throughput sequencing data sets to evaluate and investigate the function of NATs. Moreover, a ‘Gene Set Analysis’ module based on GO annotation was implemented to excavate the statistical significantly overrepresented GO categories from the complex network formed by different NATs. PlantNATsDB provides an information rich and user-friendly interface and an integrated, graphical network browser to facilitate mining-specific functional NAT pairs (Figure 1). Detailed information is provided at the PlantNATsDB website (http://bis.zju.edu.cn/pnatdb/).
DATABASE CONSTRUCTION
Data source
Of the 69 plant species, 27 have genomic information. For these 27 genomically sequenced species, the annotated transcription units (TUs) used for NAT prediction and other annotation information were downloaded from the specific genome-sequencing projects. Based on the fact that pseudogenes and transposons can form NATs with protein-coding genes (10,11,17), all the pseudogenes and transposons were retained for NAT prediction. For the remaining 42 plant species, the tentative consensus sequences (TCs), which can be used to provide putative genes with functional annotation similar to TUs were used for NAT prediction, and their related information were downloaded from The Gene Index Project (25).
SRNA high-throughput sequencing data sets of each species were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) (26). All the sRNA data sets retrieved for this study were summarized in Table 1.
Table 1.
No. | Species | GEO Data setsa |
|
---|---|---|---|
Series | Samples | ||
1 | Arabidopsis thaliana | 15 | 80 |
2 | Arabidopsis lyrata | 3 | 8 |
3 | Brachypodium distachyon | 2 | 4 |
4 | Chlamydomonas reinhardtii | 3 | 6 |
5 | Citrus sinensis | 1 | 2 |
6 | Gossypium hirsutum | 2 | 6 |
7 | Glycine max | 2 | 5 |
8 | Medicago truncatula | 2 | 5 |
9 | Nicotiana benthamiana | 2 | 6 |
10 | Oryza sativa subsp. indica | 1 | 2 |
11 | Oryza sativa subsp. japonica | 6 | 38 |
12 | Physcomitrella patens | 3 | 10 |
13 | Prunus persica | 1 | 2 |
14 | Solanum lycopersicum | 1 | 2 |
15 | Selaginella moellendorffii | 1 | 1 |
16 | Triticum aestivum | 2 | 2 |
17 | Vitis vinifera | 2 | 5 |
18 | Zea mays | 4 | 12 |
Total | 54 | 196 |
aNumber of GEO Series or GEO Samples in each species, including biological and technical replicates. Detailed information of the data sets in each species is provided at the PlantNATsDB website.
Note that all small RNA data sets in this study were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) (26).
Prediction of NAT pairs
Prediction of NAT pairs was performed as previously described (17,18,22). Specifically, the following criteria were used to identify cis-NATs and trans-NATs, respectively.
For cis-NATs, they can be grouped into five categories, namely: (i) Divergent (head to head or 5′ to 5′ overlap); (ii) Convergent (tail-to-tail or 3′ to 3′ overlap); (iii) Containing (full overlap); (iv) Nearby head-to-head (5′ close to 5′) and (v) Nearby tail-to-tail (3′ close to 3′) according to their relative orientation and degree of overlap (Figure 1A) (27). If a pair of transcripts is located in opposite strands at adjacent genomic loci and has at least 1 nt overlapping, or their distance on the chromosome is no >100 nt, then they were considered as a cis-NAT pair. In total, 27 plant species were subjected to cis-NAT prediction.
For trans-NATs, BLASTN (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/, Release 2.2.20) (28) was used to search for transcript pairs with high sequence complementary to each other and the following criteria should be satisfied for each transcript pair: (i) If the complementary region identified by BLAST covered more than half the length of either transcript, this transcript pair was designated to be a ‘high-coverage’ (HC) trans-NAT pair; (ii) If the two transcripts had a continuous complementary region >100 nt, they were classified as a ‘100-nt’ pair. Functional trans-NATs should form RNA–RNA duplexes in vivo. We therefore used DINAMelt (29) to verify whether the transcript pairs could melt into RNA–RNA duplexes in the complementary regions in silico. All the trans-NAT pairs based on BLAST search were further used to DINAMelt hybridization validation. The trans-NAT pair was retained if it satisfied: (i) the paired region indentified by DINAMelt should be coincident with the BLAST-based search; (ii) any bubble in the paired region predicted by DINAMelt should be no longer than 10% of the region. For the BLAST-based trans-NAT pairs that contain transcripts >10 kb, they were not applied to DINAMelt validation due to the heavy computational work. Instead, if the paired region identified by BLAST was >10% of its longer transcript, it was considered as verified trans-NAT.
All the NAT pairs predicted in this study were summarized in Table 2.
Table 2.
No. | Species | Genesa | NATs (cis, trans)b | No. | Species | Genesa | NATs (cis, trans)b |
---|---|---|---|---|---|---|---|
1 | Allium cepa | 4063 (10) | 5 (NA, 5) | 36 | Nicotiana benthamiana | 7712 (429) | 564 (NA, 564) |
2 | Aquilegia coerulea | 13 556 (655) | 141 (NA, 141) | 37 | Nicotiana tabacum | 45 554 (3962) | 3521 (NA, 3521) |
3 | Arabidopsis lyrata | 32 670 (4841) | 6757 (918, 5839) | 38 | Oryza sativa subsp. indica | 40 745 (10 153) | 144 088 (387, 143 701) |
4 | Arabidopsis thaliana | 33 239 (8049) | 7788 (3005, 4783) | 39 | Oryza sativa subsp. japonica | 57 624 (30 799) | 409 789 (1186, 408 603) |
5 | Beta vulgaris | 4785 (249) | 192 (NA, 192) | 40 | Ostreococcus lucimarinus CCE9901 | 7805 (2773) | 1498 (1482, 16) |
6 | Brachypodium distachyon | 25 532 (5363) | 107 933 (36, 107 897) | 41 | Ostreococcus tauri | 7725 (3030) | 1790 (1620, 170) |
7 | Brassica napus | 50 542 (20771) | 45 930 (NA, 45 930) | 42 | Panicum virgatum | 52 936 (4631) | 4802 (NA, 4802) |
8 | Capsicum annuum | 14 727 (2138) | 6119 (NA, 6119) | 43 | Petunia hybrida | 2259 (39) | 25 (NA, 25) |
9 | Carica papaya | 25 536 (3991) | 7302 (180, 7122) | 44 | Phaseolus coccineus | 22 518 (1063) | 754 (NA, 754) |
10 | Chlamydomonas reinhardtii | 15 935 (6549) | 26 879 (1450, 25 429) | 45 | Phaseolus vulgaris | 11 954 (638) | 433 (NA, 433) |
11 | Citrus clementina | 32 287 (2243) | 3554 (NA, 3554) | 46 | Physcomitrella patens | 35 938 (3976) | 24 396 (195, 24 201) |
12 | Citrus sinensis | 26 081 (3451) | 7492 (NA, 7492) | 47 | Picea abies | 42 746 (22360) | 43 535 (NA, 43 535) |
13 | Coffea canephora | 7511 (202) | 163 (NA, 163) | 48 | Pinus taeda | 39 798 (10897) | 14 298 (NA, 14 298) |
14 | Cucumis sativus | 32 775 (6104) | 23 373 (1471, 21 902) | 49 | Populus trichocarpa | 41 377 (5001) | 13 107 (744, 12 363) |
15 | Ectocarpus siliculosus | 9122 (387) | 340 (NA, 340) | 50 | Prunus persica | 27 852 (4642) | 26 163 (298, 25 865) |
16 | Euphorbia esula | 10 727 (103) | 96 (NA, 96) | 51 | Quercus robur | 17 804 (2138) | 2142 (NA, 2142) |
17 | Festuca arundinacea | 10 617 (309) | 151 (NA, 151) | 52 | Raphanus sativus | 17 939 (356) | 233 (NA, 233) |
18 | Festuca pratensis | 12 248 (156) | 96 (NA, 96) | 53 | Ricinus communis | 31 221 (2570) | 3348 (495, 2853) |
19 | Fragaria vesca | 34 809 (10 622) | 117 786 (574, 117 212) | 54 | Saccharum officinarum | 42 377 (5311) | 7210 (NA, 7210) |
20 | Glycine max | 46 367 (11 352) | 78 339 (436, 77 903) | 55 | Secale cereale | 1471 (52) | 32 (NA, 32) |
21 | Gossypium hirsutum | 50 081 (31 296) | 80 835 (NA, 80 835) | 56 | Selaginella moellendorffii | 22 285 (2399) | 1558 (669, 889) |
22 | Gossypium raimondii | 9508 (667) | 426 (NA, 426) | 57 | Solanum lycopersicum | 28 167 (20,3,9) | 1793 (NA, 1793) |
23 | Helianthus annuus | 20 130 (2460) | 2255 (NA, 2255) | 58 | Solanum melongena | 14 512 (219) | 336 (NA, 336) |
24 | Hordeum vulgare | 43 306 (6993) | 8503 (NA, 8503) | 59 | Solanum tuberosum | 31 972 (2849) | 2866 (NA, 2866) |
25 | Ipomoea nil | 11 754 (57) | 31 (NA, 31) | 60 | Sorghum bicolor | 34 496 (8231) | 145 374 (241, 145 133) |
26 | Lactuca sativa | 12 505 (347) | 263 (NA, 263) | 61 | Striga hermonthica | 9275 (178) | 128 (NA, 128) |
27 | Lactuca serriola | 8047 (215) | 140 (NA, 140) | 62 | Theobroma cacao | 14 724 (889) | 1593 (NA, 1593) |
28 | Lotus japonicus | 40 504 (7783) | 29 575 (126, 29 449) | 63 | Triphysaria eriantha | 17 442 (1491) | 1224 (NA, 1224) |
29 | Malus x domestica | 34 945 (3631) | 4356 (NA, 4356) | 64 | Triphysaria versicolor | 7165 (672) | 539 (NA, 539) |
30 | Manihot esculenta | 47 443 (14 342) | 30 308 (4454, 25 854) | 65 | Triticum aestivum | 93 508 (32 258) | 120 316 (NA, 120 316) |
31 | Medicago truncatula | 50 962 (18 083) | 164 686 (1151, 163 535) | 66 | Vigna unguiculata | 19 333 (592) | 405 (NA, 405) |
32 | Mesembryanthemum crystallinum | 3627 (207) | 156 (NA, 156) | 67 | Vitis vinifera | 26 346 (11 898) | 108 392 (685, 107 707) |
33 | Micromonas pusilla CCMP1545 | 10 547 (4717) | 11 881 (1573, 10 308) | 68 | Volvox carteri | 15 669 (7438) | 90 222 (273, 89 949) |
34 | Micromonas sp. RCC299 | 10 108 (4321) | 2338 (2189, 149) | 69 | Zea mays | 32 540 (6944) | 25 726 (1528, 24 198) |
35 | Mimulus guttatus | 27 501 (8885) | 160 109 (1032, 159 077) | Totalc | 1 746 886 (384 466) | 2 138 498 (28 398, 2 110 100) |
aNumber of genes used for NAT prediction in each species. The number of genes formed at least one NAT pair with other genes is shown in parenthesis.
bNumber of predicted NAT (cis- and trans-NAT) pairs in each species.
cThe total number in all species belonging to each categories.
Small RNA analysis
SRNA sequences containing incomplete information (such as containing ‘N’) with length <18 or >28 were removed for further analysis. For each data set, the filtered sRNA sequences were mapped to all the gene models of the related plant species. All mapping steps were performed using the Bowtie algorithm (30) allowing no mismatch. Besides, for comparison, the normalized abundance of sRNAs from each data set was calculated as RPMs (reads per million), which divided the read number of each sRNA by the total reads from this data set, and multiplied by 106.
For each NAT, an enrichment score was calculated to evaluate whether sRNAs were enriched in the overlapping region (17,18). The enrichment score E was calculated using the following formula:
where So = the total normalized abundance of the sRNAs generated from the overlapping region, Lo = the total length of the paired region of the two transcripts of the NATs, Sa = the total normalized abundance of the sRNAs generated from these two transcripts and La = the total length of the two transcripts. Furthermore, a standard χ2 test (Pearson's chi-square test) was performed to test the significance of the enrichment.
Database implementation
All the predicted NATs and processed sRNAs were organized and stored in the MySQL database (http://www.mysql.com/). Besides, the gene sequence information, annotated gene models and their functional annotations, including GO annotations, were collected and stored in the database. These genes can also be linked to external genome browsers. PlantNATsDB was implemented in JSP language and deployed on the Apache Tomcat web server (http://tomcat.apache.org/). The integrated network browser is created by Cytoscape Web program (http://cytoscapeweb.cytoscape.org/) (31). JavaScript and adobe flash player are required in order to use the full functionality of PlantNATsDB. PlantNATsDB can be accessed through IE 6.0 or higher, Netscape 7.0 or higher, Safari, Opera, Chrome and Firefox from multiple platforms.
WEB INTERFACE AND DATABASE USAGE
Search modules
PlantNATsDB provides various query interfaces and graphical visualization tools to facilitate the retrieve and demonstration of NAT data. Four major search modules for retrieving NATs are designed: ‘Simple Searcher’, ‘Batched Searcher’, ‘Advanced Searcher’ and ‘BLAST Searcher’. Alternatively, users can get the entire NAT list by species in the ‘Browser’ module. The ‘Simple Searcher’ module allows users to enter any keyword in all fields for all data entries, including gene locus identifiers (IDs), gene aliases or any words in their annotation texts. The ‘Batched Searcher’ module supports gene set search, which allows users to enter a list of gene locus IDs or gene aliases. The ‘Advanced Searcher’ was designed to facilitate users to access any NAT data according multiple options such as the plant species, the types of NATs, the length of overlapping regions and the GO annotation. In addition, users can perform a BLAST sequence search to retrieve NAT data in the fourth module, ‘BLAST Searcher’. All the search results performed by the above search modules can be further used for functional investigation (see below).
NAT information page
For each NAT pair, PlantNATsDB provides rich annotation according to the relationships between the related two genes. The result page largely comprises four main parts, i.e. NAT summarization, gene information, GO annotation and sRNA expression. Generally, all parts are displayed vividly in the graphical fashion. Figure 2 shows the example of SRO5 and P5CDH cis-NAT pair (9). The first part is the summary of NAT information and the overlapping region is highlighted (Figure 2A). The second part shows the detailed annotation of the two genes (Figure 2B). The third part displays the GO functional assessment of this NAT pair based on the GO annotation of the two genes (Figure 2C). Functional NAT pairs are expected to have similar ‘Molecular Function’, involve in the related ‘Biological Process’ and/or locate in the same ‘Cellular Component’. Therefore, the same GO terms shared by the two genes are highlighted in the GO network graph. The information provided in this part is very useful for evaluating the function of NAT. The last part provides sRNA expression derived from the NAT pair (Figure 2D). Based on the finding that sRNAs were the important component in the NAT regulatory pathway (9), most, if not all, of the sRNA data sets currently available were collected and further processed and organized into the database (Table 1). Thus, these invaluable data sources will be of much help to users to inquire the function of NATs. Furthermore, a user-friendly interface is provided that allows users to add or remove data sets for analysis and to highlight different regions of the NAT.
GO functional analysis module
Gene set analysis based on GO annotation (24) and statistical test is widely used to identify enriched GO categories and to explore the most important biological terms associated with the given gene set. A ‘Gene Set Analysis’ module (Figure 1B) has been developed for organizing a set of genes based on GO annotation, where the set of genes can be found by the search modules (see above) or collected in the network formed by NAT pairs (see below). Here, we used the combination of the χ2 test and Fisher's exact test to evaluate the significance of enrichment for GO category. Detailed methods can be referred from the PlantNATsDB website.
Graphical interaction network visualization
One gene may forms multiple NAT pairs with other antisense transcription partners, just as multiple paralogous genes may form RNA duplexes with the same antisense transcripts. Different NAT pairs might form complex regulatory networks in the related process (17). To this end, a graphical browser based on Cytoscape Web program (31) was developed to display the network formed by different NAT pairs (Figure 1B). Different types of nodes (genes) and relationships (NAT pairs) are colored distinctly. Moreover, the network graph can be edited (such as, to click/double-click/right-click the nodes/edges, to delete the nodes/edges, to apply distinct layouts and to export the graph in various formats) and all the genes contained in the network can be further subject to gene set analysis based on GO annotation (see above). In addition, users can use the toolkit of ‘My Network’, where genes or NAT pairs of interest may be stored temporarily on the server side during the session period and later retrieved in the ‘My Network’ page. There is a button to add selected genes or NAT pairs to ‘My Network’ in many pages of the website, which will greatly facilitate users’ digging out specific biological network formed by related NAT pairs involved in regulation of the interrelated process.
SUMMARY AND FUTURE DIRECTIONS
This work presents a comprehensive collection of plant NATs, which are organized and deposited in an online database named PlantNATsDB. The biological function of NAT pairs can be elucidated from the variously integrated data currently available. Moreover, vivid web interfaces are also designed to facilitate the presentation of NATs. PlantNATsDB serves the plant research community by providing a reference database to investigate the functions of NATs.
In the near future, PlantNATsDB will collect and include more experimentally validated data and plan to make distinction between experimentally determined and predicted NATs. In addition, more useful and precise algorithms or tools will be designed to evaluate the functions of NAT pairs or to dig out functional NAT pairs based on GO network graphs and NATs-formed regulatory network. For example, it would be helpful to put such a regulatory subnetwork graph to the context of a larger network. Besides, some NAT pairs or subnetworks formed by NATs may be conserved between species. PlantNATsDB intends to allow users to select a specific family and to make comparisons within the family members.
As new and improved high-throughput technologies are applied to a broader set of species, cell lines, tissues and conditions, more and more data sets will be generated, PlantNATsDB will be continuously maintained and timely updated to keep up with these improvements. In addition, gene expression data, such as ESTs (expression sequence tags), microarray and RNA-Seq data and degradome-sequencing data (32,33) will be integrated into PlantNATsDB to improve our understanding of the regulatory networks formed by NATs.
FUNDING
The National Natural Sciences Foundation of China (30971743, 31050110121, 31071659); The Ministry of Science and Technology of China (2009DFA32030); Program for New Century Excellent Talents in University of China (NCET-07-0740); Huazhong Agricultural University Scientific & Technological Self-innovation Foundation (2010SC07). Funding for Open Access charge: Partial waiver by Oxford University Press.
Conflict of interest statement. None declared.
ACKNOWLEDGEMENTS
The authors thank the Joint Genome Institute (http://www.jgi.doe.gov/) for the availability of the draft genome assemblies and the genomic annotation of Arabidopsis lyrata, Cucumis sativus, Manihot esculenta, Mimulus guttatus and Selaginella moellendorffii. The authors thank Dr Christian Klukas for his kind discussions. The authors also thank Dr Michael Galperin and the three anonymous referees for their constructive and helpful suggestions.
REFERENCES
- 1.Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. doi: 10.1016/j.cell.2009.02.006. [DOI] [PubMed] [Google Scholar]
- 2.Brosnan CA, Voinnet O. The long and the short of noncoding RNAs. Curr. Opin. Cell Biol. 2009;21:416–425. doi: 10.1016/j.ceb.2009.04.001. [DOI] [PubMed] [Google Scholar]
- 3.Ghildiyal M, Zamore PD. Small silencing RNAs: an expanding universe. Nat. Rev. Genet. 2009;10:94–108. doi: 10.1038/nrg2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 2009;10:155–159. doi: 10.1038/nrg2521. [DOI] [PubMed] [Google Scholar]
- 5.Lapidot M, Pilpel Y. Genome-wide natural antisense transcription: coupling its regulation to its different regulatory mechanisms. EMBO Rep. 2006;7:1216–1222. doi: 10.1038/sj.embor.7400857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vanhee-Brossollet C, Vaquero C. Do natural antisense transcripts make sense in eukaryotes? Gene. 1998;211:1–9. doi: 10.1016/s0378-1119(98)00093-6. [DOI] [PubMed] [Google Scholar]
- 7.Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G. In search of antisense. Trends Biochem. Sci. 2004;29:88–94. doi: 10.1016/j.tibs.2003.12.002. [DOI] [PubMed] [Google Scholar]
- 8.Faghihi MA, Wahlestedt C. Regulatory roles of natural antisense transcripts. Nat. Rev. Mol. Cell. Biol. 2009;10:637–643. doi: 10.1038/nrm2738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK. Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell. 2005;123:1279–1291. doi: 10.1016/j.cell.2005.11.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Watanabe T, Totoki Y, Toyoda A, Kaneda M, Kuramochi-Miyagawa S, Obata Y, Chiba H, Kohara Y, Kono T, Nakano T, et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature. 2008;453:539–543. doi: 10.1038/nature06908. [DOI] [PubMed] [Google Scholar]
- 11.Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S, Hodges E, Anger M, Sachidanandam R, Schultz RM, et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature. 2008;453:534–538. doi: 10.1038/nature06904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Okamura K, Balla S, Martin R, Liu N, Lai EC. Two distinct mechanisms generate endogenous siRNAs from bidirectional transcription in Drosophila melanogaster. Nat. Struct. Mol. Biol. 2008;15:998. doi: 10.1038/nsmb0908-998c. [DOI] [PubMed] [Google Scholar]
- 13.Czech B, Malone CD, Zhou R, Stark A, Schlingeheyde C, Dus M, Perrimon N, Kellis M, Wohlschlegel JA, Sachidanandam R, et al. An endogenous small interfering RNA pathway in Drosophila. Nature. 2008;453:798–802. doi: 10.1038/nature07007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ghildiyal M, Seitz H, Horwich MD, Li C, Du T, Lee S, Xu J, Kittler EL, Zapp ML, Weng Z, et al. Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science. 2008;320:1077–1081. doi: 10.1126/science.1157396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ron M, Alandete Saez M, Eshed Williams L, Fletcher JC, McCormick S. Proper regulation of a sperm-specific cis-nat-siRNA is essential for double fertilization in Arabidopsis. Genes Dev. 2010;24:1010–1021. doi: 10.1101/gad.1882810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Katiyar-Agarwal S, Morgan R, Dahlbeck D, Borsani O, Villegas A, Jr, Zhu JK, Staskawicz BJ, Jin H. A pathogen-inducible endogenous siRNA in plant immunity. Proc. Natl Acad. Sci. USA. 2006;103:18002–18007. doi: 10.1073/pnas.0608258103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhou X, Sunkar R, Jin H, Zhu JK, Zhang W. Genome-wide identification and analysis of small RNAs originated from natural antisense transcripts in Oryza sativa. Genome Res. 2009;19:70–78. doi: 10.1101/gr.084806.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chen D, Meng Y, Ma X, Mao C, Bai Y, Cao J, Gu H, Wu P, Chen M. Small RNAs in angiosperms: sequence characteristics, distribution and generation. Bioinformatics. 2010;26:1391–1394. doi: 10.1093/bioinformatics/btq150. [DOI] [PubMed] [Google Scholar]
- 19.Zhang Y, Li J, Kong L, Gao G, Liu QR, Wei L. NATsDB: Natural Antisense Transcripts DataBase. Nucleic Acids Res. 2007;35:D156–D161. doi: 10.1093/nar/gkl782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Osato N, Yamada H, Satoh K, Ooka H, Yamamoto M, Suzuki K, Kawai J, Carninci P, Ohtomo Y, Murakami K, et al. Antisense transcripts with rice full-length cDNAs. Genome Biol. 2003;5:R5. doi: 10.1186/gb-2003-5-1-r5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang XJ, Gaasterland T, Chua NH. Genome-wide prediction and identification of cis-natural antisense transcripts in Arabidopsis thaliana. Genome Biol. 2005;6:R30. doi: 10.1186/gb-2005-6-4-r30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang H, Chua NH, Wang XJ. Prediction of trans-antisense transcripts in Arabidopsis thaliana. Genome Biol. 2006;7:R92. doi: 10.1186/gb-2006-7-10-r92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jin H, Vacic V, Girke T, Lonardi S, Zhu JK. Small RNAs and the regulation of cis-natural antisense transcripts in Arabidopsis. BMC Mol. Biol. 2008;9:6. doi: 10.1186/1471-2199-9-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lee Y, Tsai J, Sunkara S, Karamycheva S, Pertea G, Sultana R, Antonescu V, Chan A, Cheung F, Quackenbush J. The TIGR gene indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res. 2005;33:D71–D74. doi: 10.1093/nar/gki064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–210. doi: 10.1093/nar/30.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Osato N, Suzuki Y, Ikeo K, Gojobori T. Transcriptional interferences in cis natural antisense transcripts of humans and mice. Genetics. 2007;176:1299–1306. doi: 10.1534/genetics.106.069484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 29.Markham NR, Zuker M. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res. 2005;33:W577–W581. doi: 10.1093/nar/gki591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD. Cytoscape Web: an interactive web-based network browser. Bioinformatics. 2010;26:2347–2348. doi: 10.1093/bioinformatics/btq430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.German MA, Pillay M, Jeong DH, Hetawal A, Luo S, Janardhanan P, Kannan V, Rymarquis LA, Nobuta K, German R, et al. Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol. 2008;26:941–946. doi: 10.1038/nbt1417. [DOI] [PubMed] [Google Scholar]
- 33.Addo-Quaye C, Eshoo TW, Bartel DP, Axtell MJ. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr. Biol. 2008;18:758–762. doi: 10.1016/j.cub.2008.04.042. [DOI] [PMC free article] [PubMed] [Google Scholar]