Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2015 Nov 2;44(Database issue):D324–D329. doi: 10.1093/nar/gkv1175

sORFs.org: a repository of small ORFs identified by ribosome profiling

Volodimir Olexiouk 1,*, Jeroen Crappé 1, Steven Verbruggen 1, Kenneth Verhegen 2,3, Lennart Martens 2,3, Gerben Menschaert 1,*
PMCID: PMC4702841  PMID: 26527729

Abstract

With the advent of ribosome profiling, a next generation sequencing technique providing a “snap-shot’’ of translated mRNA in a cell, many short open reading frames (sORFs) with ribosomal activity were identified. Follow-up studies revealed the existence of functional peptides, so-called micropeptides, translated from these ‘sORFs’, indicating a new class of bio-active peptides. Over the last few years, several micropeptides exhibiting important cellular functions were discovered. However, ribosome occupancy does not necessarily imply an actual function of the translated peptide, leading to the development of various tools assessing the coding potential of sORFs. Here, we introduce sORFs.org (http://www.sorfs.org), a novel database for sORFs identified using ribosome profiling. Starting from ribosome profiling, sORFs.org identifies sORFs, incorporates state-of-the-art tools and metrics and stores results in a public database. Two query interfaces are provided, a default one enabling quick lookup of sORFs and a BioMart interface providing advanced query and export possibilities. At present, sORFs.org harbors 263 354 sORFs that demonstrate ribosome occupancy, originating from three different cell lines: HCT116 (human), E14_mESC (mouse) and S2 (fruit fly). sORFs.org aims to provide an extensive sORFs database accessible to researchers with limited bioinformatics knowledge, thus enabling easy integration into personal projects.

INTRODUCTION

Small open reading frames (sORFs) can be defined as open reading frames smaller than or equal to 300 nucleotides (100 amino acids). These ‘sORFs’, while inherent to all genomes, were historically ignored in gene annotation studies, stating that these lack any coding potential (1). Mainly due to their small size they were thought to occur by chance, however, some longer sORFs resemble protein-coding ORFs and thus simplify their annotation. Exclusion of these sORFs has emerged during the development of different (gene prediction) tools in the field of bioinformatics/genomics/proteomics trying to reduce noise, imposed by technological limitations. For in silico prediction sORFs are excluded because these can easily occur by chance due to their small size. RNAseq driven transcriptomics is ignorant to ORF delineation and thus mainly focuses on the longest available ORF in the transcript sequence. As for MS-based proteomics studies, the small protein products are often lost in sample preparation steps and furthermore micropeptides are thought to be low abundant and can have tissue/time specific expression, further impeding their identification. The search for micropeptides, defined as translation products from sORFs, was nourished with the advent of ribosome profiling (2,3), a next generation sequencing technique. Ribosome profiling (RIBO-seq) recovers and subsequently sequences the ±30 nt RNA fragments captured within translating ribosomes. This technique differs from a regular RNA-seq setup, as a ‘snap-shot’ is provided of what is being translated in a cell, rather than what is expressed in a cell. In this context, it allows to detect translated sORFs, possibly encoding functional peptides or small proteins. Standard RNA sequencing techniques are unable to detect translated sORFs. Mass spectrometry is routinely used to detect and measure translation products. Although this technique is rapidly improving in sensitivity, detection of translating sORFs remains very difficult, making RIBO-seq (4) the preferred tool for sORF discovery. Also, RIBO-seq enables translation initiation site (TIS) detection through specific antibiotics treatment using harringtonine (HARR) or lactimidomycin (LTM). These drugs make that initiating ribosomes are stalled at the translation initiation site as opposed to the normal procedure where all translating ribosomes are obtained after cycloheximide (CHX) treatment. While RIBO-seq provides data on many putatively functional translated sORFs, ribosome occupancy does not automatically imply true coding and function at the peptide level. Consequently several tools/metrics have been published in order to assess the coding potential (i.e. the potential to encode functional peptides) of RIBO-seq/sORFs/micropeptide related data. Analytical methods measuring the coding potential can be either sequence based: multiple sequence alignment-based phylogenetic analysis, sequence variation or based on RIBO-seq: sequence similarity analysis ribosome protected fragment (RPF) length analysis, RPF reading frame analysis. Despite the onerous proteomic identification of micropeptides, it is still the best methodology to truly (at amino acid level) identify micropeptides. Since the advent of RIBO-seq, the biological functions of several micropeptides were unraveled. Toddler, for example, is an embryonic signal that promotes cell movement (5), Pri-peptides regulate various development steps across many insect species (6), Sarcolipin regulates muscle-based thermogenesis in mammals (7) and Myoregulin regulates Ca (2+) handling in muscle cells (8). These examples highlight the uprising importance of micropeptides (911). The creation of a public repository for sORFs, holding a growing number of RIBO-seq studies and providing information resulting from various tools and metrics, seems a necessity in aiding the necessary functional research in the micropeptide field. Here, we present www.sorfs.org, a comprehensive repository of sORFs identified by RIBO-seq, currently harboring 263 354 sORFs originating from three different species (human, mouse, fruit fly).

MATERIALS AND METHODS

Database development

The current sORF identification pipeline requires RIBO-seq data after both CHX-treatment, capturing elongating ribosomes, and HARR- or LTM-treatment, resulting in initiating ribosomes (12). The RIBO-seq sequence reads are first aligned using the STAR splice site aware mapper (13), as described by the PROTEO-FORMER pipeline (14). Reference genome indexes and gene annotation information are retrieved from the iGenomes repository (based on Ensembl annotation version 75, https://support.illumina.com/sequencing/sequencing_software/igenome.html) and are updated on every new release. A summary of parameters, mapping statistics as well as quality control files (FastQC (15)) can be found on the sorfs.org ‘data sets’ page. Secondly, the translation initiation sites are determined using criteria defined by Lee et al.(16). A full description of the TIS-calling implementation can be found in the PROTEOFORMER pipeline (14). Subsequently, sORFs are assembled starting from the detected TIS positions extending the sequence to the next stop codon situated 10–100 amino acids further upstream and in-frame relative to the TIS. Here, existing gene annotation information can optionally be taken into account (either or not splice-aware). Alongside the genomic positions a number of general sORF related characteristics are calculated. These include the mass of the resulting peptide, the mRNA and peptide sequence, a categorization based on the Ensembl mRNA annotation (5′ UTR, exonic, intronic, 3′ UTR, ncRNA or intergenic). For intergenic sORFs the distance to the nearest up- and downstream gene is calculated and for each 5′ UTR, exonic or intronic sORF the percentage of overlap with exonic regions is retrieved and a possible frameshift is determined relative to the overlapping Ensembl transcript. The RPF and RPF-fragments per kilobase of coding region per million aligning reads (RPKM) are computed as described in Ingolia et al. (2). A unique ID is provided to all identified sORFs, constructed from the corresponding cell line and an auto-incremental number as follows: [cell line]:[auto-incremental number]. All data are generated using in-house Perl (version 5.16.3) and Python (version 2.7.10) scripts and stored in a MySQL database (version 5.5.42). Currently sORFs.org holds three RIBO-seq data sets from three different cell-lines: HCT116 (human colon cancer cell line), E14_mESC (Mouse embryonic stem cells, 14 days old) and S2 (20–24 h old Drosophila melanogaster embryos). A detailed overview of the cell lines can be found at http:/www.sorfs.org/dataset_information. With every iGenomes update, data will be reprocessed and updated within the next month. New data sets are actively searched for and will be included if permitted by the owners, after a manual inspection of the data (quality control) and should be expected to be included within the next month. Same holds for data submitted by users.

The sorfs.org web interface was build using the laravel PHP-framework (version 4.2), applying the model-view-controller (MVC) architectural paradigm. The web interface was developed using HTML, PHP, CSS, SQL and JavaScript. Two different query interfaces are provided to the user. The default query interface (see Figure 1A) provides real-time lookup of sORFs with limited query possibilities, excelling in the quick lookup of specific sORFs. Secondly a BioMart (17) (version 0.9.0) query interface (see Figure 1B) was developed enabling advanced query and export options. A comprehensive guide for both query interfaces is provided on sORFs.org.

Figure 1.

Figure 1.

(A) sorfs.org default query interface. (B) sorfs.org BioMart query interface.

Coding potential assessment

Based on sequence conservation

Several algorithms are implemented providing coding evidence of the identified sORFs. A PhyloCSF conservation analysis (18) uses species-specific multiple alignment files from UCSC (19) in order to obtain a score representing the phylogenetic conservation of a sORF. PhyloCSF examines evolutionary signatures characteristic to alignments of conserved coding regions in order to determine whether a multi-species nucleotide sequence alignment is likely to represent a protein-coding region.

Based on ribosome profiling data

(i) The fragment length organization similarity score (FLOSS), described by Ingolia et al.(20), measures the magnitude of disagreement between the RPF-length distribution of Ensembl annotated protein coding sequences and the RPF-length distribution of a sORF. This fragment length metric enables to identify true ribosome footprints bioinformatically. Additionally a classification is formalized by defining a threshold FLOSS value. (ii) The ORFscore, a novel metric described by Bazinni et al.(21), quantifies the preference of RPFs to accumulate in the first frame of the coding sequence, as an indication for true coding sequences. The ORFscore, specifically designed for small ORFs, is calculated by counting RPFs in each frame and subsequently comparing this distribution to an equally sized uniform distribution using a modified chi-squared statistic. Only RPFs with length corresponding to the most abundant, in-frame RPF found in the Ensembl canonical protein coding transcripts, are used. For example if the annotated Ensembl CDS contains mostly 29-bp long footprints, only these 29 bp footprints will be used for the ORFscore analysis within this region.

Based on sequence variation

Sequence variation (i.e. mutations, insertions or deletions) associated with distinct phenotypes provides information on the function of that genomic/mRNA region. Associating sequence variation with sORFs provides evidence for functionally important sORFs. The Ensembl variation database (22,23) (including dbSNP, ClinVar, Cosmic …) is used as the source for sequence variation. Important to note: no filters were applied on these variation sources; caution is advised as some sources contain machine-annotated variations.

Based on sequence homology

Sequence similarity between sORFs and known proteins can discover false positives sORF annotations (e.g. a 5′ UTR sORF matching an unannotated protein isoform). The ‘Basic Local Alignment Search Tool protein’ (BLASTp) (24,25) was used to calculate AA-sequence similarity between sORFs and the Non-redundant (NR) protein sequence database (NCBI) (26). An expected value (E-value) of 10 holds as an upper threshold to define adequate similar sequences.

In order to provide some insight into various sORF attributes (TIS distribution, Ensembl annotation, PhyloCSF, FLOSS, variation analysis) as well as the data, overview plots were generated summarizing the outcome of these in silico analyses (Supporting Material S1).

Based on mass spectrometry fragmentation spectra identification

An automated pipeline was developed to reprocess the PRIDE (27,28) repository to identify micropeptides. The sequence searching pipeline consisted of pride-asap (29) to extract and infer the correct search parameters, SearchGUI (30) version 2.0.4 for the search engine management and finally PeptideShaker (31) version 1.0.1 for the post-processing of the algorithms output and the filtration for validated spectra.

To minimalize the chances of erroneously assigning a spectrum to a sORF instead of an known human protein, a two stage search approach was used: a filtering search identifying all spectra at a 1% FDR rate at the PSM level against human UniProt-KB (32,33) including isoforms, release 10_2015 and the cRAP library (34) (i), and a follow up search of the non-validated spectra against a sequence database containing the hypothetical sequences of sORF translation products (ii).

The PRIDE ReSpin results are represented on the sORF detail page and can be queried from the BioMart query interface. More information can be found in Supporting Material S2.

sORFs.org access

sORFs.org is publicly available through a web interface located at (http://www.sorfs.org). sORFs.org has two different query interfaces, the default query interface (http://www.sorfs.org/database) allows to query on basic sORF attributes (ID, species, cell line, genomic position, length, annotation, biotype, sequence). Additionally a BioMart query interface (http://www.sorfs.org/BioMart) allows to query on all possible features and export the filtered data. A manual is provided for both query interfaces next to the corresponding query interface page. All sORFs can be individually inspected on a detail page (Figure 2), displaying all the sORF attributes. This detail page also contains a RIBO-seq visualization tool, permitting manual inspection of RIBO-seq data. The visualization tool enables selection of RPFs based on length or reading frame (Figure 3). Furthermore the detail page contains a hyperlink through the ‘gene location’ attribute, where the mapped RIBO-seq data are available for inspection in the UCSC browser (35,36). Researches can submit data and papers through the ‘submit’ (http://www.sorfs.org/submit) page and sORFs.org can be contacted through the ‘contact’ (http://www.sorfs.org/contact) page.

Figure 2.

Figure 2.

sORF detail page.

Figure 3.

Figure 3.

RIBO-SEQ visualization tool with options.

CONCLUSION AND FUTURE DIRECTION

Although the micropeptide research field has grown significantly, it still remains in its infancy. The existence of micropeptides has been long neglected, but refusing to accept their significance could impair our scientific knowledge. Since the advent of RIBO-seq, various tools and metrics have been developed to discover sORFs. sORFs.org aims to perform these tools and metrics, integrate these various data sources, and furthermore use visualization tools and intuitive querying interfaces to enable wet lab researchers to question this pool of information. Consequently the micropeptide research field will become more accessible. This sORFs.org resource can also significantly facilitate other follow-up analyses. A sORFs sequence database can be constructed to use in MS-based identification. Also, certain (disease) phenotype related variations could be explained because they reside within a sORF, encoding a functional micropeptide.

As RIBO-seq becomes more appreciated, sORFs.org is expected to elaborate on the number of data sets and supported species. Simultaneously new tools and metrics will be incorporated following new developments in the field. For instance, a pipeline is being developed to allow sORF identification from RIBO-seq data lacking HARR/LTM treatment. sORFs.org contains the potential to become a community resource for sORFs and micropeptide research.

Acknowledgments

We would like to thank Joshua Dunn for providing us with the S2 (fruit fly) RIBO-seq data.

Footnotes

Present address: Volodimir Olexiouk, BioBiX - Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Building A, Ghent 9000, Belgium.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Postdoctoral Fellows of the Research Foundation – Flanders (FWO-Vlaanderen) [12A7813N to G.M.]. Research Foundation – Flanders (FWO-Vlaanderen) [G0D3114N to V.O]. Funding for open access charge: Ghent University.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Frith M.C., Forrest A.R., Nourbakhsh E., Pang K.C., Kai C., Kawai J., Carninci P., Hayashizaki Y., Bailey T.L., Grimmond S.M. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2006;2:515–528. doi: 10.1371/journal.pgen.0020052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ingolia N.T., Ghaemmaghami S., Newman J.R.S., Weissman J.S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324:218–223. doi: 10.1126/science.1168978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ingolia N.T. Ribosome profiling: new views of translation, from single codons to genome scale. Nat. Rev. Genet. 2014;15:205–13. doi: 10.1038/nrg3645. [DOI] [PubMed] [Google Scholar]
  • 4.Crappé J., Van Criekinge W., Trooskens G., Hayakawa E., Luyten W., Baggerman G., Menschaert G. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs. BMC Genomics. 2013;14:648–660. doi: 10.1186/1471-2164-14-648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Pauli A., Norris M.L., Valen E., Chew G.-L., Gagnon J. a, Zimmerman S., Mitchell A., Ma J., Dubrulle J., Reyon D. Toddler: an embryonic signal that promotes cell movement via Apelin receptors. Science. 2014;343:1248636–1248644. doi: 10.1126/science.1248636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Chanut-Delalande H., Hashimoto Y., Pelissier-Monier A., Spokony R., Dib A., Kondo T., Bohère J., Niimi K., Latapie Y., Inagaki S. Pri peptides are mediators of ecdysone for the temporal control of development. Nat. Cell Biol. 2014;16:1035–1044. doi: 10.1038/ncb3052. [DOI] [PubMed] [Google Scholar]
  • 7.Magny E.G., Pueyo J.I., Pearl F.M.G., Cespedes M.A., Niven J.E., Bishop S. a, Couso J.P. Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames. Science. 2013;341:1116–1120. doi: 10.1126/science.1238802. [DOI] [PubMed] [Google Scholar]
  • 8.Tonkin J., Rosenthal N. One small step for muscle: a new micropeptide regulates performance. Cell Metab. 2015;21:515–516. doi: 10.1016/j.cmet.2015.03.013. [DOI] [PubMed] [Google Scholar]
  • 9.Crappé J., Van Criekinge W., Menschaert G. Little things make big things happen: A summary of micropeptide encoding genes. EuPA Open Proteomics. 2014;3:128–137. [Google Scholar]
  • 10.Andrews S.J., Rothnagel J.A. Emerging evidence for functional peptides encoded by short open reading frames. Nat. Rev. Genet. 2014;15:193–204. doi: 10.1038/nrg3520. [DOI] [PubMed] [Google Scholar]
  • 11.Slavoff S.A., Mitchell A.J., Schwaid A.G., Cabili M.N., Ma J., Levin J.Z., Karger A.D., Budnik B.A., Rinn J.L., Saghatelian A. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 2013;9:59–64. doi: 10.1038/nchembio.1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ingolia N.T., Brar G.A., Rouskin S., McGeachy A.M., Weissman J.S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat. Protoc. 2012;7:1534–1550. doi: 10.1038/nprot.2012.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Crappé J., Ndah E., Koch A., Steyaert S., Gawron D., De Keulenaer S., De Meester E., De Meyer T., Van Criekinge W., Van Damme P. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration. Nucleic Acids Res. 2015;43:e29–e39. doi: 10.1093/nar/gku1283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Andrews S. FastQC: A quality control tool for high throughput sequence data. babraham Bioinformatics. 2010 [Google Scholar]
  • 16.Lee S., Liu B., Lee S., Huang S.-X., Shen B., Qian S.-B. PNAS Plus: Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc. Natl. Acad. Sci. U.S.A. 2012;109:E2424–E2432. doi: 10.1073/pnas.1207846109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Smedley D., Haider S., Durinck S., Pandini L., Provero P., Allen J., Arnaiz O., Awedh M.H., Baldock R., Barbiera G., et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 2015;43:589–598. doi: 10.1093/nar/gkv350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lin M.F., Jungreis I., Kellis M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics. 2011;27:275–282. doi: 10.1093/bioinformatics/btr209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Miller W., Rosenbloom K., Hardison R.C., Hou M., Taylor J., Raney B., Burhans R., King D.C., Baertsch R., Blankenberg D., et al. 28-Way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res. 2007;17:1797–1808. doi: 10.1101/gr.6761107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ingolia N.T., Brar G.A., Stern-Ginossar N., Harris M.S., Talhouarne G.J.S., Jackson S.E., Wills M.R., Weissman J.S. Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding Genes. Cell Rep. 2014;8:1365–1379. doi: 10.1016/j.celrep.2014.07.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bazzini A.A., Johnstone T.G., Christiano R., MacKowiak S.D., Obermayer B., Fleming E.S., Vejnar C.E., Lee M.T., Rajewsky N., Walther T.C., et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 2014;33:981–993. doi: 10.1002/embj.201488411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chen Y., Cunningham F., Rios D., McLaren W.M., Smith J., Pritchard B., Spudich G.M., Brent S., Kulesha E., Marin-Garcia P. Ensembl variation resources. BMC Genomics. 2010;11:293–309. doi: 10.1186/1471-2164-11-293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cunningham F., Amode M.R., Barrell D., Beal K., Billis K., Brent S., Carvalho-Silva D., Clapham P., Coates G., Fitzgerald S., et al. Ensembl 2015. Nucleic Acids Res. 2014;43:D662–D669. doi: 10.1093/nar/gku1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.McGinnis S., Madden T.L. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004;32:W20-–W25. doi: 10.1093/nar/gkh435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Madden T. The NCBI Handbook [Internet] 2nd edition 2013. The BLAST sequence analysis tool. [Google Scholar]
  • 26.Ostell J., McEntyre J. The NCBI Handbook [Internet] 1st edition 2007. The NCBI Handbook. [Google Scholar]
  • 27.Vizcaíno J.A., Côté R.G., Csordas A., Dianes J.A., Fabregat A., Foster J.M., Griss J., Alpi E., Birim M., Contell J. The Proteomics Identifications (PRIDE) database and associated tools: Status in 2013. Nucleic Acids Res. 2013;41:D1063–D1069. doi: 10.1093/nar/gks1262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Martens L., Hermjakob H., Jones P., Adamsk M., Taylor C., States D., Gevaert K., Vandekerckhove J., Apweiler R. PRIDE: the proteomics identifications database. Proteomics. 2005;5:3537–3545. doi: 10.1002/pmic.200401303. [DOI] [PubMed] [Google Scholar]
  • 29.Hulstaert N., Reisinger F., Rameseder J., Barsnes H., Vizcaíno J.A., Martens L. Pride-asap: automatic fragment ion annotation of identified PRIDE spectra. J. Proteomics. 2013;95:89–92. doi: 10.1016/j.jprot.2013.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vaudel M., Barsnes H., Berven F.S., Sickmann A., Martens L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics. 2011;11:996–999. doi: 10.1002/pmic.201000595. [DOI] [PubMed] [Google Scholar]
  • 31.Vaudel M., Burkhart J.M., Zahedi R.P., Oveland E., Berven F.S., Sickmann A., Martens L., Barsnes H. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat. Biotechnol. 2015;33:22–24. doi: 10.1038/nbt.3109. [DOI] [PubMed] [Google Scholar]
  • 32.Boutet E., Lieberherr D., Tognolli M., Schneider M., Bairoch A. UniProtKB/Swiss-Prot. Methods Mol. Biol. 2007;406:89–112. doi: 10.1007/978-1-59745-535-0_4. [DOI] [PubMed] [Google Scholar]
  • 33.Apweiler R., Bateman A., Martin M.J., O'Donovan C., Magrane M., Alam-Faruque Y., Alpi E., Antunes R., Arganiska J., Casanova E.B., et al. Activities at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2014;42:D191–D198. doi: 10.1093/nar/gkt1140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mellacheruvu D., Wright Z., Couzens A.L., Lambert J.-P., St-Denis N. a, Li T., Miteva Y.V., Hauri S., Sardiu M.E., Low T.Y., et al. The CRAPome: a contaminant repository for affinity purification-mass spectrometry data. Nat. Methods. 2013;10:730–736. doi: 10.1038/nmeth.2557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler a.D. The Human Genome Browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zweig A.S., Karolchik D., Kuhn R.M., Haussler D., Kent W.J. UCSC genome browser tutorial. Genomics. 2008;92:75–84. doi: 10.1016/j.ygeno.2008.02.003. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES