Table 1.
Type of data | Tripal module | Upload file format | Controlled vocabulary | Corresponding NCBI database |
---|---|---|---|---|
RNA Sequence read | Feature (core) | Fasta | Sequence ontology | Sequence read archive |
Assembled transcript sequence | Feature (core) and analysis unigene (extension) | Fasta | Sequence ontology | Transcriptome shotgun assembly |
Gene sequence | Feature (core) | Fasta | Sequence ontology | Gene |
BLAST result | BLAST (extension) | XML | Gene ontology | NA |
KEGG result | KEGG (extension) | Tab-delimited | Gene ontology | NA |
Biomaterial | Analysis expression (extension) | XML or tab-delimited | Species-dependenta | BioSample |
Gene expression values | Analysis expression (extension) | Matrix or column files | NA | Gene expression omnibuss |
Gene expression experiment methods | Analysis expression (extension) | Descriptive text | NA | BioProject |
Data may be sourced from or uploaded to a corresponding database in NCBI.
Biomaterials may be associated with multiple controlled vocabulaties, often species-specific. For example, plant samples may be described by anatomical structure and development stage with the Plant Ontology (20), by phenotype with the Plant Trait Ontology or by stress treatment with the Plant Stress Ontology. All of these ontologies are available through Planteome (http://planteome.org/).