Skip to main content
. Author manuscript; available in PMC: 2008 Jun 3.
Published in final edited form as: Nat Biotechnol. 2008 May;26(5):541–547. doi: 10.1038/nbt1360
Report type
Investigation EU BA PL VI OR ME
• Submit to trace archives and INSDC M M M M M M
• Investigation type (i.e., report type) M M M M M M
• Project name2 M M M M M M
  • Study
  • Environment
    • Geographic location (latitude and longitudefloat (point, transect and region), depth and altitude of sample)(integer) M M M M M M
    • Time of sample collection(UCT) M M M M M M
    • HabitatEnvO M M M M M M
    MIMS extension: select to report a set of uniform measurements for a given habitat: M
    • Water body: (temperature, pH, salinity, pressure, chlorophyll, conductivity, light intensity, dissolved organic carbon (DOC), current, atmospheric data, density, alkalinity, dissolved oxygen, particulate organic carbon (POC), phosphate, nitrate, sulfates, sulfides, primary production)(integer, unit)
  • Nucleic acid sequence source
    • Subspecific genetic lineage (below lowest rank of NCBI taxonomy, which is subspecies) (e.g., serovar, biotype, ecotype)(CABRI) M M M M M
    • Ploidy (e.g., allopolyploid, polyploid)(PATO) M
    • Number of replicons (EU, BA: chromosomes (haploid count); VI: segments)(integer) M M M
    • Extrachromosomal elements(integer) X M
    • Estimated size (before sequencing; to apply to all draft genomes)(integer; base pairs) M X X X X
    • Reference for biomaterial (primary publication if isolated before genome publication; otherwise, primary genome report)(PMID or DOI) X M X X X X
    • Source material identifiers: (cultures of microorganisms: identifiers(alphanumeric) for two culture collections(OBI); specimens (e.g., organelles and Eukarya): voucher condition and location(CV)) M M M M M M
    • Known pathogenicity M M
    • Biotic relationship (e.g., free-living, parasite, commensal, symbiont)(OBI) X M X
    • Specific host (e.g., host taxid, unknown, environmental)EnvO X M M M
    • Host specificity or range(taxid) X X X M
    • Health or disease status of specific host at time of collection (e.g., alive, asymptomatic)PATO M M
    • Trophic level (e.g., autotroph, heterotroph)PATO M M
    • Propagation (phage: lytic or lysogenic; plasmid: incompatibility group)(CV) M M M
    • Encoded traits (e.g., plasmid: antibiotic resistance; phage: converting genes)(CV; see caption) X M M X
    • Relationship to oxygen (e.g., aerobic, anaerobic)PATO M
    • Isolation and growth conditions(PMID or DOI) M M M M M M
    • Biomaterial treatment (e.g., filtering of sea water)(OBI) M
    • Volume of sample(integer) M
    • Sampling strategy (enriched, screened, normalized)(CV) M
• Assay
  • Sequencing
    • Nucleic acid preparation (extraction method(CV); amplification(CV) M M M M M M
    • Library construction (library size(integer), number of reads sequenced(integer), vector(CV) M
    • Sequencing method (e.g., dideoxysequencing, pyrosequencing, polony)(OBI) M M M M M M
    • Assembly (assembly method(CV), estimated error rate(unit) and method of calculation(CV)) M M M M M M
    • Finishing strategy (status—e.g., complete or draft(CV), coverage(integer), contigs(integer)) M M X X X X
    • Relevant Standard Operating Procedures (SOPs) M M M M M M
    • Relevant electronic resources M M M M M M

All proposed descriptors in MIGS and the reports (groups) to which they apply are listed. EU, eukaryotes; BA, bacteria and archaea; PL, plasmid; VI, virus; OR, organelle; ME, metagenome. Each descriptor has superscripts denoting its ‘type’ (e.g., integer or controlled vocabulary (CV) term). For items marked “CV,” candidate OBO ontologies (http://obofoundry.org), if available, have been selected for use. EnvO, The Environment Ontology; PATO, the Phenotype and Trait Ontology; CABRI, Common Access to Biological Resources and Information. Mixed ontologies may be useful for the “encoded traits” descriptor: the PATO term “resistant” could be used with a ChEBI term—for example, “penicillin”—to note antibiotic resistance to a given compound. Descriptors in shaded rows are common to all report types and are considered the ‘core’ of MIGS. “Source material identifier” is an exception; the GSC recommends this be a core descriptor, but as yet, physical archives are not yet routinely created for all cases or types of biological material subjected to genome sequencing (the recommended deposition in at least two culture collections for viable samples20 and vouchers for specimens). This is due to both cultural and technical issues. The need for universal and unique identifiers for metagenomic samples is an idea recently discussed in an exploratory workshop organized by the MetaFunctions group (http://www.metafunctions.org). In fact, the application of MIGS to our complete genome collection will require the designation of permanent and unique identifiers for all genome projects, something the INSDC is working to implement21. Geographic location is applied in principle to all report types, but we recognize that many isolates, especially eukaryotes, are highly domesticated laboratory organisms distantly separated from an environmental context of relevance. All descriptors deemed to be core are marked “M” (minimum) and others which could be optionally applied to other groups with high priority are marked “X” (extra). Taxonomic groups for which a descriptor cannot be meaningfully applied are marked with a dash. This list of minimal information is recognized by the GSC as just a starting point for the description of genomes and metagenomes. PMID, PubMed identifier; DOI, digital object identifier; float, floating-point decimal; UCT, Coordinated Universal Time (YYYY-MM-DD); unit, a suitable unit of measure. The descriptors isolation and growth conditions take citations as their values because the information can not be contained in a single value (or small set of values) like those of all other fields. This could be given as the PMID or DOI of the publication. It could also be an SOP. In principle, all aspects of the checklist could be substantiated with a reference in addition to a value, and this would be captured at the level of implementation.