Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jun 5.
Published in final edited form as: Nat Biotechnol. 2011 May;29(5):415–420. doi: 10.1038/nbt.1823

Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

Pelin Yilmaz 1,2, Renzo Kottmann 1, Dawn Field 3, Rob Knight 4,5, James R Cole 6,7, Linda Amaral-Zettler 8, Jack A Gilbert 9,10,11, Ilene Karsch-Mizrachi 12, Anjanette Johnston 12, Guy Cochrane 13, Robert Vaughan 13, Christopher Hunter 13, Joonhong Park 14, Norman Morrison 3,15, Philippe Rocca-Serra 16, Peter Sterk 3, Manimozhiyan Arumugam 17, Mark Bailey 3, Laura Baumgartner 18, Bruce W Birren 19, Martin J Blaser 20, Vivien Bonazzi 21, Tim Booth 3, Peer Bork 17, Frederic D Bushman 22, Pier Luigi Buttigieg 1,2, Patrick S G Chain 7,23,24, Emily Charlson 22, Elizabeth K Costello 4, Heather Huot-Creasy 25, Peter Dawyndt 26, Todd DeSantis 27, Noah Fierer 28, Jed A Fuhrman 29, Rachel E Gallery 30, Dirk Gevers 19, Richard A Gibbs 31,32, Inigo San Gil 33, Antonio Gonzalez 34, Jeffrey I Gordon 35, Robert Guralnick 28,36, Wolfgang Hankeln 1,2, Sarah Highlander 31,37, Philip Hugenholtz 38, Janet Jansson 23,39, Andrew L Kau 35, Scott T Kelley 40, Jerry Kennedy 4, Dan Knights 34, Omry Koren 41, Justin Kuczynski 18, Nikos Kyrpides 23, Robert Larsen 4, Christian L Lauber 42, Teresa Legg 28, Ruth E Ley 41, Catherine A Lozupone 4, Wolfgang Ludwig 43, Donna Lyons 42, Eamonn Maguire 16, Barbara A Methé 44, Folker Meyer 10, Brian Muegge 35, Sara Nakielny 4, Karen E Nelson 44, Diana Nemergut 45, Josh D Neufeld 46, Lindsay K Newbold 3, Anna E Oliver 3, Norman R Pace 18, Giriprakash Palanisamy 47, Jörg Peplies 48, Joseph Petrosino 31,37, Lita Proctor 21, Elmar Pruesse 1,2, Christian Quast 1, Jeroen Raes 49, Sujeevan Ratnasingham 50, Jacques Ravel 25, David A Relman 51,52, Susanna Assunta-Sansone 16, Patrick D Schloss 53, Lynn Schriml 25, Rohini Sinha 22, Michelle I Smith 35, Erica Sodergren 54, Aymé Spor 41, Jesse Stombaugh 4, James M Tiedje 7, Doyle V Ward 19, George M Weinstock 54, Doug Wendel 4, Owen White 25, Andrew Whiteley 3, Andreas Wilke 10, Jennifer R Wortman 25, Tanya Yatsunenko 35, Frank Oliver Glöckner 1,2
PMCID: PMC3367316  NIHMSID: NIHMS370143  PMID: 21552244

Abstract

Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.


Without specific guidelines, most genomic, metagenomic and marker gene sequences in databases are sparsely annotated with the information required to guide data integration, comparative studies and knowledge generation. Even with complex keyword searches, it is currently impossible to reliably retrieve sequences that have originated from certain environments or particular locations on Earth—for example, all sequences from ‘soil’ or ‘freshwater lakes’ in a certain region of the world. Because public databases of the International Nucleotide Sequence Database Collaboration (INSDC; comprising DNA Data Bank of Japan (DDBJ), the European Nucleotide Archive (EBI-ENA) and GenBank (http://www.insdc.org/)) depend on author-submitted information to enrich the value of sequence data sets, we argue that the only way to change the current practice is to establish a standard of reporting that requires contextual data to be deposited at the time of sequence submission. The adoption of such a standard would elevate the quality, accessibility and utility of information that can be collected from INSDC or any other data repository.

The GSC has previously proposed standards for describing genomic sequences— the “minimum information about a genome sequence” (MIGS)—and metagenomic sequences—the “minimum information about a metagenome sequence” (MIMS)1. Here we introduce an extension of these standards for capturing information about marker genes. Additionally, we introduce ‘environmental packages’ that standardize sets of measurements and observations describing particular habitats that are applicable across all GSC checklists and beyond2. We define ‘environment’ as any location in which a sample or organism is found, e.g., soil, air, water, human-associated, plant-associated or laboratory. The original MIGS/MIMS checklists included contextual data about the location from which a sample was isolated and how the sequence data were produced. However, standard descriptions for a more comprehensive range of environmental parameters, which would help to better contextualize a sample, were not included. The environmental packages presented here are relevant to any genome sequence of known origin and are designed to be used in combination with MIGS, MIMS and MIMARKS checklists.

To create a single entry point to all minimum information checklists from the GSC and to the environmental packages, we propose an overarching framework, the MIxS standard (http://gensc.org/gc_wiki/index.php/MIxS). MIxS includes the technology-specific checklists from the previous MIGS and MIMS standards, provides a way of introducing additional checklists such as MIMARKS, and also allows annotation of sample data using environmental packages. A schematic overview of MIxS along with the MIxS environmental packages is shown in Figure 1.

Figure 1.

Figure 1

Schematic overview about the GSC MIxS standard (brown), including combination with specific environmental packages (blue). Shared descriptors apply to all MIxS checklists; however, each checklist has its own specific descriptors as well. Environmental packages can be applied to any of the checklists. EU, eukarya; BA, bacteria/archaea; PL, plasmid; VI, virus; ORG, organelle.

Development of MIMARKS and the environmental packages

Over the past three decades, the 16S rRNA, 18S rRNA and internal transcribed spacer gene sequences (ITS) from Bacteria, Archaea and microbial Eukaryotes have provided deep insights into the topology of the tree of life3,4 and the composition of communities of organisms that live in diverse environments, ranging from deep sea hydrothermal vents to ice sheets in the Arctic516. Numerous other phylogenetic marker genes have proven useful, including RNA polymerase subunits (rpoB), DNA gyrases (gyrB), DNA recombination and repair proteins (recA) and heat shock proteins (HSP70)3. Marker genes can also reveal key metabolic functions rather than phylogeny; examples include nitrogen cycling (amoA, nifH, ntcA)17,18, sulfate reduction (dsrAB)19 or phosphorus metabolism (phnA, phnI, phnJ)20,21. In this paper we define all phylogenetic and functional genes (or gene fragments) used to profile natural genetic diversity as ‘marker genes’. MIMARKS (Table 1) complements the MIGS/MIMS checklists for genomes and metagenomes by adding two new checklists, a MIMARKS survey, for uncultured diversity marker gene surveys, and a MIMARKS specimen, for marker gene sequences obtained from any material identifiable by means of specimens. The MIMARKS extension adopts and incorporates the standards being developed by the Consortium for the Barcode of Life (CBOL)22. Therefore, the checklist can be universally applied to any marker gene, from small subunit rRNA to cytochrome oxidase I (COI), to all taxa, and to studies ranging from single individuals to complex communities.

Table 1.

The core items of the MIMARKS checklists, along with the value types, descriptions and requirement status

Report type

Item Description MIMARKS
survey
MIMARKS
specimen
Investigation
    Submitted to INSDC[boolean] Depending on the study (large-scale, e.g., done with next-generation sequencing technology, or small-scale) sequences have to be submitted to SRA (Sequence Read Archives), DRA (DDBJ Sequence Read Archive) or through the classical Webin/Sequin systems to GenBank, ENA and DDBJ M M
    Investigation type[mimarks-survey or mimarks-specimen] Nucleic Acid Sequence Report is the root element of all MIMARKS compliant reports as standardized by Genomic Standards Consortium (GSC). This field is either MIMARKS survey or MIMARKS specimen M M
    Project name Name of the project within which the sequencing was organized M M
Environment
    Geographic location (latitude and longitude[float, point, transect and region]) The geographical origin of the sample as defined by latitude and longitude.
The values should be reported in decimal degrees and in WGS84 system
M M
    Geographic location (depth[integer, point, interval, unit]) Please refer to the definitions of depth in the environmental packages E E
    Geographic location (elevation of site[integer, unit]; altitude of sample[integer, unit]) Please refer to the definitions of either altitude or elevation in the environmental packages E E
    Geographic location (country and/or sea[INSDC or GAZ]; region[GAZ]) The geographical origin of the sample as defined by the country or sea name. Country, sea or region names should be chosen from the INSDC list (http://insdc.org/country.html), or the GAZ (Gazetteer, v1.446) ontology (http://bioportal.bioontology.org/visualize/40651) M M
    Collection date[ISO8601] The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated, that is, all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; except for 2008-01 and 2008, all are ISO6801 compliant M M
    Environment (biome[EnvO]) In environmental biome level are the major classes of ecologically similar communities of plants, animals and other organisms. Biomes are defined based on factors such as plant structures, leaf types, plant spacing and other factors like climate. Examples include desert, taiga, deciduous woodland or coral reef. Environment Ontology (EnvO) (v1.53) terms listed under environmental biome can be found at http://bioportal.bioontology.org/visualize/44405/?conceptid=ENVO%3A00000428 M M
    Environment (feature[EnvO]) Environmental feature level includes geographic environmental features. Examples include harbor, cliff or lake. EnvO (v1.53) terms listed under environmental feature can be found at http://bioportal.bioontology.org/visualize/44405/?conceptid=ENVO%3A00002297 M M
    Environment (material[EnvO]) The environmental material level refers to the matter that was displaced by the sample, before the sampling event. Environmental matter terms are generally mass nouns. Examples include: air, soil or water. EnvO (v1.53) terms listed under environmental matter can be found at http://bioportal.bioontology.org/visualize/44405/?conceptid=ENVO%3A00010483 M M
MIGS/MIMS/MIMARKS extension
    Environmental package[air, host-associated, human-associated, human-skin, human-oral, human-gut, human-vaginal, microbial mat/biofilm, miscellaneous natural or artificial environment, plant-associated, sediment, soil, wastewater/sludge, water] MIGS/MIMS/MIMARKS extension for reporting of measurements and observations obtained from one or more of the environments where the sample was obtained. All environmental packages listed here are further defined in separate subtables. By giving the name of the environmental package, a selection of fields can be made from the subtables and can be reported M M
Nucleic acid sequence source
    Isolation and growth conditions[PMID, DOI or URL] Publication reference in the form of PubMed ID (PMID), digital object identifier (DOI) or URL for isolation and growth condition specifications of the organism/material M
Sequencing
    Target gene or locus (e.g., 16S rRNA, 18S rRNA, nif, amoA, rpo) Targeted gene or locus name for marker gene study M M
    Sequencing method (e.g., dideoxysequencing, pyrosequencing, polony) Sequencing method used, e.g., Sanger, pyrosequencing, ABI-solid M M

Items for the MIMARKS specification and their mandatory (M), status for both MIMARKS-survey and MIMARKS-specimen checklists. Furthermore, “–” denotes that an item is not applicable for a given checklist. E denotes that a field has environment-specific requirements. For example, whereas “depth” is mandatory for the environments water, sediment or soil, it is optional for human-associated environments. MIMARKS-survey is applicable to contextual data for marker gene sequences, obtained directly from the environment, without culturing or identification of the organisms. MIMARKS-specimen, on the other hand, applies to the contextual data for marker gene sequences from cultured or voucher-identifiable specimens. Both MIMARKS-survey and specimen checklists can be used for any type of marker gene sequence data, ranging from 16S, 18S, 23S, 28S rRNA to COI, hence the checklists are universal for all three domains of life. Item names are followed by a short description of the value of the item in parentheses and/or value type in brackets as a superscript. Whenever applicable, value types are chosen from a controlled vocabulary (CV) or an ontology from the Open Biological and Biomedical Ontologies (OBO) foundry (http://www.obofoundry.org/). This table only presents the very core of MIMARKS checklists, that is, only mandatory items for each checklist. Supplementary Results 2 contains all MIMARKS items, the tables for environmental packages in the MIGS/MIMS/MIMARKS extension and GenBank structured comment name that should be used for submitting MIMARKS data to GenBank. In case of submitting to EBI-ENA, the full names can be used.

Both MIMARKS and the environmental packages were developed by collating information from several sources and evaluating it in the framework of the existing MIGS/MIMS checklists. These include four independent community-led surveys, examination of the parameters reported in published studies and examination of compliance with optional features in INSDC documents. The overall goal of these activities was to design the backbone of the MIMARKS checklist, which describes the most important aspects of marker gene contextual data.

Results of community-led surveys

Four online surveys about descriptors for marker genes have been conducted to determine researcher preferences for core descriptors. The Department of Energy Joint Genome Institute and SILVA23 surveys focused on general descriptor contextual data for a marker gene, whereas the Ribosomal Database Project (RDP)24 focused on prevalent habitats for rRNA gene surveys, and the Terragenome Consortium25 focused on soil metagenome project contextual data (Supplementary Results 1). The above recommendations were combined with an extensive set of contextual data items suggested by an International Census of Marine Microbes (ICoMM) working group that met in 2005. These collective resources provided valuable insights into community requests for contextual data items to be included in the MIMARKS checklist and the main habitats constituting the environmental packages.

Survey of published parameters

We reviewed published rRNA gene studies, retrieved from SILVA and the ICoMM database MICROBIS (The Microbial Oceanic Biogeographic Information System, http://icomm.mbl.edu/microbis/) to further supplement contextual data items that are included in the respective environmental packages. In total, 39 publications from SILVA and >40 ICoMM projects were scanned for contextual data items to constitute the core of the environmental package subtables (Supplementary Results 1).

In a final analysis step, we surveyed usage statistics of INSDC source feature key qualifier values of rRNA gene sequences contained in SILVA (Supplementary Results 1). Notably, <10% of the 1.2 million 16S rRNA gene sequences (SILVA release 100) were associated with even basic information such as latitude and longitude, collection date or PCR primers.

The MIMARKS checklist

The MIMARKS checklist provides users with an ‘electronic laboratory notebook’ containing core contextual data items required for consistent reporting of marker gene investigations. MIMARKS uses the MIGS/MIMS checklists with respect to the nucleic acid sequence source and sequencing contextual data, but extends them with further experimental contextual data such as PCR primers and conditions, or target gene name.

For clarity and ease of use, all items within the MIMARKS checklist are presented with a value syntax description, as well as a clear definition of the item. Whenever terms from a specific ontology are required as the value of an item, these terms can be readily found in the respective ontology browsers linked by URLs in the item definition. Although this version of the MIMARKS checklist does not contain unit specifications, we recommend all units to be chosen from and follow the International System of Units (SI) recommendations. In addition, we strongly urge the community to provide feedback regarding the best unit recommendations for given parameters. Unit standardization across data sets will be vital to facilitate comparative studies in future. An Excel version of the MIMARKS checklist is provided on the GSC web site (http://gensc.org/gc_wiki/index.php/MIMARKS).

The MIxS environmental packages

Fourteen environmental packages provide a wealth of environmental and epidemiological contextual data fields for a complete description of sampling environments. The environmental packages can be combined with any of the GSC checklists (Fig. 1 and Supplementary Results 2). Researchers within The Human Microbiome Project26 contributed the host-associated and all human packages. The Terragenome Consortium contributed sediment and soil packages. Finally, ICoMM, Microbial Inventory Research Across Diverse Aquatic Long Term Ecological Research Sites and the Max Planck Institute for Marine Microbiology contributed the water package. The MIMARKS working group developed the remaining packages (air, microbial mat/biofilm, miscellaneous natural or artificial environment, plant-associated and wastewater/sludge). The package names describe high-level habitat terms in order to be exhaustive. The miscellaneous natural or artificial environment package contains a generic set of parameters, and is included for any other habitat that does not fall into the other thirteen categories. Whenever needed, multiple packages may be used for the description of the environment.

Examples of MIMARKS-compliant data sets

Several MIMARKS-compliant reports are included in Supplementary Results 3. These include a 16S rRNA gene survey from samples obtained in the North Atlantic, an 18S pyrosequencing tag study of anaerobic protists in a permanently anoxic basin of the North Sea, a pmoA survey from Negev Desert soils, a dsrAB survey of Gulf of Mexico sediments and a 16S pyrosequencing tag study of bacterial diversity in the western English Channel (SRA accession no. SRP001108).

Adoption by major database and informatics resources

Support for adoption of MIMARKS and the MIxS standard has spread rapidly. Authors of this paper include representatives from genome sequencing centers, maintainers of major resources, principal investigators of large- and small-scale sequencing projects, and individual investigators who have provided compliant data sets, showing the breadth of support for the standard within the community.

In the past, the INSDC has issued a reserved ‘barcode’ keyword for the CBOL7. Following this model, the INSDC has recently recognized the GSC as an authority for the MIxS standard and issued the standard with official keywords within INSDC nucleotide sequence records27. This greatly facilitates automatic validation of the submitted contextual data and provides support for data sets compliant with previous versions by including the checklist version as a keyword.

GenBank accepts MIxS metadata in tabular format using the sequin and tbl2asn submission tools, validates MIxS compliance and reports the fields in the structured comment block. The EBI-ENA Webin submission system provides prepared web forms for the submission of MIxS compliant data; it presents all of the appropriate fields with descriptions, explanations and examples, and validates the data entered. One tool that can aid submitting contextual data is MetaBar28, a spreadsheet and web-based software, designed to assist users in the consistent acquisition, electronic storage and submission of contextual data associated with their samples in compliance with the MIxS standard. The online tool CDinFusion (http://www.megx.net/CDinFusion) was created to facilitate the combination of contextual data with sequence data, and generation of submission-ready files.

The next-generation Sequence Read Archive (SRA) collects and displays MIxS-compliant metadata in sample and experiment objects. There are several tools that are already available or under development to assist users in SRA submissions. The myRDP SRA PrepKit allows users to prepare and edit their submissions of reads generated from ultra-high-throughput sequencing technologies. A set of suggested attributes in the data forms assist researchers in providing metadata conforming to checklists such as MIMARKS. The Quantitative Insights Into Microbial Ecology (QIIME) web application (http://www.microbio.me/qiime) allows users to generate and validate MIMARKS-compliant templates. These templates can be viewed and completed in the users’ spreadsheet editor of choice (e.g., Microsoft Excel). The QIIME web-platform also offers an ontology lookup and geo-referencing tool to aid users when completing the MIMARKS templates. The Investigation/Study/Assay (ISA) is a software suite that assists in the curation, reporting and local management of experimental metadata from studies using one or a combination of technologies, including high-throughput sequencing29. Specific ISA configurations (http://isa-tools.org/tools.html) have been developed to ensure MIxS compliance by providing templates and validation capability. Another tool, ISAconverter, produces SRA.xml documents, facilitating submission to the SRA repository. MIxS checklists are also registered with the BioSharing catalog of standards (http://biosharing.org/), set to progressively link minimal information specifications to the respective exchange formats, ontologies and compliant tools.

Further detailed guidance for submission processes can be found under the respective wiki pages (http://gensc.org/gc_wiki/index.php/MIxS) of the standard.

Maintenance of the MIxS standard

To allow further developments, extensions and enhancements of MIxS, we set up a public issue tracking system to track changes and accomplish feature requests (http://mixs.gensc.org/). New versions will be released annually. Technically, the MIxS standard, including MIMARKS and the environmental packages, is maintained in a relational database system at the Max Planck Institute for Marine Microbiology Bremen on behalf of the GSC. This provides a secure and stable mechanism for updating the checklist suite and versioning. In the future, we plan to develop programmatic access to this database to allow automatic retrieval of the latest version of each checklist for INSDC databases and for GSC community resources. Moreover, the Genomic Contextual Data Markup Language is a reference implementation of the GSC checklists by the GSC and now implements the full range of MIxS standards. It is based on XML Schema technology and thus serves as an interoperable data exchange format for infrastructures based on web services30.

Conclusions and call for action

The GSC is an international body with a stated mission of working towards richer descriptions of the complete collection of genomes and metagenomes through the MIxS standard. The present report extends the scope of GSC guidelines to marker gene sequences and environmental packages and establishes a single portal where experimentalists can gain access to and learn how to use GSC guidelines. The GSC is an open initiative that welcomes the participation of the wider community. This includes an open call to contribute to refinements of the MIxS standards and their implementations.

The adoption of the GSC standards by major data providers and organizations, as well as the INSDC, supports efforts to contextually enrich sequence data and complements recent efforts to enrich other (meta) ‘omics data. The MIxS standard, including MIMARKS, has been developed to the point that it is ready for use in the publication of sequences. A defined procedure for requesting new features and stable release cycles will facilitate implementation of the standard across the community. Compliance among authors, adoption by journals and use by informatics resources will vastly improve our collective ability to mine and integrate invaluable sequence data collections for knowledge- and application-driven research. In particular, the ability to combine microbial community samples collected from any source, using the universal tree of life as a measure to compare even the most diverse communities, should provide new insights into the dynamic spatiotemporal distribution of microbial life on our planet and on the human body.

Supplementary Material

S1
S2
S3
S4

ACKNOWLEDGMENTS

Funding sources are listed in the Supplementary Note.

Footnotes

Note: Supplementary information is available on the Nature Biotechnology website.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

References

  • 1.Field D, et al. The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 2008;26:541–547. doi: 10.1038/nbt1360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Taylor CF, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 2008;26:889–896. doi: 10.1038/nbt.1411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ludwig W, Schleifer KH. In: Microbial Phylogeny and Evolution, Concepts and Controversies. Sapp J, editor. New York: Oxford University Press; 2005. pp. 70–98. [Google Scholar]
  • 4.Ludwig W, et al. Bacterial phylogeny based on comparative sequence analysis. Electrophoresis. 1998;19:554–568. doi: 10.1002/elps.1150190416. [DOI] [PubMed] [Google Scholar]
  • 5.Giovannoni SJ, Britschgi TB, Moyer CL, Field KG. Genetic diversity in Sargasso Sea bacterioplankton. Nature. 1990;345:60–63. doi: 10.1038/345060a0. [DOI] [PubMed] [Google Scholar]
  • 6.Stahl DA. Analysis of hydrothermal vent associated symbionts by ribosomal RNA sequences. Science. 1984;224:409–411. doi: 10.1126/science.224.4647.409. [DOI] [PubMed] [Google Scholar]
  • 7.Ward DM, Weller R, Bateson MM. 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature. 1990;345:63–65. doi: 10.1038/345063a0. [DOI] [PubMed] [Google Scholar]
  • 8.DeLong EF. Archaea in coastal marine environments. Proc. Nat. Acad. Sci. USA. 1992;89:5685–5689. doi: 10.1073/pnas.89.12.5685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Diez B, Pedros-Alio C, Massana R. Study of genetic diversity of eukaryotic picoplankton in different oceanic regions by small-subunit rRNA gene cloning and sequencing. Appl. Environ. Microbiol. 2001;67:2932–2941. doi: 10.1128/AEM.67.7.2932-2941.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fuhrman JA, McCallum K, Davis AA. Novel major archaebacterial group from marine plankton. Nature. 1992;356:148–149. doi: 10.1038/356148a0. [DOI] [PubMed] [Google Scholar]
  • 11.Hewson I, Fuhrman JA. Richness and diversity of bacterioplankton species along an estuarine gradient in Moreton Bay, Australia. Appl. Environ. Microbiol. 2004;70:3425–3433. doi: 10.1128/AEM.70.6.3425-3433.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Huber JA, Butterfield DA, Baross JA. Temporal changes in archaeal diversity and chemistry in a mid-ocean ridge subseafloor habitat. Appl. Environ. Microbiol. 2002;68:1585–1594. doi: 10.1128/AEM.68.4.1585-1594.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lopez-Garcia P, Rodriguez-Valera F, Pedros-Alio C, Moreira D. Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature. 2001;409:603–607. doi: 10.1038/35054537. [DOI] [PubMed] [Google Scholar]
  • 14.Moon-van der Staay SY, De Wachter R, Vaulot D. Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature. 2001;409:607–610. doi: 10.1038/35054541. [DOI] [PubMed] [Google Scholar]
  • 15.Pace NR. A molecular view of microbial diversity and the biosphere. Science. 1997;276:734–740. doi: 10.1126/science.276.5313.734. [DOI] [PubMed] [Google Scholar]
  • 16.Rappe MS, Giovannoni SJ. The uncultured microbial majority. Annu. Rev. Microbiol. 2003;57:369–394. doi: 10.1146/annurev.micro.57.030502.090759. [DOI] [PubMed] [Google Scholar]
  • 17.Francis CA, Beman JM, Kuypers MMM. New processes and players in the nitrogen cycle: the microbial ecology of anaerobic and archaeal ammonia oxidation. ISME J. 2007;1:19–27. doi: 10.1038/ismej.2007.8. [DOI] [PubMed] [Google Scholar]
  • 18.Zehr JP, Mellon MT, Zani S. New nitrogen-fixing microorganisms detected in oligotrophic oceans by amplification of nitrogenase (nifH) genes. Appl. Environ. Microbiol. 1998;64:3444–3450. doi: 10.1128/aem.64.9.3444-3450.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Minz D, et al. Diversity of sulfate-reducing bacteria in oxic and anoxic regions of a microbial mat characterized by comparative analysis of dissimilatory sulfite reductase genes. Appl. Environ. Microbiol. 1999;65:4666–4671. doi: 10.1128/aem.65.10.4666-4671.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gilbert JA, et al. The seasonal structure of microbial communities in the Western English Channel. Environ. Microbiol. 2009;11:3132–3139. doi: 10.1111/j.1462-2920.2009.02017.x. [DOI] [PubMed] [Google Scholar]
  • 21.Martinez AW, Tyson G, DeLong EF. Widespread known and novel phosphonate utilization pathways in marine bacteria revealed by functional screening and metagenomic analyses. Environ. Microbiol. 2009;12:222–238. doi: 10.1111/j.1462-2920.2009.02062.x. [DOI] [PubMed] [Google Scholar]
  • 22.Hanner R. Data Standards for BARCODE Records in INSDC (BRIs) (Database Working Group, Consortium for the Barcode of Life, 2009) < http://www.barcodeoflife.org/sites/default/files/legacy/pdf/DWG_data_standards-Final.pdf>. [Google Scholar]
  • 23.Pruesse E, et al. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007;35:7188–7196. doi: 10.1093/nar/gkm864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cole JR, et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009;37:D141–D145. doi: 10.1093/nar/gkn879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Vogel TM, et al. TerraGenome: a consortium for the sequencing of a soil metagenome. Nat. Rev. Microbiol. 2009;7:252. [Google Scholar]
  • 26.Turnbaugh PJ, et al. The Human Microbiome Project. Nature. 2007;449:804–810. doi: 10.1038/nature06244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Benson DA, et al. GenBank. Nucleic Acids Res. 2008;36:D25–D30. doi: 10.1093/nar/gkm929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hankeln W, et al. MetaBar—a tool for consistent contextual data acquisition and standards compliant submission. BMC Bioinformatics. 2010;11:358. doi: 10.1186/1471-2105-11-358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rocca-Serra P, et al. ISA infrastructure: supporting standards-compliant experimental reporting and enabling curation at the community level. Bioinformatics. 2010;26:2354–2356. doi: 10.1093/bioinformatics/btq415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kottmann R, et al. A standard MIGS/MIMS compliant XML schema: toward the development of the Genomic Contextual Data Markup Language (GCDML) OMICS. 2008;12:115–121. doi: 10.1089/omi.2008.0A10. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1
S2
S3
S4

RESOURCES