Skip to main content
. 2021 Jul 12;8:169. doi: 10.1038/s41597-021-00962-3

Online Only Table 1.

Additional data sources included in the AOP-DB.

Biological Category Data Source Description Processing URL
Gene NCBI Gene This source supplies all NCBI entrez genes in the gene info table with associated gene information such as name, symbol, location, etc. The latest gene_info and gene_history files are pulled from the NCBI GENE FTP. gene_history contains all deprecated entrez which are used to remove entries from the gene_info table. No other processing takes place ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
STRING This source gives protein-protein interaction data for the gene-interactions table. Each record from these networks is stored with an entrez1, entrez2, and an interaction score STRING files are downloaded (no api.or ftp) and proteins are mapped to entrez via unitprot. (UPKB2entrez). No other processing takes place http://string-db.org/cgi/download.pl
Taxonomy & Orthology NCBI Taxonomy All taxa available from NCBI, including nomenclature info and divisions. This data is used to fill the species_info The taxonomy data are pulled from the NCBI FTP. No other processing takes place. ftp://ftp.ncbi.nlm.nih.gov/pub/
Homologene Constructs and stores putative homology groups and contributes and ortho group number, tax id, entrez id to the homology gene table The homogy_gene file is pulled from the NCBI FTP. No processing takes place. ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene
KEGG Orthology This database of functional orthologs contributes ortho group ids, tax ids, and entrez ids describing an orthologous group to the homology gene_table The latest ortholog data is pulled from the KEGG “R” Package KEGGREST. Kegg Gene IDs are mapped to entrez. No Processing takes place. http://www.kegg.jp/kegg/rest/keggapi.html
metaPhOrs This database of phylogeny based orthologs contributes ortho group ids, taxonomy ids, and entrez ids to the homology gene table, describing orthologous groups. Mappings and ortholog data for a manually curated set of species are pulled from the phylomedb FTP. Only protein mappings from SwissProt and TrEMBL are considered. Protid IDs are mapped to entrez. Where protid2 is unmapped or empty, it is mapped to entrez = 0. No other processing takes place ftp://phylomedb.org/metaphors/
AOP AOP-wiki This is a collaborative set of AOPs regularly updated with new details or new Adverse Outcome Pathways. This source contributes to the central AOP_info tables and the AOP_gene tables, supplying AOP names, key events, descriptions, and information used to map key events to genes. The latest aop wiki xml dump is downloaded from the AOPWiki downloads page. No other processing takes place https://aopwiki.org
Chemical CTD This source is a manually curated database of chemical information, including many modules. The module of interest for the AOPdb is the chemical gene interactions module, which contributes chemical names and ids to chemical info, as well as the chemical gene interactions with contextual information to the chemical gene table. The latest chemical tables and chemical gene interactions are downloaded from CTD website. No Processing takes place. http://ctdbase.org/downloads/
AOP-wiki In addition to being the source of AOPs for the AOPdb, this source also adds chemical stressors related to the MIE of each AOP. This data contributes chemical names, as well is DTXSIDs, casrns, or other chemical ID when available. Latest AOPWiki XML dump is downloaded. All aop, stressor, and key event information is merged together. No other processing takes place https://aopwiki.org
ToxCast This is a collection of high-throughput screening assays for chemicals that contributes assay identification information and assay context information as well as gene target information in the form of entrez ids. ToxCast Assay data is loaded from a static file. No processing takes place ftp://newftp.epa.gov/comptox/High_Throughput_Screening_Data
Pathway KEGG Pathways This source is a collection of biological molecular interaction pathways that supplies entrez ids and pathway names and ids, linking gene components to the pathways in which they are involved. Pathway data is accessed through the KEGG API, KEGGREST (version 1.24.0). KEGG gene IDs are mapped to Entrez ID. Entries without an Entrez are dropped. No other processing takes place. http://www.kegg.jp/kegg/rest/keggapi.html
Reactome This curated and peer-reviewed source of molecular pathways supplies entrez ids and their linked pathways to the pathway_gene table of the AOPdb Reactome data is loaded, and Tax_ids are mapped using species name and species ID. No other processing takes place. http://www.reactome.org/pages/download-data
ConcensusPathDB This source brings together pathway and interaction data from 32 public resources and supplies entrez ids and pathway ids that link genes to biological pathways for the pathway gene table CPDB data is loaded from a static file. Tax_ids are added by mapping entrez ID to species ID. No other processing takes place. http://consensuspathdb.org/
Disease DisGeNET This database compiles different data, both curated and inferred from models, and supplies multiple downloadable tables relating genes and variants to the diseases in the database. The AOPdb uses DisGeNET’s gene-disease association table, adding all fields to the disease-gene table. These include disease name and id, entrez id, and a score for the association based on its sources. The latest disese gene associations are downloaded via the Disgenet website. No other processing takes place. http://www.disgenet.org/downloads
Ontology NCBI Gene In addition to being a source of taxonomy info and gene info, NCBI Gene supplies gene ontology information. This supplies gene ontology terms and any related entrez ids to the GO gene table. The latest gene2go (gene GO mapping file) is downloaded from the NCBI FTP site. No other processing takes place. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
Tissues HumanBase This API is used to pull tissue specific gene interaction network from HumanBase. The data imported into the tissue networks table in the AOPdb include entrez1 and entrez2 fields to construct edges, as well as a probability score indicating the strength of the modeled gene interaction. Tissue networks are independently retrieved through the HumanBase API for all genes and all tissues. Tissue-gene combinations without edges are dropped. Genes are compared to NCBI’s list of deprecated and entrez ID is updated. If an edge has a deprecated gene it is dropped, if it is changed the tissue_networkgene is updated. No other processing takes place. https://hb.flatironinstitute.org/api/
Haplotypes 1000 Genomes This is a collection of variant data for individuals from a multitude of populations. This source contributes snp frequencies for each function snp in the snps table for each of 5 1000 Genomes major populations. 1000 Genomes Phase 3 static VCF files are downloaded and stored locally. Using VCF tools we filter the data set to SNPs retrieved from GWAS and GTEX. No other processing takes place.
Ensemble This API, allowing access to ensembl’s gene and variant information, is used to get genotype data for each individual sample from the 1000 Genomes project. These data are used to construct haplotypes for each AOP and find differences in haplotype frequencies within and between populations. Ensembl Regulatory regions are loaded from a static file and filtered by AOP-DB SNPs. Genotype, minor allele, variant id are then downloaded for each loci using the Ensembl REST API. No other processing takes place. https://rest.ensembl.org/
GWAS Catalog This is a source used to filter SNPs into snps of interest for variant analysis in different populations. Functional snps are specifically targetd. It, along with GTEx, supplies refsnp ids for these variants as well as contextual information. SNPs are pulled from the static file GWAS Catalog 1.0.2. No other processing takes place. https://www.ebi.ac.uk/gwas/
GTEx This is a source used to filter SNPs into snps of interest for variant analysis in different populations. Functional snps are specifically targetd. It, along with GWAS Catalog, supplies refsnp ids for these variants as well as contextual information. SNPs are pulled from the static file Gtex V7 Single Tissue cis eQTL. No other processing takes place. https://www.gtexportal.org/home/