The preparation of the input data entails identification of the candidate proteins and candidate ligands (outlined in red) and the generation of information to be used as scoring restraints (outlined in blue). The initial step was sequence analysis of the TRAP solute binding proteins, and target proteins were identified for follow-up screening by DSF and genome neighborhood analysis (Uniprot ID
P71336 and Uniprot ID A7JQX0). Analysis of the genome neighborhood network led to the selection of possible pathway proteins. In this case, the candidate proteins were hypothesized to be involved in sugar catabolism, so metabolic endpoints were selected from intermediates in central metabolism mapped in the KEGG database. The functions of close homologs (>70% sequence identity) were identified, which in this case, included D-mannonate dehydratase. Chemical transformation patterns were inferred by the Pfam annotations of each of the candidate proteins. Structural models of the candidate proteins were created by comparative modeling, and metabolite docking of a large screening library against each of these models was performed. The chemical structures of the metabolites with the top docking scores were compared chemoinformatically to produce SEA scores. With the chemical transformations and docking scores, the metabolite library was filtered down to a smaller set of candidate ligands. For every candidate ligand, each chemical transformation was applied in silico, and the results were compared chemoinformatically with every other candidate ligand to produce chemical transformation scores. This workflow resulted in the following sources of information to be used in scoring: DSF hits, metabolic endpoints, functions of close homologs, docking scores, SEA scores, and chemical transformation scores.