Skip to main content
. 2022 Mar 6;1(1):e9. doi: 10.1002/imt2.9

Table 1.

Approaches that could utilize metagenome data properties for better protein structure prediction​​​​​

Approach Metagenome source Number of biomes Strategy Source
HMM + Rosettaa IMG database Multiple biomes Combined [63]
HMM + C‐QUARKb Ocean microbiome Single biomes Single [64]
Alphafoldc Metagenome Multiple biomes Combined [57]
DeepMSA + C‐I‐TASSERd Mgnify Multiple biomes Combined [70]
MetaSource + DeepMSA + C‐I‐TASSERe Mgnify Multiple biomes Targeted [70]

Note: Single strategy: using a single large biome as the protein source. Combined strategy: using a set of large biomes as protein sources. Targeted strategy: customized methods that select different biomes for different proteins.

a

Using IMG database [70, 82], models for 614 protein families were generated for unknown structures.

b

Using Tara Oceans data [80], proteins for 27 Pfam families were modeled with unsolved structures.

c

A deep learning algorithm, leveraging multisequence alignments were used for modeling protein structures.

d

Built on 4.25 billion microbiome sequences, 1044 Pfam families foldable by C‐I‐TASSER [70].

e

As targeted approach, MetaSource model was used to identify a set of biomes to supplement homologous sequence for specific Pfam families [70].