Table 1.
Approach | Metagenome source | Number of biomes | Strategy | Source |
---|---|---|---|---|
HMM + Rosettaa | IMG database | Multiple biomes | Combined | [63] |
HMM + C‐QUARKb | Ocean microbiome | Single biomes | Single | [64] |
Alphafoldc | Metagenome | Multiple biomes | Combined | [57] |
DeepMSA + C‐I‐TASSERd | Mgnify | Multiple biomes | Combined | [70] |
MetaSource + DeepMSA + C‐I‐TASSERe | Mgnify | Multiple biomes | Targeted | [70] |
Note: Single strategy: using a single large biome as the protein source. Combined strategy: using a set of large biomes as protein sources. Targeted strategy: customized methods that select different biomes for different proteins.
Using IMG database [70, 82], models for 614 protein families were generated for unknown structures.
Using Tara Oceans data [80], proteins for 27 Pfam families were modeled with unsolved structures.
A deep learning algorithm, leveraging multisequence alignments were used for modeling protein structures.
Built on 4.25 billion microbiome sequences, 1044 Pfam families foldable by C‐I‐TASSER [70].
As targeted approach, MetaSource model was used to identify a set of biomes to supplement homologous sequence for specific Pfam families [70].