Chemical VAE |
2018 |
250 000 drug-like molecules from ZINC; 108 000 molecules from QM9 data set under 9 heavy atoms |
Autoencoder |
4.2 M |
https://github.com/aspuru-guzik-group/chemical_vae
|
[80] |
SMILES-BERT |
2019 |
18.7 million compounds sampled from ZINC |
Encoder-only transformer |
13 M |
https://github.com/uta-smile/SMILES-BERT
|
[88] |
ChemBERTa/ChemBERTa-v2 |
2020 |
250 000 drug-like molecules from ZINC |
Encoder-only transformer |
5–77 M |
https://github.com/seyonechithrananda/bert-loves-chemistry
|
[99] |
MolBERT |
2020 |
1.27 million GuacaMol benchmark data set molecules |
Encoder-only transformer |
85 M |
https://github.com/BenevolentAI/MolBERT
|
[100] |
MegaMolBART |
2021 |
1.45 billion ‘reactive’ molecules from ZINC under 500 Da and logP ≤ 5 |
Encoder-decoder transformer |
45–230 M |
https://github.com/NVIDIA/MegaMolBART
|
[101] |
Molformer |
2022 |
1.1 billion molecules from ZINC and PubChem |
Encoder-only transformer |
110 M |
https://github.com/IBM/molformer
|
[102] |
Chemformer/MolBART |
2022 |
100 million molecules randomly sampled from ZINC under 500 Da and logP ≤ 5 |
Encoder-decoder transformer |
45–230 M |
https://github.com/MolecularAI/Chemformer
|
[90] |
X-MOL |
2022 |
1.1 billion ZINC database molecules |
Encoder-decoder transformer |
110 M |
https://github.com/bm2-lab/X-MOL
|
[103] |