Skip to main content
. Author manuscript; available in PMC: 2023 Mar 28.
Published in final edited form as: J Chem Inf Model. 2022 Mar 10;62(6):1376–1387. doi: 10.1021/acs.jcim.1c01467

Table 1:

Dataset Splits Used for the Experiments

Dataset Train Valid Test Total Task
USPTO_TPL56 a 360,545 40,059 44,511 445,115 Reaction type classification
USPTO_MIT12 409,035 30,000 40,000 479,035 Forward prediction
USPTO_50k29 a 40,029 5,004 5,004 50,037 Retrosynthesis
C-N Coupling44 a, b
(Random splits)
2,767 1,188 3,955 Reaction yield prediction
C-N Coupling44 a, b
(Out-of-sample test1)
3,057 898 3,955 Reaction yield prediction
C-N Coupling44 a, b
(Out-of-sample test2, 4
3,055 900 3,955 Reaction yield prediction
C-N Coupling44 a, b
(Out-of-sample test3)
3,058 897 3,955 Reaction yield prediction
USPTO_500_MTa 116,360 12,937 14,238 143,535 Multi-task prediction
a

Contains stereochemical information

b

With reactants/reagents separation