. Author manuscript; available in PMC: 2023 Mar 28.

Published in final edited form as: J Chem Inf Model. 2022 Mar 10;62(6):1376–1387. doi: 10.1021/acs.jcim.1c01467

Table 1:

Dataset Splits Used for the Experiments

Dataset	Train	Valid	Test	Total	Task
USPTO_TPL^56 a	360,545	40,059	44,511	445,115	Reaction type classification
USPTO_MIT¹²	409,035	30,000	40,000	479,035	Forward prediction
USPTO_50k^29 a	40,029	5,004	5,004	50,037	Retrosynthesis
C-N Coupling^{44 a, b} (Random splits)	2,767	–	1,188	3,955	Reaction yield prediction
C-N Coupling^{44 a, b} (Out-of-sample test1)	3,057	–	898	3,955	Reaction yield prediction
C-N Coupling^{44 a, b} (Out-of-sample test2, 4	3,055	–	900	3,955	Reaction yield prediction
C-N Coupling^{44 a, b} (Out-of-sample test3)	3,058	–	897	3,955	Reaction yield prediction
USPTO_500_MT^a	116,360	12,937	14,238	143,535	Multi-task prediction

Contains stereochemical information

With reactants/reagents separation