Skip to main content
. Author manuscript; available in PMC: 2023 Apr 28.
Published in final edited form as: Harv Data Sci Rev. 2022 Apr 28;4(2):10.1162/99608f92.1e23fb3f. doi: 10.1162/99608f92.1e23fb3f

Figure A1. Matching takes place in two stages.

Figure A1.

Stage 1 produces high-confidence matches for the Survey of Doctorate Recipients (SDR) respondents and Stage 2 expands the matching process by utilizing WoS Author Clusters (DAIS NG). The gold standard data, manually curated publication histories developed for two sets of SDR respondents each, 800 REFID and 785 REFID, were used in two places during this process as indicated by the purple gear icons: (1) a training set and (2) a test set used to conduct postprocessing targeting 95% Precision during Stage 1 and to produce a final assessment of precision and recall after Stage 2. LNFI = Last name plus First initial; REFID is the SDR person identifier; SED = Survey of Earned Doctorates.