Table 1.
Dataset | Abbr. | Number species | Number of strains identified at species level | Number of strains | Number of sequences |
---|---|---|---|---|---|
CBS type ITS | T1 | 1 436 | 2 067 | 2 067 | 6 108 |
CBS ITS | C1 | 1 595 | 6 768 | 8 176 | 14 601 |
Manually validated CBS ITS | M1 | 1 387 | 5 182 | 5 182 | 10 454 |
GB ITS | N1 | 856 | 5 564 | 6 958 | 6 985 |
CBS+GB ITS | CN1 | 1 782 | 11 768 | 17 994 | 21 134 |
CBS type LSU | T2 | 1 463 | 2 119 | 2 119 | 8 795 |
CBS LSU | C2 | 1 617 | 7 269 | 8 708 | 19 ,498 |
Manually validated CBS LSU | M2 | 1 380 | 5 011 | 5 011 | 13 211 |
GB LSU | N2 | 1 042 | 9 304 | 9 393 | 13 938 |
CBS+GB LSU | CN2 | 1 804 | 15 679 | 21 678 | 32 546 |
Manually validated dataset having both CBS ITS and CBS LSU sequences | M3 | 1 375 | 4 995 | 4 995 | 10 225 ITS sequences 13 188 LSU sequences |
The “CBS type datasets”, abbreviated as T1 for ITS and T2 for LSU, contained all the strains that were designated as ex-type strains for a currently accepted species or of a synonym species name. The “CBS datasets”, abbreviated as C1 for ITS and C2 for LSU, contained all the strains from the CBS collection including the ex-type strains T1 and T2. The “Manually validated datasets”, abbreviated as M1 for ITS, M2 for LSU and M3 for both ITS and LSU, contained all the CBS strains present in the C1 and C2 datasets that were manually checked by the curators to confirm their species assignments using ITS and/or LSU sequences. The “GB datasets”, abbreviated as N1 for ITS and N2 for LSU, contained all yeast sequences available from the GB database until June 2015. The “CBS+GB datasets”, abbreviated as CN1 for ITS and CN2 for LSU, contained all data from datasets C and N in which strains and sequences were accounted once.