Skip to main content
. 2022 Nov 30;13:7374. doi: 10.1038/s41467-022-35032-8

Fig. 1. The Genetic Engineering Attribution Challenge.

Fig. 1

a The creation of any synthetic nucleic-acid sequence involves numerous design decisions, each of which leaves a mark in the resulting sequence. Genetic engineering attribution (GEA) aims to use these marks to identify the designer. b Misclassification rate (1-(Top-N accuracy)) of past ML approaches to GEA on the Addgene plasmid database, compared to BLAST (left) and the results of the Genetic Engineering Attribution Challenge (GEAC, right). Lower misclassification rates indicate higher accuracy. Our BLAST method achieves higher accuracy than previous implementations; see Methods for details. c In the GEAC, teams were provided with engineered plasmid sequences from Addgene, alongside basic metadata for each plasmid. Lab-of-origin labels were provided for the training dataset, but withheld from the leaderboard and holdout test datasets. In the Prediction Track, teams competed to identify these withheld labs-of-origin with the greatest top-10 accuracy. In the Innovation Track, high-scoring teams from the Prediction Track were then invited to submit reports describing their approaches to a panel of expert judges for assessment.