Figure 1.
Overview of Athena’s workflow. First, we train the language model using the entire set of uncorrected reads for the specific dataset. Second, we perform error correction on a subsample from the uncorrected reads using an EC tool (e.g., Lighter or Blue) and a range of k-values. Third, we compute perplexity of each corrected sample, corrected with a specific EC tool, and decide on the best k-value for the next iteration, i.e., the one corresponding to the lowest perplexity metric because EC quality is negatively correlated with the perplexity metric. This process continues until the termination criteria are met. Finally, the complete set of reads is corrected with the best k-value found and then used for evaluation.