Skip to main content
. 2019 Oct 4;20:722. doi: 10.1186/s12864-019-6058-6

Fig. 1.

Fig. 1

Bioinformatics pipeline for improved Nicotiana benthamiana proteome annotation. The predicted proteins of Nicotiana species generated by the NCBI Eukaryotic Genome Annotation Pipeline were retrieved from Genbank and clustered at 95% identity threshold to reduce redundancy (Step 1), and used to annotate the Niben1.0.1 genome assembly (Step 2). Only those proteins with an alignment coverage ≥60% to the Nicotiana predicted proteins as determined by BLASTP were retained (Step 3) to produce the NbD core dataset. Similarly, the other draft genome assemblies were annotated (Step 4), and only those proteins with an alignment coverage ≥90% to the Nicotiana predicted proteins as determined by BLASTP were retained (Step 5). CD-HIT-2D was used at 100% sequence identity to retain proteins missing in NbD dataset (Step 6), resulting in supplemental dataset NbE. NbD and NbE can be combined (NbDE) to maximise the spectra annotation for proteomics experiments