Figure 1.
Data analysis pipeline used for genome assembly and annotation. Left. DNA level: the genome sequence of D39V was determined by SMRT sequencing, supported by previously published Illumina data (10,25). Automated annotation by the RAST (13) and PGAP (4) annotation pipelines was followed by curation based on information from literature and a variety of databases and bioinformatic tools. Right. RNA level: Cappable-seq (7) was utilized to identify transcription start sites. Simultaneously, putative transcript ends were identified by combining reverse reads from paired-end, stranded sequencing of the control sample (i.e. not 5′-enriched). Terminators were annotated when such putative transcript ends overlapped with stem loops predicted by TransTermHP (22). Finally, local fragment size enrichment in the paired-end sequencing data was used to identify putative small RNA features. αD39V derivative (bgaA::PssbB-luc; GEO accessions GSE54199 and GSE69729). βThe first 1 kb of the genome file was duplicated at the end, to allow mapping over FASTA boundaries. γAnalysis was performed with only sequencing pairs that map uniquely to the genome.