Abstract
Bacterial DNA methylation is involved in diverse cellular functions, including modulation of gene expression, DNA repair, and restriction–modification systems for defense against viruses and other foreign DNA. Restriction systems hinder efforts to engineer organisms to produce fuels and chemicals from waste and renewable feedstocks by degrading DNA during transformation. Methylome analysis allows identification of motifs within a bacterial chromosome that may be targeted by native restriction enzymes. Further expression of the corresponding methyltransferases in Escherichia coli allows plasmid DNA to be protected from restriction in the target organism, thereby drastically enhancing transformation efficiency. Nanopore sequencing can detect methylated bases, but software is needed to transform modified base coordinates into methylated motifs. Here, we develop MIJAMP (MIJAMP Is Just A MethylBED Parser), a software package that was developed to discover methylated motifs from the output of ONT’s Modkit or other data in the methylBED format. MIJAMP employs a human-driven refinement strategy that empirically validates all motifs against genome-wide methylation data, thus eliminating incorrect motifs. MIJAMP also reports methylation data on specific, user-defined motifs. Using MIJAMP, we determined the methylated motifs both in a control strain (wild-type E. coli) and in Synecococcus sp. strain PCC7002, laying the foundation for improved transformation in this organism. MIJAMP is available at https://code.ornl.gov/alexander-public/mijamp/.
One Sentence Summary:
Here we describe software written to discover DNA methylation motifs from nanopore sequencing data.
Keywords: methylome, ONT sequencing, restriction systems
Graphical Abstract
Graphical Abstract.

Introduction
Non-model bacteria are critical components for the burgeoning circular bioeconomy because they often possess phenotypes that would be difficult or impossible to otherwise engineer into domesticated microbes (e.g. thermotolerance, acidiophilicity, etc.). Thus, these bacteria represent attractive potential chassis strains for developing new bioproduction systems. Despite possessing native traits that make these bacterial strains more suitable for industrialization, they will most likely require some sort of modification prior to use in a productive capability to produce a target product at the titer, rate, and yield needed for economical bioproduction; therefore, they require genetic modification to optimize bioconversion. Unfortunately, these organisms are also frequently difficult or impossible to manipulate genetically, which limits their usefulness as production strains.
In nature, bacteria are subjected to cellular invasions by foreign genetic elements such as viruses and plasmids, and so they have evolved mechanisms to resist these invaders. As genetic manipulation involves the insertion of DNA in some form into the cell, these defenses also act to prevent genetic manipulation. One very common class of defense systems are the restriction–modification (R–M) systems, which encode restriction endonucleases (REs) that hydrolyze foreign DNA at specific DNA sequences, or motifs (Bickle & Krüger, 1993). These endonucleases are paired with DNA methyltransferases, which modify DNA within the motifs that are recognized by REs by adding a methyl (–CH3) group to specific locations on the nucleotide base. This modification prevents degradation of chromosomal DNA by blocking RE action at associated motifs, allowing the bacterial cell to differentiate self (methylated DNA) from non-self (unmethylated DNA). Three types of DNA methylation have been reported in bacteria—5-methylcytosoine (5mC), 4-methylcytosine (4mC), and 6-methyladenine (6mA).
Because of their capability to degrade foreign DNA, R–M systems are a major barrier to the genetic manipulation of new chassis strains of bacteria. To complicate the matter, R–M systems are the most horizontally transferred genes in bacteria (Bernheim & Sorek, 2020) and can be hyper-variable between closely related strains of the same species. However, a straightforward approach exists to effectively circumvent R–M system activity: By heterologously expressing DNA methyltransferases from the target organism in a cloning strain of Escherichia coli, an investigator can recreate the host methylation patterns in the said cloning strain. Any plasmid isolated from that strain will be methylated at the appropriate motifs and protected from degradation in the target organism during transformation (Riley et al., 2019; Yasui et al., 2009). To be most effective, these approaches require knowledge of the methylated motifs in the target organism, meaning that DNA methylome analysis is required to understand the unique R–M systems for each particular strain being domesticated.
Modern high-throughput sequencing technologies are capable of methylome analysis, with Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) systems dominating the field. PacBio, long considered the gold standard for long-read sequencing and methylome analysis, can detect 6mA and 4mC residues by analyzing changes in enzyme kinetics when sequencing a methylated base (Feng et al., 2013). 5mC modifications, however, are only weakly detected by PacBio, thus requiring a prohibitive sequencing depth or chemical modification of the 5mC base in conjunction with an Illumina sequencing run for detection (Li & Tollefsbol, 2011). In contrast, ONT basecalling systems can detect all three modifications with equivalent depths of sequencing (Rand et al., 2017). Coupled with the relative low cost for both instrumentation and consumables compared to PacBio, ONT-based methylome discovery is highly attractive to investigators who require routine, rapid, and accurate methylomes in novel bacterial species or strains.
While most ONT methylome analysis software to date has focused on the characterization of CpG islands in eukaryotes (Ahsan et al., 2024; Liu et al., 2021; Snajder et al., 2023), a few packages have been released that detect motifs in bacterial genomes (Crits-Christoph et al., 2023; Tourancheau et al., 2021). These specialist packages differ in how modified bases are identified from genomic data. For example, Nanodisco (Tourancheau et al., 2021) compares the raw electrical signal of reads from methylated and unmethylated versions of the same genome to determine which bases are most likely modified, extracts the genomic regions around these putative modified bases, and inputs them into MEME (Bailey et al., 2015), which finds patterns of enriched DNA motifs that surround these modified bases. MicrobeMod (Crits-Christoph et al., 2023), in comparison, forgoes the raw current comparison method of modified base detection in favor of using the output from the ONT-provided DNA basecalling model. After the data are processed by Modkit, genomic regions around the modified bases are extracted and patterns detected using STREME (a speed-optimized version of MEME).
Unfortunately, issues exist for most, if not all, of the available bacterial methylation software packages. Many of these packages function as a “black box” that takes an input and produces an output with no human intervention needed or possible. This means investigators are taking an automated analysis of potentially complex data at face value with no ability to directly evaluate or tune the analysis. Nanodisco solved this by innovating a “motif-refining” technique to empirically test each motif against the global dataset, allowing a user to correct mistakes made during motif detection and to clarify ambiguities arising from complex methylomes. Although an effective (if sometimes cumbersome) software package, Nanodisco has been abandoned since 2021. In those intervening years, ONT discontinued their R9.4.1 version of flowcells, and without being updated to handle the electrical signals from the new R10.4.1 flowcell, Nanodisco has been rendered non-functional.
As we were in need of a robust software solution to discover methylomes to improve our ability to transform diverse non-model bacteria, we developed MIJAMP (MIJAMP Is Just A MethylBED Parser) to rapidly, interactively, and accurately call methylation motifs in bacterial nanopore genomic DNA sequencing datasets. Like other software packages, MIJAMP uses a single Modkit-parsed dataset per genome, but like Nanodisco it requires a user to manually refine and empirically validate every detected motif. Here, we describe the MIJAMP workflow and demonstrate its capabilities on both simple and complex methylomes, as well as discuss limitations and potential compatibility with other sequencing technologies. This software package represents a key enabling tool to expand our ability to industrialize a broader range of bacterial industrial chassis strains for the emerging bioeconomy (Fig. 1).
Fig. 1.
The role MIJAMP plays in strain domestication for novel bioproduction capabilities. Bacteria are isolated from a variety of sources (e.g. animal gastrointestinal tracts, extant industrial processes, or contaminated soil), cultured, then genetically manipulated using artificial DNA constructs. Plasmid DNA isolated from typical cloning strains of Escherichia coli produce few or no colonies when transformed into these novel hosts. However, if the plasmid DNA is isolated from a cloning strain expressing methyltransferases from the target host, said plasmids produce multiple orders of magnitude more colonies. With an effective transformation method available, genetic modification of this novel host becomes much more lilkely.
Methods
Escherichia coli strain MG1655 was cultured in LB Miller medium overnight at 37 °C for genomic DNA extraction, which was done using the Zymo Research Quick-DNA Miniprep Plus Kit (Zymo Research, Irvine, CA). Synecococcus sp. strain PCC7002 was cultured in 20-mL glass bubble tubes in A+ medium (utex.org/products/a-plus-medium) under a 50 µmol photons m−2 s−1 full-spectrum LED light for overnight culturing, then genomic DNA was extracted using the Zymo Research Quick-DNA Fungal/Bacterial Miniprep Kit. Each gDNA was used with the Ligation Sequencing Kit SQK-LSK114.24 (ONT, Oxford, UK) as instructed by ONT, except that the bead cleanup between the end repair and the barcode ligation steps was skipped. Sequencing was conducted using a MinION Mk. IIB run by an HP G5 Z8 workstation. Live basecalling was performed in Minknow v24.06.10 via Dorado v0.7.1 run on two NVIDIA RTX A6000 GPUs. Reference genomes were assembled from the obtained reads using the Trycycler v0.5.4 meta-assembler with default settings and suggested assemblers and accessories (Filtlong v0.2.1, Flye v2.9.4, Raven v1.5.3, miniasm/minipolish v0.3, and Medaka v1.5 (Kolmogorov et al., 2019; Li, 2016; Vaser & Šikić, 2021; Wick et al., 2021). Modified basecalling of the 100x coverage datasets by Dorado was performed on a separate HP G5 Z8 workstation equipped with a single NVIDIA RTX A5000 GPU using the v5.0.0 all-context 6mA and 4mC + 5mC modified base models.
Results
The MIJAMP Workflow
MIJAMP is a collection of executable Python scripts that function to discover methylated motifs within prokaryotic genomes. These scripts rely on several Python packages along with Minimap (Li, 2018), Modkit (github.com/nanoporetech/modkit), samtools (Danecek et al., 2021), and MEME software (Bailey et al., 2015) installed. Dorado (https://github.com/nanoporetech/dorado) is also recommended but is not strictly required for MIJAMP.
The initial input files for MIJAMP are a BAM file containing modified base calls from Dorado and a reference genome in FASTA format. preprocess maps the BAM file to the reference genome, processes and sorts the mapped result, and passes the final mapped, sorted BAM file to Modkit as an input. The output of Modkit is saved into a working directory, which includes the mapped, sorted BAM file, the reference genome, and indexing files. After preprocessing, motif is run to discover all DNA sequence motifs within that genome. First, the Modkit data are split into three subsets according to modification type (6mA, 4mC, 5mC) and filtered to remove low-quality base modification calls. The top 250 putative modified sites are extracted from the reference genome along with 15 bases flanking each site: These 31-mers are written to a file and used as an input for MEME. MEME outputs an XML file, which is read and parsed to produce a list of unrefined motifs potentially present within the genome. For each motif in this list, a function calculates the location of the modification within the motif, and the motif and its modified base index are sent to a refining loop.
The refining loop allows the user to empirically evaluate each putative motif against the genome-wide methylation (GWM, or the proportion of the motif that is called methylated across the genome) dataset, as MEME is a pattern finder and should be seen as the beginning to methylome discovery rather than the conclusion. By systematically producing all single nucleotide changes in a motif (both substitutions and indels), the user can experimentally validate the length and composition of a putative motif, potentially substituting a better sequence in place of the one produced by MEME. Thus, a user can explore the data via a simple text interface to ensure that the motifs detected are empirically evaluated. This refining loop can be executed repeatedly until the user is satisfied with the motif they have refined, and the sequence is saved to be reported later. After all motifs from MEME have been refined and either accepted or rejected, the data corresponding to the methylated base within the saved motifs are removed from the master dataset and a new loop is then initiated. This “refine-then-remove” strategy is inspired by and analogous to the refinement method used by Nanodisco and serves to better detect less-common motifs. Each dataset corresponding to a modification type is evaluated by these loops until no viable motifs are left to be found. At the end of this refinement process, the saved motifs of all types are written to an output tab-separated value file (Fig. 2).
Fig. 2.
Flowchart illustrating the function of the MIJAMP motif script. Shaded parallelograms indicate data saved locally to the user’s system, while unshaded represent data held within memory. Rectangles represent functions called during the process, while diamonds indicate user input. The dashed lines indicate repetitive loops: The items enclosed by the thinner dashed blue line are performed on each of the three possible modification datasets after they are Split Into Mod Class, while those enclosed by the thicker red dashed line are performed on each member of the motifList dataset held in memory.
MIJAMP also includes two commands for exploring datasets: query and diagnostic. The query command accepts a user-defined motif and returns the number of methylated motifs, total number of motifs, and percent of motifs that are called methylated, while the diagnostic command calculates the percentage of low-, mid-, and high-quality methylated base calls that are explained by the methylome derived from the motif command. In essence, query lets an investigator directly interrogate the methylome, while diagnostic provides a sanity check for an investigator nervous about potentially missing a motif.
Benchmarking
MIJAMP and MicrobeMod were each run on the same datasets from E. coli K12 MG1655 and Synecococcus sp. PCC7002, which are included in the MIJAMP distribution, and the discovered methylation motifs were compared (Table 1). Both MIJAMP and MicrobeMod correctly called the three known methylation motifs in E. coli: G(6mA)TC, C(5mC)WGG, and A(6mA)CNNNNNNGTGC/GC(6mA)CNNNNNNGTT. However, the Synecococcus methylomes differed between software. MIJAMP discovered seven 6-mA motifs, two 5mC motifs, and one 4mC motif, while MicrobeMod discovered three 6mA motifs, two 5mC motifs, and zero 4mC motifs. Additionally, MicrobeMod and MIJAMP disagreed on the precise sequence of the 5mC motifs: MicrobeMod called GG(5mC)GATCGCC and (5mC)NCGNG, while MIJAMP called G(5mC)GATCGC and (5mC)YCGRG.
Table 1.
Benchmarking MIJAMP and MicrobeMod
| Escherichia coli | MIJAMP | MicrobeMod | |||||
|---|---|---|---|---|---|---|---|
| Motif | Methylated sites | Total sites | % | Methylated sites | Total sites | % | Disagreements |
| G(6mA)TC | 38 097 | 38 248 | 99.61 | 38 226 | 38 248 | 99.9 | None |
| GC(6mA)CNNNNNNGTT | 590 | 595 | 99.16 | 595 | 595 | 100 | None |
| A(6mA)CNNNNNNGTGC | 591 | 595 | 99.33 | 595 | 595 | 100 | None |
| C(5mC)WGG | 24 030 | 24 102 | 99.7 | 24 102 | 24 102 | 100 | None |
| % high-quality sites explained | 6mA: 94.8%, 5mC: 99.7% | 6mA: 94.8%, 5mC: 99.7% | |||||
| Picosynecococcus | MIJAMP | MicrobeMod | |||||
| Motif | Methylated sites | Total sites | % | Methylated sites | Total sites | % | Disagreements |
| G(6mA)TC | 44 886 | 45 782 | 98.04 | 42 668 | 45 782 | 93.2 | None |
| TGG(6mA)GG | 1784 | 1831 | 97.43 | n.d. | n.d. | n.d. | n.d. by MicrobeMod |
| GCCGN(6mA)C | 1218 | 1230 | 99.02 | 1202 | 1230 | 97.7 | None |
| GRGGA(6mA)G | 1259 | 1271 | 99.06 | n.d. | n.d. | n.d. | n.d. by MicrobeMod |
| GAGG(6mA)G | 1398 | 1453 | 96.21 | n.d. | n.d. | n.d. | n.d. by MicrobeMod |
| GA(6mA)GNNNNNTCC | 334 | 340 | 98.24 | 337 | 340 | 99.1 | None |
| GG(6mA)NNNNNCTTC | 337 | 340 | 99.12 | 337 | 340 | 99.1 | None |
| CRA(6mA)NNNNNNNNTGAC | 259 | 260 | 99.62 | n.d. | n.d. | n.d. | n.d. by MicrobeMod |
| GTC(6mA)NNNNNNNNTTYG | 244 | 260 | 93.85 | n.d. | n.d. | n.d. | n.d. by MicrobeMod |
| G(5mC)GATCGC | 11 040 | 11 170 | 98.84 | 7880 | 7924 | 99.4 | GG(5mC)GATCGCC |
| (5mC)YCGRG | 1217 | 1448 | 84.05 | 1440 | 10 230 | 14.1 | (5mC)NCGNG |
| CCC(4mC)GC | 1528 | 1800 | 84.89 | n.d. | n.d. | n.d. | n.d. by MicrobeMod |
| % high-quality sites explained | 6mA: 99.9%, 5mC: 81.9%, 4mC: 80.3% | 6mA: 94.3%, 5mC: 65.8%, 4mC: 0% | |||||
Modified bases are denoted in parentheses, n.d. = not detected.
To determine which software’s methylome is more accurate, we used the diagnostic script to evaluate how many methylated bases were explained by each methylome. Our rationale was that a complete methylome would explain significantly more modified bases than an incomplete one. However, MicrobeMod does not produce these data. Because diagnostic uses the output.tsv file produced by motifs as the input data containing relevant motifs to search for in the genome, we manually replaced the motifs from the MIJAMP methylome with those from MicrobeMod and then ran diagnostic with this spoofed file. The MIJAMP methylome explained 99.9% of the 6mA modifications, 81.9% of the 5mC modifications, and 80.3% of the 4mC modifications, while the MicrobeMod methylome explained 94.3%, 65.8%, and 0% of the 6 mA, 5mC, and 4mC modifications, respectively. The lower values obtained by the MicrobeMod methylome suggest that motifs are missing or miscalled in the MicrobeMod methylome. Additionally, the Synecococcus methylome produced by a beta version of MIJAMP was used recently to produce E. coli strains that mimic Synecococcus methylation, which improved transformation yields of Synecococcus between 2- and 20-fold (Hren et al., 2024). As the methylating strains target motifs not predicted by MicrobeMod but clearly improve transformation with heterologous DNA, we can conclude that these motifs are, in fact, real. Taken together with the output of diagnostic, we conclude that MIJAMP produces more accurate methylomes than MicrobeMod.
While benchmarking used a dataset consisting of 470 Gbp worth of reads (or 100x) for each genome, we questioned what minimum sequence depth is required by MIJAMP. To explore these minimums, we made datasets containing 30x, 50x, 75x, and 200x coverage worth of reads from the same master dataset as the E. coli dataset included in MIJAMP. The 30x dataset was insufficient to call proper GWM values, while the only difference between the rest of the datasets is that the 50x dataset reported less-stringent motifs (i.e. RMACNNNNNNGYK instead of GCACNNNNNNGTT) than the deeper datasets. Thus, we recommend a target coverage of 100–200x, with 75x being an absolute minimum.
Discussion
Methylome discovery is a crucial early step in genetic tool development for the engineering of non-model bacteria. To that end, we developed MIJAMP to discover methylated motifs containing any of the three possible DNA modifications in bacteria from ONT sequencing. Additionally, the user-driven refinement method featured in MIJAMP helps to decrease the chance of miscalling the true sequence of a motif. MIJAMP provides utilities for both directly querying the dataset with a motif and confirming the extent of the data explained by a given methylome.
We showed that MIJAMP produces more accurate methylomes than MicrobeMod. MIJAMP utility was demonstrated on a complex methylome, as removing explained data from consideration by MEME helps prevent spurious motif calls, which in turn provides better starting motifs for refinement. Additionally, MIJAMP methylome calling explains a much higher proportion of the methylome data than MicrobeMod, which did not call multiple well-supported motifs. We recommend using MIJAMP with a dataset representing at least 75x coverage of the target genome, though datasets significantly larger than 100x are unlikely to improve the quality of the methylome produced.
As MIJAMP is a methylBED file parser, it can work with any dataset in this format. Since the methylBED file format (https://www.encodeproject.org/data-standards/wgbs/) is both well defined and human readable, data from other sequencing platforms, such as Illumina bisulfite-Seq and PacBio SMRT-Seq, could likely be converted to the methylBED format for parsing by MIJAMP. We plan that future versions will provide options to directly query a methylBED file (and associated genome) for parsing.
MIJAMP is designed to find motifs in modern ONT sequencing data; as such, certain caveats should be kept in mind. MIJAMP will not detect non-motif-based methylation, e.g. eukaryotic CpG islands. MIJAMP can also only process data produced by ONT flowcells of version R10.4.1 and likely beyond; therefore, data from earlier versions such as the discontinued R10.4 flowcells are not usable. MIJAMP does not detect motifs longer than 20 bp, though the likelihood of encountering a real R–M motif of this length is low. MIJAMP relies on modified base models published by ONT, and the quantity and quality of modified base detection are completely dependent on the ONT models. During MIJAMP development, model v5.0.0 was released for modified base detection, which appears to produce more reliable modification calls than the previous v4.3.0 models, leading to much improved methylome discovery. In our experience, the ONT Modkit software can crash when handling very large datasets (∼1000x coverage); however, MIJAMP showed no improvement in motif-finding with datasets over the recommended 100x coverage, so we strongly recommend reducing large datasets to avoid issues running preprocess. Finally, motif refinement intentionally requires human intervention for functionality to leverage human pattern-recognition capabilities to discern noise from data. Users who accept all motifs passed to them when running motifs will very likely produce imprecise methylomes, limiting prospects for automation.
In conclusion, MIJAMP is a major advancement to enable more accurate elucidation of methylation patterns in diverse bacteria. This tool is an important software package to accelerate genetic modification for diverse applications, including the engineering of bacterial chassis toward economical production of fuels, chemicals, and materials at scale.
Acknowledgments
We thank Andrew Hren at CU—Boulder for the sample of gDNA from Synecococcus sp. PCC7002. We also thank Dr. Andrea Garza Elizondo for her work on the graphical abstract and Fig. 1. This manuscript has been authored by Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC, under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Contributor Information
Alyssa K Tidwell, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA; The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee—Knoxville, Knoxville, TN, USA.
Evelyn Faust, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Carrie A Eckert, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Adam M Guss, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
William G Alexander, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Author Contributions
Conceptualization: W.G.A., A.M.G., and C.A.E.; code development: W.G.A., A.K.T., and E.F.; sequence data generation: W.G.A.; writing: all authors; project administration: W.G.A., A.M.G., and C.A.E.
Funding
Funding was provided in part by the U.S. DOE Office of Energy Efficiency and Renewable Energy Bioenergy Technologies Office (BETO) to the Agile BioFoundry. Funding was also provided in part by the DOE Office of Science Office of Biological and Environmental Research award DE-SC0023085.
Conflict of Interest
The authors declare no conflict of interest.
Data Availability
The data underlying this article are available in the MIJAMP GitLab repository (https://code.ornl.gov/alexander-public/mijamp/-/tree/main/testData).
References
- Ahsan M. U., Gouru A., Chan J., Zhou W., Wang K. (2024). A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nature Communications, 15(1), 1448. 10.1038/s41467-024-45778-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey T. L., Johnson J., Grant C. E., Noble W. S. (2015). The MEME Suite. Nucleic Acids Research, 43(W1), W39–W49. 10.1093/nar/gkv416 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernheim A., Sorek R. (2020). The pan-immune system of bacteria: Antiviral defence as a community resource. Nature Reviews Microbiology, 18(2), 113–119. 10.1038/s41579-019-0278-2 [DOI] [PubMed] [Google Scholar]
- Bickle T. A., Krüger D. H. (1993). Biology of DNA restriction. Microbiological Reviews, 57(2), 434–450. 10.1128/mr.57.2.434-450.1993 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crits-Christoph A., Kang S. C., Lee H. H., Ostrov N. (2023). MicrobeMod: A computational toolkit for identifying prokaryotic methylation and restriction-modification with nanopore sequencing. bioRxiv. 10.1101/2023.11.13.566931 [DOI] [Google Scholar]
- Danecek P., Bonfield J. K., Liddle J., Marshall J., Ohan V., Pollard M. O., Whitwham A., Keane T., McCarthy S. A., Davies R. M., Li H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2), giab008. 10.1093/gigascience/giab008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng Z., Fang G., Korlach J., Clark T., Luong K., Zhang X., Wong W., Schadt E. (2013). Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic. PLoS Computational Biology, 9(3), e1002935. 10.1371/journal.pcbi.1002935 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hren A. P., Abraham J. P., Tumen-Velasquez M. P., Vergara M. M., Guss A. M., Alexander W. G., Pfleger B. F., Fox J. M., Eckert C. A. (2024). High-efficiency transformation and gene expression in Picosynechococcus sp. PCC 7002. bioRxiv, 10.1101/2024.09.17.613521 [DOI] [Google Scholar]
- Kolmogorov M., Yuan J., Lin Y., Pevzner P. A. (2019). Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology, 37(5), 540–546. 10.1038/s41587-019-0072-8 [DOI] [PubMed] [Google Scholar]
- Li H. (2016). Minimap and miniasm: Fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 32(14), 2103–2110. 10.1093/bioinformatics/btw152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094–3100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Y., Tollefsbol T. O. (2011). DNA methylation detection: Bisulfite genomic sequencing analysis. Methods in Molecular Biology (Clifton, N.J.), 791, 11–21. 10.1007/978-1-61779-316-5_2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y., Rosikiewicz W., Pan Z., Jillette N., Wang P., Taghbalout A., Foox J., Mason C., Carroll M., Cheng A., Li S. (2021). DNA methylation-calling tools for Oxford Nanopore sequencing: A survey and human epigenome-wide evaluation. Genome Biology, 22(1), 295. 10.1186/s13059-021-02510-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rand A. C., Jain M., Eizenga J. M., Musselman-Brown A., Olsen H. E., Akeson M., Paten B. (2017). Mapping DNA methylation with high-throughput nanopore sequencing. Nature Methods, 14(4), 411–413. 10.1038/nmeth.4189 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riley L. A., Ji L., Schmitz R. J., Westpheling J., Guss A. M. (2019). Rational development of transformation in Clostridium thermocellum ATCC 27405 via complete methylome analysis and evasion of native restriction-modification systems. Journal of Industrial Microbiology & Biotechnology, 46(9–10), 1435–1443. 10.1007/s10295-019-02218-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snajder R., Leger A., Stegle O., Bonder M. J. (2023). pycoMeth: A toolbox for differential methylation testing from Nanopore methylation calls. Genome Biology, 24(1), 83. 10.1186/s13059-023-02917-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tourancheau A., Mead E. A., Zhang X.-S., Fang G. (2021). Discovering multiple types of DNA methylation from bacteria and microbiome using nanopore sequencing. Nature Methods, 18(5), 491–498. 10.1038/s41592-021-01109-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaser R., Šikić M. (2021). Time- and memory-efficient genome assembly with Raven. Nature Computational Science, 1(5), 332–336. 10.1038/s43588-021-00073-4 [DOI] [PubMed] [Google Scholar]
- Wick R. R., Judd L. M., Cerdeira L. T., Hawkey J., Méric G., Vezina B., Wyres K. L., Holt K. E. (2021). Trycycler: Consensus long-read assemblies for bacterial genomes. Genome Biology, 22(1), 266. 10.1186/s13059-021-02483-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yasui K., Kano Y., Tanaka K., Watanabe K., Shimizu-Kadota M., Yoshikawa H., Suzuki T. (2009). Improvement of bacterial transformation efficiency using plasmid artificial modification. Nucleic Acids Research, 37(1), e3. 10.1093/nar/gkn884 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data underlying this article are available in the MIJAMP GitLab repository (https://code.ornl.gov/alexander-public/mijamp/-/tree/main/testData).


