Abstract
To rapidly identify and systematically analyse the vegetative replication origins (oriVs) of bacterial plasmids, we present OriV-Finder, a comprehensive web server for bacterial plasmid replication origin analysis. To fulfil this purpose, we collected 470 replication initiation proteins (RIPs) reported in the literature, identified 35 conserved domains associated with RIPs, and summarized conserved features of oriVs for various replication initiation mechanisms. Therefore, OriV-Finder could accurately identify the homologous genes of RIPs and then assess the likelihood of each intergenic sequence as a potential oriV based on the information of RIPs and conserved features. Consequently, the potential oriVs could be designated using a priority-based scoring system. As a user-friendly web server, OriV-Finder integrates visualization modules of oriVs, RIPs, and genomes, which facilitates the analysis and validation of oriVs. OriV-Finder is freely available to all users without any login requirement at https://tubic.org/OriV-Finder/.
Graphical Abstract
Graphical Abstract.
Introduction
Plasmids are autonomous, self-replicating genetic elements that coexist with chromosomes within cellular environments [1], predominantly in bacteria. As an indispensable component, the vegetative replication origin (oriV) is essential for the survival and propagation of plasmid [2]. Bacterial plasmids strategically utilize host cellular machinery while maintaining autonomy through plasmid-specific replication initiation mechanisms [3]. Plasmid replication is mainly divided into three modes, i.e. theta replication, rolling-circle replication, and strand-displacement replication [4]. The oriV typically resides within the intergenic sequences (IGSs) adjacent to the genes encoding replication initiation proteins (RIPs) and is sometimes embedded within these genes [5, 6]. The parallel principles have also been observed in the chromosomal replication origin (oriC) systems of bacteria and archaea [7, 8], which have been considered important basis to design the prediction algorithm of oriCs [9–11]. In addition, the replication and partitioning systems are arranged in an apparent operon in certain plasmids to ensure faithful transmission of plasmids to daughter cells of host [12]. These features could provide important references for the analysis of plasmid replication origins.
With the advancement of sequencing technology, plasmid sequences have grown exponentially, which has promoted the development of analysis tools and databases specialized for plasmids. However, there is currently no online resource specifically dedicated to analysing oriVs. PlasmidFinder [13] is an invaluable tool for plasmid typing in bacterial genomics and molecular epidemiology, which primarily identifies coding sequences (CDSs) of RIPs or incompatibility (Inc) group marker sequences rather than precise oriV regions, with scope restricted to Enterobacteriaceae and Gram-positive bacteria currently. The curated databases like PLASDB [14], IMG/PR [15], and PIPdb [16] offer comprehensive annotation and visualization for plasmid sequences but lack specialized functions for oriV analysis. The latest prokaryotic replication origin database DoriC 12.0 [17], contains only ∼1000 oriV entries, which are collected from literature and supplemented with some highly confident theta-A type oriVs predicted using Ori-Finder 2022 [11]. Therefore, there is an urgent need to develop online services specifically for comprehensive oriV analysis across bacterial plasmids.
In this study, we present OriV-Finder, a user-friendly web server for the comprehensive analysis of bacterial plasmid replication origins, which integrates RIP homolog distribution information and sequence features. Based on a comprehensively collected RIP dataset, OriV-Finder conducts rapid homology searches across whole genome of bacterial plasmid. Subsequently, the likelihood of each IGS as a potential oriV is evaluated using RIP-related information and additional sequence features, and finally, the potential oriVs would be designated using a priority-based scoring system. For the convenience of users, OriV-Finder also provides visualization modules for oriVs, RIPs, and genomes. OriV-Finder is expected to become a popular tool for plasmid replication origin analysis and contribute to the development of this field.
Methods and implementation
Data collection and processing
RIP sequences and oriV regions were manually collected and curated from published literature. To eliminate redundancy while maintaining sequence diversity of RIPs, MMseqs2 [18] was employed for clustering with stringent parameters (–min-seq-id 0.9 -c 0.9 –cov-mode 0), and the experimentally validated RIPs (Supplementary Table S1) were returned with priority as representative sequences for each cluster. Then, HMMER3 [19] was used to align RIPs against the Pfam-A [20] and NCBIfam [21] databases. HMM-profiles for 35 domains unique to bacterial plasmid RIPs (Supplementary Table S2) were retained with verification from published literature. Additionally, a covariance model was constructed for RNAII homology search.
General workflow
To systematically identify the oriVs in plasmids, a bioinformatics analysis pipeline was developed, which is shown in Fig. 1. Initially, the query genome was annotated, and then the CDSs and IGSs were obtained for subsequent processing.
Figure 1.
The analysis pipeline used by OriV-Finder to identify the putative oriVs of a plasmid.
Detection of homologous genes of RIPs in CDSs
The homologs of RIPs have been detected in translated CDSs by MMseqs2 and HMMER3 algorithms. For MMseqs2-based homology detection, stringent filtering criteria were applied, while the HMMER3 analysis employed hmmscan with default noise cutoff (NC) parameters as specified in the profile hidden Markov models (Supplementary Table S2) to determine significant hits.
Detection of oriV features in IGSs
For IGS regions, the following oriV features have been extracted and scored.
OriV similar sequences: BLAST is employed to identify similar sequences to the curated oriV regions collected from literature.
ColE1 like regions: the presence of ColE1 like region has been searched for by the Infernal [22] program.
Iteron sequences: a K-mer sliding window approach combined with Shannon entropy scoring is applied to detect potential iteron sequences (Supplementary Methods).
AT-rich regions: a modified Z-curve-based algorithm is utilized to identify potential AT-rich regions [23, 24].
Conserved motifs: common conserved motifs associated with oriVs are identified, including but not limited to DnaA boxes, nick sites [25], and IncQ conserved motif [26] (Supplementary Table S3).
Priority-based scoring system for potential oriVs
Based on the detection results, OriV-Finder employs a priority-based scoring system to identify potential oriVs. Each IGS will be assigned a type by this scoring system, and the IGSs with highest priority will be output as potential oriVs [27].
Type 1 (highest priority)
Blast hit presence: If an oriV similar sequence is detected within an IGS, this IGS is designated as Type 1 (evidence: ‘Blast hit’); ColE1 like region presence: if a ColE1 like region is detected within an IGS, this IGS is designated as Type 1 (evidence: ‘ColE1 like’); RIP flanking IGSs: for each identified homologous gene of RIP, the CDS and its four flanking IGSs (two upstream and two downstream) were evaluated, and the region with the highest score among these five segments is designated as Type 1 (evidence: ‘RIP’ or ‘nearby RIP’).
Type 2 (secondary priority)
No RIP homologs: in the absence of RIP homologs, for each partition protein (such as ParA or ParB), the six IGSs flanking the CDS of partition protein (three upstream and three downstream) were evaluated. The IGS with the highest comprehensive score among these six candidates is designated as Type 2.
Type 3 (lowest priority)
No RIP or partition protein: if neither RIP homolog nor partition-related protein is present, the IGS with the highest score is designated as Type 3.
Web platform implementation
OriV-Finder was deployed using a Linux-Apache-Redis-Gunicorn-Django architecture, with Python serving as the primary development language. Bioinformatics tools including MMseqs2 and HMMER3 for homologous genes of RIPs detection, Bakta [28] for genome annotation, and Infernal for RNAII homology search have been incorporated into the OriV-Finder framework. The plasmid sequence display panel was built using the JavaScript library SeqViz (https://github.com/Lattice-Automation/seqviz). Visualization modules using PyEcharts and Plotly have been developed to display the features of oriVs, conserved domains of RIPs, and whole genomes of plasmids. The web interface of OriV-Finder consists of input and output pages generated by HTML, CSS, and JavaScript. OriV-Finder could be fully functional on the latest versions of major web browsers, including Firefox, Chrome, Safari, and Microsoft Edge.
Input
The OriV-Finder web server is applicable to a wide range of bacterial plasmids. It allows users to upload a plasmid genome sequence in FASTA format as input. Although OriV-Finder can also detect oriVs in contigs, it was initially designed to identify and analyse oriVs in complete plasmid genomes. Therefore, complete plasmid genome sequence is recommended as input for optimal results.
Output
In addition to detailed information on oriVs, the OriV-Finder web server provides the interactive visualization to facilitate the analysis and characterization of plasmid replication origins from different perspectives. The visualization interface primarily consists of the following parts, each offering unique insights into replication mechanisms, as demonstrated by using the plasmid pQEL231 as an example (Fig. 2). For more detailed information, please refer to https://tubic.org/OriV-Finder/Tutorial.
Figure 2.
An overview of OriV-Finder output results using the plasmid pQEL231 as an example. (A) The visualization of the potential oriV. (B) The visualization of potential RIP. (C) The genome sequence visualized by the GC disparity curve integrated with the K-mer cumulative score. (D) The genome sequence visualized by integrated SeqViz.
Replication origin visualization
The oriV visualization displays the location, type, and supporting evidence of oriVs through both GC profile and smoothed GC profile, with AT-rich and GC-rich regions marked in pink and green, respectively. It also includes a sequence diagram illustrating the conserved features of oriVs, providing a comprehensive overview of the replication initiation landscape (Fig. 2A).
RIP visualization
The RIP visualization presents MMseqs2 alignment results between the RIP homolog and its best hit RIP, alongside conserved domain features identified through HMMER3 alignment against Pfam-A database. Within the interactive HTML table, users can expand the full protein sequence by clicking on the sequence and utilize the copy function for convenient sequence extraction to facilitate further analyses. The domain visualization section offers an intuitive interface where hovering over specific domains reveals comprehensive domain information (Fig. 2B).
Genome visualization
On the one hand, the genome sequence is visualized by the GC disparity curve integrated with the K-mer cumulative score [29]. The locations of RIPs and oriVs are marked with red and green rectangles, respectively (Fig. 2C). On the other hand, the genome is visualized by the integrated SeqViz with RIPs marked in red, oriV in green, and ncRNA & regulatory RNA in yellow etc. Through these two methods, users can easily view the distribution characteristics of RIPs, oriVs, and the RNAs that regulate replication initiation. Additionally, the detailed annotations of potential iterons, DnaA boxes, AT-rich regions, oriV similar sequences, and conserved sites can be displayed within oriVs by SeqViz (Fig. 2D).
The web server also offers convenient download option, including Bakta annotation results supplemented the identified RIPs and oriVs, the CSV file with detailed scores for every IGS by OriV-Finder.
Discussion and conclusion
In order to demonstrate the advantages of OriV-Finder, we have conducted the following comparisons for rough reference, since there is no specialized tool with the same function as OriV-Finder. Firstly, we performed the comparison between OriV-Finder with default parameters (excluding BLAST function) and Ori-Finder 2022 with plasmids uploaded as secondary chromosomes, based on a dataset of 327 plasmids with 380 annotated oriVs collected from literature and curated manually. Consequently, 366 of 380 oriVs have been identified by OriV-Finder as Type 1 oriVs, while only 81 oriVs have been detected by Ori-Finder 2022. Secondly, we also compared OriV-Finder with PlasmidFinder based on all bacterial circular plasmids in the PLSDB 2025 database using default parameters. PlasmidFinder successfully identified the oriV-related sequences (RIPs, ColE1 like regions, and Inc group markers) in 55.6% of plasmids, whereas OriV-Finder demonstrated superior recognition performance, detecting oriV-related sequences (RIPs and ColE1 like regions) in 85.5% of plasmids. It should be noted PlasmidFinder can not provide accurate oriV locations like OriV-Finder. The comparision results together with the datasets are available at https://tubic.org/OriV-Finder/Tutorial.
For the plasmid genomes without RIPs and ColE1 like region, OriV-Finder outputs candidate oriVs of type 2 or type 3. Although the proteins encoded by the genes adjacent to the candidate oriVs have no obvious sequence similarity with the collected RIPs, we found some proteins encoded by these genes have strong structural similarities with replication initiation-related proteins by using Alphafold [30] for structure prediction and then compare them with AFDB50 by foldseek [31], which may suggest that they are potential RIPs. However, due to the lack of experimental validation, these putative RIPs were not incorporated into our RIP dataset but available in the Supplementary Table S4 for reference.
Although excellent performance is achieved, several limitations of OriV-Finder should also be noted. First, its performance largely depends on the accuracy of gene prediction, as our approach relies on the spatial relationship between the oriVs and the RIP-encoding genes. This dependency may reduce the effectiveness in highly fragmented assemblies or incomplete sequences. Second, the oriVs output by OriV-Finder may be suboptimal for plasmids harboring novel or atypical replication mechanisms that deviate significantly from the characterized systems in our database. Additionally, the scoring system was primarily optimized for circular plasmids instead of linear plasmids, particularly those with terminal proteins or hairpin structures, due to limited experiment results.
In general, we have developed OriV-Finder, a user-friendly web server specialized in analyzing and visualizing oriVs in bacterial plasmids. The web server uses the distribution of RIPs in the genome and a priority-based scoring system to evaluate the IGSs as the potential oriVs. In the future, OriV-Finder will be continuously updated to integrate the latest scientific discoveries in plasmid DNA replication, consistently expand its repository of cataloged RIPs, and incorporate other relevant features to enhance the reliability of RIP and oriV identification.
Supplementary Material
Acknowledgements
The authors thank Dr Mei-Jing Dong and Yu-Hao Zeng for their invaluable assistance.
Author contributions: Yujie Li (Conceptualization [equal], Formal analysis [equal], Methodology [equal], Validation [equal], Visualization [equal], Writing—original draft [equal]) and Feng Gao (Conceptualization [lead], Formal analysis [equal], Writing—review & editing [equal])
Contributor Information
Yujie Li, Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.
Feng Gao, Department of Physics, School of Science, Tianjin University, Tianjin 300072, China; State Key Laboratory of Synthetic Biology, Tianjin University, Tianjin 300072, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China.
Supplementary data
Supplementary data is available at NAR online.
Conflict of interest
None declared.
Funding
This work was supported by the National Natural Science Foundation of China [grant numbers 32270692, 31571358]. Funding to pay the Open Access publication charges for this article was provided by National Natural Science Foundation of China.
Data availability
All data are incorporated into the article and its online supplementary material. OriV-Finder, including its Docker image, is freely available at https://tubic.org/OriV-Finder/.
References
- 1. Gao NL, Chen J, Wang T et al. Prokaryotic genome expansion is facilitated by phages and plasmids but impaired by CRISPR. Front Microbiol. 2019; 10:2254. 10.3389/fmicb.2019.02254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Yue H, Ling C, Yang T et al. A seawater-based open and continuous process for polyhydroxyalkanoates production by recombinant Halomonas campaniensis LS21 grown in mixed substrates. Biotechnol Biofuels. 2014; 7:108. 10.1186/1754-6834-7-108. [DOI] [Google Scholar]
- 3. Wang G, Wang Q, Qi Q et al. Dynamic plasmid copy number control for synthetic biology. Trends Biotechnol. 2024; 42:147–50. 10.1016/j.tibtech.2023.08.004. [DOI] [PubMed] [Google Scholar]
- 4. Lilly J, Camps M Mechanisms of theta plasmid replication. Microbiol Spectr. 2015; 3:PLAS-0029-2014. 10.1128/microbiolspec.PLAS-0029-2014. [DOI] [PubMed] [Google Scholar]
- 5. Wang P, Zhu Y, Shang H et al. A minireplicon of plasmid pBMB26 represents a new typical replicon in the megaplasmids of Bacillus cereus group. J Basic Microbiol. 2018; 58:263–72. 10.1002/jobm.201700525. [DOI] [PubMed] [Google Scholar]
- 6. Ruiz-Masó JA, Machó NC, Bordanaba-Ruiseco L et al. Plasmid rolling-circle replication. Microbiol Spectr. 2015; 3:PLAS-0035-2014. 10.1128/microbiolspec.PLAS-0035-2014. [DOI] [PubMed] [Google Scholar]
- 7. Leonard AC, Méchali M DNA replication origins. Cold Spring Harb Perspect Biol. 2013; 5:a010116. 10.1101/cshperspect.a010116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Jha J, Ramachandran R, Chattoraj D Opening the strands of replication origins—still an open question. Front Mol Biosci. 2016; 3:62. 10.3389/fmolb.2016.00062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Gao F, Zhang C-T Ori-Finder: a web-based system for finding oriC s in unannotated bacterial genomes. BMC Bioinformatics. 2008; 9:79. 10.1186/1471-2105-9-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Luo H, Zhang C-T, Gao F Ori-Finder 2, an integrated tool to predict replication origins in the archaeal genomes. Front Microbiol. 2014; 5:482. 10.3389/fmicb.2014.00482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dong M-J, Luo H, Gao F Ori-Finder 2022: a comprehensive web server for prediction and analysis of bacterial replication origins. Genomics Proteomics Bioinformatics. 2022; 20:1207–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Pinto UM, Pappas KM, Winans SC The ABCs of plasmid replication and segregation. Nat Rev Micro. 2012; 10:755–65. 10.1038/nrmicro2882. [DOI] [PubMed] [Google Scholar]
- 13. Carattoli A, Zankari E, García-Fernández A et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother. 2014; 58:3895–903. 10.1128/AAC.02412-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Molano L-AG, Hirsch P, Hannig M et al. The PLSDB 2025 update: enhanced annotations and improved functionality for comprehensive plasmid research. Nucleic Acids Res. 2025; 53:D189–96. 10.1093/nar/gkae1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Camargo AP, Call L, Roux S et al. IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata. Nucleic Acids Res. 2024; 52:D164–73. 10.1093/nar/gkad964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhu Q, Chen Q, Gao S et al. PIPdb: a comprehensive plasmid sequence resource for tracking the horizontal transfer of pathogenic factors and antimicrobial resistance genes. Nucleic Acids Res. 2025; 53:D169–78. 10.1093/nar/gkae952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Dong M-J, Luo H, Gao F DoriC 12.0: an updated database of replication origins in both complete and draft prokaryotic genomes. Nucleic Acids Res. 2023; 51:D117–20. 10.1093/nar/gkac964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Steinegger M, Söding J MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017; 35:1026–8. 10.1038/nbt.3988. [DOI] [PubMed] [Google Scholar]
- 19. Potter SC, Luciani A, Eddy SR et al. HMMER web server: 2018 update. Nucleic Acids Res. 2018; 46:W200–4. 10.1093/nar/gky448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Paysan-Lafosse T, Andreeva A, Blum M et al. The Pfam protein families database: embracing AI/ML. Nucleic Acids Res. 2025; 53:D523–34. 10.1093/nar/gkae997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Goldfarb T, Kodali VK, Pujar S et al. NCBI RefSeq: reference sequence standards through 25 years of curation and annotation. Nucleic Acids Res. 2025; 53:D243–57. 10.1093/nar/gkae1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Nawrocki EP, Eddy SR Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–5. 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Lai F-L, Gao F GC-Profile 2.0: an extended web server for the prediction and visualization of CpG islands. Bioinformatics. 2022; 38:1738–40. 10.1093/bioinformatics/btab864. [DOI] [PubMed] [Google Scholar]
- 24. Rajewska M, Wegrzyn K, Konieczny I AT-rich region and repeated sequences—the essential elements of replication origins of bacterial replicons. FEMS Microbiol Rev. 2012; 36:408–34. 10.1111/j.1574-6976.2011.00300.x. [DOI] [PubMed] [Google Scholar]
- 25. Khan SA Plasmid rolling-circle replication: highlights of two decades of research. Plasmid. 2005; 53:126–36. 10.1016/j.plasmid.2004.12.008. [DOI] [PubMed] [Google Scholar]
- 26. Rawlings DE, Tietze E Comparative biology of IncQ and IncQ-like plasmids. Microbiol Mol Biol Rev. 2001; 65:481–96. 10.1128/MMBR.65.4.481-496.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Gao F, Zhang C-T DoriC: a database of oriC regions in bacterial genomes. Bioinformatics. 2007; 23:1866–7. 10.1093/bioinformatics/btm255. [DOI] [PubMed] [Google Scholar]
- 28. Schwengers O, Jelonek L, Dieckmann MA et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom. 2021; 7:000685. 10.1099/mgen.0.000685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Mori H, Evans-Yamamoto D, Ishiguro S et al. Fast and global detection of periodic sequence repeats in large genomic resources. Nucleic Acids Res. 2019; 47:e8. 10.1093/nar/gky890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Abramson J, Adler J, Dunger J et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024; 630:493–500. 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. van Kempen M, Kim SS, Tumescheit C et al. Fast and accurate protein structure search with Foldseek. Nat Biotechnol. 2024; 42:243–6. 10.1038/s41587-023-01773-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data are incorporated into the article and its online supplementary material. OriV-Finder, including its Docker image, is freely available at https://tubic.org/OriV-Finder/.