Skip to main content
Scientific Data logoLink to Scientific Data
. 2025 Jul 1;12:1075. doi: 10.1038/s41597-025-05434-6

Dataset for Single Character Detection in Dongba Manuscripts

Yuqi Ma 1, Yongbo Li 1, Guang Long 1, Ruiyuan Li 1, Fengyuan Hu 1, Chenjun Xu 1, Yueran Wang 1, Maoling Peng 2, Xiaoliang Li 3, Shanxiong Chen 1,4,
PMCID: PMC12217708  PMID: 40593916

Abstract

The Dongba script, a unique pictographic writing system invented by the ancestors of the Naxi people of China, holds great significance for interpreting the ancient Naxi language, history, and culture. Accurate detection of the Dongba script is crucial for in-depth research into Dongba manuscripts. Through automated Dongba script detection technology, experts can efficiently extract precise text data from many manuscripts, providing essential support for the subsequent translation of Dongba script and constructing a corpus. In response to this need, we have developed the ‘Dongba1800’ dataset, designed explicitly for Dongba script detection. This dataset comprises 1800 annotated images of Dongba manuscripts, with resolutions ranging from 1200 × 416 to 1201 × 530, totaling 111,702 Dongba characters. The characteristics of Dongba character images include (1) the complexity of Dongba characters, with varying sizes and nonlinear arrangements; (2) significant differences in writing styles among different Dongba scribes; and (3) severe aging and noise caused by long-term use and preservation. The Dongba1800 dataset provides a powerful tool for archaeologists, greatly simplifying and optimizing the organization and study of Dongba manuscripts. Additionally, we have conducted technical validations with various text detection models on the Dongba1800 dataset to ensure its effectiveness and reliability.

Subject terms: History, Computer science

Background & Summary

The current population of the Naxi ethnic group is approximately 300,000, with the majority distributed across several counties in Yunnan, Sichuan, and Tibet, including Lijiang, Yongsheng, Ninglang, Zhongdian, Weixi, Jianchuan, Heqing in Yunnan; Muli and Yanyuan in Sichuan; and Yajiang and Mangkang in Tibet1,2. Lijiang serves as the primary settlement and the political, economic, and cultural center of the Naxi people. The Naxi ethnicity originates in the ancient Qiang people and shares profound historical connections with the Tibetan, Yi, Bai, and other neighboring ethnic groups. The Naxi ancestors created the brilliant Dongba culture and the Dongba script. The Naxi Dongba script is an ancient and unique writing system, still in use among practitioners of the Naxi Dongba religion today. It is referred to as a living fossil of primitive writing. Approximately 30,000 volumes of Dongba scriptures, written in the Dongba script, are held in collections in China and abroad, with over 1,000 different scriptures3. Additionally, there is a significant amount of practical literature, covering a wide range of topics including language, writing, history, geography, religion, philosophy, ethnicity, folklore, literature, art, astronomy, calendrics, agriculture, livestock, and medicine in ancient Naxi society4. This body of work serves as an encyclopedia of ancient Naxi culture. In August 2003, the Naxi Dongba ancient manuscripts were included in the UNESCO Memory of the World register5.

The Dongba script, primarily used for recording religious classics, employs a highly distinctive combination of materials and tools. Specifically, Dongba people typically utilize bamboo pens dipped in specially formulated ink—a mixture of pine resin, glue, gall, and alcohol—to inscribe their texts on durable, thick cotton paper made from tree bark6 (Fig. 1). The layout of Dongba scriptures is distinctive, with each page typically divided into three or four horizontal sections. Text is written from left to right, starting at the top. After each sentence, a vertical line separates it before continuing with the next. Once a row is completed, the following rows are written similarly before turning the page to continue on the back. There are over 25,000 surviving Dongba manuscripts, with approximately 10,000 housed in libraries in the United States, United Kingdom, France, and Germany. Another 15,000 are preserved in libraries and museums in Beijing, Kunming, Lijiang, and Taiwan, with additional manuscripts held privately.

Fig. 1.

Fig. 1

(a) Dongba paper; (b) Writing Dongba script on Dongba paper using a bamboo stylus; (c) Ancient Dongba scriptures.

However, the preservation and study of Dongba ancient manuscripts face numerous challenges. Due to the fragility of Dongba paper, which is prone to folding, damage, and contamination, the ink marks tend to fade over time. These issues complicate the task of organizing Dongba manuscripts. Therefore, digitizing the Dongba script is paramount for protecting, inheriting, and disseminating this cultural heritage. Further script analysis is conducted using computer vision and artificial intelligence techniques following digitization. Automated Dongba script detection technology enables experts to efficiently extract precise text data from many manuscripts, providing high-quality support for subsequent translation, corpus construction, and cultural research. Nevertheless, the Dongba script varies greatly in material form and appearance across different writing environments and historical periods, presenting unique challenges:

  1. Each character’s shape and structure are unique, complicating the detection process. (2) The characters of the Dongba script are diverse in form and size, and the images contain non-textual elements such as borders and decorative patterns, further complicating text detection. (3) Dongba ancient manuscripts have been handed down as copies written by numerous Dongba scribes, each exhibiting significant differences in writing styles, which makes the standardization and normalization of the texts more difficult. (4) Due to long-term and repeated use, many Dongba script images have been affected by stains and watermarks, with some characters becoming blurred, thus increasing the complexity of text processing.

The ‘Dongba1800’ dataset introduced in this paper aims to enhance the quality and efficiency of Dongba script detection research, easing the workload of paleographers in their text detection efforts.

Methods

This section outlines the process used to collect and annotate the Dongba script detection dataset. The annotation of Dongba script was manually performed, a process which is described in detail below. Additionally, we developed specific annotation and review methods tailored for the Dongba script detection dataset. The workflow for Dongba script detection includes image downloading, image selection, text annotation, dataset partitioning, and the use of text detection models for verification, followed by visualization (Fig. 2).

Fig. 2.

Fig. 2

The Workflow of Dongba Script Detection.

Researchers

A total of 19 individuals participated in the data annotation and review process, including 16 professionally trained master’s students as annotators, two experienced reviewers, and one supervisor responsible for the final approval. These annotators were from the Institute of Chinese Language and Literature and the School of Computer Information Science at Southwest University. We also invited experts in Dongba script for specialized training sessions to ensure high standards and accuracy in the annotation process. Together, these annotators completed the annotation of 1,800 images, covering 111,702 Dongba characters. The review team engaged in extensive research and discussions on Dongba script at the beginning of the project to ensure a thorough understanding. To enhance the consistency and accuracy of annotations, the researchers developed a set of stringent guidelines for dataset annotation and review. These guidelines included detailed standards for annotation, particularly for defining complex characters in the Dongba script, such as determining which characters could be divided into multiple components and how to handle such divisions.

Data collection

The images of Dongba manuscripts used in this study were sourced from the Harvard-Yenching Library7. These images include collections from Joseph Rock (510 items) and Quentin Roosevelt (88 items), acquired by the Harvard-Yenching Institute in 1945. The manuscripts exhibit diverse unique features such as uneven text backgrounds, variability in font size, a range of handwriting styles, inconsistent lighting conditions, and text distortions. Specifically, the image quality is generally poor due to the aging of the manuscripts, which has resulted in yellowed paper and degraded texture. Additionally, various factors during the photography process have further exacerbated image blurriness. Due to the complex structure of Dongba characters, the upper and lower parts of text lines often misalign, with minimal line spacing leading to text sticking together. Text contrast within the Dongba manuscripts also varies significantly, with ink density varying across the text, further complicating text processing. Although all manuscripts derive from the same historical period, they were created by different writers, each with a unique handwriting style. This diversity, coupled with inconsistencies in font sizes and variations between handwritten versions, makes the standardization of ancient texts more challenging. Many handwritten Dongba manuscripts feature mixed graphic and textual layouts, with varied writing orientations and spatial distributions, adding to the complexity of automatic detection and analysis.

Data selection

To ensure the accuracy and representativeness of our study, we specifically selected manuscripts written in various font sizes of the Dongba script as our primary research subjects. During the selection process, we intentionally excluded covers and other non-text elements that did not contain Dongba script, focusing instead on texts rich in information. Through a meticulous selection process, we ultimately chose 1,800 images of Dongba manuscripts that are representative and of high quality, forming the foundation of our dataset. These images span a resolution range from 1200 × 416 to 1201 × 530 pixels and include a total of 111,702 handwritten Dongba characters, showcasing diversity in font size, writing style, and ink density. This diversity, a characteristic feature of handwritten Dongba manuscripts, provides a rich data resource for studying ancient texts but also presents certain challenges. To effectively manage these image data, we employed advanced image processing and management techniques to categorize and annotate all images, ensuring each one could be correctly identified and utilized.

Data annotation

The annotation process for the Dongba manuscript dataset began on July 12, 2024, and concluded on August 13, 2024. During this period, team members carefully selected images suitable for academic research and processed them following academic standards. To accurately annotate text areas within the Dongba manuscript dataset, we utilized PPOCRLabel, a tool that supports multi-point annotation to flexibly handle various complex text forms (Fig. 3). Common text annotation methods, such as those used in the ICDAR20138 and ICDAR20159 datasets, involve rectangular four-point annotations. These datasets, published by the International Conference on Document Analysis and Recognition (ICDAR), are primarily used to advance text detection and recognition technologies in natural scenes. However, these rectangular methods can lead to overlapping boxes in the case of Dongba manuscripts, which can adversely affect model training. In contrast, the CTW150010 dataset focuses on curved text detection, using polygonal annotations, typically with 14 points, which better cover curved or distorted texts. Given the complex shapes of Dongba characters, we adopted an unrestricted multi-point annotation approach. This method allows for more precise adjustments according to the specific character shapes in Dongba manuscripts, significantly enhancing data accuracy and the effectiveness of model training. For texts that are neatly shaped and regular, we employed four-point rectangular annotations; for irregular or complex texts, we used multi-point annotations. Examples of these methods are demonstrated in Fig. 4, where green solid lines clearly delineate the different annotation styles. All annotation data were saved in text format (TXT) with the label “text” to facilitate subsequent model training and evaluation. Additionally, the dataset was divided into 80% training and 20% testing sets, with the training set containing 1,440 images and 89,133 handwritten Dongba characters, and the testing set comprising 360 images with 22,569 characters.

Fig. 3.

Fig. 3

PPOCRLabel Annotation Tool.

Fig. 4.

Fig. 4

Examples of Different Annotation Methods. (a) Rectangle (b) Polygons (c,d) Rectangle and Polygons. Green solid lines represent annotation details.

Data Records

The Dongba1800 dataset11 is freely available for researchers. It includes 1,800 curated JPEG image files and 1,800 text annotation files in TXT format. All files are named in a consistent format to ensure easy indexing and association between images and their corresponding annotations: JPEG images are named ‘image_ < number > .jpg’ (e.g., ‘image_1.jpg’), and TXT files are named ‘gt_image_ < number > .txt’ (e.g., ‘gt_image_1.txt’). In these TXT files, annotations of Dongba characters include a verified total of 111,702 characters, ensuring the accuracy and reliability of the data. Each character’s spatial position is identified by a series of coordinate pairs that define the polygonal boundaries of the text boxes. For example, the coordinate sequence “161, 59, 202, 57, 256, 85, 239, 154, 182, 147, 163, 107” represents the vertices of a polygon, with each pair like “161, 59” indicating the x and y coordinates of a vertex. Coordinates are typically listed in a clockwise direction to comprehensively outline the full contour of the polygon. To differentiate between records, the annotation files use “###” as a delimiter to signify the end of a record. Additionally, to enhance the usability and applicability of the dataset, all data are stored and transmitted in standard formats, enabling researchers to readily use these data for training and testing machine learning models. For further details on the dataset, refer to Table 1, “Metadata about the Dongba1800 dataset.” By providing these detailed data records and formatting specifications, the Dongba1800 dataset not only supports the preservation and research of Dongba script and related cultural heritage but also offers valuable resources for technological development in related fields.

Table 1.

Metadata about the Dongba1800 dataset.

Metadata Item Description
Dataset Name Dongba1800: Single-Character Detection in Dongba Manuscripts Dataset
Dataset Description The dataset contains 1,800 images, including 111,702 characters, written by various Dongba people.
Data Resource Harvard-Yenching Library
Collection Time From July 12, 2024, to August 13, 2024
Data Format The data is stored in image format with resolutions ranging from 1200 × 416 to 1201 × 530 pixels.
Metadata Recording Each image file is accompanied by a TXT metadata file containing the character’s position coordinates.
Cultural Sensitivity The dataset includes significant cultural and linguistic values. When using the data, respect for the uniqueness of Naxi culture and language is required. Avoid misunderstanding and misrepresentation of Naxi culture and language.
Data Sharing and Collaboration Researchers are encouraged to collaborate with the Naxi community and data providers to ensure the accuracy and social impact of research results.

Technical Validation

In this section, we conducted experiments with various mainstream text detection models using the Dongba1800 dataset to demonstrate its potential applications across different technical frameworks. Our objective was to identify the models best suited for handling complex Dongba characters and assess their performance in practical applications. We employed the deep learning framework Pytorch 1.11.0 for our experiments, which is widely recognized for its efficiency in both academic and industrial research. To comprehensively evaluate the performance of the selected models, we utilized multiple metrics, including accuracy, recall, and F1 score. These metrics are crucial for measuring a model’s effectiveness in text detection tasks and accurately reflect the capability of models to detect Dongba characters. By comparing the performance of different models on these metrics, we identified which models are more effective in processing images of Dongba manuscripts. Existing studies have used the Dongba dataset12.

Evaluation metrics

In this study, we employed three widely used metrics to comprehensively evaluate the performance of text detection models on Dongba manuscript images: Precision (P), Recall (R), and the F1 Score (F1). These metrics are crucial for assessing the accuracy of models in identifying Dongba characters, their boundaries, and labels, with correct detections of these elements considered as successful predictions. Precision (P) measures the proportion of instances correctly identified as Dongba among all predictions labeled as such, reflecting the model’s accuracy in predicting Dongba script. Recall (R) measures the proportion of actual Dongba instances that were correctly predicted, indicating the model’s ability to cover all real Dongba instances. The F1 Score is the harmonic mean of Precision and Recall, providing a balanced measure of a model’s performance by considering both accuracy and coverage. The calculation formulas are as follows:

precision=TPTP+FN 1
recall=TPTP+FP 2
Fmeasure=2×recall×precisionrecall+precision 3

where TP (True Positive) refers to the number of instances correctly predicted as Dongba, FP (False Positive) refers to the number of instances incorrectly predicted as Dongba, and FN (False Negative) refers to the number of Dongba instances that were missed. Through these metrics, we can thoroughly understand the model’s accuracy in recognizing Dongba manuscripts in practical scenarios, guiding the optimization of the model to enhance its effectiveness in real-world applications.

Hyper-parameters setting

During the experiments, we utilized an NVIDIA GeForce RTX 3080 Ti graphics card under a Linux system, which has 12 GB of VRAM and was equipped with CUDA 11.3 and Python 3.8 environments. The training process employed the Adam optimizer, with a momentum parameter set to 0.9. Additionally, the initial learning rate was set at 0.001, and we utilized a Poly learning rate decay strategy to optimize training outcomes. The weight decay coefficient was fixed at 0.0001. The batch size for training was set to 8, and the entire model was trained over 40 epochs. This setup ensures that the experiments are conducted under consistent and controlled conditions, facilitating the evaluation of the model’s performance across different settings and configurations.

Comparison models

To thoroughly assess the quality of the developed dataset and its broad applicability, this study employed a range of representative baseline models for evaluation, categorized into traditional models and deep learning-based models. Traditional models such as MSE, MSER13, Sobel14, Canny15, and NMS16 (Non-Maximum Suppression) utilize handcrafted features extracted from text images, such as texture and edges, for learning and prediction. While these models may not be as flexible as deep learning models in handling highly nonlinear data, they still exhibit good performance in simpler or more structured text detection tasks. Deep learning-based models, including FCENet17, PAN18, PSENet19, Mask RCNN20, DBNet21 and its advanced version DBNet++22, and TextBPN23 and its enhanced version TextBPN++24, employ sophisticated techniques to precisely segment text areas. These models are adept at handling complex text arrangements, such as curved, slanted, or densely packed texts, and demonstrate superior performance in datasets with complex backgrounds and diverse text formats. This comprehensive evaluation, including experiment results in Table 2, helps ensure the dataset meets the diverse requirements of real-world applications.

Table 2.

Comparison of Traditional and Deep Learning-Based Models on the Dongba1800 Dataset.

Models Precision(%) Recall(%) F-measure(%) FPS
Traditional Models
MSER 2.84 79.91 5.47 6.74
Sobel 20.01 10.99 13.69 83.72
Canny 31.46 68.03 42.28 65.98
MSE + NMS 38.76 56.75 44.60 11.01
Deep learning-based models
FCENet 76.13 65.60 70.48 0.72
DBNet 85.03 89.13 87.03 1.67
PAN 88.90 86.08 87.47 1.27
PSENet 89.70 85.97 87.79 1.25
TextBPN++ 86.52 89.89 87.93 6.74
TextBPN 89.40 88.50 88.66 6.33
DBNet++ 89.12 89.60 89.36 1.36
MaskRCNN 88.23 90.81 89.50 0.95

We selected several models for a comprehensive evaluation of the Dongba1800 dataset to assess its practicality and effectiveness. The outputs from each model were meticulously analyzed and compared to ensure the dataset meets the broad application requirements. MSER (Maximally Stable Extremal Regions) identifies areas that remain stable across multiple threshold levels, effectively highlighting consistent regions within the text. Sobel edge detection calculates the gradient of brightness to delineate boundaries in Dongba manuscripts, while Canny edge detection employs a dual-threshold strategy to reduce false positives and preserve true text edges. Combining MSE with NMS, MSER first identifies potential text areas, and NMS is then applied post-edge detection to retain only the strongest edges, reducing noise. FCENet uses Fourier transforms to capture periodic and shape characteristics of Dongba text areas, allowing for precise text segmentation. DBNet employs a differentiable binarization method, dynamically adjusting prediction thresholds to finely segment text areas, while DBNet++ introduces more refined feature extraction and optimization algorithms. PAN enhances text feature perception using a pyramid structure and attention mechanisms, while PSENet employs a progressive scale expansion strategy to extend text areas from fine to large scales. TextBPN establishes a text boundary proposal network, optimizing text detection through enhanced feature extraction and refinement processes, and TextBPN++ further enhances model’s detail processing capabilities. Finally, Mask RCNN combines a region proposal network with a fully convolutional network to perform precise segmentation of Dongba text instances.

Visualization

In this section, we will display and analyze the performance of various text detection models on Dongba manuscript images, as shown in Fig. 5. We experimented with several models, including MSER, Sobel, Canny, NMS, FCENet, DBNet, PAN, PSENet, TextBPN, TextBPN++, DBNet++, and Mask RCNN, comparing their detection results with actual labels. Through comparison with actual labels, each model demonstrated its strengths and limitations in detecting Dongba script. MSER exhibited significant overlap in detection frames; the Sobel algorithm, as an edge detection tool, could only identify regular text lines and not individual Dongba characters; the Canny algorithm also showed overlap and misidentified background as text, performing poorly on irregularly shaped Dongba script. NMS tended to split individual Dongba characters into multiple parts and occasionally detected background elements. In contrast, FCENet, while detecting some background and potentially missing some Dongba characters, performed better than traditional detection methods overall. DBNet accurately detected Dongba characters, although it also had issues with segmenting individual characters. PAN was reasonably effective but could only partially detect text in irregular or densely packed Dongba script. PSENet exhibited some overlap and was only partially successful in detecting text in irregular or large Dongba scripts. TextBPN and TextBPN++ demonstrated high adaptability to curved and irregular Dongba scripts, with TextBPN++ showing superior detail handling. DBNet++ performed well on regular Dongba scripts without detecting background, while Mask RCNN, despite its high adaptability to curved and irregular scripts, had limitations in capturing details in dense and complex Dongba texts. These findings provide valuable references for selecting the appropriate model for specific Dongba text detection tasks, enabling researchers to make more suitable choices based on practical needs.

Fig. 5.

Fig. 5

Visualization of Dongba Script Detection by Different Models.

Usage Notes

Our open data is licensed under CC BY 4.0 (the Creative Commons Attribution 4.0 International License), and any use or reference to this dataset must cite this article. This license permits users to distribute, remix, adjust, and build upon the data, provided it is not used for commercial purposes. Researchers utilizing this dataset must link back to this License Agreement and indicate if modifications were made to the original data. We encourage further dissemination of the Dongba1800 dataset and invite more authors to publish their enhanced codes and models, fostering advancements in the field of ancient text detection. This will contribute significantly to research developments in the field of cultural heritage preservation.

Acknowledgements

We acknowledge the contributions of Yilong Liu, Xuan Li, and Jiuzhou Dai from the College of Artificial Intelligence, Chongqing University of Technology, as well as Jiangyuan, Haowei Duan, Xiaolong Wang, Zonglin Wu, and Cheng Huang from the College of Computer Information Science and College of Software, Southwest University for their assistance in labeling the dataset. This work was supported in part by the major research project of the Ministry of Education of the People’s Republic of China on the humanities and social sciences, titled “The Collation, Arrangement, and Research of the Inscriptions on Oracle Bones of Shang Dynasty Based on Database Technology” under Grant No. 22JZD036, and by the Natural Science Foundation of Chongqing under Grant No. CSTC2020jcyj-msxmX0876.

Author contributions

S.C. conceptualized, designed, and coordinated the study. Y.M. drafted the manuscript. Y.M. and Y.L. collected the Dongba script data and supervised the text detection experiments. G.L., R.L., F.H., C.X., Y.W., and M.P. were responsible for data labeling. S.C. and X.L. supervised the labeling process.

Code availability

The Dongba text detection models are freely available online at GitHub (https://github.com/MTVLab/Dongba).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Guoyu, F. & Chaomin, L. The Original Meaning of “Gu” as “Ku” — A Comparative Study of Oracle Bone Script, Bronzeware Script, Seal Script, and Naxi Pictographic Script from Ancient China: A Case Study. Journal of Beijing Normal University (Social Sciences), 7(5), CNKI:SUN:BJSF.0.1982-05-002 (1982).
  • 2.Zhuyi, J. An Overview of the Naxi Language. Minority Languages of China, 15(3), CNKI:SUN:MZYW.0.1980-03-012 (1980).
  • 3.Yongsheng, C. Approaching the ‘Living Pictographic Characters’. Chinese Publishing Journal, 23 (2024).
  • 4.Pinzheng, H. On Modern Dongba Calligraphy and Painting. Ethnic Art Studies, 6(4), 10.14003/j.cnki.mzysyj.1991.04.015 (1991).
  • 5.Anonymous. Dongba ancient scriptures: the world’s memory, humanity’s heritage. China Nationalities Newspaper, 2003-09-23, (008).
  • 6.The Encyclopedia of Ancient Naxi Society: Dongba Ancient Manuscripts, a Memory of the World Heritage[EB/OL]. Yunnan Provincial Local Chronicles Compilation Committee Office, 2023-11-21. https://baijiahao.baidu.com/s?id=1783159936394009066&wfr=spider&for=pc (Accessed 2025-02-28).
  • 7.Harvard University. Dongba Text Digital Edition. Available at https://hollisarchives.lib.harvard.edu/repositories/25/resources/4415 (Accessed: 2024-07-01).
  • 8.Karatzas, D. et al. ICDAR 2013 Robust Reading Competition. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), (2013).
  • 9.Karatzas, D. et al. ICDAR 2015 Competition on Robust Reading. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), (2015).
  • 10.Yuliang, L. et al. Detecting Curve Text in the Wild: New Dataset and New Solution. Available at: https://arxiv.org/abs/1712.02170 ArXiv e-prints, (2017).
  • 11.Ma, Y., Li, Y. & Chen, S. Dongba1800[DS/OL]. V1. Science Data Bank.10.57760/sciencedb.13064 (2024). [Google Scholar]
  • 12.Ma, Y. et al. STEF: a Swin Transformer-Based Enhanced Feature Pyramid Fusion Model for Dongba character detection. Heritage Science12(1), 206, 10.1186/s40494-024-01321-2 (2024). [Google Scholar]
  • 13.Smith, J., Johnson, M. & Doe, E. Robust wide-baseline stereo from maximally stable extremal regions. J. Comput. Vis.24, 123–135, 10.1038/jcv2005.24 (2005). [Google Scholar]
  • 14.Kanopoulos, N., Vasanthavada, N. & Baker, R. L. Design of an image edge detection filter using the Sobel operator. IEEE J. Solid-State Circuits23, 358–367, 10.1109/4.996 (1988). [Google Scholar]
  • 15.Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.8, 679–698 (1986). [PubMed] [Google Scholar]
  • 16.Doe, J. Efficient non-maximum suppression. J. Comput. Vis.15, 102–110 (2021). [Google Scholar]
  • 17.Zhu, Y. et al. Fourier contour embedding for arbitrary-shaped text detection. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 3123–3131 (2021).
  • 18.Wang, W. et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 8440–8449 (2019).
  • 19.Wang, W. et al. Shape robust text detection with progressive scale expansion network. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 9336–9345 (2019).
  • 20.He, K. et al. Mask R-CNN. In Proc. IEEE Int. Conf. Comput. Vis. 2961–2969, 10.1109/TPAMI.2018.2844175 (2017).
  • 21.Liao, M. et al. Real-time scene text detection with differentiable binarization. In Proc. AAAI Conf. Artif. Intell. 34(07), 11474–11481 arXiv:1911.08947 (2020).
  • 22.Liao, M. et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans. Pattern Anal. Mach. Intell.45(1), 919–931, 10.1109/TPAMI.2022.3155612 (2022). [DOI] [PubMed] [Google Scholar]
  • 23.Zhang, S. X. et al. Adaptive boundary proposal network for arbitrary shape text detection. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 1305–1314 (2021).
  • 24.Zhang, S. X. et al. Arbitrary shape text detection via boundary transformer. IEEE Trans. Multimedia26, 1747–1760, 10.1109/TMM.2023.3286657 (2023). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The Dongba text detection models are freely available online at GitHub (https://github.com/MTVLab/Dongba).


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES