Abstract
Timber knot detection is essential for automated grading and quality control in the wood processing industry. Knots, which arise at the intersection of branches and the tree trunk, are among the most influential defects affecting both structural integrity and aesthetics. This paper introduces VNWoodKnot, a publicly available image dataset comprising 1,515 high-resolution wood surface images, collected in a Vietnamese industrial facility. The dataset includes three categories: live knots (519 images), dead knots (496 images), and knot-free surfaces (500 images). Live knots are structurally integrated and color-consistent, while dead knots are darker, cracked, and loosely attached. VNWoodKnot enables both classification and object detection tasks and addresses a critical gap in publicly accessible datasets for AI-driven wood defect inspection. It serves as a crucial benchmark for the development of real-time, scalable, and reliable deep learning models for industrial-grade wood defect inspection.
Keywords: VNWoodKnot dataset, Wood knot detection, Wood knot classification, Computer vision in wood processing
Specifications Table
| Subject | Computer Sciences |
| Specific subject area | Deep Learning, Computer Vision, Image Processing, Image Classification |
| Type of data | Raw Images |
| Data collection | The iPhone 7 Plus was used to take images of Acacia wood surfaces on different days and under various lighting conditions and surface textures. |
| Data source location | Bien Hoa, Dong Nai, Vietnam |
| Data accessibility | Repository: VNWoodKnot: A Benchmark Image Dataset for Wood Knot Detection and Classification Data identification number: 10.17632/vnst548g5n.1 Direct URL to data: https://data.mendeley.com/datasets/vnst548g5n/1 Instructions for accessing these data: The data is divided into three sets: train, test, and validation. Each set contains three folders with pre-processed images in JPEG format. |
1. Value of the Data
-
•
VNWoodKnot comprises >1500 annotated images that reflect realistic wood surface conditions, supporting AI-based visual inspection systems in forestry and wood processing.
-
•
The dataset supports diverse machine learning tasks, including image classification and object detection using modern models such as CNNs, YOLO, etc.
-
•
Annotations include bounding boxes for object detection tasks.
-
•
The dataset is suitable for benchmarking and comparative studies in industrial inspection, sustainability-driven manufacturing, and automation.
-
•
It provides practical value for researchers, engineers, and industry practitioners aiming to enhance quality control, reduce manual labour, and minimize waste.
2. Background
The wood processing industry is vital to many economies, producing furniture, construction materials, and exporting goods. In Vietnam, wood and furniture exports reached approximately US$13.5 billion in 2023, highlighting timber’s strategic economic importance. With abundant forest resources and expanding plantations, Vietnam has significant potential for sustainable timber production to meet global demand. However, timber quality and value are affected by defects like knots, categorized as live knots (firmly embedded, similar in color to surrounding grain) or dead knots (darker, loosely attached, prone to dislodging). These defects impact aesthetics, structural integrity, and grade classification, posing challenges for quality assurance and pricing. Currently, knot detection relies on manual visual inspection, which is labor-intensive, subjective, and inconsistent, lacking scalability for industrial needs. Artificial intelligence (AI), particularly computer vision, offers a promising solution, enabling rapid, objective, and scalable timber surface evaluation, improving inspection efficiency, defect detection accuracy, and resource optimization.
Recent studies highlight the effectiveness of convolutional neural networks (CNNs) and enhanced deep learning models like YOLOv5, with modules such as C3Ghost and attention mechanisms, in detecting wood surface defects, including complex knot patterns [[1], [2], [3]]. Improved ResNet-50 and branched networks have also proven effective in extracting features across diverse wood grain textures [[4], [5]]. However, the lack of large-scale, publicly accessible, and richly annotated datasets for wood knot detection limits model development and deployment. To address this, VNWoodKnot, a dataset of 1515 high-resolution images from a Vietnamese wood processing facility, is introduced, with 519 live knot images, 496 dead knot images, and 500 knot-free images, all annotated for classification and object detection. Unlike prior studies using proprietary datasets focused on general defects [[1], [2], [3], [4], [5]], VNWoodKnot is a publicly available, specialized resource for knot detection, aiming to enhance AI-driven timber assessment, productivity, and quality control.
3. Data Description
The VNWoodKnot dataset was collected in February 2025 at a wood processing facility in Bien Hoa City, Dong Nai Province, Vietnam, a region known for its strong timber and furniture manufacturing industry. It includes 1515 high-resolution images of wood surfaces, captured using an iPhone 7 Plus in a controlled environment to ensure clarity and consistency. Each physical wood sample was photographed only once from a fixed angle, ensuring that each knot in the images is distinct, reflecting the natural diversity of surface defects. All images were taken without background elements, focusing exclusively on the wood surface to minimize visual noise during model training. To prevent data leakage, each image represents a unique wood sample and was assigned to only one subset (training, validation, or test), ensuring no sample overlaps across subsets.
The dataset consists of three categories: knot-free wood (500 images), live knots (519 images), and dead knots (496 images). These categories are organized into three folders named live_knot, dead_knot, and knot_free within each subset (train, val, test), facilitating efficient data handling for classification tasks. These classes are defined based on the visual and structural characteristics commonly used in timber grading standards (see Table 1). Live knots originate from living branches, are firmly integrated into the wood, and typically exhibit colour and texture similar to the surrounding grain. Dead knots result from dead or broken branches, often appear darker, and are loosely attached or detached, making them more likely to compromise the mechanical integrity of the wood. Knot-free wood surfaces display no visible defects and represent the highest quality grade used in premium applications. For machine learning tasks, the dataset is divided into three classes, numerically encoded and organized into corresponding folders: class 0 (live knots) in folder live_knot, class 1 (dead knots) in folder dead_knot, and class 2 (knot-free wood) in folder knot_free. This structure supports efficient loading and label assignment during model training and evaluation.
Table 1.
Wood surface categories in the VNWoodKnot dataset.
| No | Class name | Description | Visualization |
|---|---|---|---|
| 1 | Live Knots (Class 0) | A knot that is formed from a living branch and remains firmly attached to the surrounding wood. It usually has a similar color and texture to the rest of the wood, appearing smooth and integrated into the grain. These knots are structurally sound and less likely to fall out. | ![]() |
| 2 | Dead Knots (Class 1) |
A knot formed from a dead or broken branch. It is often darker in color, has cracks or irregular edges, and is loosely attached or completely detached from the wood. Dead knots are considered defects as they weaken the wood and can fall out over time. | ![]() |
| 3 | Knot-free Wood (Class 2) | Wood surface with no visible knots or defects. It represents the highest quality grade with a smooth, uniform appearance, often used for premium products where aesthetics and strength are critical. | ![]() |
Manual annotations for live and dead knots were carried out by trained technicians using labelImg, a graphical annotation tool that enables the creation of precise bounding boxes to indicate the location and shape of each knot. For every annotated image, a corresponding .txt file is included, containing the bounding box coordinates and class label in a format fully compatible with object detection frameworks such as YOLO. In contrast, images in folder knot_free, labelled as class 2 (knot-free wood), are used exclusively for classification purposes and do not include bounding boxes, as no defects are present. This comprehensive annotation strategy supports both image classification and object detection tasks, facilitating the training, evaluation, and benchmarking of deep learning models for automated visual inspection in the wood processing industry. With a total of 1515 images, the VNWoodKnot dataset offers a valuable resource for developing and evaluating wood knot detection and classification models. While the dataset is sufficiently diverse for these tasks, its scale may be limited for training very deep neural networks from scratch. Therefore, it is particularly well-suited for transfer learning approaches, where models pre-trained on large-scale datasets can be fine-tuned effectively to achieve high performance on wood knot detection and classification.
4. Experimental Design, Materials and Methods
The VNWoodKnot dataset was developed through a systematic data acquisition and annotation process carried out in February 2025 at a wood processing facility in Bien Hoa city, Dong Nai province, Vietnam. The primary objective was to capture high-resolution images of wood surfaces containing various types of knots, under conditions that closely resemble those in real-world industrial settings. Fig. 1 provides an overview of the data preparation pipeline, encompassing key stages such as image acquisition, preprocessing, dataset splitting, and preparation for model training and evaluation.
Fig. 1.
Illustration of the data preparation workflow in the VNWoodKnot dataset.
4.1. Image acquisition and sample preparation
Images were captured using an iPhone 7 Plus, maintaining a consistent distance of approximately 1 m from the wood surface to ensure uniform scale and perspective. All images were captured under the factory’s artificial lighting and were subsequently refined to produce a dataset where wood knots are easily recognizable within each frame. This approach helps augment the dataset with a greater number of input images.
Wood samples were selected from processed timber panels representing three distinct surface conditions: live knots, dead knots, and knot-free wood. Efforts were made to include a range of wood species commonly used in the furniture and construction industries, thereby increasing the dataset's variability and applicability. The final dataset comprises 500 images of knot-free wood, 519 images of live knots, and 496 images of dead knots, resulting in a balanced distribution across the three classes.
4.2. Image preprocessing
To prepare the dataset for deep learning applications, all images were first converted to JPEG format for storage efficiency and compatibility. The source images had a native resolution of 3024×3024 pixels, captured using iPhone 7 Plus with a square sensor. To standardize input dimensions across the dataset, each image was uniformly resized to 1500×1500 pixels without cropping or padding, preserving the original square aspect ratio. This resizing ensured that the geometric structure of the knots and wood textures remained undistorted while reducing computational requirements.
Annotations for live and dead knots were generated using labelImg, a graphical annotation tool, by drawing precise bounding boxes around each knot to capture their locations and shapes. For knot-free wood images, no bounding boxes were created; instead, these samples were labeled as Class 2 and used exclusively for classification tasks.
In object detection tasks, such as annotating live and dead knots in the VNWoodKnot dataset, bounding box information is stored in accompanying .txt files using the YOLO format. Each .txt file typically contains one line corresponding to a single knot annotation. However, 6 images in the dataset contain two knots, resulting in multiple lines within the .txt files for those specific images. Each line in the .txt file represents a bounding box using five values. The first value is an integer denoting the object class (0 for live knots, 1 for dead knots). The next four values describe the bounding box geometry, all normalized to the range [0, 1] relative to the image dimensions: the x- and y-coordinates of the bounding box center, followed by its width and height. This lightweight and widely adopted format ensures compatibility with modern object detection frameworks and supports efficient model training and evaluation. An illustration of these bounding box parameters is shown in Fig. 2.
Fig. 2.
Example of an image with bounding box.
All images were normalized and manually reviewed to ensure annotation consistency and quality. Background elements and external visual noise were deliberately excluded, ensuring that each image focuses exclusively on the wood surface to improve feature extraction and model performance. The annotation files are named identically to their corresponding image files, with only the extension changed from .jpeg to .txt. For example, an image named img_3966.jpg will have an annotation file named img_3966.txt. This consistent naming convention facilitates automated loading and matching of images with their annotations during model training.
4.3. Dataset splitting
The VNWoodKnot dataset comprises three primary classes: live knots (519 images), dead knots (496 images) and knot-free wood (500 images), numerically labelled as ‘0′, ‘1′, and ‘2’, respectively. These images are distributed across training, validation, and test subsets with a nearly balanced split to ensure robust model training and evaluation: live knots include 404 (Train), 52 (Validation), and 63 (Test); dead knots include 396 (Train), 56 (Validation), and 44 (Test); knot-free wood includes 404 (Train), 50 (Validation), and 46 (Test). As shown in Fig. 3, this distribution ensures that each class is adequately represented across all phases of model development.
Fig. 3.
Data distribution across train, test and valid sets.
4.4. Baseline performance for wood surface classification
In this study, we employed DenseNet201 as the baseline model for evaluating wood surface classification performance. Fig. 4 presents the training and validation curves of the DenseNet201 model for the wood surface classification task. The accuracy plot shows that both training and validation accuracy steadily improve over the course of 50 epochs, with the validation accuracy closely following the training trend—indicating good generalization and minimal overfitting. Similarly, the loss curves demonstrate a consistent downward trend, further confirming stable model convergence.
Fig. 4.
Training and validation loss and accuracy curves of DenseNet201 model.
Upon completion of training, the DenseNet201 model achieved an accuracy of 94.76 %, an F1 score of 94.96 %, precision of 94.76 %, and recall of 94.83 %. These results establish DenseNet201 as a strong baseline model for wood surface classification using the VNWoodKnot dataset.
Limitations
While the VNWoodKnot dataset offers a valuable foundation for training and evaluating wood knot detection models, several limitations should be acknowledged:
-
•
Single Wood Species: The dataset was collected exclusively from Acacia wood (locally known as "tràm"), which is widely used in Vietnamese wood processing. As a result, it may not fully capture the color and texture variability of knots found in global timber species. This may limit the model’s ability to generalize to knots in different wood types.
-
•
All images in this dataset were captured using an iPhone 7 Plus camera. While the image quality is adequate for the intended tasks, it is important to note that results may vary when using images obtained from industrial-grade vision systems, due to differences in optics, sensors, and imaging conditions.
-
•
Controlled Imaging Conditions: All images were captured under controlled lighting and background-free environments. Although this ensures consistency and reduces noise during training, it may limit the model’s robustness and generalization to real-world production settings where lighting and backgrounds vary significantly.
-
•
Fixed Image Resolution: Images were standardized to a resolution of 1500 × 1500 pixels. This uniform resizing may lead to a loss of fine-grained detail, particularly for small or subtle knot features, potentially impacting detection precision.
-
•
Annotation Granularity: Current annotations are limited to bounding boxes. The absence of finer labeling methods such as pixel-level segmentation or knot severity classification may constrain the dataset’s applicability in tasks requiring detailed structural analysis or defect grading.
-
•
The current version of the dataset provides only bounding box annotations without pixel-level segmentation or defect severity grading. To address this limitation, we plan to expand the VNWoodKnot dataset in the future by incorporating detailed segmentation masks, severity annotations, and additional wood species. These enhancements will further support advanced computer vision tasks such as fine-grained defect analysis, quality grading, and more robust model development for industrial applications.
Ethics Statement
All research procedures adhere to ethical principles. The study does not involve humans, animals, or data from social media. All data used in the research is publicly available, and we strictly follow proper citation guidelines.
CRediT authorship contribution statement
Vinh Tran: Data curation. Duy Lam: Writing – original draft. Tuong Le: Supervision, Writing – review & editing.
Acknowledgements
We express our gratitude to the staff at the wood processing facility in Bien Hoa City, Dong Nai Province, Vietnam, for their cooperation during data collection.
Declaration of Competing Interest
We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability
References
- 1.Ehtisham R., Qayyum W., Camp C.V., Plevris V., Mir J., Khan Q.U.Z., Ahmad A. Classification of defects in wooden structures using pre-trained models of convolutional neural network. Case Stud. Construct. Mater. 2023;19 [Google Scholar]
- 2.Xu J., Yang H., Wan Z., Mu H., Qi D., Han S. Wood surface defects detection based on the improved YOLOv5-C3Ghost with SimAm module. IEEE Access. 2023;11:105281–105287. [Google Scholar]
- 3.Han S., Jiang X., Wu Z. An improved YOLOv5 algorithm for wood defect detection based on attention. IEEE Access. 2023;11:71800–71810. [Google Scholar]
- 4.Zou X., Wu C., Liu H., Yu Z. Improved ResNet-50 model for identifying defects on wood surfaces. Signal Image Video Proc. 2023;17(6):3119–3126. [Google Scholar]
- 5.Wang X. Detection of natural wood defects with large color differences based on branched network. Multimed. Tool. Appl. 2023;82(29):44719–44739. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







