Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Mar 19;15:9453. doi: 10.1038/s41598-025-92704-3

Learning features for offline handwritten signature verification using spatial transformer network

Wanghui Xiao 1,2,, Hao Wu 3,
PMCID: PMC11923077  PMID: 40108350

Abstract

Offline handwritten signatures are one of the most common and widely accepted symbols in the field of biometrics and document forensics, and they are frequently used for daily attendance checks, credit card payments, and business contracts to verify an individual’s identity. However, offline signature verification remains a challenging task due to the difficulty in discriminating minute yet significant details between genuine and skillfully forged signatures. To tackle this issue, this paper proposes a two-stage Siamese network model for offline handwritten signature verification using spatial transformer network. It is implemented with two-fold interesting ideas: (a) an efficient spatial transformation network module is introduced to reconstruct the spatial position of handwriting and automatically guide the model to focus on important features while ignoring redundant information, and (b) adopting the Focal loss function to overcome the extreme imbalance between positive and negative signature samples. Experimental results on four challenging handwritten signature datasets with different languages demonstrate that our proposed model outperforms state-of-the-art models in terms of verification accuracy.

Subject terms: Information technology, Computer science

Introduction

As one of the oldest recognized symbols, the handwritten signature has become one of the most common and widely used methods of personal authentication, utilized in various activities such as time clocks, contract signing, credit card payment, and financial transactions. Therefore, handwritten signature verification is a vital task and is considered an important biometric technology in today’s world1,2. In real life, verifying handwritten signatures often relies on specialized organizations and manual expertise, which introduces subjectivity and uncertainty3,4. With the rapid development of pattern recognition and artificial intelligence, automatic signature verification has attracted the attention of many researchers.

Generally, there are two types of handwritten signatures based on the input mode: offline and online. Offline signatures are typically captured using scanners or other optical scanning devices, resulting in static signature images. Online signatures, on the other hand, provide additional information such as stroke order, writing speed, and writing pressure. However, this increases system costs and limits the practical application scenarios. The handwritten signature verification systems can be divided into two categories: writer-dependent (WD) and writer-independent (WI)5,6. The writer-dependent approach trains the model to learn specific signatures for each user. In contrast, the writer-independent approach aims to create a unified model capable of classifying signatures from all users. Therefore, writer-independent scenarios are preferred as they offer more realistic solutions. Hence, writer-independent offline signature verification encompasses the most practical application scenarios and presents the most crucial and challenging tasks.

Over the past several decades, there has been a marked increase in research interest in automatic signature verification, driven by advancements in pattern recognition and image processing technologies7. Numerous models have been proposed for this purpose. Among these, the Siamese neural network model has emerged as a prominent and effective technique that significantly enhances handwritten signature verification technology8. A Siamese neural network consists of two interconnected artificial neural networks that process tasks by comparing the similarities between two samples. This architecture is frequently utilized for classification and recognition challenges. The fundamental principle involves inputting two samples into two identical neural networks that share the same structure and weights. Each network transforms the input sample into a representation in a higher-dimensional space.Nonetheless, many existing methodologies approach offline handwritten signatures as mere image recognition tasks9, which inadequately capture the distinctive writing style of the signer. Furthermore, acquiring a sufficient quantity of both positive and negative signature samples in real-world scenarios proves to be challenging, resulting in imbalanced distributions10. Consequently, the performance of these models is often suboptimal in such contexts.

To overcome this issue, it is worth exploring whether the Siamese neural network model can be enhanced by addressing the imbalances in positive and negative signature distributions. To do this, this study proposes a two-stage Siamese network model for offline handwritten signature verification by incorporating a spatial transformer network. An efficient spatial transformation network module is introduced based on the Siamese neural network to automatically emphasize signature features reflecting handwriting style. This allows the neural network to prioritize processing stroke information. To address the significant imbalance between positive and negative feature samples, the Focal loss function is integrated with the Siamese neural network and spatial transformation network module to enhance focus on stylistic handwriting features. Accordingly, this paper aims to make the following main contributions:

  1. Proposing a two-stage Siamese network model for offline handwritten signature verification. An effective spatial transformation network module is introduced to reflect the handwriting style of the signature by reconstructing the spatial position of the handwriting and auto-focusing the signature features.

  2. Utilizing the Focal Loss to deal with the extreme imbalance between positive and negative offline signatures.

  3. Analyzing the visualization of the feature representation process.

Experimental results on handwritten signature datasets spanning four different languages demonstrate the effectiveness of the proposed method.

The remaining sections of this paper are organized as follows: Section two presents a literature review on signature-related research. Section three provides the details of the proposed approach. Section four evaluates the proposed method through experiments, presents the results, and discusses them. Finally, section five summarizes the paper and suggests avenues for future research.

Related work

Offline handwritten signature verification

Offline handwritten signature verification is a specialized technology used to authenticate the identity of a writer by analyzing their writing skills, habits, and handwriting characteristics11. The main objective of this technology is to ascertain whether the handwriting on documents or physical evidence belongs to the same individual through handwriting identification experiments. Handwritten signatures can be categorized as real signatures or forged signatures. A forged signature is created by a forger and can be further classified as random forging or skilled forging, depending on the forger’s level of expertise in replicating the user’s information. Random forgery refers to the signature written randomly without knowing the user’s information, while skilled forgery refers to the signature imitated by the forger after mastering the user’s signature information and writing characteristics in detail. A skilled forger makes a forged signature highly similar to a real signature by repeatedly practicing the writing pattern of a real signature. Thus, there is a high degree of similarity between skilled forgeries and authentic signatures, since authentic signatures of the same author exhibit significant internal variability. In contrast, the content and writing characteristics of random forged signatures are different from real signatures, making them easy to distinguish. However, the difference between skilled forged signatures and real signatures is small and difficult to distinguish, which poses a huge challenge to offline signature authentication algorithms.

Research in the field of offline signature verification can be traced back to at least the 1970s12. In offline signature verification research, researchers are typically provided with a set of real signatures and forged signatures. The goal of this research is to explore and develop a method or model that can effectively distinguish between genuine signatures and forged ones. Forged signatures are not created by the signer himself and are usually divided into different types. The most common in the literature is a skilled forgery, where the forger attempts to mimic the signature of the signer. Technical forgery recognition is an open pattern recognition problem, which is also the focus of this study. This problem is challenging for several reasons: First, there is a high degree of similarity between a real signature and a skilled forged signature, as the forger usually practices the signature beforehand. Second, in practical applications, we can’t expect all users in the system to encounter skilled forgers. Therefore, the classifier should only be trained using real signatures for wide applications. Additionally, the real sample of each user is usually small. Especially for new users, we may only have 3–5 signatures, which is particularly challenging. Because many users have large intra-class variability, a small number of signatures is not enough to capture the full range of changes. Due to data privacy concerns, research on signature verification has been hampered by the lack of large-scale databases5.

Feature extraction plays a crucial role in the field of pattern recognition and has a direct impact on the overall algorithm performance. In the past decades, scholars have proposed a variety of feature extraction methods for offline signatures. In the early research on signature verification, static features like geometric features13, directional features14, and mathematical ransformations15 were commonly utilized. Simple geometric features depict the general morphological traits of the signature, including its height and width16, the signature region17, the local binary pattern of the local patch18, and the scale-invariant feature transformation19. It is then verified by comparing the distance between the two signatures. Although signature verification research has been limited by the lack of large-scale databases due to data privacy concerns, several studies have successfully demonstrated the feasibility of deep learning models for offline handwritten signature verification. In addition to using global descriptions, some researchers calculate local geometric features, such as pixel density in a grid, through mesh partitioning20. The direction feature is used to describe the direction of each stroke in the signature. The gradient direction probability density is extracted from the gradient of the signature contour21. In addition, the researchers also used a variety of mathematical transformations to extract features. Kumar et al.22 designed a surrounding feature that includes signature shape and texture information and achieved good results. In the literature23, SIFT is used to extract local feature descriptors from the signature to be authenticated and the reference signature, and it is used for signature authentication. Malik et al.24 used SURF to extract local feature descriptors, which were used to calculate the stable region of features to improve the reliability of the algorithm.

In recent years, algorithms based on deep learning have made significant progress in the field of image processing. Deep learning and convolutional neural networks have become the core technologies of pattern recognition and dominate the vast majority of biometric recognition systems25. Convolutional neural networks can automatically learn features from a large number of image samples, which is the main reason why deep learning algorithms excel in image-related fields. This feature learning method is more efficient than manual feature design, and its performance is generally superior. Therefore, in recent years, some methods have applied convolutional neural networks to feature extraction in offline signature authentication26. For example, Khalajzadeh et al.27 used convolutional neural networks to train on Persian signature datasets to extract signature features. Hafemann et al.28 first utilized a convolutional neural network to conduct writer classification training in the training set to optimize network parameters, and then used the optimized convolutional neural network to extract signature features for signature authentication. Additionally, Hafemann et al.28 introduced spatial pyramid pooling in convolutional neural networks to accommodate input images of different sizes and extract uniformly sized features. Xing et al.8 utilized deep convolutional twin networks to integrate feature extraction and similarity measurement in a model for training. They enhanced the feature extraction capability of the networks by combining deep and shallow features, which included anti-authentication networks for verifying independent handwritten signatures and region-based deep convolutional twin networks. On the other hand, Wei et al.29 introduced an inverse discriminant network to extract relevant information from signatures. Danilo et al.30 proposed a two-channel CNN feature extractor, with each channel processing reference signature and query signature separately. In addition, a multi-tasking architecture based on R-SigNet architecture31 is introduced for handwritten signature recognition to reduce feature extraction errors through relaxation loss learning in feature space. A method of Generative Adversarial Network (GAN) model is proposed as a high-quality data synthesis method to address the unreadable data problem on signature verification32. A two-channel and two-stream framework (2C2S) that is based on the transformer structure and utilizes the attention mechanism to capture stroke information efficiently33. These methods have been validated on multiple publicly available handwritten signature datasets.

Focal loss

Focal Loss is an improved cross-entropy loss function proposed by Facebook AI Research to solve the class imbalance problem in object detection34,35. The key is to make the model pay more attention to hard-to-classify samples by reducing the loss of easily classified samples. Specifically, Focal Loss introduced a regulatory factor based on the standard cross-entropy loss function to reduce the loss value of easily classified samples. In image classification tasks, category imbalance is common, that is, the number of samples of some categories is much more than that of other categories, which leads to the bias of most categories and the neglect of a few categories during model training. Focal Loss can reduce the loss value of easily classified samples, allowing the model to focus more on difficult classified samples and effectively address the issue of category imbalance. By utilizing Focal Loss, the model’s recognition performance for certain categories can be significantly enhanced. Focal Loss is not only effective for image classification tasks but also for other tasks that involve dealing with class imbalances, yielding significant results.

Siamese neural network

The Siamese network paradigm, initially pioneered by Bromley et al.36 for signature verification applications, has emerged as a fundamental architecture in metric learning frameworks. Subsequent research has demonstrated its versatility across diverse domains including few-shot learning, document analysis, and biometric authentication. As illustrated in Fig. 1, this architecture comprises dual isomorphic subnetworks with parameter-sharing constraints, implementing symmetrical processing pathways for comparative feature analysis. The network’s computational topology employs twin convolutional neural networks (CNNs) with identical configuration matrices that process distinct input pairs. Through hierarchical nonlinear transformations, these subnetworks project input patterns into a shared embedding space where semantic similarity metrics become linearly separable. This architecture effectively addresses the curse of dimensionality by disentangling complex manifolds present in raw signature representations through successive convolution-pooling operations. Key architectural innovations include weight-tying mechanisms that prevent model degeneration and ensure permutation invariance in input processing. The shared parameter space between twin networks not only reduces computational complexity but also enforces feature consistency across compared samples a critical requirement for robust verification systems.

Fig. 1.

Fig. 1

Architecture of siamese neural network.

Proposed method

The architectural framework of the proposed methodology is systematically illustrated in Fig. 2, comprising four principal components: data preprocessing, spatial transformation network, hierarchical feature extraction, and an adaptive loss formulation. The pipeline initiates with standardized denoising procedures applied to all input handwritten signature images to suppress acquisition artifacts and enhance signal fidelity. These preprocessed images are subsequently processed through parallel computational pathways: a convolutional neural network for baseline feature extraction and a differentiable spatial transformer network for geometric normalization. The CNN backbone generates discriminative feature embedding vectors from both original signatures and their spatially transformed counterparts through successive convolution-pooling operations. A similarity metric is computed between these dual feature representations, with the fusion loss function integrating multi-scale discriminative information to optimize verification decisions. To mitigate class imbalance challenges inherent in signature verification tasks, we implement a focal loss formulation that adaptively recalibrates sample weights through a tunable γ parameter. Notably, our framework preserves verification label consistency (y ∈ {0,1}y ∈ {0,1}) across spatial transformations through style-invariant feature learning. The STN-generated augmented samples maintain original decision labels due to the geometric transformation equivariance of learned representations. All loss components are computed over this label-preserving augmented dataset, ensuring robust decision boundary learning. Subsequent sections provide rigorous formalization of each module, accompanied by ablation studies quantifying component-wise contributions to verification performance.

Fig. 2.

Fig. 2

The architecture of the proposed method.

Preprocessing

Preprocessing plays a crucial role in most pattern recognition problems, greatly impacting the final results. In the case of static handwritten signature datasets, signature images often exhibit variations in pen thickness, background, rotation, and scale. In real-world application scenarios, some collected signature images may have an unclean background. Hence, this study applies Otsu’s method37 for binarization processing on the feature images to extract foreground and background masks. The background pixels are then set to 255 based on the background mask, thereby effectively removing the background. For foreground processing, normalization is employed to mitigate the impact of light and varying image quality. Firstly, a template is created from the background image to standardize its grayscale. This template is then used to normalize the grayscale of the background image. Furthermore, the resolution and size of the obtained handwritten feature image may vary, resulting in differences in the characteristics of the image. To consistently align the positions of feature strokes across all handwritten feature images, the moment normalization method38 is utilized to unify the position and size of the features.

In particular, for an initial image f(x, y) and a standardized image f(x′, y′), the scale map can be defined as:

graphic file with name d33e411.gif 1

where (xc, yc ) and Inline graphic are the center coordinates of the original image and the central coordinate values of the normalized image, respectively. By analyzing the central moment of an image, the handwritten region of an image is obtained as:

graphic file with name d33e428.gif 2

where Hnorm and Wnorm are used to describe the height and width of the standardized image. We set Hnorm and Wnorm to be 115, 220 in experiments, which means that we normalized the size of signature images as 512 × 224 for the following feature extraction. K is a scale factor that determines the standardized size of the font. In the text, we set k to 4.7. The middle moment of the picture is

graphic file with name d33e444.gif 3

In addition, the foreground grey normalization similar to11 is adopted, and the average and standard deviation are set to 30 and 10 respectively. The comparison of signature handwriting before and after pre-processing is shown in Fig. 3.

Fig. 3.

Fig. 3

Comparison of signature handwriting before and after Preprocessing.

Spatial transformer network

A spatial transformer network can perform rotation, translation, scaling, alignment, and other transformation operations on input image information without changing its size39. The structure of the spatial transformation network is shown in Fig. 4. The system can effectively improve the rotation and dimension invariance of the feature information. The basic structure of a spatial transformation network includes (1) Localization Network; (2) Grid Generator; and (3) Sampler.

Fig. 4.

Fig. 4

The structure of the spatial transformation network.

The spatial transformation network module optimizes the deep learning network model to focus on key regions in the signature handwriting image that is critical for the identification of signature features. Each signature handwriting usually has a unique stroke, texture, and color pattern, and the spatial transformation network module can effectively direct the model to focus on these key features. In addition, the spatial transform network module can also suppress noise and interference, as background and irrelevant regions may interfere with feature recognition. The spatial transformation network module mitigated these negative effects by focusing on important areas and eliminating redundant information. The spatial transformation network module also enables distortion-free image transformation, ensuring that the overall structure and content of the signature remain unchanged, which is essential to distinguish different signature features with similar structures and objects. With these capabilities, the spatial transformation network module enables deep neural network models to identify features in signature handwriting more accurately and reliably.

In the field of image processing, the transformation is usually affine. Affine transformation parameters may be expressed by a matrix, denoted as A. The mathematical model of the transformation is as follows:

graphic file with name d33e488.gif 4

where, Inline graphic and Inline graphic are coordinates of input pixels; Inline graphic and Inline graphic, are the coordinates of output pixels; a-f are the 6 parameters contained in affine transformation, which represent the changes in image scaling, rotation, translation, linear distortion, etc. In38, different transformations corresponding to different parameters were discussed in detail.

After learning the parameters of the affine transformation matrix, the key problem is how to obtain the segmentation result differentially through the matrix. This involves designing the sampler to effectively utilize the matrix. The output coordinates in Eq. (4) may be decimal, and the value of integer pixel coordinates can be obtained by bilinear interpolation (using coordinate values of nearby 4-pixel points), and the change is as follows:

graphic file with name d33e529.gif 5
graphic file with name d33e535.gif 6
graphic file with name d33e541.gif 7

where, v is the pixel value corresponding to integer pixel coordinates; v11, v21, v12, and v22 are the pixel values corresponding to the four nearby pixel points(x1,y1) ,(x2,y1) ,(x1, y2) , and(x2,y2) , respectively.

By using bilinear interpolation, the sampling operation can be expressed as generating the output image area:

graphic file with name d33e603.gif 8

where vi is the output signature block in the (Inline graphic, Inline graphic)of the pixel value, and Unm is the input complete signature image pixel values in (n, m). The symbols n and m represent the height and width of the input feature map respectively. It can be found that Eq. (8) is differentiable. The whole network can be trained with the backpropagation algorithm and the optimal network parameters can be obtained by combining the differentiable location network. The spatial transformation network composed of the above three parts can be independently inserted into the neural network structure, and the spatial transformation of feature information can be completed by modifying parameters through continuous training in the network.

The feature extractor

The architecture of the feature extractor module is depicted in Fig. 5. It takes an original signature image as input and produces the signature image’s feature as output. This study utilizes a group of deep convolutional networks comprising two branches of convolutional neural networks sharing the same parameters to learn the feature representation of the handwritten signature. The preprocessed handwriting information is fed into an artificial neural network model, where a set of convolutional neural networks at different levels are employed for handwriting classification. The convolution network incorporates the convolution layer, nonlinear activation layer, maximum pooling layer, and batch normalization layer for image feature extraction. Additionally, a four-level joint convolution operation based on a visual geometric group network is proposed, along with a modified linear unit function, standardization, and set layer. Dropout layers are not used in the best training practices, which hinders learnability at each level.

Fig. 5.

Fig. 5

The architecture of the feature extractor module.

The fourth-order convolution operation better captures the overall and local characteristics of handwriting, as presented in Table 1. Each level has a learnable feature, and the optimal feature is obtained through training. After the completion of each learnable layer, batch normalization and ReLU nonlinear activation are performed. The corresponding operator group cascade convolutional layers consist of 32 channels, 64 channels, 96 channels, and 128 channels respectively. The batch normalization layer serves to normalize the inputs of each level, preventing issues such as gradient explosion or vanishing. This layer is only used during the training phase. The activation layer is responsible for capturing various influential factors and characteristics in a higher-dimensional nonlinear region. All input handwritten signature images share the same set of training parameters. By undergoing rigorous training and continuous learning, this module successfully identifies and extracts the unique stylistic features of the author’s signature, thereby streamlining computational processes and enhancing the overall performance of the model.

Table 1.

Summary of the CNN layers.

Layers Size Other parameters
Input 1 × 115 × 220
Conv2d × 2 (C1) 32 × 115 × 220 kernel_size = (3, 3), stride = 1, padding = 1
Pooling 32 × 57 × 110 kernel_size = 2, stride = 2
Conv2d × 2 (C2) 64 × 57 × 110 kernel_size = (3, 3), stride = 1, padding = 1
Pooling 64 × 28 × 55 kernel_size = 2, stride = 2
Conv2d × 2 (C3) 96 × 28 × 55 kernel_size = (3, 3), stride = 1, padding = 1
Pooling 96 × 14 × 27 kernel_size = 2, stride = 2
Conv2d × 2 (C4) 128 × 14 × 27 kernel_size = (3, 3), stride = 1, padding = 1
Pooling 128 × 7 × 13 kernel_size = 2, stride = 2
Full connection 128 × 7 × 13
Full connection 2048

Loss function

Focus loss35 was originally proposed to solve the object detection scenario with the imbalance between foreground and background classes. The goal is to solve the class imbalance problem by redesigning the standard cross-entropy loss function so that the first-stage target detector can train high-precision models even when there are a large number of easily classified background samples. The proposed focus loss function focuses on a sparse set of difficult-to-classify samples and prevents a large number of easily classified negative samples from overwhelming the detector during training. The modulation factor and the weighting factor are added to the traditional cross-entropy loss function to control the influence of samples on the total loss, to solve the extreme imbalance between different classes. By changing the values of the weighting factor α and the modulating factor γ, the contribution of the positive/negative sample to the total loss can be controlled. Compared with the classical cross-entropy loss, focus loss pays more attention to difficult and misclassified samples.

graphic file with name d33e757.gif 9

where ŷ denotes the predicted value; α is a weighting from [0, 1] and γ is an adjustable focus parameter used to prevent easy samples from contributing too much. Under normal circumstances, when the α value is 0.25 and γ = 2, the coke loss effect is the best, and these parameter values are used in this experiment.

Evaluating the performance of the model through the loss function is the most important link at present, especially in the training stage. Suppose the combination of elements {(xi, si, yi) | i = 1, …, N} is the sample of the training data set, xi is the i-th verification sample, si is the i-th test sample, and si is the genuine verification. The value yi ∈ {0, 1} is (xi, si), where y = 0 indicates the verification sample xi is forged compared to si, and where y = 1 indicates genuine. Through the analysis of the training samples, the two sub-modules of the neural network are optimally designed. This method takes into account not only the contrast of the original data, but also the contrast of the new data after spatial conversion, and the contrast of the original data. For the original handwriting signature (xi, si, yi), F(xi) and F(si) are the outputs of the Feature extraction module.

Where P(F(xi), F(si)) is the predicted feature recognition probability. Because of the positive and negative asymmetry of offline features, a method based on the focus loss function is proposed to describe the loss of offline features. According to the binary classification problem, the consequences of losses Inline graphic can be calculated as follows.:

graphic file with name d33e848.gif 10

Based on the input handwriting features xi and si, the new feature after spatial transformation is defined as x̃i and s̃i. In handwriting recognition, x̃i and s̃i are required to be consistent with the predicted results of the original handwriting signatures (xi, si). Meanwhile, for the converted handwriting, the handwriting signatures maintain the stroke style of the handwriting. In this way, the loss function Inline graphic for Inline graphic the spatial transform network should be determined after.

graphic file with name d33e868.gif 11

We jointly minimize the Inline graphic loss and the Inline graphic loss during training. The total loss of the two parts can be calculated as follows.

graphic file with name d33e888.gif 12

One such hyperparameter is used to balance two loss components and is based on empirical data. In the experimental part of this study, the method of Grid Search is used for hyperparameters, whose values range from 0.2 to 5, and are increased by 0.2 each time. The performance of different combinations is evaluated through cross-validation, and the optimal value of λ, which performs best on the validation set, is finally selected as 3.

Algorithm design

Here, we discuss the learning process of the proposed method, which is demonstrated in Algorithm 1.graphic file with name 41598_2025_92704_Figa_HTML.jpg

Empirical studies

General settings

To evaluate the proposed approach, we used the four available offline handwritten signature datasets summarized in Table 2, namely CEDAR40, BHSIG-Hindi41, BHSIG-Bengali41, and our proposed Chinese signature dataset. These data sets represent four different languages, namely English, Hindi, Bengali, and Chinese, as shown in Table 3. These datasets contain a variety of different signature samples, including real and forged signatures, as well as signatures of different people. Due to the real environment, negative sample pairs are often much more than positive sample pairs, and the data set used is summarized as follows:

Table 2.

Dataset details.

Datasets CEDAR BHSig-B BHSig-H Chinese
Signatures 2640 5400 8640 2000
People 55 100 160 500
Total sample 46,860 99,600 159,360 1500
Languages English Bengali Hindi Chinese
Positive:Negative 276:576 276:720 276:720 840:660

Table 3.

Samples of signatures datasets. Each row contains four signatures. Columns (1) and (2) are genuine signatures and columns (3) and (4) are forged signatures.

(1) (2) (3) (4)
CEDAR graphic file with name 41598_2025_92704_Figb_HTML.gif graphic file with name 41598_2025_92704_Figc_HTML.gif graphic file with name 41598_2025_92704_Figd_HTML.gif graphic file with name 41598_2025_92704_Fige_HTML.gif
BHSig-H graphic file with name 41598_2025_92704_Figf_HTML.gif graphic file with name 41598_2025_92704_Figg_HTML.gif graphic file with name 41598_2025_92704_Figh_HTML.gif graphic file with name 41598_2025_92704_Figi_HTML.gif
BHSig-B graphic file with name 41598_2025_92704_Figj_HTML.gif graphic file with name 41598_2025_92704_Figk_HTML.gif graphic file with name 41598_2025_92704_Figl_HTML.gif graphic file with name 41598_2025_92704_Figm_HTML.gif
Chinese dataset graphic file with name 41598_2025_92704_Fign_HTML.gif graphic file with name 41598_2025_92704_Figo_HTML.gif graphic file with name 41598_2025_92704_Figp_HTML.gif graphic file with name 41598_2025_92704_Figq_HTML.gif

CEDAR40: This dataset contains 55 English signatures, including 24 false signatures and 24 genuine signatures. This gives us a total of 276 genuine-genuine signature pairs (C224 = 276). Additionally, by combining the genuine and false features of each person, we obtained a total of 576 group negative matches.

BHSig26041: This dataset is divided into two subsets, namely the BHSig-Bengali signature set and BHSig-Hindi signature set. They are trained and tested separately. The BHSig-Bengali dataset includes 100 autographed images with Bengali features. Each person contributed 30 fake handwritten signatures and 24 genuine signatures, resulting in C224 = 276 genuine signatures. By combining the genuine and false characteristics of each person, we obtained a total of 720 group negative matches.

BHSig-Hindi41: Another subset of BHSig260, this dataset comprises 160 handwritten feature images of Hindi characters. Similar to the BHSig-Bengali dataset, each person in this dataset contributed 30 fake signatures and 24 genuine signatures, resulting in C224 = 276 genuine signatures. Combining the genuine and false characteristics of each person, we obtained a total of 720 group negative matches.

Chinese signature dataset: To address the absence of a public signature dataset, we have collected a large-scale Chinese handwritten signature dataset in a real-life setting. This dataset is derived from actual handwriting appraisal cases conducted by the judicial appraisal of Southwest University of Political Science and Law from 2009 to 2020. In practical scenarios, handwriting identification typically requires highly similar signatures. Given the variability in the number of handwriting samples obtained during each handwriting authentication process, we have ensured data consistency within the dataset. Specifically, one verified signature handwriting and three authenticated real signature handwriting samples have been collected to form the signature handwriting dataset. The remaining three samples are all written by the same individual. We use the sample signature to confirm that the remaining signatures in the set are written by the same person. The dataset provides different sets of feature samples corresponding to different sampling times. Ultimately, the dataset consists of 500 names and 2000 scanned signature images, comprising 220 negative handwritten signatures and 280 positive handwritten signatures.

To maintain consistency with other state-of-the-art methods, we randomly selected writers for the partitioning of the training and testing datasets in each dataset. All genuine-genuine pairs in the training dataset were used for modeling purposes.

Evaluation metrics

In this study, the positive sample consisted of two real signatures written by the same writer, corresponding to the identification decision label y = 1. The following metrics are used to evaluate the algorithm: False Rejection Rate (FRR): The false rejection rate refers to the proportion of real samples that are wrongly rejected by the system. It measures how many real positive samples the system misses. False acceptance rate (FAR random and FAR skilled): False acceptance rate refers to the proportion of counterfeit or unauthorized samples mistakenly accepted by the system. It measures how many fake or unauthorized samples the system accepts. Accuracy (ACC) is the percentage of samples that the system correctly identifies. Based on different fusions, the above indicators are calculated differently. A lower FRR means that the system recognizes positive samples more accurately, but may cause some to mistakenly accept negative samples. A lower FAR means that the system is more accurate at rejecting fake or unauthorized samples, but may cause some to erroneously reject genuine positive samples. In general, the smaller the FRR or FAR and the higher the ACC, the better the performance. The calculation method is as follows:

graphic file with name d33e1138.gif 13
graphic file with name d33e1144.gif 14
graphic file with name d33e1150.gif 15

Baselines

As shown in Table 4, this study aims to compare the model with various methods (SigNet31, MMDANet30, Ensemble Learning35, CBAM-SigNet42, ISNN43 and DeepHSV44) and three writer-dependent models (Fusion of HTF6, ISNN-Focal36, and Texture Feature45). These methods have achieved certain results in handwriting verification.

Table 4.

Descriptions of all involved models.

Model Description
SigNet The applicant first introduced the Siamese network in 2017 and is widely used in feature recognition
MMDANet An improving attention mechanism and Siamese network model based on the Multi-scale and Mixed Domain Attention mechanism in 2023
CBAM-SigNet The improvement is based on SigNet, which combines the Siamese network with Convolutional Block Attention Module to extract handwritten signature features in 2022
Eensemble Learning In 2019, the depth feature extraction method based on depth data was introduced into a new depth feature extraction method, and an offline feature extraction method based on depth feature extraction was realized
Texture feature A feature recognition algorithm based on texture features was proposed in 2019. For Indian drama, the film also has good results
Fusion of HTF In 2020, signature authentication was introduced. This method makes use of the characteristics of discrete wavelet and partial quantization modes
DeepHSV User-Independent Offline Signature Verification Using a Two-Channel CNN network was proposed in 2020
ISNN A neural network model was proposed in 2022, which improves the Siamese neural network model

The implementation is built on PyTorch version 1.3.1, utilizing the NVIDIA 2080Ti graphics processing unit. Additionally, we introduce a dynamic optimization algorithm that relies on random gradients, with a sample size of 32 batches and a learning rate set at 1e−5. In Eq. (11), The initial value of λ is set to increase by 0.5 each time from 0.5 to 5. The experimental results are then compared with the benchmark model. Furthermore, by denoising the image while preserving the background information, we enhance the feature extraction capability of the image.

Comparison with state-of-the-art models

This study, classified as writer-independent (WI) in the results table, was conducted independently of the author8. Additionally, there are a series of methods marked as writer-dependent (WD), such as31. Writer-independent methods provide a single pattern for all writers, whereas writer-dependent methods provide a pattern for each writer. Despite the high recognition rate achieved by the author-based method, it has high requirements on individual characteristics, necessitates training a model parameter for each individual, and is not suitable for real-world environments. In this paper, we adopt a method independent of the author to train the learning model on a set of data, and constantly optimize and update the learned parameters to form the final model parameter set. The performance of the proposed model is compared with four existing models, as shown in Tables 5, 6, and 7 below. The results show that the recognition effect is best when λ value is 3. In view of the fact that the number of positive samples far exceeds the number of negative samples in most datasets, the focal loss function is used in this study to better balance the difference between positive and negative samples. The experimental results demonstrate that our method enhances accuracy and achieves favorable outcomes in the statistics of FRR and FAR. This represents a significant advancement in the field of handwriting recognition.

Table 5.

Comparison on CEDAR dataset (%).

Method Type FRR FAR ACC
Fusion of HTF WI 12.39 11.23 88.19
MMDANet WI 8.33 8.33 91.67
CBAM-SigNet WD 9.36 7.84 92.16
Ensemble learning WI 8.48 7.88 92.00
ISNN WI 6.87 4.20 95.66
Our method WI 6.81 1.15 97.02

Significant values are in bold.

Table 6.

Comparison of BHSig-Bengali and BHSig-Hindi dataset (%).

BHSig-Bengali BHSig-Hindi
Method Type FRR FAR ACC FRR FAR ACC
MMDANet WD 8.21 8.43 91.57 9.18 9.53 90.47
CBAM-SigNet WD 5.94 8.37 92.84 7.88 19.52 86.32
SigNet WI 13.89 13.89 86.11 15.36 15.36 84.64
ISNN WI 14.25 6.41 90.64 12.29 9.6 88.98
DeepHSV WI 11.92 11.92 88.08 13.34 13.34 86.66
Our method WI 11.97 4.37 94.42 11.14 6.71 91.76

Significant values are in bold.

Table 7.

Comparison on Chinese Dataset (%).

Method Type FRR FAR Acc
MMDANet WI 32. 4 32. 4 68.88
CBAM-SigNet WI 34.32 30.77 64.79
SigNet WI 42.36 42.36 57.64
DeepHSV WI 41.87 41.87 58.13
ISNN WI 32.18 30.59 70.31
Our method WI 26.52 22.73 75.44

Significant values are in bold.

Contrastive analysis

When compared to several existing forecasting methods, this model shows superior forecasting accuracy. In the characteristic data set of CEDAR, the evaluation model proposed in this study achieves a relative error rate of 6.81%, outperforming the existing evaluation model’s relative error rate of 1.15% and the relative error rate of 97.02%. Each evaluation index of our model surpasses those of the existing models. Furthermore, our method exhibits superior prediction accuracy on the BHSIG-Bengali and BHSIG-Hindi Signature databases, with rates of 94.42% and 91.76% respectively, significantly surpassing other methods. Our analysis method offers clear advantages compared to other comparative analyses. Furthermore, the non-writer-dependent approach produces superior results when compared to the writer-dependent approach.

This project proposes a signal enhancement algorithm that is based on a spatial conversion network for reconstructing a set of signal sequences. An effective spatial transformation network module is introduced to reflect the handwriting style of the signature by reconstructing the spatial position of the handwriting and auto-focusing the signature features. Importantly, this research represents the first instance of using Chinese to analyze real-life cases. The findings of this paper hold significant guiding implications for improving and refining the theory and practice of forensic technology. In practical scenarios, handwriting recognition samples often exhibit high similarity and are challenging to distinguish. Furthermore, due to the distinctive characteristics of Chinese handwriting compared to Latin characters, the random features of handwriting exert a significant influence on its characteristics. Chinese character signatures encompass starting and ending strokes, intersecting strokes, and continuous strokes, each corresponding to specific and fixed strokes. Therefore, this paper’s work not only explores the development of Chinese handwriting recognition technology but also contributes to the authenticity testing of Chinese handwritten signatures and the advancement of related theory and technology. The implementation of this project will make a valuable contribution to the field.

Process visualization

The cases of identification errors found in experiments often come from signatures that are highly imitative and have excess handwriting noise. Since the Chinese data set used in this paper comes from real cases in judicial authentication, and each sample comes from highly similar forged or unusual real signatures, the accuracy of the Chinese handwriting data set is lower than that of other data sets. Figure 6 illustrates the feature extraction process used to analyze the characteristics of the signature image. As shown in the diagram, in the early stages of training, the neural network mainly concentrates on learning the texture features of the handwritten images, as depicted in Fig. 6(2–4). As training continues, the features learned become more abstract and start to represent the unique characteristics of the handwritten style. The spatial transformation network module helps to focus on important areas of the signature image, reducing noise and eliminating unnecessary information. It performs distortion-free transformations on the initial image, preserving the overall structure and content of the signature. This allows for more accurate and reliable extraction of subtle features in the handwriting.

Fig. 6.

Fig. 6

(1) Original Signature images, (24), feature extraction in process, (5) Signatures after STN.

Conclusions

Given the subtle differences between genuine signatures, skilled counterfeit signatures, and strict writing environments, signature recognition algorithms should prioritize literal stroke features and abstract features in the recognition process. Building upon this, a two-level recognition method for Siamese handwriting is proposed. Additionally, a handwritten image reconstruction method based on spatial transformation and the adoption of focus loss is introduced. The proposed method exhibits strong recognition capabilities on samples from four languages and particularly demonstrates notable performance on Chinese samples. In comparison to existing methods, this approach significantly improves prediction accuracy. Moreover, further in-depth analysis of Chinese character fonts and handwritten features of Chinese characters can be conducted to enhance the accuracy of offline Chinese handwriting recognition.

Author contributions

W. Xiao and H. Wu conceptualization, methodology. W. Xiao: reviewing and editing. H. Wu.: data curation, sofware. W. Xiao: writing-original draf preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported in part by the Youth Project of Science and Technology Project of Chongqing Education Commission (KJQN202100304), Chongqing, China.

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reason able request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Wanghui Xiao, Email: xiaowanghui007@126.com.

Hao Wu, Email: haowuf@gmail.com.

References

  • 1.Vincent, C., David, B., Florian, H., Andreas, M. & Elli, A. Writer identification using GMM supervectors and exemplar SVMs. Patt. Recognit.63, 258–267 (2017). [Google Scholar]
  • 2.Chen, M., He, C., Luo, X. MNL: A highly-efficient model for large-scale dynamic weighted directed network representation, IEEE Trans. Big Data, 10.1109/TBDATA.2022.3218064 (2022).
  • 3.Khan, F., Tahir, M. & Khelifi, F. Robust offline text in dependent writer identification using bagged discrete cosine transform features. Expert Syst. Appl.71, 404–415 (2017). [Google Scholar]
  • 4.Wei, L., Jin, L. & Luo, X. A robust coevolutionary neural-based optimization algorithm for constrained nonconvex optimization. IEEE Trans. Neural Netw. Learn. Syst.10.1109/TNNLS.2022.3220806 (2022). [DOI] [PubMed] [Google Scholar]
  • 5.Li, H., Wei, P. & Hu, P. AVN: An adversarial variation network model for handwritten signature verification. IEEE Trans. Multim.24, 594–608. 10.1109/TMM.2021.3056217(2021) (2021). [Google Scholar]
  • 6.Lu, X., Huang, L., Yin, F. Cut and compare: End-to-end offline signature verification network, in 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, pp. 3589–3596, 10.1109/ICPR48806.2021.9412377 (2020).
  • 7.Jain, A., Singh, S. K. & Singh, K. P. Handwritten signature verification using shallow convolutional neural network. Multimedia. Tools Appl.79, 19993–20018 (2020). [Google Scholar]
  • 8.Zhang, J. et al. Water body detection in high-resolution SAR images with cascaded fully-convolutional network and variable focal loss. IEEE Trans. Geosci. Remote Sens.59, 316–332 (2021). [Google Scholar]
  • 9.Luo, X., Wu, H. & Li, Z. NeuLFT: A Novel Approach to Nonlinear Canonical Polyadic Decomposition on High-Dimensional Incomplete Tensors. IEEE Trans. Knowl. Data Eng.10.1109/TKDE.2022.3176466 (2022). [Google Scholar]
  • 10.Zois, E. N., Alewijnse, L. & Economou, G. Offline signature verification and quality characterization using poset-oriented grid features. Patt. Recognit.54, 162–177 (2016). [Google Scholar]
  • 11.Bharathi, R.K., Shekar, B.H. Offline signature verification based on chain code histogram and Support Vector Machine. in Proceedings of the International Conference on Advances in Computing, Communications, and Informatics (ICACCI), Mysore, India, pp. 2063–2068 (2013).
  • 12.Maergner, P., Pondenkandath, V., Alberti, M., Liwicki, M. & Riesen, K. F. Combining graph edit distance and triplet networks for offline signature verification. Patt. Recognit. Lett.125, 527–533 (2019). [Google Scholar]
  • 13.Wu, D., He, Q., Luo, X. & Zhou, M. A latent factor analysis-based approach to online sparse streaming feature selection. IEEE Trans. Syst. Man Cybern. Syst.10.1109/TSMC.2021.3096065 (2021). [Google Scholar]
  • 14.Alaei, A., Pal, S., Pal, U. & Blumenstein, M. An efficient signature verification method based on an interval symbolic representation and a fuzzy similarity measure. IEEE Trans. Inform. Forens. Security10(12), 2360–2372 (2017). [Google Scholar]
  • 15.Luo, X., Zhou, Y., Liu, Z. & Zhou, M. Generalized Nesterov’s acceleration-incorporated, non-negative and adaptive latent factor analysis. IEEE Trans. Serv. Comput.15(5), 2809–2823 (2022). [Google Scholar]
  • 16.Jain, A., Singh, S. K. & Singh, K. P. Multi-task learning using GNet features and SVM classifier for signature identification. IET Biom.2, 117–126 (2021). [Google Scholar]
  • 17.Liu, Z., Yuan, G. & Luo, X. Symmetry and nonnegativity-constrained matrix factorization for community detection. IEEE/CAA J. Automat. Sinic.9(9), 1691–1693 (2022). [Google Scholar]
  • 18.Sharif, M., Khan, M., Faisal, M., Yasmin, M. & Fernandes, S. L. A framework for offline signature verification system: Best features selection approach. Patt. Recognit.139, 50–59 (2020). [Google Scholar]
  • 19.Khan, F., Tahir, M. & Khelifi, F. Novel geometric features for offline writer identification. Patt. Anal. Appl.19, 699–708 (2016). [Google Scholar]
  • 20.Bhunia, A. K., Alaei, A. & Roy, P. P. Signature verification approach using fusion of hybrid texture features. Neural Comput. Appl.31, 8737–8748 (2019). [Google Scholar]
  • 21.Luo, X., Liu, Z., Shang, M., Lou, J. & Zhou, M. Highly-accurate community detection via pointwise mutual information-incorporated symmetric non-negative matrix factorization. IEEE Trans. Netw. Sci. Eng.8(1), 463–476 (2021). [Google Scholar]
  • 22.Kumar, R., Sharma, J. D. & Chanda, B. Writer-independent offline signature verification using surroundedness feature. Patt. Recognit. Lett.33, 301–308 (2012). [Google Scholar]
  • 23.Ruiz, V., Linares, I., Sanchez, A. & Velez, J. F. Offline handwritten signature verification using compositional synthetic generation of signatures and Siamese Neural Networks. Neurocomputing374, 30–41 (2020). [Google Scholar]
  • 24.Malik, N. U. et al. Performance Comparison Between SURF and SIFT for Content-Based Image Retrieval . In 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) 214–218. 10.1109/ICSIPA45851.2019.8977732 (Kuala Lumpur, Malaysia, 2019).
  • 25.Hu, J., Chen, Y. Offline signature verification using real Adaboost classifier combination of pseudo-dynamic features. in: Proceedings of the International Conference on Document Analysis & Recognition, Washington, DC, USA, 1345–1349 (2013).
  • 26.Jain, A., Singh, S. K. & Singh, K. P. Signature verification using geometrical features and artificial neural network classifier. Neural Comput. Appl.12, 6999–7010 (2020). [Google Scholar]
  • 27.Khalajzadeh, H., Mansouri, M. & Teshnehlab, M. Persian signature verification using convolutional neural networks. In International Journal of Engineering Research and Technology, Vol. 1. (ESRSA Publications, 2012).
  • 28.Hafemann, L., Sabourin, R. & Oliveira, L. Learning features for offline handwritten signature verification using deep convolutional neural networks. Patt. Recognit.70, 163–176 (2017). [Google Scholar]
  • 29.Wei, P., Li, H. & Hu, P. Inverse discriminative networks for handwritten signature verification. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5757–5765 (Long Beach, CA, USA, 2019).
  • 30.Danilo, A., Manoochehr, J., Luigi, C., Alessio, F. & Marco, R. R-SigNet: Reduced space writer-independent feature learning for offline writer-dependent signature verification. Pattern Recognit. Lett.150, 189–196 (2021). [Google Scholar]
  • 31.Dutta, A., Pal, U., Lladós, J. Compact correlated features for writer independent signature verification. in Proceedings of the International Conference on Pattern Recognition, Cancún, Mexico, 3422–3427 (2016).
  • 32.Li, L., Huang, L., Yin, F. & Chen, Y. Offline signature verification using a region-based deep metric learning network. Patt. Recognit.118, 108009 (2021). [Google Scholar]
  • 33.Dey, S., Dutta, A., J. Toledo., Ghosh, S.K., Pal, U. Signet: Convolutional Siamese Network for Writer Independent Offline Signature Verification. arXiv preprint arXiv:1707.02131 (2017).
  • 34.Kalera, M. K., Srthan, S. & Xu, A. H. Ofline signature verifcat ion and identificationusing dit ance stat isties. Int J. Patt. Recognit. Artuf. Intell.18(07), 1339–1360 (2024). [Google Scholar]
  • 35.Pal, S., Alaei, A., Pal, U., Blumenstein, M. Performance of an off-line signature verification method based on texture features on a large indic-script signature dataset. in 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece, 72–77, 10.1109/DAS.2016.48 (2016).
  • 36.Ferer, M. A., Diaz-Cabrera, M. & Morales, A. Satic sigature synthesis: A neuro-motor inspired approach for biometics. IEEE Tans. Patt. Anal. Mach. Intell.37(3), 667–680 (2015). [DOI] [PubMed] [Google Scholar]
  • 37.Wu, H. et al. A PID-incorporated latent factorization of tensors approach to dynamically weighted directed network analysis. IEEE/CAA J. Automat. Sinica9(3), 533–546 (2022). [Google Scholar]
  • 38.Ho, S. L., Yang, S., Yao, Y. & Fu, W. Robust optimization using a methodology based on cross-entropy methods. IEEE Trans. Magn.47, 1286–1289 (2011). [Google Scholar]
  • 39.Xiao, W., Wu, D 2021 An improved siamese network model for handwritten signature verification. in Proceedings of the 2021 IEEE International Conference on Networking, Sensing, and Control (ICNSC), Xiamen, China, 1, 1–6 (2021).
  • 40.Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. IEEE Trans. Patt. Anal. Mach. Intell.42(2), 318–327 (2020). [DOI] [PubMed] [Google Scholar]
  • 41.Das, S.D., Ladia, H., Kumar, V., Mishra, S. Writer independent offline signature recognition using ensemble learning. In ICDSMLA 2019: Proceedings of the 1st International Conference on Data Science, Machine Learning and Applications (Lecture Notes in Electrical Engineering, 601), 1st Edn., Springer: Berlin/Heidelberg, Germany (2020).
  • 42.Kumar, R., Kundu, L., Chanda, B., Sharma, J.D. A writer-independent off-line signature verification system based on signature morphology. in Proceedings of the 1st International Conference on Intelligent Interactive Technologies and Multimedia, Allahabad, India, (2010).
  • 43.Xiao, W. & Ding, Y. A two-stage siamese network model for offline handwritten signature verification. Symmetry14, 1216. 10.3390/sym14061216 (2022). [Google Scholar]
  • 44.Majidpour, J., Ozyurt, F., Abdalla, M. H., Chu, Y. M. & Alotaibi, N. D. Unreadable offline handwriting signature verification based on generative adversarial network using lightweight deep learning architectures. Fractals31(6), 2340101. 10.1142/s0218348x23401011 (2023). [Google Scholar]
  • 45.Ren, J., Xiong, Y., Zhan, H. & Huang, B. 2C2S: A two-channel and two-stream transformer based framework for offline signature verification. Eng. Appl. Artif. Intell.118, 105639. 10.1016/j.engappai.2022.105639 (2023). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and/or analysed during the current study available from the corresponding author on reason able request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES