Abstract
Mammogram image analysis has benefited from advancements in artificial intelligence (AI), particularly through the use of Siamese networks, which, similar to radiologists, compare current and prior mammogram images to enhance diagnostic accuracy. One of the main challenges in employing Siamese networks for this purpose is selecting an effective distance function. Given the complexity of mammogram images and the high correlation between current and prior images, traditional distance functions in Siamese networks often fall short in capturing the subtle, non-linear differences between these correlated features. This study explores the impact of incorporating non-linear and correlation-sensitive distance functions within a Siamese network framework for analyzing paired mammogram images. We benchmarked different distance functions, including Euclidean, Manhattan, Mahalanobis, Radial Basis Function (RBF), and cosine, and introduced a novel combination of RBF with Matern Covariance. Our evaluation revealed that the RBF with Matern Covariance consistently outperformed other functions, emphasizing the importance of addressing non-linearity and correlation in this context. For instance, the ResNet50 model, when paired with this distance function, achieved an accuracy of 0.938, sensitivity of 0.921, precision of 0.955, specificity of 0.958, F1 score of 0.930, and AUC of 0.940. We observed similarly strong performance across other models as well. Furthermore, the robustness of our approach was confirmed through evaluation on a dataset of 30 cross-validation samples, demonstrating its generalizability. These findings underscore the effectiveness of non-linear and correlation-based distance functions in Siamese networks for improving the performance and generalization of mammogram image analysis. All codes used in this paper are available at https://github.com/NabaviLab/Benchmarking_Distance_Functions_in_Siamese_Networks
Index Terms—: siamese networks, distance functions, non-linearity, correlation, radial basis function
I. Introduction
Breast cancer, affecting both women and men, originates in the ducts or lobules of the breast [1], [2]. Symptoms include lumps, shape changes, or nipple discharge [3]. Early detection through regular mammograms greatly improves treatment success [4]. Mammography, along with clinical and self-exams, remains the most effective method for detecting early-stage tumors [5], [6]. Computer-aided detection (CAD) systems and recent advances in AI further enhance mammogram analysis, improving accuracy and aiding radiologists in diagnosis [7]–[9]. Siamese networks [10] have shown promise in mammogram analysis by comparing current and prior images, similar to how radiologists detect subtle changes over time. This can enhance the accuracy of breast cancer diagnosis. However, challenges remain, such as selecting effective distance functions—traditional measures like Euclidean or Manhattan often fail to capture complex non-linear relationships in mammograms. Additionally, the need for large annotated datasets, variability in imaging protocols, and potential biases hinder the broader adoption of AI-based mammogram analysis [11].
Recent AI models improve the analysis of mammogram images by integrating patient history and examining changes in images over time. In this case, Lee et. al. [12] introduced a novel technique called PRIME+ to enhance the accuracy of breast cancer risk prediction. The method incorporates prior images and utilizes the transformer decoder. Their findings exhibited superior performance compared to the state-of-the-art risk prediction technique, which solely relies on mammograms taken at a single moment in time. Moreover, Park et al. [13] designed a deep learning algorithm that leverages prior screening examinations to enhance diagnostic accuracy. Their model is constructed so that it can utilize prior examinations to aid in making a diagnosis. More precisely, their models require two screening exams as input, with each exam consisting of four images. In the end, the associated model generates predictions for the presence of benign or malignant findings in the more recent exam for each corresponding image pair. Furthermore, Bai et al. [14] designed a Siamese-based network [10] for predicting cancer probabilities with high-resolution mammograms. Their model, using one-shot learning, included the concatenation of two distance functions, Euclidian and point-wised. Their model outperformed various models, such as the original Siamese network [15].
While the Siamese network has been studied in mammogram image analysis, the impact of non-linearity and correlation between feature maps on the network’s performance has not been fully explored. A thorough examination of these factors is essential for enhancing the accuracy and robustness of mammogram image analysis systems. In this paper, we address these gaps by analyzing and evaluating commonly used distance functions, and also introducing a new one within the Siamese network [10] framework. Building on the context outlined above, our main contributions are:
We propose and implement non-linear distance functions, specifically the Radial Basis Function (RBF), within the Siamese network to better capture complex relationships in breast cancer image data.
We introduce a novel correlation-based distance function that combines RBF with Matern covariance to account for both statistical dependencies and nonlinear relationships between feature maps.
We benchmark a mixture of commonly used non-linear and linear distance functions for the Siamese network to provide the most effective similarity measurement for complex and correlated medical images.
We evaluate the generalizability of the distance functions using a completely different dataset that was not used during training by applying cross validation.
II. Methods and Datasets
Our method introduces a Siamese network [10] classification framework with four primary steps: i) preprocessing to prepare and enhance the mammogram images; ii) a backbone model consisting of two identical (twin) networks to extract relevant features from the current and prior images; iii) a distance function to compare the features extracted by the backbone model and measure the similarity or dissimilarity between them; and iv) a classification layer that learns the differences between feature maps (inter-image features) for final classification. In this section, we discuss the theoretical foundations of these steps and explore the impact of non-linearities and correlations on the distance function. Figure 1 and Figure 2 show a flowchart of the proposed method.
Fig. 1:

Flowchart illustrating the steps of the proposed method, where and are the feature vectors extracted from the current and prior images, respectively.
Fig. 2:

Flowchart showing the steps of applying different distance functions with two scenarios of (a) with and (b) without fully connecte (FC) layers, where is the summation of neurons output and is the sigmoid function.
A. Preprocessing
1). Cropping:
Our preprocessing, as indicated in Figure 1 (step 1), begins with cropping extra black background and zero padding for having square images, preserving essential information and avoiding deformation. Let represent the image matrix, we use binary vectors and to indicate the presence of an object in each row and column, respectively: for all for all . To determine the bounding box coordinates, we define , and as:
| (1) |
Then, we consider the cropped image as . Let and be the width and height of the cropped image, respectively. To ensure the image is square, padding is added to either side based on the position of the object’s center in the x-dimension, , which is calculated as :
| (2) |
here, is defined as . This padding ensures that the image remains square, which is important for maintaining consistency during subsequent classification and analysis.
2). Removing Artifacts:
To remove artifacts, such as extraneous texts and notations that are often present in medical images, from , we apply a binary threshold to create , and then invert it as . After that, we define and as vectors of non-zero column and row counts as . We identify the first and last non-zero indices in and as:
| (3) |
Next, we crop the image to the bounding box as follows:
| (4) |
We then invert and find contours by . We identify the largest contour by creating a mask of zeros and draw on . We then apply the mask to the cropped image to obtain as . These preprocessing steps ensure a consistent and artifact-free dataset suitable for our model training.
B. Backbone Network
For our classification model, we use the ResNet [16] and VGG [17] architectures, shown in Figure 1 (Step 2), as the backbone network. As described in [14], to address the challenges of training deep neural networks on limited medical image datasets, we utilize transfer learning. We start with ResNet [16] and VGG [17] models pre-trained on the ImageNet dataset [18], then fine-tune the backbone network using publicly available mammogram datasets that include only current images. The model receives high-resolution mammogram images (1024 × 1024 pixels) as input. To apply ResNet50 to (1024 × 1024 pixels) images, we modify its architecture by removing the global average pooling layer and the fully connected (dense) layer that maps to the 1000 ImageNet classes. These are then replaced with two new fully connected (FC) layers: the first with 512 neurons and the second with 256 neurons, both using ReLU activation functions [19]. The output layer consists of a single neuron with sigmoid activation for binary classification, predicting the probability of malignancy in the input mammogram. This architecture is used as the backbone for the twin networks in the Siamese model, as shown in Figure 1 (Step 2). These identical networks process the current and prior mammogram images independently, extracting feature vectors from the current image and from the prior image, which are then compared using a distance function. Table I shows the training configuration for the backbone network.
TABLE I:
Backbone Network Hyperparameter’s Configuration
| Hyperparameter | Value |
|---|---|
| Input Image Size | 1024 × 1024 pixels |
| Batch Size | 16 |
| Learning Rate | Adjusted using a cosine learning rate scheduler |
| Dropout Rate | Within the range of 0.2 to 0.6 in the FC layers |
| Optimizer | ADAM optimizer with standard parameters |
| Regularization | L1 and L2 regularization |
C. Distance Function
The original Siamese network architecture, as proposed by [10], typically employs Euclidean distance to compare the feature representations extracted from pairs of images. However, in this study, we explore a broader range of distance functions to determine their effectiveness in the context of mammogram image analysis. Specifically, we benchmark several distance functions, including Euclidean, Manhattan, Mahalanobis [20], RBF [21], cosine [22], and RBF combined with Matern Covariance [23]. Additionally, inspired by the approach in [14], we enhance these distance metrics by incorporating point-wise differences between and as shown in Figure 2 (Step 3). This combination of point-wise distances with the selected distance metrics allows us to capture localized variations between the current-year image and the prior-year image , providing a more comprehensive distance feature for classification.
1). Mahalanobis Distance Function:
The Mahalanobis distance [20] between two feature vectors and is given by:
| (5) |
where is the covariance matrix of the feature vectors. To ensure numerical stability, we regularize the covariance matrix by adding a small to its diagonal as . To efficiently compute the Mahalanobis distance, we can use Cholesky decomposition [24]. Cholesky decomposition is a factorization of a positive-definite matrix into the product of a lower triangular matrix and its transpose as , where is a lower triangular matrix. Using the Cholesky decomposition, we can reformulate the distance computation as follows:
| (6) |
In this case, instead of directly computing the inverse of the covariance matrix, we solve the linear system . This step leverages forward substitution, which is more numerically stable and efficient than matrix inversion. Specifically, for the lower triangular matrix , we solve for the vector :
| (7) |
The forward substitution process involves solving for each element iteratively:
| (8) |
In the end, the Mahalanobis distance is obtained by computing the Euclidean norm of the transformed vector :
| (9) |
This approach converts Mahalanobis distance calculations into matrix factorizations and linear system solutions, improving computational efficiency and numerical stability. By considering feature correlations and adjusting for variance differences, the Mahalanobis distance provides more precise measurements, making it a reliable and meaningful metric for tasks involving significant feature interactions.
2). Cosine Distance:
Cosine similarity [22] is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. Formally, given two feature vectors and , the cosine similarity is defined as:
| (10) |
where denotes the dot product of and , and and are the Euclidean norms of the vectors and . The dot product of the feature vectors is computed as follows:
| (11) |
where is the index of the components of the feature vectors, and is the dimensionality of the feature vectors. The magnitudes (norms) are determined by:
| (12) |
Using the dot product and magnitudes, the cosine similarity is given by:
| (13) |
Finally, the cosine distance [22] is derived as – Cosine Similarity. The cosine distance, as one of the distance functions in our Siamese model, enhances the model’s ability to measure similarity by focusing on the angle between the feature vectors extracted from the current and prior images. This approach makes the model less sensitive to variations in the overall magnitude of the features and more sensitive to the relative orientation and alignment of patterns within the images.
3). RBF Kernel:
The RBF kernel [21] is a commonly used method to measure similarity in high-dimensional spaces, such as in image recognition, function approximation, and support vector machines for classification tasks. The RBF kernel is defined as:
| (14) |
where is a parameter that controls the spread of the kernel and is set to 0.5 in our experiments. The use of the RBF kernel in a Siamese network has been shown to be effective in measuring distances in the semantic space between images, as demonstrated by Schlagenhauf et al. [25], who combined Siamese networks with RBF networks to perform classification without the need for pretraining. However, while the RBF kernel is effective in high-dimensional space, it assumes a uniform level of smoothness across the entire space. This limitation can lead to less accurate distance measurements in scenarios where the smoothness of the data varies, such as in complex medical images.
4). RBF with Matern Covariance Function:
To address the limitations of the RBF kernel, we introduce a novel distance function that combines the RBF kernel with the Matern covariance function [23]. The Matern covariance function is a generalization of the RBF kernel that introduces an additional parameter to control the smoothness of the function, making it more adaptable to variations in data smoothness. The Matern kernel is defined as:
| (15) |
where is the smoothness parameter (set to 1.5), is the length scale parameter (set to 1), is the modified Bessel function [26] of the second kind, and is the Gamma function [26]. While the RBF kernel enhances the importance of nearby similarities by exponentially amplifying differences, the Matern covariance function further improves its flexibility by incorporating a parameter for smoothness. This combination allows the distance function to adjust to different levels of data smoothness and provides a more nuanced measure of similarity, particularly in complex, real-world data such as mammogram images. To integrate the and into a single distance metric, we define a hybrid distance function as follows:
| (16) |
where is a weighting parameter that balances the contributions of the RBF and Matern covariance functions. This hybrid approach leverages the strengths of the RBF kernel in high-dimensional spaces while overcoming its limitation of assuming uniform smoothness, making the distance metric more adaptable to the intricate patterns found in medical images.
5). Point-Wise Distance:
In addition to calculation a distance value using the metrics explained above, we also compute a point-wise distance vector , which captures localized variations between the current and prior images. Point-wise subtraction is particularly effective because it allows the model to detect subtle, pixel-level differences that may not be captured by more global distance metrics. This approach enhances the model’s ability to identify small but clinically significant changes, such as the growth of a tumor or the appearance of new microcalcifications, which are critical for accurate diagnosis in mammogram image analysis. To calculate , we perform an element-wise subtraction of the feature vector from as,
| (17) |
The output is a vector of the same size as the input feature vectors and . Upon the calculation of and , the final distance feature is formed by concatenating the scalar distance with the point-wise distance vector and defining the final distance vector as:
| (18) |
where denotes the concatenation of and .
6). Normalization and Final Considerations:
To ensure consistency between the various distance functions used in our study, we apply a normalization step to each distance metric. Specifically, after calculating the distances , and , we normalize each resulting distance value using min-max scaling. This approach maps each distance to a range of [0, 1], ensuring that all distance metrics contribute equally to the final analysis, regardless of their original scale or distribution. This normalization step is crucial for maintaining consistency across different distance functions, allowing for a fair and unbiased comparison in our Siamese network-based analysis. By standardizing the distances in this manner, we ensure that the combined distance feature accurately reflects the similarities and differences between feature vectors. In the end, we set the similarity decision threshold equal to 0.5, as closer to 0 shows similarity (normal) and closer to 1 indicates disimilarity (abnormal).
D. Evaluation of Siamese Network with Distance Vector
As shown in Figure 2 (Step 4), in our benchmarking, we use the as the input for two scenarios: i) employing a FC layer (Figure 2 (Step 4 (a))) and ii) transforming distance vector through a fully connected layer into a single neuron with sigmoid activation (Figure 2 (Step 4 (b))) [27].
1). FC Classification Layer:
The concatenated feature vector is fed into a FC layer. The FC layer, as described in [14], learns to map the combined features to a higher-level representation. This is followed by a sigmoid activation [27] function to predict the probability of abnormality:
| (19) |
where and are the weights and bias of the FC layer, respectively.
2). Classification Layer with Sigmoid Activation:
Alternatively, the concatenated feature vector is directly transformed into a probability score through a single neuron with a sigmoid activation function. This method simplifies the model by bypassing the need for an additional FC layer while still providing a probabilistic output. The prediction is given by , where and are the weights and bias of the single neuron network. In this case, a higher output value (closer to 1) indicates abnormal, while a lower output value (closer to 0) indicates normal. Both methods allow us to evaluate the effectiveness of the combined feature vector in predicting abnormality, providing insights into the performance of more complex versus simpler model architectures. In the end, for our benchmarking, we used the same loss function method as described in [14].
E. Dataset
We used a combination of public and private datasets for training the backbone and Siamese network.
1). Backbone Dataset:
We pre-trained our backbone models (ResNet and VGG) using the following public datasets, which include only current patient images:
DDSM [28]: Comprises mammograms from 2,620 cases, including cancerous, benign, and normal findings, for studying computer-aided detection and diagnosis.
CMMD [29]: A collection of mammograms from Chinese women, supporting research in breast cancer detection, classification, and diagnosis.
BCS-DBT [30]: Contains 2D reconstructed images from 3D mammographic images for improving breast tissue visualization and early cancer detection.
2). Siamese Network Dataset:
We trained, tested, and validated the Siamese network using the following datasets, which include patient history images:
UCHC: A private dataset with 1,600 pairs of 2D full-field digital mammograms from the Radiology Department of UCHC.
EMBED [31]: The dataset consists of 3.4 million images from 110,000 patients, where 20% of the data are publicly available. This dataset ensures racial diversity. For our study, we used a subset of 150 image pairs of current and prior mammograms.
Breast Masses Dataset [32], [33]: Consists of 40 pairs of sequential mammograms from routine screenings, totaling 160 images. We used 30 pairs for cross-validation and evaluating the generalization of our models.
To maintain consistency across the datasets, we only considered the normal and abnormal classes for training, including 4,640 normal and 4,533 abnormal samples for the backbone model, along with 1,620 pairs of images for the Siamese network. The dataset was split into 70 % for training, 15% for validation, and 15% for testing in both datasets used in the Siamese network. We applied preprocessing, as explained in Section II-A, to all datasets, resulting in a final resolution of 1024 × 1024 pixels. Hyperparameter tuning was conducted using the validation set, with optimal values including a learning rate of 0.001 with a step decay scheduler, a batch size of 32, and 50 epochs. These hyperparameters were used to optimize model performance.
III. Results and Discussion
In our benchmarking, we evaluate distance functions through three primary steps: i) We assess our distance functions across four different backbones; ii) We evaluate different distance functions by considering only and excluding FC layers; iii) Finally, we evaluate in the two scenarios: with and without FC layers. In this section, we discuss our results under various conditions, focusing on key metrics such as accuracy, sensitivity, precision, specificity, F1 score, AUC, and computation time in seconds (s). All experiments were conducted on Google Colab Pro+ [34] with an A100 GPU and high RAM.
1). Backbone Evaluations with Different Distance Functions:
The role of the backbone in the Siamese network is crucial, as the extracted feature vectors are fed into the distance function for further processing. In our study, we consider four different backbones: ResNet50, ResNet101, VGG16, and VGG19, along with Euclidian without FC. Our experiments demonstrate that ResNet50 outperforms the other backbones. We observed the same behavior using the other distance metrics (the results are not shown here). For the dataset used in this study, ResNet50 is particularly suitable due to its balanced complexity; while ResNet101 is computationally intensive, and VGG16 has a relatively low model capacity. This balance makes ResNet50 an optimal choice, a finding that is also supported by the work of [14], where the advantages of ResNet50 in similar contexts were highlighted. As shown in Table II, among the four different backbones, the Euclidean distance function performed the best for ResNet50, achieving the highest accuracy of 0.880, along with superior results in other evaluation metrics. In contrast, VGG16 received the lowest scores across all evaluation metrics. Despite VGG16’s faster training time of 8800 seconds, ResNet50’s training time of 11,230 seconds, which is less than that of ResNet101 and only slightly higher than VGG19, confirms its balanced compatibility and evaluation capability.
TABLE II:
Comparison of backbone test results for ResNet50, ResNet101, VGG16, and VGG19 using the Euclidian distance function.
| Backbone models / Evaluation Metrics | Accuracy | Sensitivity | Precision | Specificity | F1 | AUC | Time (s) |
|---|---|---|---|---|---|---|---|
| ResNet50 | 0.880 | 0.860 | 0.875 | 0.880 | 0.865 | 0.890 | 11,230 |
| ResNet101 | 0.876 | 0.855 | 0.865 | 0.870 | 0.860 | 0.880 | 14,050 |
| VGG16 | 0.860 | 0.840 | 0.850 | 0.855 | 0.835 | 0.860 | 8,800 |
| VGG19 | 0.870 | 0.850 | 0.860 | 0.865 | 0.850 | 0.875 | 9,100 |
2). Distance Functions Evaluation:
We benchmarked various distance functions using different backbone models. As ResNet50 consistently yielded the best results across all evaluation metrics, we present the results specifically for the ResNet50 backbone when evaluating different distance functions. This evaluation was conducted without the use of FC layers to carefully assess the performance of these distance functions within the Siamese network. Table III shows that the RBF with Matern covariance distance function is the top performer with the ResNet50 backbone, achieving the highest accuracy (0.893), sensitivity (0.878), precision (0.895), specificity (0.900), F1 score (0.880), and AUC (0.910). Despite its longer computation time (13,010s), its superior accuracy and robustness justify the cost. Mahalanobis also performed well, matching RBF with Matern in accuracy and AUC but with slightly lower sensitivity and precision. It has a shorter computation time (12,200s), making it a viable alternative when efficiency is crucial. The Euclidean function offered good accuracy (0.880) with a faster time (11,230s) but lagged in AUC and F1 score. Cosine and Manhattan had lower performance overall, with Cosine being quicker (11,550s) but less accurate (0.871). In summary, RBF with Matern covariance is the best choice, balancing accuracy and robustness, albeit with higher computational demands.
TABLE III:
Performance comparison of distance functions using the ResNet50 backbone. The ± indicates the margin of error.
| Distance Function | Accuracy | Sensitivity | Precision | Specificity | F1 | AUC | Time (s) |
|---|---|---|---|---|---|---|---|
| Euclidian | 0.880 ± 0.02 | 0.860 ± 0.02 | 0.875 ± 0.02 | 0.880 ± 0.02 | 0.865 ± 0.02 | 0.890 ± 0.02 | 11,230 |
| Manhattan | 0.873 ± 0.03 | 0.850 ± 0.03 | 0.865 ± 0.03 | 0.870 ± 0.03 | 0.850 ± 0.03 | 0.885 ± 0.03 | 11,890 |
| Mahalanobis | 0.893 ± 0.01 | 0.875 ± 0.02 | 0.890 ± 0.02 | 0.895 ± 0.02 | 0.870 ± 0.02 | 0.910 ± 0.01 | 12,200 |
| RBF | 0.889 ± 0.02 | 0.870 ± 0.02 | 0.885 ± 0.02 | 0.890 ± 0.02 | 0.870 ± 0.02 | 0.900 ± 0.02 | 11,780 |
| Cosine | 0.871 ± 0.02 | 0.850 ± 0.02 | 0.865 ± 0.02 | 0.870 ± 0.02 | 0.850 ± 0.02 | 0.880 ± 0.02 | 11,550 |
| RBF + Matern Covariance | 0.893 ± 0.01 | 0.878 ± 0.01 | 0.895 ± 0.01 | 0.900 ± 0.01 | 0.880 ± 0.01 | 0.910 ± 0.01 | 13010 |
3). FC Evaluation:
The final step in our benchmarking process involves evaluating the impact of using FC layers after the distance function . Specifically, we used the approach, as explained in Section II-C. As described above, we tested two scenarios: in the first scenario, is fed into the FC layers, while in the second scenario, is directly fed into single neurons for final classification. As shown in Table IV, the RBF with Matern covariance without FC layers outperformed all other configurations, achieving the highest accuracy (0.938), precision (0.955), specificity (0.958), F1 score (0.930), and sensitivity (0.921). This setup not only delivered superior feature extraction and evaluation but also proved more efficient, with a shorter computation time (13,450s) compared to the FC version (14,550s). The addition of FC layers did not improve results because they can introduce unnecessary complexity, potentially leading to overfitting and diminishing the model’s ability to generalize from the feature representations extracted by the backbone. While Mahalanobis and Euclidean functions also performed well, they did not surpass the metrics achieved by RBF with Matern covariance without FC. Building on our previous success with models without FC layers and using ResNet50 as the backbone, we conducted cross-validation with different distance functions, excluding FC layers. This cross-validation aimed to assess the accuracy of our models when applied to new data. The evaluation, illustrated in Figure 3, showed that the RBF with the Matern Covariance function consistently yielded the highest performance metrics, achieving 0.92 in accuracy and 0.95 in AUC. These results confirm that the combination of RBF and Matern Covariance offers superior performance in capturing the complex relationships in mammogram images, making it the most effective distance function for this application.
TABLE IV:
Comparison of distance function test results using different distance functions, where ✓ indicates using FC and ✕ considering single neurons. The ± indicates the margin of error.
| Distance Function | FC | Accuracy | Sensitivity | Precision | Specificity | F1 | AUC | Time (s) |
|---|---|---|---|---|---|---|---|---|
| Euclidian | ✓ | 0.916 ± 0.02 | 0.883 ± 0.02 | 0.917 ± 0.02 | 0.920 ± 0.02 | 0.902 ± 0.02 | 0.932 ± 0.02 | 12,744 |
| ✗ | 0.916 ± 0.02 | 0.897 ± 0.02 | 0.920 ± 0.02 | 0.931 ± 0.02 | 0.913 ± 0.02 | 0.949 ± 0.02 | 11,330 | |
| Manhattan | ✓ | 0.876 ± 0.03 | 0.855 ± 0.03 | 0.864 ± 0.03 | 0.873 ± 0.03 | 0.851 ± 0.03 | 0.880 ± 0.03 | 13550 |
| ✗ | 0.883 ± 0.03 | 0.860 ± 0.03 | 0.871 ± 0.03 | 0.889 ± 0.03 | 0.864 ± 0.03 | 0.891 ± 0.03 | 11,991 | |
| Mahalanobis | ✓ | 0.924 ± 0.01 | 0.903 ± 0.02 | 0.935 ± 0.02 | 0.930 ± 0.02 | 0.911 ± 0.02 | 0.952 ± 0.01 | 14022 |
| ✗ | 0.925 ± 0.01 | 0.917 ± 0.02 | 0.941 ± 0.02 | 0.940 ± 0.02 | 0.923 ± 0.02 | 0.969 ± 0.01 | 12,310 | |
| RBF | ✓ | 0.923 ± 0.02 | 0.894 ± 0.02 | 0.925 ± 0.02 | 0.937 ± 0.02 | 0.903 ± 0.02 | 0.941 ± 0.02 | 13,335 |
| ✗ | 0.921 ± 0.02 | 0.903 ± 0.02 | 0.937 ± 0.02 | 0.944 ± 0.02 | 0.913 ± 0.02 | 0.955 ± 0.02 | 11,883 | |
| Cosine | ✓ | 0.901 ± 0.02 | 0.871 ± 0.02 | 0.891 ± 0.02 | 0.902 ± 0.02 | 0.880 ± 0.02 | 0.912 ± 0.02 | 12,900 |
| ✗ | 0.900 ± 0.02 | 0.884 ± 0.02 | 0.905 ± 0.02 | 0.923 ± 0.02 | 0.894 ± 0.02 | 0.923 ± 0.02 | 11,660 | |
| RBF + Matern Covariance | ✓ | 0.934 ± 0.01 | 0.918 ± 0.01 | 0.944 ± 0.01 | 0.943 ± 0.01 | 0.923 ± 0.01 | 0.944 ± 0.01 | 14,550 |
| ✗ | 0.938 ± 0.01 | 0.921 ± 0.01 | 0.955 ± 0.01 | 0.958 ± 0.01 | 0.930 ± 0.01 | 0.940 ± 0.01 | 13,450 |
Fig. 3:

Comparison of Distance Function Test Results on a Cross-Validation Dataset.
IV. Conclusion
This study benchmarks various distance functions within a Siamese network to understand the effects of non-linearity and correlation-based functions on breast cancer detection, and also introduces a new distance function that combines the RBF kernel with the Matern covariance function. This novel approach aims to improve the accuracy and robustness of mammogram image analysis by capturing complex, non-linear relationships between current and prior images. The RBF with Matern Covariance Function emerged as the most effective distance function, consistently delivering superior performance across different architectures (ResNet and VGG). Specifically, it achieved the highest accuracy, sensitivity, precision, specificity, F1 score, and AUC in our evaluations. Additionally, evaluation on a dataset of 30 cross-validation samples confirmed its robust generalization capabilities. These findings demonstrate the importance of considering complex feature relationships in improving the classification accuracy and generalization capabilities of mammogram analysis. Future work will explore further enhancements and real-world applications of these advanced distance functions in clinical settings, specifically in the context of the explanability, and include more cancer classes such as Benign.
Acknowledgment
This work was support in part by the University of Connecticut (UConn) Clinical Research and Innovation Seed Program (CRISP) Award (PI: Sheida Nabavi) and the National Institutes of Health (NIH) under grant No. R01CA297855 (PIs: Sheida Nabavi, Clifford Yang).
References
- [1].Wellings SR, “Development of human breast cancer,” Advances in cancer research, vol. 31, pp. 287–314, 1980. [DOI] [PubMed] [Google Scholar]
- [2].Hulka BS and Stark AT, “Breast cancer: cause and prevention,” The Lancet, vol. 346, no. 8979, pp. 883–887, 1995. [DOI] [PubMed] [Google Scholar]
- [3].Koo MM, von Wagner C, Abel GA, McPhail S, Rubin GP, and Lyratzopoulos G, “Typical and atypical presenting symptoms of breast cancer and their associations with diagnostic intervals: Evidence from a national audit of cancer diagnosis,” Cancer epidemiology, vol. 48, pp. 140–146, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Lashof JC, Henderson IC, and Nass SJ, “Mammography and beyond: developing technologies for the early detection of breast cancer,” 2001. [PubMed]
- [5].Leung J, McKenzie S, Martin J, Dobson A, and McLaughlin D, “Longitudinal patterns of breast cancer screening: mammography, clinical, and breast self-examinations in a rural and urban setting,” Women’s Health Issues, vol. 24, no. 1, pp. e139–e146, 2014. [DOI] [PubMed] [Google Scholar]
- [6].Pisano ED and Yaffe MJ, “Digital mammography,” Radiology, vol. 234, no. 2, pp. 353–362, 2005. [DOI] [PubMed] [Google Scholar]
- [7].Castellino RA, “Computer aided detection (cad): an overview,” Cancer Imaging, vol. 5, no. 1, p. 17, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Shah SM, Khan RA, Arif S, and Sajid U, “Artificial intelligence for breast cancer analysis: Trends & directions,” Computers in Biology and Medicine, vol. 142, p. 105221, 2022. [DOI] [PubMed] [Google Scholar]
- [9].Lococo F, Ghaly G, Chiappetta M, Flamini S, Evangelista J, Bria E, Stefani A, Vita E, Martino A, Boldrini L et al. , “Implementation of artificial intelligence in personalized prognostic assessment of lung cancer: A narrative review,” Cancers, vol. 16, no. 10, p. 1832, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Bromley J, Guyon I, LeCun Y, Säckinger E, and Shah R, “Signature verification using a” siamese” time delay neural network,” Advances in neural information processing systems, vol. 6, 1993. [Google Scholar]
- [11].Panayides AS, Amini A, Filipovic ND, Sharma A, Tsaftaris SA, Young A, Foran D, Do N, Golemati S, Kurc T et al. , “Ai in medical imaging informatics: current challenges and future directions,” IEEE journal of biomedical and health informatics, vol. 24, no. 7, pp. 1837–1857, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Lee H, Kim J, Park E, Kim M, Kim T, and Kooi T, “Enhancing breast cancer risk prediction by incorporating prior images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2023, pp. 389–398. [Google Scholar]
- [13].Park J, Phang J, Shen Y, Wu N, Kim S, Moy L, Cho K, and Geras KJ, “Screening mammogram classification with prior exams,” arXiv preprint arXiv:1907.13057, 2019. [Google Scholar]
- [14].Bai J, Jin A, Wang T, Yang C, and Nabavi S, “Feature fusion siamese network for breast cancer detection comparing current and prior mammograms,” Medical Physics, vol. 49, no. 6, pp. 3654–3669, 2022. [DOI] [PubMed] [Google Scholar]
- [15].Koch G, Zemel R, Salakhutdinov R et al. , “Siamese neural networks for one-shot image recognition,” in ICML deep learning workshop, vol. 2, no. 1. Lille, 2015, pp. 1–30. [Google Scholar]
- [16].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [Google Scholar]
- [17].Simonyan K and Zisserman A, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [Google Scholar]
- [18].Deng J, Dong W, Socher R, Li L-J, Li K, and Fei-Fei L, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255. [Google Scholar]
- [19].Nair V and Hinton GE, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814. [Google Scholar]
- [20].Mahalanobis PC, “On the generalized distance in statistics,” Sankhyā: The Indian Journal of Statistics, Series A (2008-), vol. 80, pp. S1–S7, 2018. [Google Scholar]
- [21].Lowe D and Broomhead D, “Multivariable functional interpolation and adaptive networks,” Complex systems, vol. 2, no. 3, pp. 321–355, 1988. [Google Scholar]
- [22].Salton G, Wong A, and Yang C-S, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, 1975. [Google Scholar]
- [23].Matérn B, Spatial variation. Springer Science & Business Media, 2013, vol. 36. [Google Scholar]
- [24].Harville DA, “Matrix algebra from a statistician’s perspective,” 1998.
- [25].Schlagenhauf T, Yildirim F, and Brückner B, “Siamese basis function networks for data-efficient defect classification in technical domains,” in International Conference on Software Engineering and Formal Methods. Springer, 2022, pp. 71–92. [Google Scholar]
- [26].Abramowitz M and Stegun IA, Handbook of mathematical functions with formulas, graphs, and mathematical tables. US Government printing office, 1948, vol. 55. [Google Scholar]
- [27].Rumelhart DE, Hinton GE, and Williams RJ, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986. [Google Scholar]
- [28].Lee RS, Gimenez F, Hoogi A, Miyake KK, Gorovoy M, and Rubin DL, “A curated mammography data set for use in computer-aided detection and diagnosis research,” Scientific data, vol. 4, no. 1, pp. 1–9, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Cui C, Li L, Cai H, Fan Z, Zhang L, Dan T, Li J, and Wang J, “The chinese mammography database (cmmd): An online mammography database with biopsy confirmed types for machine diagnosis of breast,” The Cancer Imaging Archive, vol. 1, 2021. [Google Scholar]
- [30].Buda M, Saha A, Walsh R, Ghate S, Li N, Świecicki A, Lo JY, and Mazurowski MA, “A data set and deep learning algorithm for the detection of masses and architectural distortions in digital breast tomosynthesis images,” JAMA network open, vol. 4, no. 8, pp. e2 119 100–e2 119 100, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Jeong JJ, Vey BL, Bhimireddy A, Kim T, Santos T, Correa R, Dutt R, Mosunjac M, Oprea-Ilies G, Smith G et al. , “The emory breast imaging dataset (embed): A racially diverse, granular dataset of 3.4 million screening and diagnostic mammographic images,” Radiology: Artificial Intelligence, vol. 5, no. 1, p. e220047, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Loizidou K, Skouroumouni G, Nikolaou C, and Pitris C, “Breast mass detection and classification algorithm based on temporal subtraction of sequential mammograms,” in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI). IEEE, 2021, pp. 1117–1121. [Google Scholar]
- [33].Loizidou K, Skouroumouni G, Savvidou G, Constantinidou A, Vlachou EO, Yiallourou A, Pitris C, and Nikolaou C, “Breast masses dataset with precisely annotated sequential mammograms,” 2024, you can cite all versions by using the DOI 10.5281/zenodo.7179855. This DOI represents all versions, and will always resolve to the latest one. [Online]. Available: https://zenodo.org/records/11446259 [DOI] [Google Scholar]
- [34].Bisong E and Bisong E, “Google colaboratory,” Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners, pp. 59–64, 2019. [Google Scholar]
