PiCCL: A lightweight multiview contrastive learning framework for image classification

Yiming Kuang; Jianwu Guan; Hongyun Liu; Fei Chen; Zihua Wang; Weidong Wang

doi:10.1371/journal.pone.0329273

. 2025 Aug 25;20(8):e0329273. doi: 10.1371/journal.pone.0329273

PiCCL: A lightweight multiview contrastive learning framework for image classification

Yiming Kuang ^1,^#, Jianwu Guan ^2,^#, Hongyun Liu ^1,³, Fei Chen ¹, Zihua Wang ⁴, Weidong Wang ^1,^3,^*

Editor: Hung Thanh Bui⁵

PMCID: PMC12377561 PMID: 40853997

Abstract

We introduce PiCCL (Primary Component Contrastive Learning), a self-supervised contrastive learning framework that utilizes a multiplex Siamese network structure consisting of many identical branches rather than 2 to maximize learning efficiency. PiCCL is simple and light weight, it does not use asymmetric networks, intricate pretext tasks, hard to compute loss functions or multimodal data, which are common for multiview contrastive learning frameworks and could hinder performance, simplicity, generalizability and explainability. PiCCL obtains multiple positive samples by applying the same image augmentation paradigm to the same image numerous times, the network loss is calculated using a custom designed Loss function named PiCLoss (Primary Component Loss) to take advantage of PiCCL’s unique structure while keeping it computationally lightweight. To demonstrate its strength, we benchmarked PiCCL against various state-of-the-art self-supervised algorithms on multiple datasets including CIFAR-10, CIFAR-100, and STL-10. PiCCL achieved top performance in most of our tests, with top-1 accuracy of 94%, 72%, and 97% for the 3 datasets respectively. But where PiCCL excels is in the small batch learning scenarios. When testing on STL-10 using a batch size of 8, PiCCL still achieved 93% accuracy, outperforming the competition by about 3 percentage points.

1 Introduction

In recent years, self-supervised learning (SSL) has gained significant popularity for its incredible performance while only using unlabeled data [1–6]. Since the noninvolvement of labels, SSL is a subcategory of unsupervised learning. However, today researchers commonly refer to SSL as its own category different from UL, with the distinction being SSL trains the network in a supervised manner using pseudo-labels, which are labels generated autonomously by pre-defined objectives [7,8]. It is common for networks trained by SSL to require a transfer learning step or fine-tuning step before applications like classification. In such cases, the role of the SSL is often referred to as pre-training or pretext task.

Contrastive learning (CL), a type of pretext task and a powerful tool for SSL, has been used in various fields including computer vision (CV) [1,2], natural language processing (NLP) [9,10], and graphs [5,11]. Most contrastive learning methods work by minimizing an objective function (loss function) which aims to bring the representations of positive sample pairs together and, often but not always, pushes representations of negative sample pairs apart. The latter’s purpose is to prevent trivial solutions, but as SimSiam [12] demonstrated it’s not always required. In the field of computer vision, positive and negative sample pairs refer to images with the same label and images with different labels respectively. For self-supervised approaches, due to the lack of real labels, the generation of sample pairs often relies on image augmentation, where views originating from the same sample image are positive pairs and views originating from different sample images are negative pairs, despite the source images might belong to the same category. The augmentations used might include color alternations, affine transformations, cropping and resizing, blurring and masking, etc., and are often applied randomly. So far, the majority of CL algorithms generate two views from each image per iteration, these two views form a positive pair, while all other views are considered to be negative samples. There are also algorithms that generates more than two views from each image, in this paper we categorize them as multiview contrastive learning algorithms (MCL). Since the number of views scales linearly with the number of views per image, the pairwise correlation calculation complexity scales quadratically. Thus a common problem for MCL is that their objective function is either complicated or asymmetric, making them hard to implement and compute. As most CL algorithms use other samples from the same batch as negative samples, most of them requires a large batch size to work well, making traning hardware intensive.

The search for better SSL algorithm is an ongoing task. Our research objective has been to develop methods that performs well, don’t require large batch size, and can run on limited hardware. In this paper, we introduce Primary Component Contrastive Learning (PiCCL), a new self-supervised contrastive learning framework for visual representation extraction. PiCCL is a MCL algorithm, it employs a symmetric multiplex Siamese network, which is an extension of the usual 2-fold structure to higher orders, it generates positive sample sets containing multiple views rather than positive pairs with just two views. To fully take advantage of this multiplex Siamese network without adding too much computation complexity, a brand-new loss function, Primary Component Loss (PiCLoss), is constructed. PiCLoss’s idea is similar to other contrastive learning loss functions as it too works by promoting similarities within positive embedding sets (pair) while discouraging similarities between negative embedding sets (pair). The number of Siamese network branches, which is also the number of views per image, is “P", which can be set to any arbitrary integer greater than 1. To avoid the aforementioned scalability issue of MCL, PiCLoss first find the average (the primary component) of the embedding vectors of each set of positive samples, i.e. the set of views originated from the same image. And then calculate the instance discrimination network loss from the primary components. Thereby, retaining $O (P)$ complexity. We tested PiCCL with P = 4 and P = 8 against popular algorithms, including previous state of the arts. PiCCL achieved highest accuracies in most of the tests, performing especially well in the small batch learning case.

To provide some intuition for PiCCL, consider the following. Suppose the learning has converged, and embeddings have formed into clusters in the embedding space. At the center of each cluster should be the defining features of that respective category. Let’s call this cluster center the target embedding vector $\tilde{V}$ . Now suppose the learning is in progress, the embedding vector of sample I is $f_{θ_{t}} (I) = A$ , here $f_{θ_{t}} (*)$ represents the neural network and $θ_{t}$ is the network weights at step t. Retrospectively we know we should update θ with the goal of bringing A toward $\tilde{V}$ , but at this point $\tilde{V}$ is unknown, so the best we can do is to update A toward the other positive embedding vectors, or, equivalently, update toward their average: V. The above is a stochastic process, the expected distance between the target embedding vector $\tilde{V}$ and the average embedding vector V decreases as P increases. i.e., when increasing P, the chance of the average embedding being closer to the true center is higher, so in each iteration, the updates to model weights are more effective.

Other than the above argument, PiCCL’s multiplex Siamese structure also provides both more positive and negative samples per batch, making it more suited to small batch learning than more traditional SSL algorithms (Fig 1).

Fig 1 — There are 100 black dots on each plot, each dot is the average position of P randomly generated points following a Gaussian distribution probability density ( $σ = 1$ , centered at (0,0)). For A, P = 2, and for B, P = 8. Comparing the 2 plots we can see, that as P increases, the chance of the average representation being closer to the true center is higher, and thus the network updates are more efficient.

2 Related works

2.1 Contrastive learning

Most CV contrastive learning methods use Siamese networks or pseudo-Siamese networks. The former consists of twin networks of identical structure and weight, while the latter might have some subtle differences between the networks.

One of the most influential contrastive learning algorithms in recent years is SimCLR [13], it uses a twin Siamese network and achieved massive improvements compared to the State-Of-The-Art algorithms at that time. Siamese networks are a class of neural networks that take multiple (often 2) inputs, and process them by the same neural network (or equivalently, process them by neural networks with the same structure and shared weights). For a batch consisting of N images, SimCLR generates 2 views from each image via random image augmentation and forms an extended batch of size 2N, which is then fed into the Siamese network to obtain 2N embeddings. Then the network loss is calculated based on those embeddings using a loss function named NT-Xent (sometimes referred to as InfoNCE).

Loss = \sum_{i = 1}^{N} L_{i} = \sum_{i = 1}^{N} (- l o g \frac{e^{({\vec{A}}_{i}^{1} \cdot {\vec{A}}_{i}^{2}) / τ}}{\sum_{k = 1}^{N} (e^{({\vec{A}}_{i}^{1} \cdot {\vec{A}}_{k}^{2}) / τ} + e^{({\vec{A}}_{i}^{1} \cdot {\vec{A}}_{k}^{1}) / τ}) - e^{({\vec{A}}_{i}^{1} \cdot {\vec{A}}_{i}^{1}) / τ}} - l o g \frac{e^{({\vec{A}}_{i}^{2} \cdot {\vec{A}}_{i}^{1}) / τ}}{\sum_{k = 1}^{N} (e^{({\vec{A}}_{i}^{2} \cdot {\vec{A}}_{k}^{1}) / τ} + e^{({\vec{A}}_{i}^{2} \cdot {\vec{A}}_{k}^{2}) / τ}) - e^{({\vec{A}}_{i}^{2} \cdot {\vec{A}}_{i}^{2}) / τ}})

(1)

Eq (1) is the NT-Xent loss function for a batch of images, its form has been reformulated to keep the notation consistent within this literature. ${\vec{A}}_{i}^{1}$ is the L2 normalized embedding vector of the first view of the i-th image, and similarly ${\vec{A}}_{k}^{2}$ is the L2 normalized embedding vector of the second view of the k-th image, N is the number of images within a batch, i.e. batch size. These notation will be used through out this paper. τ is the “temperature" parameter. The numerator term of NT-Xent is the “attractive terms” that brings positive sample pairs together, while the denominator terms are the “repulsive terms” that pushes negative pairs apart.

Other algorithms like Barlow-Twins [14], and VICReg [15], also use the same kind of symmetric Siamese network but differentiate themselves through their unique loss functions.

Loss = \sum_{i = 1}^{N} {(1 - {\vec{A}}_{i}^{1} \cdot {\vec{A}}_{i}^{2})}^{2} + α \sum_{i = 1}^{N} \sum_{\binom{j = 1}{j \neq i}}^{N} {({\vec{A}}_{i}^{1} \cdot {\vec{A}}_{j}^{2})}^{2}

(2)

Eq (2) is Barlow-Twins’ loss function, named L_BT [14]. Its first term is the “attractive term” while the second term is the “repulsive term", α is a constant parameter controlling the strength of the repulsive term.

The aforementioned methods require positive pairs for feature learning, as well as negative pairs to prevent collapsing. On the contrary, methods like SimSiam [12], SwAV [16], and BYOL [17] use only positive pairs and thus are capable of online learning. To prevent collapsing, all the above features some asymmetries between the two branches, SimSiam uses a symmetric Siamese network just like SimCLR, but features a predictor head on one branch while employing a stop gradient operation on the other. SwAV and BYOL feature twin networks consisting of the same structure but different weights, such networks are sometimes referred to as pseudo-Siamese networks.

Loss = \frac{1}{N} \sum_{i = 1}^{N} (\frac{1}{2} (D ({\vec{A}}_{i}^{1}) \cdot P ({\vec{A}}_{i}^{2})) + \frac{1}{2} (D ({\vec{A}}_{i}^{2}) \cdot P ({\vec{A}}_{i}^{1})))

(3)

Eq (3) is the Loss function of SimSiam, D( $*$ ) is the stop gradient (or detach) operation which removes the argument from the backpropagation computation graph. P( $*$ ) is the predictor operation which in SimSiam’s original implementation is a 2 layer MLP.

More efficient pretext tasks leads to better training outcomes. Unrepresentative positive sample pairs, such as sample pairs that don’t contain the same object due to random crop, leads to inefficient training. SemanticCrop [18] and ScoreCL [19] are extension methods that can be used alongside popular methods like SimCLR and Barlow-Twins. SemanticCrop introduced a new weighted crop method that replaces the common random crop image augmentation method. It uses center-suppressed probabilistic sampling to favor crops that are dissimilar yet still lands in the target area. ScoreCL addresses this problem by adding an additional term to the loss function that weights the network loss based on how dissimilar the positive pairs are. PiCCL attempts to address this problem by increasing the number of views, the more views a positive sample set contains, the less likely it is to be unrepresentative.

2.2 Multi-view contrastive learning

There have been works trying to expand the 2-fold Siamese networks used by traditional contrastive learning algorithms to higher orders, some examples include CMC [20], K-shot [21], LOOC [22], and E-SSL [23]. In this paper, we refer to this family of algorithms as multiview contrastive learning (MCL) algorithms, and PiCCL is one of them. K-Shot’s framework is closest to PiCCL, it features multiple identical branches just like ours. However, its loss function is fairly complicated and computationally heavy, requiring eigenvalue decomposition calculations for each and every view, this step itself is $𝒪 (P^{3})$ , which significantly adds to the training cost. LOOC’s pretext tasks are asymmetric, each branch uses one type of predetermined image augmentation method. Its loss function is a generalization of InfoNCE, it calculates the InfoNCE loss on every possible pair of branches and finds their average, due to the P choose 2 combination, this loss function is $𝒪 (P^{2})$ . E-SSL is an extension method that can be used on other self-supervised learning methods, it creates 4 additional branches on top of the SSL Siamese framework, each branch uses a predetermined angle of rotation as its pretext task. For the added branches, predictions are made on the angle of rotation, the error is calculated and used for backpropagation. Contrastive multiview coding (CMC) is one of the earlier methods in multiview contrastive learning, it requires the use of multimodal data. Each branch is tasked to extract feature from one mode, and the network aims to learn common features across the different modes. In the original work, multimodal views of ordinary RGB pictures are obtained by converting it to Lab format, and then treat the L channel and the ab channel as different modes. The paper provides 2 loss functions for networks with P > 2, one have a “core” view” where other views are compared against, another have no “core” view and comparisons are made for every possible pairs. The former one is $𝒪 (P)$ and the latter $𝒪 (P^{2})$ .

The above methods all suffer more or less from at least one of the following problems: (1) complicated network structure, this includes the use of asymmetric pretext tasks or network weights; (2) computationally intensive, especially when P is large; (3) requires multimodal data, which could hurt generalizability. Motivated by the above, we designed PiCCL to be a simple, scalable, and generalizable algorithm (Table 1).

Table 1. Summary of related works.

Name	number of branches (P)	symmetric	multimodal data	complexity(P)
SimCLR [13]	2	yes	no	-
Barlow-Twins [14]	2	yes	no	-
VICREG [15]	2	yes	no	-
SimSIam [12]	2	no	no	-
SwAV [16]	2	no	no	-
BYOL [17]	2	no	no	-
CMC [20]	any	no	yes	$𝒪 (P)$ / $𝒪 (P^{2})$
K-shot [21]	any	yes	no	$𝒪 (P^{3})$
LOOC [22]	any	no	no	$𝒪 (P^{2})$
E-SSL [23]	6	no	no	-
PiCCL (Ours)	any	yes	no	$𝒪 (P)$

Open in a new tab

Symmetric: whether the network or the loss function contains branch specific terms that will cause the error to propagate differently between branches. For example, consider the stop gradient operator in SimSiam’s loss function (3). Complexity(P): loss function’s time complexity with respect to the number of network branches, this is invalid for algorithms with fixed branch number.

2.3 Small batch learning

Online learning is a sub-field of machine learning where data arrives in a sequential order, and the neural network updates its parameters with the objective of making a better prediction on the next sample. This is in contrast with traditional machine learning approach which the entire dataset is available from the start. Small batch learning, on the other hand, refers to mini-batch learning with a small batch size. Contrary to the name, mini-batch sometimes aren’t mini at all, for example some tests in [14] uses a batch size of 4096. Both online learning and small batch learning have a substantial advantage in computational cost, most significantly, memory requirement, as loading a small batch of samples requires very little memory. Also, small batch size makes data acquisition possible on a single device as gathering a large batch of data, especially if we wish it to contain a wide range of different classes, is both hard and time consuming. These advantages make online learning and small batch learning ideal for off-line mobile device learning, low-power training and high dynamic environment learning.

Generally speaking, aside from methods like SimSiam which doesn’t utilize negative pairs, most SSL methods require a large batch size to work well, as it provides more negative samples. Most SSL algorithms use batch sizes larger than 128, which could be difficult for offline on-device incremental learning tasks.

3 Method

The forward pass flow diagram of PiCCL is illustrated in Fig 2. PiCCL contains P identical branches, each containing the same encoding and decoding network. The image augmentation module takes a batch of N images and outputs an extended batch composed of P augmented batches each containing N views. The views are first mapped onto the latent space by the encoder and then mapped onto the embedding space by the decoder. Finally, network loss (error) is calculated from the L2 normalized embedding vectors using a custom loss function: PiCLoss. This concludes the forward pass and backpropagation can be initiated.

3.1 Image augmentation

The choice of image augmentation methods is crucial for the quality of representation learning [24]. A good augmentation should retain as much relevant information as possible while altering the rest. Here, we implement a slightly modified version of the image augmentation process used by Barlow-Twins, which consists of random applications of (1) cropping followed by resizing to the original size, (2) horizontal flip, (3) color jitter, (4) transform to grayscale, and (5) Gaussian blur. The same image augmentation paradigm is applied to all Siamese network branches. The list of augmentations is listed in Table 2.

Table 2. Image augmentation methods.

Methods	source	params	description
random crop and resize	torchvision	default	crop the image then resize to original size
random horizontal flip	torchvision	p = 0.5	flip the image horizontally with probability p
random apply of color jitter	Barlow-Twins	p = 0.8 brightness = 0.6, contrast = 0.6, saturation = 0.6 hue = 0.2	make changes to brightness, contrast, saturation and hue with probability p
random grayscale	torchvision	0.5	turn image into grayscale with probablity p
random gaussian blur	Barlow-Twins	P = 1	apply gaussian blur with probability p

Open in a new tab

The list of image augmentation methods used by PiCCL. For source, “torchvision” means the function is from torchvision 0.20, “Barlow-Twins” means the function os from Barlow-Twins’ published source code.

3.2 Loss function

PiCLoss, the loss function designed for PiCCL, is expressed as Eq (5).

{\vec{V}}_{n} = \frac{\frac{1}{P} \sum_{p} {\vec{A}}_{n}^{p}}{| | \frac{1}{P} \sum_{p} {\vec{A}}_{n}^{p} | |_{2}}

(4a)

Loss = ⟨ 1 - {\vec{A}}_{n}^{p} \cdot {\vec{A}}_{n}^{q} ⟩ + α ⟨ e^{| {\vec{V}}_{n} \cdot {\vec{V}}_{m} |} ⟩

(4b)

Same as before, ${\vec{A}}_{n}^{p}$ denotes the L2 normalized embedding vector of the p-th view of the n-th image. The superscript p is no longer confined to 1 or 2, but can now be any natural number from 1 to P. α is the regularization parameter, the idea is that PiCLoss can be tweaked to fit different scenarios. However, during testing, we found that setting $α = 2$ accommodates most cases. $⟨ * ⟩$ is the average value operation. ${\vec{V}}_{n}$ are the “Primary Components” in PiCCL’s name, it is the average of embeddings of all views from the same image followed by a L₂ normalization.

The first term in PiCLoss is the attractive term. $S_{n} = {\vec{A}}_{n}^{p}$ ⋅ ${\vec{A}}_{n}^{q}$ is a P-by-P symmetric matrix that represents all pair-wise cosine similarity between embedding vectors of views originating from image n. Cosine similarity are calculated as the dot product of the L2 normalized embedding vectors, values are bounded by –1 and 1, and diagonal elements are always 1.

The second term in PiCLoss is the repulsive terms that penalizes similarities between primary components of different images, quantified by $| {\vec{V}}_{n} \cdot {\vec{V}}_{m} |$ . By calculating the primary components first, this ${\vec{V}}_{n} \cdot {\vec{V}}_{m}$ matrix only contains N-by-N elements. An absolute value operation is applied since negative correlations should also be penalized, and an exponentiation operation is applied to help with the uniformity of ${\vec{V}}_{n}$ ⋅ ${\vec{V}}_{m}$ as it addresses a much more substantial cost on extreme values (Fig 3).

Fig 3 — Two sets of operations are applied to the L2 normalized embedding vectors (middle boxes). The left side is the attractive term promoting cluster formation. The right side is the repulsive term decorrelating negative pair embeddings.

4 Results

To test PiCCL’s performance, we benchmarked it against popular algorithms including SimCLR, Barlow-Twins, SimSiam, and E-SSL. We used 2 flavors of PiCCL: PiCCL(4) with 4 network branches and PiCCL(8) with 8 branches. We attempt to make the comparisons fair and representative, that means using the same networks, augmentations, and hyperparameters whenever possible. The encoder of choise is ResNet-18 [25], the smallest variant of the ResNet family and a very popullar choice in the computer vision community. To accommodate the picture size of CIFAR-10, CIFAR-100, and STL-10, we have made the following changes to the ResNet18 encoder. For CIFAR-10 and CIFAR-100 images, whose size is 32 by 32, the first 7 $*$ 7 convolution layer is replaced with a 3 $*$ 3 stride = 1 layer. The first max pooling as well as the final fully connected layer is removed. These changes are the same as SimCLR’s implementation on CIFAR-10. For STL-10 images, whose size is 96 by 96, the first 7 $*$ 7 convolution layer is replaced with a 5 $*$ 5 stride = 2 layer, and the final fully connected layer is removed.

For training of PiCCL, SimCLR, and Barlow-Twins, a 2-layer MLP projector is added to the encoder (first layer: 512 to 2048, second layer 2048 to 128). For training of SimSiam, we directly followed its literature, using the same 3-layer MLP as the projector and the 2-layer MLP as the predictor.

For all tests, neural network are trained for 1000 epochs, learning rate are set to 0.6 $*$ batchsize/64 unless specified. Learning rate decreases following a cosine annealing schedule. For the loss functions, we set the hyper parameters according to the original literatures. For NT–Xent 1, the temperature parameter is set to 0.5. For L_BT, λ is set to 0.5. For CIFAR-10 and CIFAR-100, we also conducted test for E-SSL. The model is ResNet18, with the same changes applied. Other parameters are kept as the same as its original literature.

Accuracies are evaluated using the linear classifier method. The projector (and predictor for SimSiam) are discarded, and a linear layer is attached after the encoder CNN for classification. For each test, the classifier layer is trained for 100 epochs using the training set. After each epoch, the accuracy is evaluated using the testing set, and the highest accuracy across the 100 epochs is reported. The batch size is set to 256, the and learning rate is set to 0.6, no learning rate scheduling is used.

4.1 STL-10

STL-10 contains 3 sets of images of size 96 by 96: a “unlabeled” set containing 100,000 unlabeled images of various catagories; a “train” set containing 5,000 labeled images, 10 classes, 500 images each; and a “test” set containing 8,000 labeled images, 800 for each class. For our test, we used the “unlabeled” set to train the network, after which the projector (and predictor) is discarded, then we used the “train” set to train the linear classifier, and finally, use the “test” set to evaluate accuracy.

Table 3 displayed the linear classifier accuracies of SimCLR, Barlow-Twins, SimSiam, PiCCL(P = 4), and PiCCL(P = 8). The batch size is set to 256, a commonly accepted size for SSL algorithms of a similar kind. At all epochs, both PiCCL(4) and PiCCL(8) outperformed other methods. Among the 2 variants of PiCCL, We did not observe any overall trend to whom is superior. The highest overall accuracy is achieved by PiCCL(4) with a score of 97.55%. Model accuracy had mostly stabilized by 500 training epochs, therefore we selected this duration as the optimal training length for subsequent experiments.

Table 3. Results on STL-10 with batch size = 256.

Methods	100epoch	200epoch	500epoch	1000epoch	Best
SimCLR	95.17	96.10	96.75	96.79	96.79
Barlow-Twins	93.14	94.33	95.86	95.97	95.97
SimSiam	84.81	88.95	87.07	86.05	88.95
PiCCL (4)	95.25	96.50	96.83	97.55	97.55
PiCCL (8)	95.79	96.33	97.05	97.18	97.18

Open in a new tab

The one trial accuracy of various methods on the STL-10 dataset while batch size is held at 256. The highest accuracy of each column is highlighted in bold, the values are percentage points.

Table 4 displayed the accuracies of the algorithms at 500 epochs, with batch size set to 4, 8, 16, 64, and 256. This table is meant to showcase the effect of batch size on classification accuracies. At the extreme case of N = 4, Barlow-Twins took the first place in accuracy, SimSiam took the second place followed by PiCCL(8). For batch sizes of 8 and onward, both PiCCL(4) and PiCCL(8) outperforms all other methods.

Table 4. Results on STL-10 at 500 epoch.

Methods	N = 4	N = 8	N = 16	N = 64	N = 256
SimCLR	84.44	88.98	95.15	96.71	96.75
Barlow-Twins	89.48	90.35	91.20	91.90	95.86
SimSiam@200[1]	88.95	88.95	88.95	88.95	88.95
PiCCL (4)	88.00	92.91	95.39	96.81	96.83
PiCCL (8)	88.66	93.54	95.67	97.02	97.05

Open in a new tab

The one trial accuracy of various methods on the STL-10 dataset with varying batch sizes, training lasted 500 epochs. The highest accuracy of each column is highlighted in bold, the values are percentage points.

[1] We did not test SimSiam with varying batch sizes as SimSiam’s loss function is unaffected by batch size. The reported value is the highest value we obtained from previous test.

To strengthen our findings, we repeated the N = 8 and N = 256 tests 5 times and analyzed the significance using the independent samples T test. The results are displayed in Fig 4.

Fig 4 — The left panel compares the methods with batch size = 8, the right panel compares the methods with batch size = 256. each test is repeated 5 times, the mean(std) is written on the columns, the p-values are drown on the displayed on the top of the figure.

For N = 8, both PiCCL(4) and PiCCL(8) showed a statistically highly significant ( $p \leq 0.001$ ) advantage over SimCLR, Barlow-Twins and SimSiam. There is also a significant difference ( $p \leq 0.01$ ) between PiCCL(4) and PiCCL(8). PiCCL(8) outperformed SimCLR by 4.56% (95%CI:[4.28, 4.83]), outperformed Barlow-Twins by 3.62% (95%CI:[2.96, 4.28]), outperformed SimSiam by 4.97% (95%CI:[3.89, 6.03]).

For N = 256, PiCCL(4) and PiCCL(8) significantly outperformed SimCLR at P values less than 0.05 and 0.01 respectively. The advantage over Barlow-Twins and SimSiam is statistically highly significant ( $p \leq 0.001$ ). Between the two variants, PiCCL(8) slightly outperforms PiCCL(4) with a P-value less than 0.05.

Shrinking the batch size from 256 to 8 caused performance degradation to all methods except SimSiam. PiCCL(8) degraded by 3.43% (95%CI:[3.16, 3.70]) and PiCCL(4) degraded by 3.69% (95%CI:[3.48, 3.90]). While SimCLR degraded by 7.46% (95%CI:[7.16, 7.76]) and Barlow-Twins degraded by 4.17% (95%CI:[2.95, 5.39]).

4.2 CIFAR-10 & CIFAR-100

The CIFAR-10 and CIFAR-100 datasets are very simillar, they both consists a training set of 50,000 and a testing set of 10,000 labeled RGB images, each of size 32 by 32. Their difference being CIFAR-10 contains 10 classes while CIFAR-100 contains 100 classes. We use the “train” set for both unsupervised training of the network, and supervised finetuning of the linear classifier, we use the “test” set for accuracy evaluation.

For the tests on CIFAR-10 and CIFAR-100 (Table 5), both PiCCL(4) and PiCCL(8) consistently outperformed the other algorithms. When testing under the same settings, SimSiam suffered from severe over fitting, this is likely due to CIFAR datasets having both less samples and less pixels per sample. To make a fair comparison we decreased SimSiam’s learning rate, its accuracy at 500 epoch is lower than 200 epoch, indicating that albeit to a lesser degree, the problem of over fitting is still present.

Table 5. Results on CIFAR-10 & CIFAR-100.

	CIFAR-10		CIFAR-100
Methods	200 Epoch	500 Epoch	200 Epoch	500 Epoch
SimCLR	92.25	93.26	69.71	71.71
Barlow-Twins	90.74	92.52	66.66	68.22
SimSiam[1]	82.92	79.85	54.20	50.93
E-SSL	90.37	93.00	64.23	69.09
PiCCL (4)	93.15	93.61	72.10	72.75
PiCCL (8)	93.73	94.04	72.38	72.44

Open in a new tab

The highest accuracy of each column is highlighted in bold, the values are percentage points.

[1]The numbers presented here for SimSiam has the learning rate decreased by a factor of 4.

5 Discussion

5.1 Performance of PiCCL

PiCCL performs great in the general setting. When testing on STL-10 with batch size of 256 (Table 3), PiCCL achieved 97.55% accuracy, outperforming all other methods tested, including previous state of the arts. We think the primary contributor to the performance gain is the combination of the multiplex Siamese network structure and our unique loss function: PiCLoss. Each positive sample set contains P positive samples, rather than 2 in other methods, which provides a stronger and more confident anchor for the positive views. We also think the exponential regularization played a crucial role since in our own testing we find that changing it to L₁ or L₂ regularization resulted in performance degradation.

PiCCL’s performance is robust to small batch sizes (Table 4). PiCCL’s performance is sub optimal for N = 4, but at N = 8, PiCCL(4) and PiCCL(8) already reached 92.91% and 93.54%. Meanwhile, SimCLR only obtained 88.98%, roughly 4 percentage points lower than PiCCL. We think having more views when calculating network loss is a key factor to this performance edge. First, more positive samples provide a more accurate and representative target embedding vector (the primary component) for which views are updated towards. And second, more negative samples mean the views have more instances to discriminate against, which also improves training quality. This theory is in line with our observation that PiCCL(8) outperforms PiCCL(4) in small batch settings.

The number of network branches (P) can take any number greater or equal to 2. In our tests, PiCCL(2) uses the least amount of system resources, but its performance lags behind PiCCL(4), further suggesting that PiCCL’s performance increases with P. PiCCL(16) on the other hand, uses more system resources than PiCCL(8) while producing on par performances, suggesting a limit to this scaling. On the dataset and the neural network we used, PiCCL(4) and PiCCL(8) represent the “sweet spot” that balances performance and computation complexity. On other datasets and neural networks, this observation may vary.

5.2 Efficiency of PiCCL

To validate our claim that PiCCL is relatively lightweight and scalable, we recorded the training time of the first 5 epochs on STL-10, deduced the average time per epoch per network branch, and reported the results in Table 6. PiCCL(2P) takes roughly twice as much time as PiCCL(P). PiCCL(2)’s training speed is on par with SimCLR and Barlow-Twins while PiCCL(4) is on par with SimSiam. The training time per network branch remains fairly constant across ALL PiCCL variants, validating our claim that PiCCL scales linearly with P. Table 6 also displays the video card memory usage. PiCCL(2) uses the least amount of memory among all methods. PiCCL(4) uses roughly 60% more than PiCCL(2), which is still less than SimSiam. The memory per branch slightly decreases as P increases, suggesting that PiCCL’s memory usage also scales linearly with P.

Table 6. Speed and Memory Metrics.

	Time per epoch	Time per epoch per branch	memory	memory per branch
SimCLR	43.2	21.6	1463	731.5
B-Twins	43.2	21.6	1467	733.5
SimSiam	80.2	40.1	2719	1359.5
PiCCL(2)	43.2	21.6	1317	658.5
PiCCL(4)	83.4	20.85	2091	522.75
PiCCL(8)	162.4	20.3	3783	472.88
PiCCL(16)	325.8	20.36	6855	428.44

Open in a new tab

The time reported has units in seconds, and the memory reported has units in megabytes (MiB). The dataset tested is STL-10 and the batch size is 64.

5.3 Weakness of our study

Due to both computer hardware constraints and time constraints, we weren’t able to test PiCCL on more complex and comprehensive datasets like ImageNet. We weren’t able to test PiCCL with larger batch size or a larger P. We also have to use ResNet-18 as our encoder backbone rather than the more common ResNet-50. However we think the above tests are enough to showcase the efficacy of our algorithm.

5.4 PiCCL’s use case and future work

PiCCL’s great performance under ordinary circumstances (Table 3) makes it a viable solution for computer vision self-supervised learning tasks. Our near future objective is to further optimize PiCCL by integrating more powerful and more recent encoders such as vision transformers [2], experiment on different loss function regularization, and bring PiCCL to other data types like audio. We will also attempt to solve real world problems using PiCCL as deep learning has been proven to be affective in countless cases [26–32]. In the long run, we want to incorporate PiCCL into scenarios where its small batch learning capabilities can be utilized. Referring to Table 4, When P = 8, PiCCL outscored the competition by more than 3 percentage points, which makes PiCCL especially suited for online styled small batch learning tasks. For example, low power device learning where memory is a constraint, or offline learning where gathering a large batch of distinct samples is hard.

6 Conclusion

In this paper, we proposed a simple and lightweight multiview contrastive learning algorithm called PiCCL. PiCCL uses a multiplex neural network structure, the pretext tasks and network weights are shared between network branches, making it a true Siamese network. PiCCL’s unique loss function “PiCLoss” simplifies computation by using primary components as a middle step, and thus can retain $𝒪 (P)$ complexity. We benchmarked PiCCL on STL-10, CIFAR-10, and CIFAR-100. In our tests, PiCCL outperformed SimCLR, Barlow-Twins, and SimSiam most of the time, especially in small batch size settings. Our vision for PiCCL involves applying it on image classification tasks where batch size is constrained, like on-device-learning or online-learning scenarios, as well as using PiCCL with larger state of the art models, these will be our future focus.

Data Availability

The source code will be made publicly available after the acceptance of this manuscript. The code will be hosted in a public GitHub repository: https://github.com/YimingKuang/PiCCL.

Funding Statement

This work was founded by the Scientific and Technological Innovation 2030 - “New Generation Artificial Intelligence” Major Project (2020AAA0105800). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.He K, Chen X, Xie S, Li Y, Dollar P, Girshick R. Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). IEEE Conference on Computer Vision and Pattern Recognition. IEEE; CVF; IEEE Comp Soc; 2022. p. 15979–88.
2.Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint 2020. doi: abs/2010.11929 [Google Scholar]
3.Caron M, Touvron H, Misra I, Jegou H, Mairal J, Bojanowski P, et al. Emerging properties in self-supervised vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021. 10.1109/iccv48922.2021.00951 [DOI] [Google Scholar]
4.Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, et al. EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans Image Process. 2021;30:2340–9. doi: 10.1109/TIP.2021.3051462 [DOI] [PubMed] [Google Scholar]
5.Wu J, Wang X, Feng F, He X, Chen L, Lian J, et al. Self-supervised graph learning for recommendation. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. p. 726–35. 10.1145/3404835.3462862 [DOI] [Google Scholar]
6.Alwassel H, Mahajan D, Korbar B, Torresani L, Ghanem B, Tran D. Self-supervised learning by cross-modal audio-video clustering. In: Advances in Neural Information Processing Systems. 2020.
7.Balestriero R, Ibrahim M, Sobal V, Morcos AS, Shekhar S, Goldstein T. A cookbook of self-supervised learning. arXiv preprint 2023. doi: abs/2304.12210 [Google Scholar]
8.Gui J, Chen T, Zhang J, Cao Q, Sun Z, Luo H, et al. A survey on self-supervised learning: algorithms, applications, and future trends. IEEE Trans Pattern Anal Mach Intell. 2024;46(12):9052–71. doi: 10.1109/TPAMI.2024.3415112 [DOI] [PubMed] [Google Scholar]
9.Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics; 2019. https://api.semanticscholar.org/CorpusID:52967399 [Google Scholar]
10.Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Neural Information Processing Systems; 2017. https://api.semanticscholar.org/CorpusID:13756489 [Google Scholar]
11.Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L. Graph contrastive learning with adaptive augmentation. In: Proceedings of the Web Conference 2021. 2021. p. 2069–80. 10.1145/3442381.3449802 [DOI] [Google Scholar]
12.Chen X, He K. Exploring simple siamese representation learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. p. 15745–53.
13.Chen T, Kornblith S, Norouzi M, Hinton GE. A simple framework for contrastive learning of visual representations. arXiv preprint 2020. https://arxiv.org/abs/2002.05709 [Google Scholar]
14.Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow-twins: self-supervised learning via redundancy reduction. arXiv preprint 2021. doi: abs/2103.03230 [Google Scholar]
15.Bardes A, Ponce J, LeCun Y. VICReg: variance-invariance-covariance regularization for self-supervised learning. arXiv preprint 2021. https://arxiv.org/abs/2105.04906 [Google Scholar]
16.Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint 2020. doi: abs/2006.09882 [Google Scholar]
17.Grill JB, Strub F, Altch’e F, Tallec C, Richemond PH, Buchatskaya E. Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint 2020. https://arxiv.org/abs/2006.07733 [Google Scholar]
18.Fang Y, Chen Z, Tang W, Wang YG. SemanticCrop: boosting contrastive learning via semantic-cropped views. In: Liu Q, Wang H, Ma Z, Zheng W, Zha H, Chen X, editors. Singapore: Springer; 2024. p. 335–46. [Google Scholar]
19.Kim J-Y, Kwon S, Go H, Lee Y, Choi S, Kim H-G. ScoreCL: augmentation-adaptive contrastive learning via score-matching function. Mach Learn. 2025;114(1). doi: 10.1007/s10994-024-06707-8 [DOI] [Google Scholar]
20.Tian Y, Krishnan D, Isola P. Contrastive multiview coding. In: European Conference on Computer Vision; 2019. https://api.semanticscholar.org/CorpusID:189762205 [Google Scholar]
21.Xu H, Xiong H, Qi G-J. K-Shot contrastive learning of visual features with multiple instance augmentations. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):8694–700. doi: 10.1109/TPAMI.2021.3082567 [DOI] [PubMed] [Google Scholar]
22.Xiao T, Wang X, Efros AA, Darrell T. What should not be contrastive in contrastive learning. arXiv preprint 2020. doi: abs/2008.05659 [Google Scholar]
23.Dangovski R, Jing L, Loh C, Han SJ, Srivastava A, Cheung B. Equivariant contrastive learning. arXiv preprint 2021. doi: abs/2111.00899 [Google Scholar]
24.Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F. A survey on contrastive self-supervised learning. arXiv preprint 2020. doi: abs/2011.00362 [Google Scholar]
25.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 770–8. 10.1109/cvpr.2016.90 [DOI] [Google Scholar]
26.Abdelsattar M, AbdelMoety A, Emad-Eldeen A. Applying image processing and computer vision for damage detection in photovoltaic panels. Mansoura Engineering Journal. 2025;50(2):2.doi: 10.58491/2735-4202.3263 [DOI] [Google Scholar]
27.Abdelsattar M, Abdelmoety A, Ismeil MA, Emad-Eldeen A. Automated defect detection in solar cell images using deep learning algorithms. IEEE Access. 2025;13:4136–57. doi: 10.1109/access.2024.3525183 [DOI] [Google Scholar]
28.Abdelsattar M, AbdelMoety A, Emad-Eldeen A. A review on detection of solar PV panels failures using image processing techniques. In: 2023 24th International Middle East Power System Conference (MEPCON); 2023. p. 1–6.
29.Biswas S, Mostafiz R, Uddin MS, Paul BK. XAI-FusionNet: diabetic foot ulcer detection based on multi-scale feature fusion with explainable artificial intelligence. Heliyon. 2024;10(10):e31228. doi: 10.1016/j.heliyon.2024.e31228 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Biswas S, Mostafiz R, Paul BK, Mohi Uddin KM, Rahman MM, Shariful FNU. DFU_MultiNet: a deep neural network approach for detecting diabetic foot ulcers through multi-scale feature fusion using the DFU dataset. Intelligence-Based Medicine. 2023;8:100128. doi: 10.1016/j.ibmed.2023.100128 [DOI] [Google Scholar]
31.Biswas S, Mostafiz R, Paul BK, Uddin KMM, Hadi MdA, Khanom F. DFU_XAI: a deep learning-based approach to diabetic foot ulcer detection using feature explainability. Biomedical Materials & Devices. 2024;2(2):1225–45. doi: 10.1007/s44174-024-00165-5 [DOI] [Google Scholar]
32.Chakravarty N, Dua M. Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection. Int J Speech Technol. 2024;27(1):225–37. doi: 10.1007/s10772-024-10093-w [DOI] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0329273.r001

Decision Letter 0

Tianlin Zhang

3 Dec 2024

PONE-D-24-42971PiCCL: a lightweight multiview contrastive learning framework for image classificationPLOS ONE

Dear Dr. Kuang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 17 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Tianlin Zhang

Academic Editor

PLOS ONE

Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 4. Thank you for stating the following financial disclosure: [This work was founded by the Scientific and Technological Innovation 2030 - ”New Generation Artificial Intelligence” Major Project (2020AAA0105800).]. Please state what role the funders took in the study. If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."" If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 5. Please upload a copy of S1 Appendix, S2 Appendix and S3 Appendix. to which you refer in your text on page 22. Please amend the file type to 'Supporting Information'. If the Supplementary file is no longer to be included as part of the submission please remove all reference to it within the text.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: • The introduction begins well but concludes abruptly. To improve consider summarizing the main contributions of the work at the end of the introduction. This will provide a good transition into the subsequent sections.

• Some technical terms such as "pseudo-labels" and "pretext tasks" may not be familiar to all readers. Providing brief definitions or explanations for these terms will make the paper accessible to a broader audience.

• The explanations for Figure 1 and Figure 2 currently in the introduction can be relocated to Section 3 (Methodology) for better alignment with the context. This will also enhance the logical flow of the manuscript.

• Ensure that all figures are explicitly referred to within the text for clarity and better reader navigation.

• In S2 Appendix, the manuscript briefly describes hyperparameters used for training. Explain how these values were chosen (e.g., prior literature, or trial and error).

• Integrate the supporting information (e.g., S1 Appendix) into the main manuscript where appropriate, particularly in the Methodology section. This will enrich the overall understanding and readability of the paper.

• Details from S1 Appendix describing the network structure should be incorporated into Section 3 (Methodology) for better contextual relevance.

• Discuss how PiCCL performs as the number of views (P) scales to higher values or as the size of the dataset increases. Address any potential bottlenecks or limitations.

• Include the testing details currently in S3 Appendix into Section 4 (Results). This will make the results section more comprehensive and self-contained.

• The manuscript mentions using ResNet-18 as the encoder backbone without explaining why it was chosen. Justify this choice, particularly in the context of its advantages or suitability for the datasets used.

• Currently, only accuracy and training time are reported. Including additional performance metrics will provide a more holistic evaluation of the proposed method.

• The future work section could be elaborated further. Consider including specific directions or potential applications to provide a clearer roadmap for follow-up research.

Reviewer #2: This research paper presents “ PiCCL (Primary Component Contrastive Learning), a self-supervised contrastive learning framework that utilizes a multiplex Siamese network structure consisting of many identical branches rather than 2 to maximize learning efficiency. You benchmarked PiCCL against various state-of-the-art self-supervised algorithms on multiple datasets including CIFAR-10, CIFAR-100, and STL-10. PiCCL achieved top performance in most of our tests, but where PiCCL excels is in the small batch learning scenarios. When using a batch size of 8, PiCCL outperforms the competition by more than 3 percentage points”

Good work keeps up

besides that, I have few minor comments which could further improve the quality of the manuscript

1. Provide quantitative remarks of the impact of the proposed method in the abstract.

2. need to rewrite clearly the contribution, motivation, challenges, your paper design.

3. it is better to summarize the literature review in table.

4. it is better to include a flow chart / pseudocode for your work.

5. The superiority performance of the proposed method could be achieved at what cost?

6.A detailed analysis of the limitations and potential failure scenarios of the proposed model is missing

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Aug 25;20(8):e0329273. doi: 10.1371/journal.pone.0329273.r002

Author response to Decision Letter 1

21 Jan 2025

Respond to reviewer 1:

Thank you so much for reviewing our work. The writer of this manuscript have very limited experience on writing academic papers and your comments are truely invaluable. Here's the changes we made in correspondance to each of your comments:

1. Regarding the flow of the manuscript. I have tried to make adjustments to improve the overall flow of the text.

2. Regarding providing explanations when technical term shows up. I have provided brief explanations of ``pseudo-labels", ``pretext task", as well as some other technical terms at their first appearance in this manuscript.

3. Regarding the placement of the plots. I have moved the flow diagram (originally Fig-1) down to the method section. However, I did not change the placement of Fig-2 (now Fig-1) because it corresponds to a paragraph providing the intuition and motivation behind this paper. If you think moving that paragraph to a different section will improve the overall structure, please point that out and I will comply.

4. Regarding figure reference. We have merged the 2 panels of Fig 2 into a single figure.

5. Regarding the choice of hyperparameters. The hyperparameters are either inherited from Barlow-Twins or picked through trial and error. However, when reevaluating the manuscript, we felt that Appendix S2 was too bulky and technical to be integrated into the main text. The information it contains can be easily accessed from the source code. And for readers not familiar with the torchvision package, the hyperparameters contain little to no meanings. Therefore we decided to remove Appendix S2.

6. Regarding integrating the appendixes into the main text. All appendixes are either integrated into the main text or removed. The revised manuscript contains no appendix.

7. Regarding placement of S1 Appendix. S1 appendix has been integrated into the section ``Results".

8. Regarding how PiCCL's performance scales with the number of views (P). We added a new paragraph to the ``discussion" section, ``performance of PiCCL" subsection where we answered this question. Here's the summary: in our tests, PiCCL(2) performs worse than PiCCL(4), and PiCCL(16) did not offer an increase in accuracy beyond PiCCL(8). Thus we reported the results for PiCCL(4) and PiCCL(8) because we felt these 2 are the most relevant.

9. Regarding placement of S3 Appendix. S3 appendix has been integrated into the section ``Results".

10. Regarding why we choose ResNet-18 as our encoder. We chose it because SimCLR and Barlow-Twins used ResNet-50 in their original publication. We do not have enough computing resources to use ResNet-50 so we used the simpler version. We explained our choice in the first paragraph of the section ``Results".

11. Regarding other performance metrics. We added the memory occupation in the ``Results" section.

12. Regarding future work. We expanded the last subsection of the section ``Discussion" and explained both our short and long term objectives.

Respond to reviewer 2:

Thank you so much for reviewing our work. The first authors are relatively new to the field, and your encouragement means a lot to us. Here's the changes we made in correspondance to each of your comments:

1. Regarding qualitative remarks in the abstract. I have added some accuracy scores to the abstract.

2. Regarding the contribution, motivation, challenges, and design. I have made minor additions across the text trying to explain every choice we made.

3. Regarding using a table to summarize the literature reviews. After consideration, we are worried that the literature mentioned in the section ``Related Works" would not present well in a table since their major differentiating factors are either in the network design or in the loss function, both are difficult to showcase in a table.

4. Regarding adding a flow chart or pseudo code. The flow diagram has been moved to the section ``Method" and has new text accompanying it, aiming to improve the clarity of the algorithm.

5. Regarding the computational performance. A few sentences are added to the ``Efficiency of PiCCL" subsection of the ``Discussion" section to discuss the computational cost. We also added memory usage as a new metric.

6. Regarding the limitations. A newly added paragraph in the ``Performance of PiCCL" subsection of ``Discussion" describes how the number of network branches affects performance. It points out that P too small will result in sub-optimal performance and P too large will result in too much computational complexity.

PLoS One. doi: 10.1371/journal.pone.0329273.r003

Decision Letter 1

Hung Bui

16 Mar 2025

PONE-D-24-42971R1PiCCL: a lightweight multiview contrastive learning framework for image classificationPLOS ONE

Dear Dr. Wang,

Please submit your revised manuscript by Apr 30 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Hung Thanh Bui, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments:

The revision is better.

In my previous comment, they said that “this is an innovative methodology that, to the best of our knowledge, has not yet been published in any peer-reviewed journal. Our research focuses on comparing transformer-based language models trained in Portuguese and specialized for a specific problem domain”. In this language and domain, maybe there is not any research, but in English there are many research focusing on using transformer models, it’s better they should apply their method and compare with another research on another dataset.

What is their improvement on transformer models, they should discuss in detail.

They should show some cases where their model got the best and worst result and analyze in detail.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: (No Response)

Reviewer #4: (No Response)

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #3: Yes

Reviewer #4: (No Response)

Reviewer #5: Yes

**********

6. Review Comments to the Author

Reviewer #3: (No Response)

Reviewer #4: (No Response)

Reviewer #5: Thank you to the authors for their effort in revising the paper. The improvements made in this version have strengthened the manuscript significantly. The paper can now be considered for acceptance, provided that the following minor comments are addressed.

Comment #1 - Abstract:

The abstract claims that PiCCL achieves top performance in most tests, but it lacks a thorough statistical significance analysis to validate these claims. Providing confidence intervals or statistical significance tests (e.g., p-values) would strengthen the credibility of the reported performance improvements.

Comment #2 - Introduction:

The introduction does not clearly state the specific gaps in existing literature that PiCCL aims to address. While it discusses the advantages of PiCCL, it does not explicitly compare the limitations of prior methods, making it difficult to see the novelty of the proposed approach.

Comment #3 - Related Works:

To strengthen the discussion on the capabilities of deep learning in image classification, I recommend citing relevant works that highlight the effectiveness of deep learning in real-world applications. Specifically, the authors may consider including the following references:

Applying Image Processing and Computer Vision for Damage Detection in Photovoltaic Panels (DOI: 10.58491/2735-4202.3263)

Automated Defect Detection in Solar Cell Images Using Deep Learning Algorithms (DOI: 10.1109/ACCESS.2024.3525183)

A Review on Detection of Solar PV Panels Failures Using Image Processing Techniques (DOI: 10.1109/MEPCON58725.2023.10462371)

These papers demonstrate the application of deep learning and contrastive learning techniques in image-based fault detection and classification, further reinforcing the significance of self-supervised learning in practical scenarios.

Comment #4 - Conclusion:

The conclusion overstates PiCCL’s potential without acknowledging its limitations. While the paper suggests that PiCCL can be used in online learning, no real-world applications or deployment experiments are conducted to support this claim.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Reviewer #4: No

Reviewer #5: No

**********

Attachment

Submitted filename: Reviewer Attachment.docx

pone.0329273.s001.docx^{(14.7KB, docx)}

PLoS One. 2025 Aug 25;20(8):e0329273. doi: 10.1371/journal.pone.0329273.r004

Author response to Decision Letter 2

7 May 2025

Response to Reviewer 3

Thank you for taking your time reviewing our work. Here's our response to each of your comments.

1. Thank you for your valuable comment. In response, we have picked the 2 tests which we think is the most demonstrative (batch size=8 and 256 on STL-10), and retested it 4 more times (5 times in total) and completed a student T test. We also provided confidence interval on the difference between the accuracy between methods. We added a new figure to display the results (Fig.4), and also added a few paragraphs in section 4.1 explaining it.

2. We added a few sentences throughout the manuscript to provide some background for why we created PiCCL. For example: beginning of introduction paragraph 3, last sentence of last paragraph of section 2.1.

3. Deep learning is indeed very useful in real world applications, this is also one of our future objectives. I have cited the provided paper in section 5.4.

4. Thank you for pointing this out, we have reduced the claims in the conclusion section.

Response to Reviewer 5

1. We added a sentence in the beginning of introduction paragraph 3 to express our research objective. Last sentence of last paragraph of section 2.1 also provides some detail.

2. Thank you for pointing this out, we have added a new paragraph in section 2.1 dedicated to 2 newer algorithms which we think provides some more background for our method.

3. We have accepted your proposition and added a table in the related works section.

4. The encoder network, ResNet-18, is a popular network for image classification and thus we did not provide network structure details for it. The modifications we made to the encoder network, as well as the detailed structure of the projection head, are expressed in the first paragraph of results section.

5. We have added a few sentences to fill in some explanation we previously missed. The networks is explained in the first paragraph of section 4, the loss function is explained in section 3.2, and the image augmentation is explained in section 3.1.

6. A table containing the augmentation methods and its parameters is added to section 3.1.

7. The datasets used in this study (STL-10, CIFAR-10, and CIFAR-100) are all well known datasets. Section 4.1, 4.2 provides some info on these datasets including number of samples, type of samples, subset segmentation. We also added a few sentences explaining how each subset are used in our study.

8. Sorry for the confusion, but most unsupervised contrastive learning methods (including PiCCL) wouldn't work without data augmentation because the generation of positive sample pairs rely on data augmentation. For supervised learning, a positive pair can be obtained by selecting 2 samples with the same label. Unsupervised contrastive learning methods obtain positive sample pairs by randomly augmenting a sample 2 times. The network doesn't know what the sample contains, but the 2 augmented views from the same sample should contain the same subject. Without augmentation, the 2 views would be identical.

9. Thank you for pointing this out, we have added more captions to make the figures and tables more informative.

10. We have added some analysis on time complexity in section 5.2., Table 1 also lists the time complexity of related methods.

11. Thank you pointing out the lack of statistical analysis and providing the paper. We have decided to employ a student T test for the 2 most representative tests. We also provided the confidence interval for the difference between accuracies. However we think the provided paper strengthens the claim that machine learning is very successful in real world applications, and decided to cite it in section 5.4.

12. A table is added in the related works section to compare the relevant existing works.

13. we think the provided papers strengthens the claim that machine learning is very successful in real world applications, and have cited it in section 5.4.

PLoS One. doi: 10.1371/journal.pone.0329273.r005

Decision Letter 2

Hung Bui

14 May 2025

PONE-D-24-42971R2PiCCL: a lightweight multiview contrastive learning framework for image classificationPLOS ONE

Dear Dr. Wang,

Please submit your revised manuscript by Jun 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Hung Thanh Bui, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments:

I overlooked some points in my initial review, which is why I have more comments this time. There are some points the authors should take care of as follows:

- The authors should cite all related works in the Table 1.

- In this research, the authors chose multiview contrastive learning algorithms, they should explain in detail why they did that.

- Also they used Siamese networks, they should give a reason to do that.

- They said that “image augmentation methods is crucial for the quality of representation learning . A good augmentation should retain as much relevant information as possible while altering the rest”, so they should explain in detail why they chose the image augmentation process used by Barlow-Twins.

- The main contribution is a loss function designed for PiCCL, they should explain it in detail.

- They chose L2 normalized embedding vector in their loss function, why did they do that.

- In comparison, they only compared with SimCLR ([12]:2021), Barlow-Twins ([14]:2021) and SimSiam ([13]:2020)), how did they get the result of other works?

- What reason did their proposed model PiCCL (8) get the best result, could they explain in detail.

- Take a look the result in Table 6, PiCCL (8) is not good in both Time per epoch and Memory, what is a reason?

- They should do more experiments, analyze the result in detail and compare with advanced methods in recent year.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #3: All comments have been addressed

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #3: Yes

Reviewer #4: (No Response)

**********

6. Review Comments to the Author

Reviewer #3: After reviewing the amendments made by the authors that dealt with most of the reviewers' comments, the research appears better than the previous one. Accordingly, there is no objection to accepting the research, for the possibility of publishing it in the journal without additional modifications.

Reviewer #4: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Reviewer #4: No

**********

PLoS One. 2025 Aug 25;20(8):e0329273. doi: 10.1371/journal.pone.0329273.r006

Author response to Decision Letter 3

27 Jun 2025

No reviewer commented for this revision, therefore no response can be provided.

Attachment

Submitted filename: response_to_reviewers.docx

pone.0329273.s002.docx^{(19.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0329273.r007

Decision Letter 3

Hung Bui

15 Jul 2025

PiCCL: a lightweight multiview contrastive learning framework for image classification

PONE-D-24-42971R3

Dear Dr. Wang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Hung Thanh Bui, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

I accept the revision.

Please check all Tables, Figures, Formulas, language and format of the paper.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

6. Review Comments to the Author

Reviewer #3: (No Response)

Reviewer #4: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Reviewer #4: No

**********

PLoS One. doi: 10.1371/journal.pone.0329273.r008

Acceptance letter

Hung Bui

PONE-D-24-42971R3

PLOS ONE

Dear Dr. Wang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Hung Thanh Bui

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Reviewer Attachment.docx

pone.0329273.s001.docx^{(14.7KB, docx)}

Attachment

Submitted filename: response_to_reviewers.docx

pone.0329273.s002.docx^{(19.7KB, docx)}

Data Availability Statement

The source code will be made publicly available after the acceptance of this manuscript. The code will be hosted in a public GitHub repository: https://github.com/YimingKuang/PiCCL.

[pone.0329273.ref001] 1.He K, Chen X, Xie S, Li Y, Dollar P, Girshick R. Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022). IEEE Conference on Computer Vision and Pattern Recognition. IEEE; CVF; IEEE Comp Soc; 2022. p. 15979–88.

[pone.0329273.ref002] 2.Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T. An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint 2020. doi: abs/2010.11929 [Google Scholar]

[pone.0329273.ref003] 3.Caron M, Touvron H, Misra I, Jegou H, Mairal J, Bojanowski P, et al. Emerging properties in self-supervised vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021. 10.1109/iccv48922.2021.00951 [DOI] [Google Scholar]

[pone.0329273.ref004] 4.Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, et al. EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans Image Process. 2021;30:2340–9. doi: 10.1109/TIP.2021.3051462 [DOI] [PubMed] [Google Scholar]

[pone.0329273.ref005] 5.Wu J, Wang X, Feng F, He X, Chen L, Lian J, et al. Self-supervised graph learning for recommendation. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021. p. 726–35. 10.1145/3404835.3462862 [DOI] [Google Scholar]

[pone.0329273.ref006] 6.Alwassel H, Mahajan D, Korbar B, Torresani L, Ghanem B, Tran D. Self-supervised learning by cross-modal audio-video clustering. In: Advances in Neural Information Processing Systems. 2020.

[pone.0329273.ref007] 7.Balestriero R, Ibrahim M, Sobal V, Morcos AS, Shekhar S, Goldstein T. A cookbook of self-supervised learning. arXiv preprint 2023. doi: abs/2304.12210 [Google Scholar]

[pone.0329273.ref008] 8.Gui J, Chen T, Zhang J, Cao Q, Sun Z, Luo H, et al. A survey on self-supervised learning: algorithms, applications, and future trends. IEEE Trans Pattern Anal Mach Intell. 2024;46(12):9052–71. doi: 10.1109/TPAMI.2024.3415112 [DOI] [PubMed] [Google Scholar]

[pone.0329273.ref009] 9.Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics; 2019. https://api.semanticscholar.org/CorpusID:52967399 [Google Scholar]

[pone.0329273.ref010] 10.Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Neural Information Processing Systems; 2017. https://api.semanticscholar.org/CorpusID:13756489 [Google Scholar]

[pone.0329273.ref011] 11.Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L. Graph contrastive learning with adaptive augmentation. In: Proceedings of the Web Conference 2021. 2021. p. 2069–80. 10.1145/3442381.3449802 [DOI] [Google Scholar]

[pone.0329273.ref012] 12.Chen X, He K. Exploring simple siamese representation learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. p. 15745–53.

[pone.0329273.ref013] 13.Chen T, Kornblith S, Norouzi M, Hinton GE. A simple framework for contrastive learning of visual representations. arXiv preprint 2020. https://arxiv.org/abs/2002.05709 [Google Scholar]

[pone.0329273.ref014] 14.Zbontar J, Jing L, Misra I, LeCun Y, Deny S. Barlow-twins: self-supervised learning via redundancy reduction. arXiv preprint 2021. doi: abs/2103.03230 [Google Scholar]

[pone.0329273.ref015] 15.Bardes A, Ponce J, LeCun Y. VICReg: variance-invariance-covariance regularization for self-supervised learning. arXiv preprint 2021. https://arxiv.org/abs/2105.04906 [Google Scholar]

[pone.0329273.ref016] 16.Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint 2020. doi: abs/2006.09882 [Google Scholar]

[pone.0329273.ref017] 17.Grill JB, Strub F, Altch’e F, Tallec C, Richemond PH, Buchatskaya E. Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint 2020. https://arxiv.org/abs/2006.07733 [Google Scholar]

[pone.0329273.ref018] 18.Fang Y, Chen Z, Tang W, Wang YG. SemanticCrop: boosting contrastive learning via semantic-cropped views. In: Liu Q, Wang H, Ma Z, Zheng W, Zha H, Chen X, editors. Singapore: Springer; 2024. p. 335–46. [Google Scholar]

[pone.0329273.ref019] 19.Kim J-Y, Kwon S, Go H, Lee Y, Choi S, Kim H-G. ScoreCL: augmentation-adaptive contrastive learning via score-matching function. Mach Learn. 2025;114(1). doi: 10.1007/s10994-024-06707-8 [DOI] [Google Scholar]

[pone.0329273.ref020] 20.Tian Y, Krishnan D, Isola P. Contrastive multiview coding. In: European Conference on Computer Vision; 2019. https://api.semanticscholar.org/CorpusID:189762205 [Google Scholar]

[pone.0329273.ref021] 21.Xu H, Xiong H, Qi G-J. K-Shot contrastive learning of visual features with multiple instance augmentations. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):8694–700. doi: 10.1109/TPAMI.2021.3082567 [DOI] [PubMed] [Google Scholar]

[pone.0329273.ref022] 22.Xiao T, Wang X, Efros AA, Darrell T. What should not be contrastive in contrastive learning. arXiv preprint 2020. doi: abs/2008.05659 [Google Scholar]

[pone.0329273.ref023] 23.Dangovski R, Jing L, Loh C, Han SJ, Srivastava A, Cheung B. Equivariant contrastive learning. arXiv preprint 2021. doi: abs/2111.00899 [Google Scholar]

[pone.0329273.ref024] 24.Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F. A survey on contrastive self-supervised learning. arXiv preprint 2020. doi: abs/2011.00362 [Google Scholar]

[pone.0329273.ref025] 25.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 770–8. 10.1109/cvpr.2016.90 [DOI] [Google Scholar]

[pone.0329273.ref026] 26.Abdelsattar M, AbdelMoety A, Emad-Eldeen A. Applying image processing and computer vision for damage detection in photovoltaic panels. Mansoura Engineering Journal. 2025;50(2):2.doi: 10.58491/2735-4202.3263 [DOI] [Google Scholar]

[pone.0329273.ref027] 27.Abdelsattar M, Abdelmoety A, Ismeil MA, Emad-Eldeen A. Automated defect detection in solar cell images using deep learning algorithms. IEEE Access. 2025;13:4136–57. doi: 10.1109/access.2024.3525183 [DOI] [Google Scholar]

[pone.0329273.ref028] 28.Abdelsattar M, AbdelMoety A, Emad-Eldeen A. A review on detection of solar PV panels failures using image processing techniques. In: 2023 24th International Middle East Power System Conference (MEPCON); 2023. p. 1–6.

[pone.0329273.ref029] 29.Biswas S, Mostafiz R, Uddin MS, Paul BK. XAI-FusionNet: diabetic foot ulcer detection based on multi-scale feature fusion with explainable artificial intelligence. Heliyon. 2024;10(10):e31228. doi: 10.1016/j.heliyon.2024.e31228 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0329273.ref030] 30.Biswas S, Mostafiz R, Paul BK, Mohi Uddin KM, Rahman MM, Shariful FNU. DFU_MultiNet: a deep neural network approach for detecting diabetic foot ulcers through multi-scale feature fusion using the DFU dataset. Intelligence-Based Medicine. 2023;8:100128. doi: 10.1016/j.ibmed.2023.100128 [DOI] [Google Scholar]

[pone.0329273.ref031] 31.Biswas S, Mostafiz R, Paul BK, Uddin KMM, Hadi MdA, Khanom F. DFU_XAI: a deep learning-based approach to diabetic foot ulcer detection using feature explainability. Biomedical Materials & Devices. 2024;2(2):1225–45. doi: 10.1007/s44174-024-00165-5 [DOI] [Google Scholar]

[pone.0329273.ref032] 32.Chakravarty N, Dua M. Feature extraction using GTCC spectrogram and ResNet50 based classification for audio spoof detection. Int J Speech Technol. 2024;27(1):225–37. doi: 10.1007/s10772-024-10093-w [DOI] [Google Scholar]

PERMALINK

PiCCL: A lightweight multiview contrastive learning framework for image classification

Yiming Kuang

Jianwu Guan

Hongyun Liu

Fei Chen

Zihua Wang

Weidong Wang

Roles

Abstract

1 Introduction

Fig 1. Distribution of the average embeddings.

2 Related works

2.1 Contrastive learning

2.2 Multi-view contrastive learning

Table 1. Summary of related works.

2.3 Small batch learning

3 Method

Fig 2. Network Structure of Primary Component Contrastive learning (PiCCL).

3.1 Image augmentation

Table 2. Image augmentation methods.

3.2 Loss function

Fig 3. Visualization of PiCLoss.

4 Results

4.1 STL-10

Table 3. Results on STL-10 with batch size = 256.

Table 4. Results on STL-10 at 500 epoch.

Fig 4. Results on STL-10 at 500 epoch with batch size = 2 and 256.

4.2 CIFAR-10 & CIFAR-100

Table 5. Results on CIFAR-10 & CIFAR-100.

5 Discussion

5.1 Performance of PiCCL

5.2 Efficiency of PiCCL

Table 6. Speed and Memory Metrics.

5.3 Weakness of our study

5.4 PiCCL’s use case and future work

6 Conclusion

Data Availability

Funding Statement

References

Decision Letter 0

Tianlin Zhang

Roles

Author response to Decision Letter 1

Decision Letter 1

Hung Bui

Roles

Author response to Decision Letter 2

Decision Letter 2

Hung Bui

Roles

Author response to Decision Letter 3

Decision Letter 3

Hung Bui

Roles

Acceptance letter

Hung Bui

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases