Abstract
Objectives
Patient-based real-time quality control (PBRTQC) is essential for clinical laboratory management but struggles with detecting small systematic errors. This study presents the patient-based pre-classified real-time quality control with neural network (PCRTQC-NN) model, utilizing neural networks to improve error detection by extracting analytical features from testing instruments.
Methods
Using PCRTQC's clustering analysis, we pre-classified and processed Na, CHOL, ALT, and CR data from 611,031 patients. A neural network autoencoder, trained using TensorFlow with mean squared error (MSE) as the loss function, extracted the testing instrument's analytical features under error-free conditions. Systematic errors were identified by comparing reconstruction residuals between test and reconstructed data. The average number of patient samples until error detection (ANPed) evaluated the model performance.
Results
The PCRTQC-NN's error detection surpasses traditional algorithms Compared to PCRTQC, it reduced the ANPed for ALT by 37 % (constant error, CE) and 22 % (proportional error, PE) at 1 total error allowable (TEa), with comparable results for other analytes. For 0.5 TEa errors, the ANPed for CHOL decreased by 23 % (CE) and 22 % (PE), for ALT by 14 % (CE) and 6 % (PE), and for CR by 4 % (CE) and 9 % (PE), enhancing error detection capabilities for analytes with high inter-individual variability and sensitivity to smaller errors.
Conclusions
PCRTQC-NN significantly enhances systematic error detection compared to PCRTQC, leveraging autoencoders to extract analytical features as discrete signals, thus improving SNR for high-variability analytes. It promises improved laboratory efficiency and inter-laboratory standardization via robust feature models. Future multi-center studies will validate broad applicability across diverse settings.
Keywords: Laboratory management, Neural network, PBRTQC, Quality control, Simulation
Highlights
-
•
We extracted analytical features from the instrument via a neural network autoencoder, without additional patient clinical information.
-
•
The autoencoder can accurately predict and identify manually introduced systematic errors based on the analytical features in the standard state.
-
•
Error detection performance has improved, with increased sensitivity to small errors.
1. Introduction
Quality control in clinical laboratories is critical for ensuring the accuracy and reliability of diagnostic tests, which directly influence patient outcomes and laboratory efficiency. Traditional internal quality control (IQC) methods rely on batch testing, which is only effective for detecting analytical errors and requires the use of additional control materials. This can introduce matrix effects, compromise the effectiveness of quality control, and increase costs [[1], [2], [3], [4]]. In contrast, patient-based real-time quality control (PBRTQC) leverages real-time analysis of patient test results to continuously monitor the stability of the testing instrument. It enhances sensitivity to both analytical and pre-analytical errors without requiring supplementary control materials, making it an ideal supplement to traditional IQC methods for laboratories focused on reducing costs and improving efficiency [5,6].
However, PBRTQC faces several challenges, such as reduced ability to detect small-scale systematic errors in analytes with high bias or variability, where variability refers to biological variation (BV)—the natural fluctuations in analyte concentrations driven by physiological factors. BV encompasses intra-individual variation (fluctuations within the same patient over time, e.g., due to circadian rhythms, quantified as CV ∼5–15 % for many analytes) and inter-individual variation (differences across patients, e.g., influenced by age, sex, or disease states, with CV often >20 %), as estimated in global databases like the EFLM Biological Variation Database. High BV reduces the signal-to-noise ratio (SNR), complicating error detection as patient data dispersion masks subtle shifts [7]. And issues related to autocorrelation, where long-term systematic errors can accumulate as new data continuously enters the system [[8], [9], [10], [11], [12]]. Additionally, determining appropriate inclusion and exclusion criteria for patient data, as well as optimizing parameter settings, remain complex tasks that limit the broader adoption of PBRTQC [13].
As a result, an increasing number of researchers are exploring the integration of programming algorithms and artificial intelligence to enhance PBRTQC applications. Duan et al. developed methods to optimize the PBRTQC model by refining traditional algorithms [14] and proposed the use of regression adjustment of patient clinical data for more stable model performance [15]. Emerging studies employ machine learning and neural networks to preprocess complex patient data, improving PBRTQC performance [[16], [17], [18], [19]]. However, many approaches demand vast datasets and may truncate outliers, potentially masking the instrument's analytical features—these encompass precision information (e.g., repeatability in repeated measurements) and inter-individual sample variability (extending beyond BV to include pathological and pre-analytical factors). Inspired by zero-shot voice conversion models like AutoVC, where autoencoders disentangle linguistic content from speaker-specific timbre (analogous to 'voice identity'), we conceptualize analytical features as the instrument's intrinsic performance representations. These features capture the instrument's response patterns to varying sample concentrations—similar to human vocalization of different phonemes—and its inherent stability (precision and accuracy), akin to consistent voice timbre across repetitions. Neural networks, with their nonlinear modeling and self-learning capabilities, excel at extracting such features from large datasets to monitor systematic errors [[20], [21], [22]].
In this study, we extend our prior work by viewing analyte test result sequences as discrete signals that encode the laboratory instrument's unique characteristics alongside population and analyte variations. This reduces data dispersion and boosts SNR for precise error detection. Building on our patient-based pre-classified real-time quality control (PCRTQC) method [23], we introduce the PCRTQC with neural network (PCRTQC-NN) model. By leveraging neural networks on historical data, PCRTQC-NN extracts personalized analytical features per analyte, enabling accurate result prediction and superior error detection—evidenced by reduced average number of patient samples until error detection (ANPed) without data truncation. This approach enhances laboratory efficiency, cuts costs, and supports broader operational improvements.
2. Materials and methods
2.1. Data collection and preprocessing
To facilitate comparative analysis, the dataset employed in the present study is identical to that utilized in our previous research [23]. The dataset was sourced from patient test results obtained using a Roche Cobas 8000 analyzer (Roche Diagnostics) at The First Affiliated Hospital of China Medical University in 2021. Initially, 650,000 test samples were processed; subsequently, samples lacking the target analytes (Na, CHOL, ALT, and CR) were excluded, resulting in a final cohort of 611,031 samples containing at least one target analyte. All data were anonymized by removing patient identifiers and clinical information, thereby ensuring compliance with ethical standards.
The dataset was entirely and randomly partitioned into three equal and balanced subsets, each designated for distinct components of the study framework as illustrated in Fig. 1. The first subset, comprising 203,677 samples, was utilized for training the Support Vector Machine (SVM) classifier. The second subset, also consisting of 203,677 samples, was allocated for training the neural network. This subset was further subdivided, with 80 % of the samples designated as the training set for the neural network autoencoder and the remaining 20 % serving as the validation set. The third subset, containing the final 203,677 samples, was employed as the performance testing set for the PCRTQC-NN method. This partitioning scheme facilitated the training and evaluation of the classifier, as well as the comprehensive assessment of the overall performance of the PCRTQC-NN approach during the testing phase.
Fig. 1.
Construction Process of PCRTQC-NN. The construction process of PCRTQC-NN is divided into three sections: the left section is classifier training process, the middle section is autoencoder training process, and the right section is performance testing process.
2.2. Classifier training
The classifier was trained using the classifier training set, with the selection of target analytes and companion analytes, data preprocessing, classifier training, data statistics, and classifying steps all consistent with our previous study [23]. The trained classifier was used to pre-classify the data from the neural network training set. The model utilizes only three analyte results for 3D clustering, avoiding additional features like age or sex to mitigate issues with data incompleteness or interference, while the autoencoder extracts non-interpretable latent features for error detection.
2.3. Neural network autoencoder training
2.3.1. Data feature extraction with autoencoder
The patient test result sequences contain various information that can influence the results, such as individual patient variability and system precision. The former is specific and changes with sample data, while the latter is relatively stable, showing some variation only when batch intervals increase. To effectively extract these features, we designed a data feature extraction model based on an autoencoder architecture [[24], [25], [26], [27]].
An autoencoder consists of two encoders and one decoder. The encoder compresses input data into a lower-dimensional representation, removing redundant information and extracting important features. The decoder reconstructs the original data from this lower-dimensional representation. Through the compression and decompression process of the encoder and decoder, data feature extraction is achieved [24,25]. In this study, we designed two encoders, A and B, to extract system precision information and individual variability, respectively (shown in Fig. 2). By setting different bottleneck widths (La and Lb) [20,25], encoders A and B can filter out redundant information from the original data sequence and retain the essential data features.
Fig. 2.
Autoencoder structure and training flow. The autoencoder consists of Encoder A, Encoder B, and Decoder C. Sa and Sb are the input data for the encoders, where FA and FB are the data features extracted after compression to a lower dimension by Encoders A and B, respectively. These features are then reconstructed into data reconstructed(Rc) by Decoder C. The bottleneck widths of Encoder A and B are represented by La and Lb, respectively. The loss function used during training is Mean Squared Error (MSE).
Specifically, encoder A receives the normalized patient test result sequence Sa as input. By adjusting the bottleneck width La, it limits the dimension of the feature sequence output by encoder A, thereby specifically extracting the feature sequence FA that contains system precision information from the sequence. Similarly, encoder B receives the normalized patient test result sequence Sb as input. By adjusting the bottleneck width Lb, it extracts the feature sequence FB that contains individual variability information. Decoder C receives FA and FB as input feature sequences and reconstructs them into data sequences reconstructed (Rc) that are of the same length as the original data sequences Sa and Sb. Through this process, illustrated in Fig. 2, we achieve the transformation from the original data sequence to the reconstructed data sequence and the extraction of specific features [21,25].
Therefore, to selectively suppress undesired data features and accurately extract relevant ones, we continuously train and validate the autoencoder to determine the optimal bottleneck width. This approach enables the autoencoder to effectively isolate and extract the necessary data features.
2.3.2. Autoencoder training
The training process of the autoencoder involves continuously adjusting the parameters of the encoders and decoders to minimize the difference between the input data sequence and the reconstructed data sequence, known as the reconstruction error, allowing the autoencoder to learn the features of the input data sequence.
The classifier is used to group all data in the autoencoder training dataset. Each group of data is processed sequentially with mean filtering, Box-Cox transformation, and data normalization. Mean filtering, adapted from classic denoising techniques in image processing and extended to one-dimensional time series data like patient results in our study, involves applying a sliding window to calculate the average of data points within the window and replacing the central point with this average [28]. This process effectively reduces random noise or fluctuations, yielding a smoother dataset while preserving essential trends. The window size (denoted as n) can be adjusted based on data characteristics: a larger n enhances noise suppression for heavily noisy data but may blur finer details, whereas a smaller n retains more details at the cost of less thorough denoising. This flexibility allows optimization for our quality control objectives, balancing noise reduction with data fidelity. The normalized data from each group is then merged and reordered according to the original sequence—a process we term normalized data merging—and used for training the autoencoder.
During the training process, we focused on constructing an autoencoder system comprising two encoders (A and B) and one decoder (C). Encoder A focuses on extracting system precision information, while encoder B focuses on extracting sample variability information. The feature vectors FA and FB produced by these two encoders are input into decoder C, which reconstructs a data sequence Rc that closely approximates the input data sequence.
To achieve this, the input sequences Sa and Sb for the autoencoder system during training must have the same system precision information (Sa and Sb are close in the patient sample sequence) and different sample variability information (Sa is not equal to Sb). We compared the similarity between the output sequence Rc and Sb by continuously adjusting parameters such as the length of the input data sequence and the bottleneck widths of encoders A and B (La and Lb) until the model converged [20]. The model is considered well-trained when Rc closely approximates Sb in both the training and validation sets, indicated by the normal convergence of the loss values to similar levels [24,25].
Potential issues during training include: a) if La is too wide, FA will carry individual variability information from Sa, inconsistent with Sb, causing Rc to deviate from Sb and the model to fail to converge; b) if La is too narrow, FA will not provide complete system precision information. If FB is normal or too narrow, it will not compensate for this, leading to non-convergence. If FB is too wide and compensates for FA's missing information, training may appear normal, but performance will drop during validation; c) only when both La and FB widths are appropriate will the model perform well on both training and validation sets.
Mean Squared Error (MSE) was used as the loss function during training to evaluate the similarity between Rc and Sb, which is the reconstruction error of the model [24,29,30]. The training data is fed into the autoencoder to calculate the training data MSE (MSE1), and the hyperparameters of the autoencoder are used to calculate the validation dataset MSE (MSE2). When MSE1 and MSE2 converge normally and are close, the hyperparameters are considered optimal for the autoencoder. Adjusting hyperparameters, especially the bottleneck widths (La and Lb), is crucial during training, as they determine the model's ability to extract specific features [25].
2.3.3. Applying autoencoder for out-of-control detection
To determine the quality control status of a point P, we input the sequence X1 to encoders A and B. Decoder C then outputs the reconstruction R1 of X1. When no changes other than precision and individual variability are introduced into the system (Fig. 3a), R1 should be close to X1. However, if a systematic error occurs (Fig. 3b), the reconstruction accuracy will be significantly affected, as the model has not learned the features of systematic errors during training.
Fig. 3.
Autoencoder-based out-of-control detection. X1 is the input sequence to Encoders A and B for evaluation.FA and FB are the features extracted by the encoders, representing the instrument's precision information and inter-individual variability in patient samples, respectively. R1 is the reconstructed sequence generated by the decoder. The circular, triangular, and diamond shapes in Fig. 3 represent the instrument's precision information, inter-individual variability in patient samples, and systematic error information, respectively, as contained in the sequence being evaluated. Fig. 3A shows the scenario where no additional changes beyond precision and variability are introduced. Fig. 3B illustrates the scenario where systematic error is present, with the sequence containing systematic error, precision, and variability information.
2.4. PCRTQC-NN model testing
2.4.1. Artificial errors
Due to the scarcity of actual out-of-control records, artificial errors were used to simulate out-of-control situations in the PCRTQC-NN study. Artificial errors were added based on the total allowable error (TEa) for each analyte. The TEa for the four target analytes are: Na: ±4 %, CHOL: ±9 %, ALT: ±16 %, and Cr: ±12 %. Since routine patient test results are already affected by random errors from the testing instrument, no additional random errors were introduced into the data. Only constant errors (CE) and proportional errors (PE) were added, as (1), (2):
| (1) |
| (2) |
2.4.2. Data sampling
From the PCRTQC-NN test dataset, 1500 samples were randomly selected as unbiased test data. Another 1500 samples were randomly selected and artificial systematic errors of ±0.5TEa and ±1TEa (CE or PE) were added according to (1), (2). These datasets were combined, resulting in a test data sequence with 3000 samples.
2.4.3. Out-of-control detection using PCRTQC-NN
PCRTQC-NN consists of the classifier and the autoencoder. First, the classifier is used to group the test data sequence, and the grouped test data sequence undergoes normalized data merging. To determine the quality control status (out-of-control or not) of a point P, the sequence X1 (starting from P and consisting of a certain length of patient sample results) is input into both encoders A and B. Decoder C outputs the reconstruction sequence R1 of X1. The starting point P in X1 corresponds to the starting point Pr in R1, which is the reconstructed point of P. The reconstruction residual of P is calculated using formula (3):
| (3) |
The reconstruction residual is calculated for each point in the normalized merged test data sequence, resulting in a sequence of reconstruction residuals. The reconstruction residuals are plotted with the residual values on the y-axis and the sequence order on the x-axis. The acceptable range is defined as the 0.05 %–99.95 % range of the residuals from the data without added systematic errors (with a control Pfr of approximately 0.1 %). A point is considered out-of-control if its residual exceeds this range.
2.5. Performance evaluation
For each target analyte, artificial errors were added and the entire testing phase was repeated 100 times. The false alarm rate (FAR) and the average number of patient samples until error detection (ANPed) were measured for performance evaluation.
2.6. Developing and testing environment
All data statistics, analysis, and processing were performed using Python 3.9 on a desktop computer with an NVIDIA RTX 4060 graphics card.
3. Results
3.1. Illustration of the distribution of patient data processed by PCRTQC and PCRTQC-NN
The original data sequence, after being processed by both PCRTQC and PCRTQC-NN methods, exhibited distinct distribution features (Fig. 4). Without altering the order of the data, the PCRTQC pre-classified data were grouped according to their corresponding classifications, and the Z-scores were calculated for each group, with the results visualized through plotting (green plots). For the data processed by PCRTQC-NN, the plots were generated based on the reconstruction residuals, which were derived from the difference between the original results and the model predictions (cyan plots). The red dashed lines represent the control limits predicted by the model. Although a shift appears in the data, a straightforward method to determine its significance is to observe whether it exceeds the control limits; the shift shown in the figure can be interpreted as a small, non-significant, and temporary deviation. The original data consists of test results from reported patient samples. In accordance with quality management regulations, traditional quality control and management procedures were strictly followed during data collection. However, due to the limitations of traditional quality control, some errors might not have been detected in a timely manner—this is precisely a key reason for the urgent need to implement patient-based quality control. At the same time, these data most accurately reflect the baseline state of a laboratory when introducing patient-based quality control methods. We believe that using this data for modeling and testing closely mirrors real-world application scenarios. The data processed by PCRTQC-NN exhibited markedly more concentrated and flattened fluctuations, which are more conducive to monitoring system errors effectively.
Fig. 4.
Illustration of the distribution of patient data processed by PCRTQC and PCRTQC-NN. The X-axis represents the sequence order of patient data samples, labeled with specific time points (e.g., 0822-134 to 0826-449). The Y-axis of the top panel shows the test values of the original result sequence (blue bars, scaled from 0 to 400), reflecting the raw patient data. The middle panel displays the Z-scores from PCRTQC processing (green line, ranging from −2 to 2), indicating the standardized deviation of the data within control limits marked by red dashed lines. The bottom panel presents the reconstruction residuals from PCRTQC-NN (cyan line, ranging from −5 to 4), highlighting the difference between original and reconstructed data, with the same red dashed lines denoting the control limits. This figure illustrates how PCRTQC-NN provides a more refined error detection capability through concentrated residual fluctuations compared to the broader Z-score variations of PCRTQC.
3.2. Illustration of PCRTQC-NN model testing
As shown in Fig. 5, the reconstruction residuals from 1500 patient test data points without introduced systematic errors were processed through the model for the 4 analytes studied. The 0.05 %–99.95 % range of these residuals was set as the in-control range, with the out-of-control thresholds established at 0.05 % and 99.95 %. Fig. 5 primarily explains the theoretical basis of the algorithm. It shows that when no systematic error is introduced (first half), the algorithm exhibits strong reconstruction capability, with small reconstruction residuals. However, after introducing a systematic error (second half), the algorithm's reconstruction capability is significantly affected, leading to a notable increase in reconstruction residuals. We leverage this difference to enhance error detection capabilities. After introducing a positive systematic error, the reconstruction residuals of patient test data mostly exceeded the 99.95 % out-of-control threshold, effectively detecting the error. Similarly, with a negative systematic error, the residuals primarily surpassed the 0.05 % threshold, also confirming the out-of-control status.
Fig. 5.
Reconstruction residual plot. The vertical axis represents residuals, and the horizontal axis represents the sequence order. The red curve shows the sequence of normalized merged data, while the blue curve represents the reconstructed sequence. The black curve illustrates the difference between the two, which is the reconstruction residual. The yellow band marks the 0.05 %–99.95 % percentile range of residuals from samples without introduced systematic errors, defining the in-control range with out-of-control thresholds at 0.05 % and 99.95 %. This figure illustrates the theoretical basis of the PCRTQC-NN algorithm, showing strong reconstruction capability with small residuals when no systematic errors (e.g., constant error [CE] or proportional error [PE]) are introduced (first half), and a significant increase in residuals when systematic errors. A,B:ALT; C,D:CHOL; E,F:CR; and G,H:Na.
3.3. PCRTQC-NN performance comparison
We conducted a comparative analysis of the error detection capabilities of PCRTQC-NN, PCRTQC [23] and other algorithms [15] across four target analytes (Table 1). When a systematic error of ±1TEa was introduced, there was no significant difference in FAR between the methods for detecting CE and PE. However, a clear observation is that the ANPeds of the two conventional algorithms, moving average(MA) and exponentially weighted moving average(EWMA), are consistently higher than those of PCRTQC and PCRTQC-NN, regardless of the error type or magnitude. Consequently, the focus of our analysis is on comparing the performance of the latter two algorithms. For ALT, PCRTQC-NN significantly outperformed PCRTQC, reducing the ANPed for CE from 167.6 to 106.5 (a 37 % reduction) and for PE from 138.8 to 108.76 (a 22 % reduction), indicating a substantial enhancement in detection performance. The other analytes exhibited no significant changes.
Table 1.
Performance comparison between MA, EWMA, PCRTQC and PCRTQC-NN based on TEa.
| Target analytes | Error type | FAR(error - TEa) |
ANPed(error - TEa) |
ANPed(error - TEa/2) |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PCRTQC | PCRTQC-NN | EWMA | MA | PCRTQC | PCRTQC-NN | EWMA | MA | PCRTQC | PCRTQC-NN | EWMA | MA | ||
| Na | CE | 0.11 % | 0.11 % | 0.051 % | 0.061 % | 5 | 6.06 | 7.1 | 8.37 | 6.55 | 15.12 | 83.39 | 112.7 |
| PE | 0.11 % | 0.11 % | 0.051 % | 0.057 % | 5 | 5.96 | 7.0 | 8.91 | 6.23 | 15.09 | 81.78 | 100.59 | |
| CHOL | CE | 0.11 % | 0.128 % | / | / | 71.8 | 83.54 | / | / | 271 | 207.75 | / | / |
| PE | 0.08 % | 0.128 % | / | / | 75.5 | 84.84 | / | / | 246.1 | 192.09 | / | / | |
| ALT | CE | 0.11 % | 0.130 % | 0.028 % | 0.061 % | 167.6 | 106.5 | 126 | 149.84 | 337.5 | 289.55 | 866.2 | 927.9 |
| PE | 0.13 % | 0.125 % | 0.03 % | 0.061 % | 138.8 | 108.76 | 907.6 | 963.74 | 334.5 | 315.39 | 999.9 | 999.5 | |
| CR | CE | 0.11 % | 0.116 % | 0.09 % | 0.08 % | 54.6 | 55.13 | 199.8 | 220.6 | 182.6 | 175.35 | 668.3 | 766.8 |
| PE | 0.08 % | 0.116 % | 0.09 % | 0.08 % | 46.5 | 54.41 | 354.9 | 388.3 | 187.2 | 170.41 | 782.3 | 832.1 | |
FAR, false alarm rate; ANPed, the average number of patient samples until error detection; Na, sodium; CHOL, cholesterol; ALT, alanine aminotransferase; CR, creatinine; CE, constant error; PE, proportional error; TEa for Na, CHOL, ALT and CR are 4 %, 9 %, 16 %, and 12 %, respectively.
When the systematic error was ±0.5TEa, PCRTQC-NN markedly improved detection performance for CHOL, with ANPed reductions from 271 to 207.75 (23 % decrease) for CE, and from 246.1 to 192.09 (22 % decrease) for PE. ALT also showed significant improvement, with ANPed reductions of 14 % for CE and 6 % for PE. Cr demonstrated modest improvements, with ANPed decreasing by 4 % for CE and 9 % for PE.
4. Discussion
Traditional PBRTQC methods, such as moving averages (MA) and exponentially weighted moving averages (EWMA), have been shown to have limited effectiveness in previous studies involving chemistry analytes. These methods often require the integration of additional clinical information or the use of advanced preprocessing techniques to improve error detection capabilities [15,18,19]. In contrast, our previous study utilized a patient pre-classify method that demonstrated performance comparable to or even superior in certain cases to other optimization strategies, without truncating patient data or requiring extra clinical information [23]. However, this approach requires normalization and standardization of data from multiple groups, which complicates the computation process. Therefore, we sought to explore more advanced data processing methods for optimization.
In this study, we introduced neural network technology as an alternative to traditional data processing techniques like wavelet transforms and mean filtering. By utilizing an autoencoder within the neural network, we were able to extract key instrument analytical features from patient data, facilitating the prediction and evaluation of test result accuracy. This approach represents the first application of signal theory in optimizing PBRTQC, resulting in the development of a new model—PCRTQC-NN, which demonstrated outstanding performance.
The success of PCRTQC-NN can be attributed to the NN's powerful nonlinear data processing capabilities, as well as its ability to autonomously learn and generalize. By fitting and reconstructing data patterns, the model enhanced its sensitivity to small errors while better capturing individual variability in analytes with high biological variation [[20], [21], [22],31]. In contrast, traditional filtering methods have limited effectiveness in handling nonlinear data, often misclassifying small errors as noise or inaccurately extracting systematic error information due to improper selection of wavelet basis functions or parameters [32,33]. By interpreting patient testing data as continuous signals emitted by the testing instrument, the neural network is able to extract crucial information such as individual variability and system accuracy. Under stable conditions, the model learns these features and detects deviations by calculating reconstruction residuals, thereby precisely identifying systematic errors.
We continued our investigation with four representative chemistry analytes, including Na (which has tightly clustered results and low variability) and ALT (with a broader range of results and high variability). PCRTQC-NN demonstrated error detection performance comparable to PCRTQC, and in some cases, it even outperformed PCRTQC. Notably, for challenging analytes like ALT, PCRTQC-NN reduced the ANPed for CE by 37 % and for PE by 22 % at 1TEa. Additionally, when processing errors at 0.5TEa, PCRTQC-NN showed significant improvements in detecting errors for analytes with high biological variability, such as ALT, CR, and CHOL, except Na [30].These results indicate that PCRTQC-NN effectively optimizes error detection for analytes with previously poor detection performance and enhances sensitivity to small errors. However, it is important to acknowledge that no algorithm is perfect yet. Even as a supplement to IQC, a combination of PBRTQC strategies should be used to achieve optimal monitoring outcomes.
The selection of training data was a crucial factor in the success of this study. We carefully curated a dataset that encompassed a wide range of variability, including seasonal changes, differences in specimen collection, changes in control material lots, and environmental factors. Additionally, the dataset included outlier results and samples from special population groups. This comprehensive approach ensured that the model could effectively capture and address potential systematic errors, providing a robust foundation for accurate error detection [29,34].
Sample variability is similar to inter-individual biological variation (BV) but broader in scope. Inter-individual BV primarily focuses on fluctuations in healthy populations, with normal reference ranges serving as a concrete manifestation of these variations [30,31]. In contrast, sample variability encompasses inter-individual BV while also including pathological test results, changes due to sample pre-processing, and other factors like seasonal or demographic influences. Completely overcoming these variations is difficult, as they ultimately manifest in the test results across multiple items. However, using multiple items for 3D clustering can group similar state test results together for analysis, effectively reducing this sample variability and enhancing SNR without truncation or additional data needs.
Beyond these considerations, PBRTQC methods, including our model, face additional limitations such as small runs and high percentages of abnormal patients. Small runs, common in low-volume laboratories, reduce statistical power by increasing variance and widening confidence intervals, limiting reliable error detection in sparse datasets [11,34]. Similarly, a high proportion of abnormal patients—prevalent in hospital settings—skews data distributions with outliers, amplifying biological variation and potentially masking subtle systematic errors, often requiring truncation limits that further reduce usable data [12,35]. PCRTQC-NN mitigates these through its pre-classification clustering, which groups similar samples to handle small or skewed datasets robustly, and neural network feature extraction, which separates analytical signals from patient abnormalities, enhancing generalizability without data truncation. However, since both the training and testing data were sourced from a single laboratory using a Roche instrument, the model's applicability to other laboratories or settings warrants further validation. While the PCRTQC-NN process is designed to be generalizable—leveraging neural network-based feature extraction that disentangles instrument-specific analytical features from patient variability—the patient data stream may exhibit site-specific biases, such as differences in population demographics, instrument calibration, or environmental factors, which could affect performance across diverse settings [36]. Studies on PBRTQC algorithms have highlighted similar challenges, noting that consistent patient data distributions are essential for model stability and comparability between laboratories, yet customization is often required to maintain high performance in varied contexts [37,38]. Recent advancements suggest that PBRTQC models can achieve cross-site generalizability through techniques like data normalization, transfer learning, or metrics such as Wasserstein distance to assess distributional similarity between sites [39,40].In future studies, multi-center validations will be essential to confirm the robustness of PCRTQC-NN and refine it for broader implementation.
It is worth mentioning that although the modeling process may seem complex, similar to other optimization algorithms, it can be completed using a single graphics card with standard computing power. Once the model is built, it can be deployed locally on a regular computer system (Fig. 4), and combined with the software we designed, it is very user-friendly. However, this computational complexity could pose barriers for resource-limited laboratories, alongside other limitations such as the single-site data source (which limits generalizability to diverse settings) and assumptions on data quality (where absolute error-freeness is impractical, though mitigated by IQC and shuffling).This lays a solid foundation for the widespread adoption of the model. In the future, PCRTQC-NN could extract performance features of standard instruments under normal control conditions to achieve unified monitoring across similar instruments, enabling cross-system and inter-laboratory comparisons, thereby providing a robust quality assurance method for inter-laboratory result consistency. Future work could address these limitations by developing streamlined versions, such as pre-trained models via transfer learning, cloud-based deployment, and automated hyperparameter optimization, while pursuing multi-center validations to enhance cross-site robustness.
5. Conclusions
PCRTQC-NN has pioneered a novel approach to patient-based real-time quality control, demonstrating error detection capabilities comparable to or even superior to traditional algorithms. Additionally, the use of neural network technology to extract features from testing instruments enables the establishment of standardized analytical feature models, offering more potential technical pathways for instrument comparison and inter-laboratory result concordance. Despite limitations such as reliance on single-site data and computational complexity, with further validation across diverse settings, this model holds promise for enhancing laboratory efficiency and patient safety worldwide. Future refinements, including multi-center studies, transfer learning for streamlined deployment, and integration of distributional metrics like Wasserstein distance, will be crucial to realizing its full potential in clinical practice.
CRediT authorship contribution statement
Bo Zhou: Writing – original draft, Visualization, Methodology, Investigation, Conceptualization. Xiaoying Li: Validation, Resources, Methodology, Conceptualization. Shitong Cheng: Investigation, Formal analysis, Conceptualization. Zhiwei Zhou: Writing – review & editing, Software, Methodology, Investigation, Formal analysis, Conceptualization. Hui Kang: Writing – review & editing, Supervision, Methodology, Conceptualization.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work the author(s) used [chatGPT 4o and Grok3] in order to [improve language and readability]. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
Funding
This work was supported by National Key Technologies Research and Development Program of China (Project Grant # 2022YFC3602300, Sub-project Grant # 2022YFC3602302).
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data availability
Data will be made available on request.
References
- 1.Mackay M., Hegedus G., Badrick T. Assay stability, the missing component of the error budget. Clin. Biochem. 2017;50(18):1136–1144. doi: 10.1016/j.clinbiochem.2017.07.004. [DOI] [PubMed] [Google Scholar]
- 2.Mackay M.A., Badrick T.C. Steady state errors and risk of a qc strategy. Clin. Biochem. 2019;64:37–43. doi: 10.1016/j.clinbiochem.2018.12.005. [DOI] [PubMed] [Google Scholar]
- 3.Liu J., et al. Moving standard deviation and moving sum of outliers as quality tools for monitoring analytical precision. Clin. Biochem. 2018;52:112–116. doi: 10.1016/j.clinbiochem.2017.10.009. [DOI] [PubMed] [Google Scholar]
- 4.Braga F., Panteghini M. Commutability of reference and control materials: an essential factor for assuring the quality of measurements in laboratory medicine. Clin. Chem. Lab. Med. 2019;57(7):967–973. doi: 10.1515/cclm-2019-0154. [DOI] [PubMed] [Google Scholar]
- 5.Meng Z., et al. Economic Implications of Chinese Diagnosis-related group–based payment systems for critically Ill patients in ICUs. Crit. Care Med. 2020;48(7):e565–e573. doi: 10.1097/CCM.0000000000004355. [DOI] [PubMed] [Google Scholar]
- 6.Lai Y., et al. Hospital response to a case-based payment scheme under regional global budget: the case of Guangzhou in China. Soc. Sci. Med. 2022;292 doi: 10.1016/j.socscimed.2021.114601. [DOI] [PubMed] [Google Scholar]
- 7.Badrick T. Biological variation: Understanding why it is so important? Pract. Lab. Med. 2021;23 doi: 10.1016/j.plabm.2020.e00199. 00199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Novis D.A. Detecting and preventing the occurrence of errors in the practices of laboratory medicine and anatomic pathology: 15 years' experience with the College of American Pathologists' Q-PROBES and Q-TRACKS programs. Clin. Lab. Med. 2004;24(4):965–978. doi: 10.1016/j.cll.2004.09.001. [DOI] [PubMed] [Google Scholar]
- 9.Fleming J.K., Katayev A. Changing the paradigm of laboratory quality control through implementation of real-time test results monitoring: for patients by patients. Clin. Biochem. 2015;48(7–8):508–513. doi: 10.1016/j.clinbiochem.2014.12.016. [DOI] [PubMed] [Google Scholar]
- 10.Badrick T., et al. Patient-based real-time quality control: review and Recommendations. Clin. Chem. 2019;65(8):962–971. doi: 10.1373/clinchem.2019.305482. [DOI] [PubMed] [Google Scholar]
- 11.Bietenbeck A., et al. Understanding patient-based real-time quality control using simulation modeling. Clin. Chem. 2020;66(8):1072–1083. doi: 10.1093/clinchem/hvaa094. [DOI] [PubMed] [Google Scholar]
- 12.Duan X., et al. Assessment of patient-based real-time quality control algorithm performance on different types of analytical error. Clin. Chim. Acta. 2020;511:329–335. doi: 10.1016/j.cca.2020.10.006. [DOI] [PubMed] [Google Scholar]
- 13.Sewpersad S., Chale-Matsau B., Pillay T.S. Real world feasibility of patient-based real time quality control (PBRTQC) using five analytes in a South African laboratory. Clin. Chim. Acta. 2025;565 doi: 10.1016/j.cca.2024.120006. [DOI] [PubMed] [Google Scholar]
- 14.Duan X., et al. Exploring optimization algorithms for establishing patient-based real-time quality control models. Clin. Chim. Acta. 2024;554 doi: 10.1016/j.cca.2024.117774. [DOI] [PubMed] [Google Scholar]
- 15.Duan X., et al. Regression-adjusted real-time quality control. Clin. Chem. 2021;67(10):1342–1350. doi: 10.1093/clinchem/hvab115. [DOI] [PubMed] [Google Scholar]
- 16.Liang Y., et al. A study on quality control using delta data with machine learning technique. Heliyon. 2022;8(8) doi: 10.1016/j.heliyon.2022.e09935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yang X., et al. Application of patient-based real-time quality control based on artificial intelligence monitoring Platform in continuously quality risk monitoring of Down Syndrome Serum Screening. J. Clin. Lab. Anal. 2024;38(5) doi: 10.1002/jcla.25019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Zhou R., et al. Traceable machine learning real-time quality control based on patient data. Clin. Chem. Lab. Med. 2022;60(12):1998–2004. doi: 10.1515/cclm-2022-0548. [DOI] [PubMed] [Google Scholar]
- 19.Xia Y., et al. Patient-based real-time quality control integrating neural networks and joint probability analysis. Clin. Chim. Acta. 2025;567 [Google Scholar]
- 20.Melkman A.A., et al. On the compressive power of Boolean threshold autoencoders. 2021;34(2):921–931. doi: 10.1109/TNNLS.2021.3104646. [DOI] [PubMed] [Google Scholar]
- 21.Sze V., et al. Efficient processing of deep neural networks: a tutorial and survey. 2017;105(12):2295–2329. [Google Scholar]
- 22.Alzubaidi L., et al. A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications. J. Big Data. 2023;10(1) [Google Scholar]
- 23.Man D., et al. Patient-based pre-classified real-time quality control (pcrtqc) Clin. Chim. Acta. 2023;549 doi: 10.1016/j.cca.2023.117562. [DOI] [PubMed] [Google Scholar]
- 24.Chen J., Du L., Liao L. Discriminative mixture variational autoencoder for semisupervised classification. IEEE Trans. Cybern. 2022;52(5):3032–3046. doi: 10.1109/TCYB.2020.3023019. [DOI] [PubMed] [Google Scholar]
- 25.Qian K., et al. International Conference on Machine Learning. 2019. Autovc: zero-shot voice style transfer with only autoencoder loss. arXiv:1905. [Google Scholar]
- 26.Vincent P., et al. Proceedings of the 25th International Conference on Machine Learning. Association for Computing Machinery; Helsinki, Finland: 2008. Extracting and composing robust features with denoising autoencoders; pp. 1096–1103. [Google Scholar]
- 27.Hinton G.E., Salakhutdinov R.R. Reducing the Dimensionality of data with neural networks. Science. 2006;313(5786):504–507. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
- 28.Mafi M., et al. A comprehensive survey on impulse and Gaussian denoising filters for digital images. Signal Process. 2019;157:236–260. [Google Scholar]
- 29.Chandola V., Banerjee A., Kumar V. Anomaly detection. ACM Comput. Surv. 2009;41(3):1–58. [Google Scholar]
- 30.Badrick T. Biological variation: Understanding why it is so important? Pract. Lab. Med. 2021;23 doi: 10.1016/j.plabm.2020.e00199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sarkar C., et al. Artificial intelligence and machine learning technology driven modern Drug Discovery and development. Int. J. Mol. Sci. 2023;24(3) doi: 10.3390/ijms24032026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Chandran K.S.S., et al. Comparison of Matching Pursuit algorithm with other signal processing techniques for computation of the time-Frequency power Spectrum of Brain signals. J. Neurosci. 2016;36(12):3399–3408. doi: 10.1523/JNEUROSCI.3633-15.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kralj L., Lenasi H. Wavelet analysis of laser Doppler microcirculatory signals: Current applications and limitations. Front. Physiol. 2022;13 doi: 10.3389/fphys.2022.1076445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.van Rossum H.H. Moving average quality control: Principles, practical application and future perspectives. Clin. Chem. Lab. Med. 2019;57(6):773–782. doi: 10.1515/cclm-2018-0795. [DOI] [PubMed] [Google Scholar]
- 35.Loh T.P., et al. Recommendation for performance verification of patient-based real-time quality control. Clin. Chem. Lab. Med. 2020;58(8):1205–1213. doi: 10.1515/cclm-2019-1024. [DOI] [PubMed] [Google Scholar]
- 36.Cervinski M.A., et al. In: Advances in Clinical Chemistry. Makowski G.S., editor. Elsevier; 2023. Chapter Six - Advances in clinical chemistry patient-based real-time quality control (PBRTQC) pp. 223–261. [DOI] [PubMed] [Google Scholar]
- 37.Lu Y., et al. Assessment of patient based real-time quality control on comparative assays for common clinical analytes. J. Clin. Lab. Anal. 2022;36(9) doi: 10.1002/jcla.24651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rossum H., et al. Benefits, limitations, and controversies on patient-based real-time quality control (PBRTQC) and the evidence behind the practice. Clin. Chem. Lab. Med. 2021;59 doi: 10.1515/cclm-2021-0072. [DOI] [PubMed] [Google Scholar]
- 39.Duan X., et al. Next-generation patient-based real-time quality control models. Annals of Laboratory Medicine. 2024;44(5):385–391. doi: 10.3343/alm.2024.0053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Sewpersad S., Chale-Matsau B., Pillay T.S. Real world feasibility of patient-based real time quality control (PBRTQC) using five analytes in a South African laboratory. Clin. Chim. Acta. 2025;565 doi: 10.1016/j.cca.2024.120006. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.





