Abstract
Variations in laboratory test names across healthcare systems—stemming from inconsistent terminologies, abbreviations, misspellings, and assay vendors—pose significant challenges to the integration and analysis of clinical data. These discrepancies hinder interoperability and complicate efforts to extract meaningful insights for both clinical research and patient care. In this study, we propose a machine learning-driven solution, enhanced by natural language processing techniques, to standardize lab test names. By employing feature extraction methods that analyze both string similarity and the distributional properties of test results, we improve the harmonization of test names, resulting in a more robust dataset. Our model achieves a 99% accuracy rate in matching lab names, showcasing the potential of AI-driven approaches in resolving long-standing standardization challenges. Importantly, this method enhances the reliability and consistency of clinical data, which is crucial for ensuring accurate results in large-scale clinical studies and improving the overall efficiency of informatics-based research and diagnostics.
1. Introduction
The digitization of medical records has revolutionized healthcare but has also brought challenges, particularly in the standardization of laboratory test names across and within institutions. Medical data, including the names of standard laboratory tests, often vary due to factors such as misspellings, synonyms, abbreviations, differing terminologies, and the use of different assay vendors. For example, many different terms are used to describe a variant allele’s impact on enzyme function and the corresponding inferred phenotypic interpretation of a clinical pharmacogenetic test result. A genetic testing laboratory report might interpret a TPMT *3A allele as leading to “low function,” “low activity,” “null allele,” “no activity,” or “undetectable activity.” Similarly, a laboratory may assign different phenotype designations to an individual with two nonfunctional alleles, such as describing the TPMT gene as “homozygous deficient” or “low activity,” depending on the laboratory. These same inconsistencies can appear when describing other gene variants, like DPYD, further complicating data standardization across laboratories and healthcare systems1. These variations result in discrepancies in naming laboratory test results that assess the same clinical or biological variables. Such inconsistencies complicate the effective analysis and integration of laboratory data, which are essential for drawing meaningful insights and making informed healthcare decisions2,3.
Standards such as the Logical Observation Identifier Names and Codes (LOINC) have been created to standardize medical terminology4, but inconsistent implementation continues to undermine data quality and usability. Moreover, a common laboratory test like a Complete Blood Count (CBC) might be listed under various names depending on the facility—Quest Diagnostics might label it as ‘CBC (includes diff/plt)’, while LabCorp could refer to it as ‘Complete CBC with Auto Diff’, and a local clinical pathology lab might simply call it ‘Blood Count’. Such inconsistencies can lead to confusion and errors in data interpretation. In specific fields, such as radiotherapy, efforts have been made to standardize lab test names5. Natural language processing (NLP) methods have been employed to address variability in clinical mentions6. The harmonization data can significantly impact the clinical conclusions derived from analysis. For example, after adjusting for inconsistencies, the prevalence of dyslipidemia rose from 39.63% to 46.2%, while the prevalence of chronic kidney disease dropped from 20.57% to 8.26%. These changes highlight how harmonized data not only improve interoperability but also lead to more accurate disease classification, enabling better-informed clinical decisions7.
This paper introduces a novel method using machine learning and NLP for standardizing laboratory test names, improving data harmonization and healthcare data integration by analyzing test names’ semantic meanings and distributional properties.
2. Methods
2.1 Data and Process
The methodology outlined is developed and validated as part of an ongoing study using anonymized data from asthma patients treated at Scripps Health, with approval from the institutional review board. The dataset consists of demographic, clinical, and medication information from 31,795 patients, collected between 2018 and 2023. The dataset, consisting of 6,860,497 entries, was organized in a tabular format, and included anonymized patient data such as study identifiers, shifted visit dates, laboratory test names, numeric results, and measurement units. An initial review identified 5,957 unique laboratory test names. After excluding entries without valid numeric results, the number of unique test names was reduced to 2,884. Of these, 715 tests had more than 200 results each. As a result, the final dataset comprised 715 unique laboratory test names, of which only 315 contained valid unit information. This resulted in 715 distinct laboratory names, leading to 255,255 unique pairings, as shown in our workflow diagram (Figure 1). Of these pairings, only 234 represented matched names, underscoring a significant class imbalance. To address this imbalance and improve computational efficiency, we employed blocking and balancing techniques.
Figure 1.
Figure 1: Workflow Diagram of Data Processing and Model Training: This diagram illustrates the step-by-step process from initial data acquisition to final model evaluation.
The filtering process significantly reduced the total number of pairings to 57,966. To further refine the dataset and address the class imbalance, we applied the Synthetic Minority Over-sampling Technique (SMOTE)8,9. This approach allowed us to generate synthetic samples of the minority class, resulting in a dataset of 111,698 pairs. By improving class balance, SMOTE enhanced model’s performance, and reliability. Following the data balancing, the dataset was split into training and testing subsets using a 75/25 ratio. This split ensures a comprehensive evaluation of our model’s ability to generalize to unseen data, providing a reliable assessment of its predictive accuracy and robustness. For the classification task, we employed the XGBoost classifier10, known for its efficiency and strong performance in handling imbalanced datasets.
2.2 Feature Extraction
Feature extraction plays a pivotal role in our lab-name-matching algorithm, as depicted in the third block of Figure 1, labeled “Feature Extraction.” This step is crucial for transforming raw data into a format that is interpretable for machine learning models. In this section, we describe the eight distinct features extracted from the laboratory test data, which are crucial for the subsequent machine learning process.
The following features are extracted to form the feature set that feeds into our machine learning model:
Grouping Feature: Spatial clustering of laboratory test names, based on their co-occurrence when doctors order tests to investigate diseases, reveals patterns and relationships.
Histogram-Based Distribution Similarity Assessment: Compares lab tests by analyzing the similarity of their statistical distributions, highlighting underlying patterns and differences, thereby enabling comparisons in a numeric distribution.
Distribution Variable Ratios: Calculates ratios of mean to standard deviation, aiding in the assessment of the proximity between mean values and standard deviations.
Similarity Measures for Lab Name Matching: Utilizes Dice and Jaccard similarity metrics to quantitatively assess the string-level similarities between different laboratory names, effectively capturing their lexical resemblance.
Word Embedding Techniques and Cosine Similarity: Uses advanced NLP to understand semantic and contextual relationships within lab test names, enhancing interpretative depth and compares their similarity with Cosine function.
2.2.1 Grouping Feature
In medical diagnostics, the set of laboratory tests requested by doctors often correlates with specific diseases or medical conditions11. For instance, tests related to asthma are typically ordered together, creating a pattern of co-occurrence that can be analytically exploited. We leverage this data-driven insight through a feature extraction method that employs spatial clustering of laboratory test names based on their co-occurrence in diagnostic scenarios. Specifically, our approach organizes lab tests into groups based on their diagnostic context; tests frequently ordered for asthma patients, for example, would cluster spatially closer in our model due to their recurrent co-occurrence, regardless of name variations. These lab groups (G) are initially formed for each patient’s set of ordered tests and are then aggregated to compile a comprehensive list reflecting these clinical associations. This methodology not only captures the natural groupings of lab tests in medical practice but also enhances our ability to identify and link tests with similar diagnostic purposes, even when their names differ.
The feature set was developed, consolidated, and normalized through a three-step process.
2.1.1.1 Random Coordinate Generation:
To generate random 3D coordinates for each unique lab test name, we assume there are n unique lab tests. For each lab test ki E K, we generate spherical coordinates (θi,T i), and then convert them into Cartesian coordinates (xi,yi,zi) using:
| (1) |
| (2) |
Here r is the radius, which can be set to 1 (for unit sphere) unless otherwise specified and U(0,1) denotes a uniform random variable between 0 and 1.
In Figure 2, the 715 laboratory test names are plotted within a 3D coordinate system, defined by the x, y, and z axes. The red points highlight spirometry tests, specifically sub-tests such as Peak Expiratory Flow (PEF), Total Lung Capacity (TLC), Functional Residual Capacity (FRC), Residual Volume (RV), Forced Vital Capacity (FVC), and Forced Expiratory Volume (FEV). These are key reference tests commonly used in diagnosing asthma. The blue points represent the remaining laboratory tests in the dataset.
Figure 2:
Initial 3D Scatter Plot of Lab Names: Initial distribution of 715 unique laboratory names plotted within a three-dimensional coordinate system. The visualization distinguishes spirometer tests (shown in red). Non-spirometer tests are depicted in blue, highlighting the initial stage of our spatial clustering process.
2.2.1.2 Iterative Adjustment:
To reflect the clinical relationships between lab tests, we iteratively adjust the coordinates. Define β and β2 as adjustment parameters. For each iteration t (from 1 to T), and for each lab test ki ∈ K, compute the mean coordinates of the group.
| (3) |
For points not in the group Gi, slightly adjust them to increase the separation:
| (4) |
2.2.1.3 Visualization and Normalization:
After the iterative adjustment, the coordinates are visualized in a 3D scatter plot. Additionally, a normalized distance matrix is calculated to validate clustering:
| (5) |
| (6) |
where Di; is the normalized distance between points i and ;, and di; is the Euclidean distance between their coordinates. This methodology effectively captures the clinical relationships between laboratory tests, providing a robust feature extraction technique that enhances the performance of the predictive model developed for laboratory test name harmonization. As shown in Figure 3, the spirometer data points indicate a cluster formation.
Figure 3:
End of Iterative Adjustments of 3D Scatter Plot: 3D scatter plot showing the final positions of laboratory names after iterative adjustments. The color coding remains consistent with the initial stage, where red points represent spirometer tests and blue points represent other types of tests.
2.2.2 Histogram-Based Distribution Similarity Assessment
To effectively analyze the laboratory data, we employed the Normalized Logarithmic Histogram Similarity Evaluation technique. This method plays a vital role in assessing the similarity across distributions of laboratory test results, which is critical to our study of 715 unique laboratory names. Each lab name has at least 200 valid numeric samples, with a range spanning from -89 to 1,497,430. This minimum threshold was set because a dataset of 200 or more values is necessary to accurately depict a probabilistic distribution, essential for robust statistical analysis.
The first step in our distribution analysis involves normalizing the data values. We apply an absolute value function and add a small constant, ϵ = 0.0001, to each value to prevent issues when applying logarithmic transformations of zeros. This method transforms any negative values into positive ones, which is crucial since logarithms of non-positive numbers are undefined. Although the proportion of negative values in our dataset is quite low, suggesting that their removal could be a feasible alternative without significantly impacting the bin width, we opt to retain them. This choice is made to preserve the full integrity of the dataset’s size and its distributional characteristics. After these adjustments, we compute the natural logarithm of the values to stabilize variance across the data range and minimize skewness.
To optimize the bin width for histogram representation, we evaluated several methods, including Scott’s Rule12, Freedman-Diaconis13, and Sturges’ formula14. The evaluation showed minimal differences in bin counts between Freedman-Diaconis (658 bins) and Scott’s Rule. Given the similar results, we selected Scott’s Rule for its robustness in handling diverse data distributions within our dataset. According to Scott’s Rule, the bin width w is defined as:
| (7) |
where σ is the standard deviation of the dataset and n is the number of observations. Using this rule, we calculated the bin width and found the number of bins to be 443. Next, we calculate common histogram edges based on the global range of the data to ensure that histograms are comparable across different lab tests. Once the histograms are generated, we assess the similarity between the distributions by calculating the Mean Absolute Error (MAE) between each pair of lab test histograms. This MAE serves as a feature that captures the distributional similarities among various lab tests, offering a robust metric for use in our analytical models.
2.2.3 Distribution Variable Ratios
The ratio of mean to standard deviation in numerical distributions is a critical feature15,16. The importance of these ratios is influenced by the density of the groups they represent. As a result, the total count of observations, along with their mean values and standard deviation rates, are included as features to offer a more comprehensive understanding of the data. These features contribute to a deeper analysis of the variability and central tendencies within the lab test results, enhancing the overall model’s ability to capture meaningful patterns.
2.2.4 Similarity Measures for Lab Name Matching
The process of lab name harmonization leverages various similarity measures to identify correspondences between differing terminologies. In this study, we focus on two key measures: the Dice coefficient and the Jaro similarity index. The Dice coefficient17, a statistical tool for measuring the similarity between two sets, is calculated by taking twice the size of the intersection divided by the sum of the sizes of both sets. Given two sets of lab names A and B, the Dice coefficient D is defined as:
| (8) |
where |A| and |B| represent the sizes of sets A and B, respectively. This measure is particularly effective in identifying syntactic similarities between lab names, which can indicate a potential relationship or redundancy between the tests. By capturing the degree of overlap in the characters or terms used in lab test names, the Dice coefficient helps reveal cases where different names may refer to the same or similar tests, thereby facilitating the harmonization process. The Jaro similarity index18 quantifies the similarity between two sequences by considering the proportion of matching characters and the extent of their transpositions. The index is particularly advantageous for comparing short strings, such as personal names or medical terms, where minor errors can alter spellings. It is defined as:
| (9) |
where m is the count of matching characters between two strings, t is half the number of transpositions, and |A| and |B| are the lengths of strings A and B, respectively. Matching characters are considered if they are not farther apart than floor(max(|A|,|B|)/2) − 1.
2.2.5 Word Embedding Techniques and Cosine Similarity
Word embedding techniques offer a nuanced approach to analyzing similarities between laboratory test names by capturing their semantic meanings. In this context, a token refers to an individual word or meaningful unit extracted from the laboratory test names. While we initially considered several pre-trained word embedding models, including Wikipedia19, PubMed201820, and GoogleNews21, for creating vector representations of lab names, our final choice was guided by the effectiveness of these models in token representation. Token coverage played a key role in our model selection. PubMed2018 achieved the highest coverage, successfully converting 1,790 out of 1,917 tokens (93.4%). In comparison, GoogleNews and Wikipedia achieved lower conversion rates of 78.2% and 92.4%, respectively. This made PubMed2018 the optimal choice for capturing the semantic relationships between lab test names. Word embedding models convert words or phrases into high-dimensional vector spaces, where semantically similar words are positioned closer together. For a given lab name, the selected word embedding model generates a vector representation v, enabling advanced semantic analysis by comparing the proximity of these vectors in the space. The superior token coverage provided by PubMed2018 ensures that a wider range of lab names is accurately represented, thereby increasing the reliability of our semantic similarity assessments between different lab tests. Cosine similarity is used to measure the similarity between two vectors by calculating the cosine of the angle between them. For two vectors vA and vB representing lab names A and B, the cosine similarity S is given by:
| (10) |
where vA · vB is the dot product of the vectors, and llvAll and llvBll are their magnitudes (norms).
By using cosine similarity, we can quantify the semantic similarity between lab names. A higher cosine similarity indicates that the lab names are more semantically related, which helps in identifying related tests and understanding the medical context behind them. These verbal analysis techniques, which combine basic similarity measures (such as Dice coefficient) with advanced word embedding models, provide a comprehensive approach to analyzing lab names based on both syntactic and semantic similarities. This enriched feature extraction process enhances both the accuracy and interpretability of predictive models used in healthcare applications.
2.3 Pairing and Filtering
To efficiently manage the vast number of lab test pairings, we introduced a blocking phase designed to categorize lab tests into blocks based on their unique and distinct measurement units, as well as their non-overlapping distribution ranges as depicted in the fifth box of Figure 1. This categorization was achieved by optimizing the MAE between the histogram distributions of lab tests. By focusing on distinct measurement units and non-overlapping distributions, we ensured that only the most relevant comparisons were made between lab tests that are truly comparable. This strategic blocking significantly reduced the total number of potential lab test pairings from 255,255 to 57,966, thereby enhancing computational efficiency. The reduction in pairings allowed our analysis to concentrate on the most pertinent comparisons, streamlining the data processing and improving the overall speed and performance of our matching algorithm.
2.4 Upsampling
A significant class imbalance was observed, where only 234 out of 57,966 pairings were matches, we employed the SMOTE. This technique was pivotal in balancing the dataset by generating synthetic samples for the underrepresented class of matched pairs. After the blocking phase, our dataset contained 57,966 pairings, with a mere 234 confirmed matches. SMOTE was utilized to augment the number of matching pairs, adding synthetic examples in equal proportion to the non-matched pairs, effectively doubling our training set to 111,698 pairs. This adjustment ensured an equal count between matched and non-matched classes, significantly enhancing the robustness and fairness of our model training process. We applied a 1:1 resampling strategy using a k-neighbors setting of 5, which generated realistic and varied synthetic samples, closely resembling the characteristics of actual data pairs. Moreover, SMOTE was applied exclusively to the training data to maintain the integrity of our validation process. This strategic application of SMOTE not only achieved a balanced class distribution but also improved the overall efficacy of the training phase.
2.5 Classification
The dataset was split into training and testing subsets with a 75/25 ratio. We utilized an XGBoost classifier, known for its strong performance in machine learning tasks, to predict matched lab test names accurately. The classifier was extensively optimized through hyperparameter tuning, including a grid search for maximum depth, learning rate, and number of estimators, combined with a 5-fold cross-validation strategy. The optimal settings determined were a maximum depth of 7, a learning rate of 0.2, and 300 estimators, resulting in a cross-validation score of 0.9982. This classification step is crucial for ensuring the accuracy of our predictions.
3. Results
We used an XGBoost classifier which achieved 99.8% accuracy. The classifier was trained on a dataset balanced using SMOTE to address class imbalances, ensuring robust model training. Upon testing, the model demonstrated high accuracy in identifying matched laboratory test names. A detailed comparison of the model’s performance pre- and post-SMOTE is presented in Table 1, illustrating significant improvements in accuracy, sensitivity, precision, and F1 score. The confusion matrix (Figure 4) further showcases the model’s effectiveness on the balanced dataset, highlighting its precision and reliability in classifying lab test names.
Table 1:
Comparison of Accuracy, Sensitivity, Precision, and F1 Score between the XGBoost model trained with and without SMOTE-augmented dataset.
| Metric | Accuracy | Sensitivity | Precision | F1 Score |
|---|---|---|---|---|
| Post-SMOTE | 99.78% | 99.57% | 99.98% | 99.78% |
| Pre-SMOTE | 97.64% | 96.14% | 99.10% | 97.61% |
Figure 4:
Confusion Matrix for XGBoost Classifier: Confusion matrix displaying the performance of the XGBoost classifier on the balanced dataset post-SMOTE application.
The model effectively identified and matched critical laboratory test name pairs, demonstrating its applicability in real-world scenarios. For example, it accurately recognized that “LDH Total” and “Lactate Dehydrogenase” refer to the same test, despite variations in abbreviation. Similarly, the model detected that “DHEA-Sulfate” and “DHEA-SO4” are equivalent, highlighting its ability to harmonize lab names with minor differences. On the other hand, while “Absolute CD3” and “Absolute CD8” had relatively high string similarity scores (Jaro and Dice), and their distributions difference was moderate, the model correctly identified them as not matching, demonstrating its robustness in distinguishing non-equivalent tests even when certain metrics are comparable. These examples underscore the model’s strength in both matching similar test names and differentiating between non-matching ones with similar features.
Table 2 presents a comparison of different laboratory test name pairs. Laboratory Test Name 1 and Laboratory Test Name 2 represent the two test names being compared, while Matching Condition indicates whether the tests were determined to be equivalent. Count 1 and Count 2 show the number of records for each test respectively. MAE of Distribution refers to the Mean Absolute Error of the histogram distributions between the two tests. The Grouping Feature quantifies how often the tests are ordered together, reflecting their clinical relevance. Jaro and Dice scores measure string similarity, assessing how closely the test names are matched in terms of spelling. Word Embedding Similarity is a semantic similarity score based on the PubMed2018 word embedding model, evaluating the relatedness between the two names.
Table 2:
Comparison of Laboratory Test Name Pairs
| Laboratory Test Name 1 | Laboratory Test Name 2 | Matching Condition | Count 1 | Count 2 | MAE of Distribution | Grouping Feature | Jaro | Dice | Word Embedding Similarity |
|---|---|---|---|---|---|---|---|---|---|
| DHEA- Sulfate | DHEA-SO4 | Yes | 886 | 506 | 0.011 | 0.339 | 0.75 | 0.600 | 0.800 |
| LDH Total | Lactate Dehydrogenase | Yes | 658 | 1035 | 0.012 | 0.140 | 0.539 | 0.500 | 0.560 |
| Absolute CD3 | Absolute CD8 | No | 270 | 270 | 0.035 | 0.299 | 0.944 | 0.916 | 0.846 |
The pairing of ‘LDH Total’ and ‘Lactate Dehydrogenase’ as matches may initially appear counterintuitive due to their low string similarity metrics (Jaro, Dice, and word embedding similarity). However, the decision by the algorithm to classify these as matches is predominantly influenced by their close proximity in terms of distribution and grouping features. Specifically, the MAE of their distribution is notably low at 0.012, indicating that the numerical values associated with these tests in clinical settings are very similar. Furthermore, the grouping feature, which reflects their clinical relevance and common usage in diagnostic contexts, also supports their classification as matches.
4. Conclusion
This study effectively demonstrates the effectiveness of our innovative approach to laboratory test name harmonization, which integrates advanced machine learning and NLP techniques. With a 99.78% F1-score in matching predictions, our approach not only boosts the reliability of clinical research but also supports efficient healthcare operations by standardizing data interpretation. This study has some limitations that may be addressed in future work. First, the model was validated using data from a single institution, which may not represent the full variability encountered in other healthcare environments. To improve generalizability, future research should test the model across multiple institutions and diverse datasets. Additionally, the approach could be refined to better handle sparse data, particularly for less frequently ordered laboratory tests. Another limitation is the use of only test names which have 200 values, which may have constrained the dataset size of our analysis. Addressing these limitations will help enhance the model’s robustness and extend its applicability to broader healthcare datasets, as well as its potential use in clinical decision-support systems.
4.1 Disclosure Statement
Funding for this study was provided by GSK [NCT06389058, Study ID 219224]. GSK was provided the opportunity to review a preliminary version of this publication for factual accuracy, but the authors are solely responsible for the final content and interpretation.
Figures & Tables
References
- 1.Caudle KE, Dunnenberger HM, Freimuth RR, Peterson JF, Burlison JD, Whirl-Carrillo M, et al. Standardizing terms for clinical pharmacogenetic test results: consensus terms from the Clinical Pharmacogenetics Implementation Consortium (CPIC) Genetics in Medicine. 2017;19(2):215–23. doi: 10.1038/gim.2016.87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Carter AB, Berger AL, Schreiber R. Laboratory Test Names Matter: A Survey on What Works and What Doesn’t Work for Orders and Results. Archives of Pathology & Laboratory Medicine. 2024;148(2):155–67. doi: 10.5858/arpa.2021-0314-OA. [DOI] [PubMed] [Google Scholar]
- 3.Abhyankar S, Demner-Fushman D, McDonald CJ. Standardizing clinical laboratory data for secondary use. Journal of biomedical informatics. 2012;45(4):642–50. doi: 10.1016/j.jbi.2012.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Forrey AW, Mcdonald CJ, DeMoor G, Huff SM, Leavelle D, Leland D, et al. Logical observation identifier names and codes (LOINC) database: a public use set of codes and names for electronic reporting of clinical laboratory test results. Clinical chemistry. 1996;42(1):81–90. [PubMed] [Google Scholar]
- 5.Syed K, Sleeman IV W, Ivey K, Hagan M, Palta J, Kapoor R, et al. Integrated natural language processing and machine learning models for standardizing radiotherapy structure names. Healthcare. 2020;vol. 8. MDPI:120. doi: 10.3390/healthcare8020120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chen L, Fu W, Gu Y, Sun Z, Li H, Li E, et al. Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking. Journal of the American Medical Informatics Association : JAMIA. 2020 doi: 10.1093/jamia/ocaa155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lee SG, Chung HJ, Park JB, Park H, Lee EH. Harmonization of laboratory results by data adjustment in multicenter clinical trials. The Korean Journal of Internal Medicine. 2018;33(6):1119. doi: 10.3904/kjim.2017.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research. 2002;16:321–57. [Google Scholar]
- 9.Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N. DeepER–Deep EntityResolution. arXiv preprint arXiv:171000597. 2017 [Google Scholar]
- 10.Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016 Available from: https://api.semanticscholar.org/CorpusID:4650265 . [Google Scholar]
- 11.Hickner J, Thompson PJ, Wilkinson T, Epner P, Shaheen M, Pollock AM, et al. Primary care physicians’ challenges in ordering clinical laboratory tests and interpreting results. The Journal of the American Board of Family Medicine. 2014;27(2):268–74. doi: 10.3122/jabfm.2014.02.130104. [DOI] [PubMed] [Google Scholar]
- 12.Scott DW. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons; 1992. [Google Scholar]
- 13.Freedman D, Diaconis P. On the histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 1981;57(4):453–76. [Google Scholar]
- 14.Sturges HA. The choice of a class interval. Journal of the American Statistical Association. 1926;21(153):65–6. [Google Scholar]
- 15.Collins R, Liu Y. Online selection of discriminative tracking features. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2003;27:1631–43. doi: 10.1109/TPAMI.2005.205. [DOI] [PubMed] [Google Scholar]
- 16.Müller A, Scarsini M, Tsetlin I, Winkler RL. Ranking distributions when only means and variances are known. Operations Research. 2022;70(5):2851–9. [Google Scholar]
- 17.Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945 [Google Scholar]
- 18.Jaro MA. Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association. 1989;84(406):414–20. [Google Scholar]
- 19.Yamada I, Asai A, Sakuma J, Shindo H, Takeda H, Takefuji Y, et al. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association Computational Linguistics; 2020. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia; pp. 23–30. [Google Scholar]
- 20.McDonald R, Brokos GI, Androutsopoulos I. Deep relevance ranking using enhanced documentquery interactions. arXiv preprint arXiv:180901682. 2018 [Google Scholar]
- 21.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013 [Google Scholar]




