Skip to main content
Internal Medicine logoLink to Internal Medicine
. 2024 Jan 2;63(16):2259–2268. doi: 10.2169/internalmedicine.2825-23

An Evaluation of the Efficacy of Machine Learning in Predicting Thyrotoxicosis and Hypothyroidism: A Comparative Assessment of Biochemical Test Parameters Used in Different Health Checkups

Jaeduk Yoshimura Noh 1, Ai Yoshihara 1, Shigenori Hiruma 1, Masahiro Ichikawa 1, Rei Hirose 1, Masakazu Koshibu 1, Hideyuki Imai 1, Akiko Sankoda 1, Nami Suzuki 1, Miho Fukushita 1, Masako Matsumoto 1, Natsuko Watanabe 1, Kiminori Sugino 2, Koichi Ito 2
PMCID: PMC11414366  PMID: 38171877

Abstract

Objective

This study assessed the efficacy of machine learning in predicting thyrotoxicosis and hypothyroidism [thyroid-stimulating hormone >10.0 mIU/L] by leveraging age and sex as variables and integrating biochemical test parameters used by the Japan Society of Health Evaluation and Promotion (JHEP) and the Japan Society of Ningen Dock (JND).

Methods

Our study included 20,653 untreated patients with Graves' disease, 3,435 untreated patients with painless thyroiditis, 4,266 healthy individuals, and 18,937 untreated patients with Hashimoto's thyroiditis. Machine learning was conducted using Prediction One on three distinct datasets: the Ito dataset (age, sex, and 30 blood tests and biochemical test data), the JHEP dataset (age, sex, and total protein,total bilirubin, aspartate aminotransferase (AST), alanine aminotransferase (ALT), gamma-glutamyl transpeptidase (γGTP), alkaline phosphatase, creatinine (CRE), uric acid (UA), and T-Cho test data), and the JND dataset (age, sex, and AST, ALT, γGTP, CRE, and UA test data).

Results

The results for distinguishing thyrotoxicosis patients from the healthy control group showed that the JHEP dataset yielded substantial discriminative capacity with an area under the curve (AUC) of 0.966, sensitivity of 92.2%, specificity of 89.1%, and accuracy of 91.7%. The JND dataset displayed similar robustness, with an AUC of 0.948, sensitivity of 92.0%, specificity of 81.3%, and accuracy of 90.4%. Differentiating hypothyroid patients from the healthy control group yielded similarly robust performances, with the JHEP dataset yielding AUC, sensitivity, specificity, and accuracy values of 0.864, 84.2%, 72.1%, and 77.4%, respectively, and the JND dataset yielding values of 0.840, 83.2%, 67.2%, and 74.3%, respectively.

Conclusion

Machine learning is a potent screening tool for thyrotoxicosis and hypothyroidism.

Keywords: machine learning, thyrotoxicosis, hypothyroidism, biochemical test parameters, health checkups

Introduction

During health checkups in Japan, overt hypothyroidism was reportedly detected in 0.7% of the subjects, subclinical hypothyroidism in 5.8%, overt hyperthyroidism in 0.7%, and subclinical hyperthyroidism in 2.1% (1). The symptoms of hyperthyroidism include palpitations, hand tremors, fatigue, sweating, and weight loss. However, not all patients experience these symptoms, and older adults experience them less frequently than younger individuals (2,3). Furthermore, other diseases sometimes present with these symptoms, making the diagnosis of hyperthyroidism based on these symptoms alone challenging. Initiating treatment of hypothyroidism at thyroid-stimulating hormone (TSH) levels >10 mIU/L is recommended (4,5), but it is difficult to diagnose subclinical hypothyroidism, in which the symptoms alone are typically insufficient to make the diagnosis.

Abnormal blood biochemical test data, including abnormal total cholesterol (T-Cho), alkaline phosphatase (ALP), creatine phosphokinase (CPK), aspartate aminotransferase (AST), alanine aminotransferase (ALT), and creatinine (CRE) levels, are well known to occur in cases of thyroid dysfunction, and they often serve as the initial indication of thyroid abnormalities, but it is impossible to make a diagnosis based on individual test data alone. We previously reported the successful use of machine learning to accurately diagnose Graves' disease (GD), and a combination of GD and painless thyroiditis (PT) using a complete blood count and biochemistry profile (6). This model may make it possible to diagnose thyrotoxicosis during routine health screening or based on diagnostic tests for other diseases. However, the tests used in this model differ from those commonly used in health checkups performed by the Japan Society of Health Evaluation and Promotion (JHEP) and the Japan Society of Ningen Dock (JND).

In this study, we investigated the ability of machine learning to diagnose thyrotoxicosis and hypothyroidism [with TSH levels >10 mIU/L, at which treatment is recommended (4,5)] based on the biochemical test parameters used by the JHEP and JND.

Materials and Methods

Study samples

We included all patients newly diagnosed with GD and PT who had made their initial visit to our hospital between January 1, 2005, and June 30, 2020. We also identified patients with untreated Hashimoto's disease who visited our clinic between January 1, 2005, and June 30, 2020. The exclusion criteria were having received treatment for a thyroid disorder before their first consultation at our hospital, and having received medication that might affect the thyroid function. We compiled a control sample of euthyroid subjects who were free of thyroid disorders and had made their first visit to our hospital on January 1, 2005, and June 30, 2020.

This study was approved by the Ethics Committee of Ito Hospital (approval number: 313). The patients were given the opportunity to refuse to participate in the study via opt-out.

Predictors

The predictors in our model were selected from the complete blood count and standard biochemical profile data obtained during the initial hospital visit. These predictors included patient age, sex, RBC, Hb, Ht, MCV, MCH, mean corpuscular hemoglobin concentration (MCHC), Plt, WBC, Neu, Lym, Mo, Eo, Ba, total protein (TP), total bilirubin (T-Bil), AST, ALT, lactate dehydrogenase (LDH), gamma-glutamyl transferase (γGTP), ALP, cholinesterase (ChE), CPK, CRE, uric acid (UA), sodium, potassium, chloride, calcium, P, and T-Cho values. These components are collectively referred to as the Ito Dataset. The JHEP uses the following parameters: TP, albumin (ALB), AST, ALT, γ-GTP, CRE, UA, high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), T-Cho, ALP, and T-Bil, and the JND uses the following parameters: ALB, AST, ALT, γ-GTP, CRE, UA, HDL-C, LDL-C, and TG. However, because our hospital does not use ALB, LDL-C, HDL-C, and TG, they were excluded from our model. Thus, the JHEP dataset included the following predictors: age, sex, TP, T-Bil, AST, ALT, γ-GTP, ALP, CRE, UA, and T-Cho; and the JND dataset included the following predictors: age, sex, AST, ALT, γ-GTP, CRE, and UA.

Outcomes

We considered free triiodothyronine (FT3) and free thyroxine (FT4) levels above the reference range together with a suppressed TSH level as evidence supporting a diagnosis of thyrotoxicosis. If the TSH receptor antibody (TRAb) test was positive and ophthalmopathy was observed, we made a diagnosis of Graves' disease. Furthermore, if a patient tested positive for TRAb and reported symptoms of hyperthyroidism, such as palpitations and weight loss persisting for more than three months, we strongly suspected Graves' disease. Conversely, if the TRAb test was either negative or weakly positive and there was no evidence of ophthalmopathy, especially when the onset of the symptoms had been recent, we performed thyroid scintigraphy to make a differential diagnosis between painless thyroiditis and Graves' disease.

Hashimoto's disease was diagnosed based on a positive anti-thyroglobulin antibody (TgAb) or anti-thyroid peroxidase antibody (TPOAb) test coupled with a negative TRAb test and a TSH level either within or above the reference range.

FT3 and FT4 levels were measured by performing electrochemiluminescence immunoassays (ECLIAs) (Elecsys FT3 and Elecsys FT4; Roche Diagnostics, Basel, Switzerland; reference ranges at our hospital: 2.2-4.3 pg/mL and 0.8-1.6 ng/dL, respectively). TSH was measured using an ECLIA (Elecsys TSH; Roche Diagnostics, Basel, Switzerland; reference range at our hospital: 0.2-4.5 mIU/L).

TgAb and TPOAb levels were determined using ECLIA (Roche Diagnostics, TgAb cutoff ≤40 IU/mL, and TPOAb cutoff ≤28 IU/mL). Between July 2003 and September 2008, TRAb was assayed with a TSH receptor autoantibody coated-tube kit [TRAb-CT] (RSR, Cardiff, United Kingdom; cutoff: ≤10%), and subsequently with an electrochemiluminescence immunoassay kit (Elecsys TRAb; Roche Diagnostics, Basel, Switzerland; cutoff: <2.0 IU/L).

We selected control subjects (free from thyroid disease) based on several criteria: a normal thyroid function based on FT3, FT4, and TSH levels within their reference ranges; negative tests for all thyroid autoantibodies; a homogeneous echo pattern on thyroid imaging indicating the absence of thyroid nodules; and the absence of goiter. Thyroid volume was measured using ultrasound and volumes within our institution's standard range of 6.4-17.8 mL were considered normal.

Statistical analyses

Differences between the thyrotoxicosis group (comprising both GD and PT patients) and the control group were analyzed using the Mann-Whitney test. Guided by the treatment protocol for hypothyroidism, which recommends intervention when TSH levels exceed 10 mIU/L (4,5), we conducted a Mann-Whitney U test on data from patients diagnosed with Hashimoto's disease who had TSH levels above this threshold and compared it to the control group.

We partitioned the data obtained from each group, namely the GD and PT groups, the Hashimoto's disease group with TSH levels exceeding 10 mIU/L, and the control group, into training and testing sets. Each group was randomly divided such that 70% of the members constituted the training set, and the remaining 30% comprised the testing set. In the training set, which constituted a 70% random sample of the entire study population comprising the GD and PT groups, Hashimoto's disease group, and control group, we constructed prediction models using Prediction One (Sony Network Communications, Tokyo, Japan), an ensemble learning model of neural networks, and gradient-boosted decision trees. Prediction One automatically adjusts and optimizes the variables, thereby creating an optimal prediction model through an artificial neural network with internal cross-validation. In the testing set, which included the remaining 30% of the total study sample, we assessed prediction performance using Prediction One. The performance was evaluated based on the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV).

We also developed a prediction model using logistic regression instead of Prediction One and compared the prediction performance of these two models (i.e., logistic regression-based model vs. machine learning-based model). The parameters were the same for both prediction and logistic regression and were used for the analysis in each dataset.

Statistical analyses were performed using the JMP Pro v17.10 software program (SAS Institute, Cary, USA), and p values <0.05 were considered significant.

Results

Patient population

Between January 1, 2005, and June 30, 2020, we identified 20,653 newly diagnosed patients with GD, 3,435 with PT, and 4,266 individuals with no thyroid disease. The mean ages [±standard deviation (SD)] of the GD, PT, and normal group were 39.9 (±14.1) years old, 39.5 (±14.2) years old, and 34.1 (±13.4) years old, respectively.

We divided the data from 16,891 patients with thyrotoxicosis and 2,987 healthy subjects into a training dataset. The test dataset comprised data obtained from 7,197 patients with thyrotoxicosis and 1,279 healthy subjects. The characteristics of the patient and control groups are listed in detail in Table 1.

Table 1.

Characteristics of the Thyrotoxicosis (GD+PT) Group and Euthyroid Control Group.

Thyrotoxicosis group Euthyroid control group p value
Number 24,088 4,266
Male:Female 4,252:19,836 858:3,408 p<0.001
Age (yrs) 39.9 (14.1) 34.1 (13.4) p<0.001
FT3 (pg/mL) 16.5 (8.9) 2.98 (0.42) p<0.001
FT4 (ng/dL) 4.8 (2.2) 1.23 (0.15) p<0.001
TSH (mIU/L) 0.01 (0.01) 1.7 (0.88) p<0.001
RBC (×103/μL) 463.2 (46.0) 447.2 (42.9) p<0.001
Hb (g/dL) 13.2 (1.3) 13.6 (1.3) p<0.001
Ht (%) 39.6 (3.7) 40.3 (3.8) p<0.001
MCV (fL) 85.6 (5.2) 90.2 (5.0) p<0.001
MCH (pg) 28.6 (2.1) 30.4 (2.1) p<0.001
MCHC (%) 33.4 (0.9) 33.7 (0.8) p<0.001
Plt (×103/μL) 24.7 (6.0) 24.9 (5.5) p<0.001
WBC (/μL) 5,815.6 (1,671.3) 6,133.1 (1,641.0) p<0.001
Neu (%) 51.6 (10.5) 57.0 (9.3) p<0.001
Lym (%) 35.6 (9.1) 32.0 (8.1) p<0.001
Mo (%) 9.70 (3.2) 7.3 (2.1) p<0.001
Eo (%) 2.6 (2.4) 2.8 (2.5) p<0.001
Ba (%) 0.6 (0.6) 1.0 (0.6) p<0.001
TP (g/dL) 6.9 (0.5) 7.3 (0.4) p<0.001
T-Bil (mg/dL) 0.7 (0.3) 0.8 (0.3) p<0.001
AST (U/L) 28.5 (15.7) 20.6 (9.7) p<0.001
ALT (U/L) 38.2 (30.5) 19.4 (15.8) p<0.001
LDH (U/L) 165.4 (30.0) 168.8 (34.0) p<0.001
γGTP (U/L) 37.4 (35.8) 24.9 (47.6) p<0.001
ALP (U/L) 334.8 (173.7) 212.9 (133.8) p<0.001
ChE (U/L) 382.8 (82.5) 303.3 (72.7) p<0.001
CPK (U/L) 64.4 (124.1) 100.3 (173.0) p<0.001
CRE (mg/dL) 0.5 (0.2) 0.7 (0.1) p<0.001
UA (mg/dL) 5.0 (1.2) 4.6 (1.2) p<0.001
Na (mmol/L) 140.0 (2.0) 139.4 (1.9) p<0.001
K (mmol/L) 4.3 (0.3) 4.3 (0.3) p<0.001
Cl (mmol/L) 105.4 (2.3) 104.0 (2.3) p<0.001
Ca (mg/dL) 9.6 (0.4) 9.5 (0.4) p<0.001
P (mg/dL) 3.9 (0.7) 3.6 (0.6) p<0.001
T-Cho (mg/dL) 150.8 (32.3) 190.5 (35.6) p<0.001

GD: Graves' disease, PT: painless thyroiditis

In addition, between January 1, 2005, and June 30, 2020, we documented 18,937 patients with an initial diagnosis of untreated Hashimoto's disease at our clinic, and 3,342 [17.6%; mean age 47.6 (±15.4) years old; 435 men and 2,907 women] of them had a TSH levels above 10 mIU/L at their initial visit. We randomly selected 70% of these patients and 2,987 control subjects for the training dataset, which included 2,339 patients with Hashimoto's disease and TSH levels above 10 mIU/L. The test dataset consisted of the remaining 30% of the samples and contained 1,003 patients with Hashimoto's disease with a TSH level >10 mIU/L and 1,279 control subjects. Table 2 lists the detailed characteristics of the patients and control subjects.

Table 2.

Characteristics of the Hashimoto's Disease (TSH>10.0 μU/mL) Group and Euthyroid Control Group.

Hashimoto's disease (TSH>10.0 μU/mL) group Euthyroid control group p value
Number 3,342 4,266  
Male:Female 435:2,907 858:3,408 p<0.001
Age (yrs) 47.6 (15.4) 34.1 (13.4) p<0.001
FT3 (pg/mL) 2.35 (0.79) 2.98 (0.42) p<0.001
FT4 (ng/dL) 0.68 (0.31) 1.23 (0.15) p<0.001
TSH (mIU/L) 56.9 (74.9) 1.7 (0.88) p<0.001
RBC (×103/μL) 430.0 (43.0) 447.2 (42.9) p<0.001
Hb (g/dL) 13.1 (1.3) 13.6 (1.3) p<0.001
Ht (%) 39.1 (3.7) 40.3 (3.8) p<0.001
MCV (fL) 91.0 (5.9) 90.2 (5.0) p<0.001
MCH (pg) 30.6 (2.3) 30.4 (2.1) p<0.001
MCHC (%) 33.6 (1.0) 33.7 (0.8) p<0.001
Plt (×103/μL) 24.6 (6.0) 24.9 (5.5) p<0.001
WBC (/μL) 6,036.3 (3,441.5) 6,133.1 (1,641.0) p<0.001
Neu (%) 56.1 (9.2) 57.0 (9.3) p<0.001
Lym (%) 33.5 (8.4) 32.0 (8.1) p<0.001
Mo (%) 6.8 (2.0) 7.3 (2.1) p<0.001
Eo (%) 2.9 (2.3) 2.8 (2.5) p<0.001
Ba (%) 0.9 (0.5) 1.0 (0.6) p<0.001
TP (g/dL) 7.6 (0.5) 7.3 (0.4) p<0.001
T-Bil (mg/dL) 0.7 (0.3) 0.8 (0.3) p<0.001
AST (U/L) 27.1 (17.8) 20.6 (9.7) p<0.001
ALT (U/L) 23.7 (20.4) 19.4 (15.8) p<0.001
LDH (U/L) 197.6 (77.9) 168.8 (34.0) p<0.001
γGTP (U/L) 28.5 (35.4) 24.9 (47.6) p<0.001
ALP (U/L) 218.6 (93.1) 212.9 (133.8) p<0.001
ChE (U/L) 300.7 (81.0) 303.3 (72.7) p<0.001
CPK (U/L) 206.7 (443.6) 100.3 (173.0) p<0.001
CRE (mg/dL) 0.7 (0.2) 0.7 (0.1) p<0.001
UA (mg/dL) 4.8 (1.3) 4.6 (1.2) p<0.001
Na (mmol/L) 139.8 (2.1) 139.4 (1.9) p<0.001
K (mmol/L) 4.3 (0.3) 4.3 (0.3) p<0.001
Cl (mmol/L) 104.4 (2.3) 104.0 (2.3) p<0.001
Ca (mg/dL) 9.5 (0.4) 9.5 (0.4) p<0.001
P (mg/dL) 3.6 (0.5) 3.6 (0.6) p<0.001
T-Cho (mg/dL) 229.7 (57.9) 190.5 (35.6) p<0.001

Predicting thyrotoxicosis

Our machine learning-based predictive model exhibited substantial discriminatory capacity in predicting thyrotoxicosis when applied to the Ito dataset, with an AUC of 0.977, sensitivity of 94.3%, specificity of 89.0%, and an overall accuracy of 93.5%. The five most significant factors in the Ito dataset for the diagnosis of thyrotoxicosis were CRE, T-Cho, ChE, CPK, and MCHC (Table 3). When the model was applied to the JHEP dataset, which incorporated 11 predictors from the Ito dataset but excluded blood cell and electrolyte data, it yielded a sustained, consistent performance, with an AUC of 0.966, sensitivity of 92.2%, specificity of 89.1%, and accuracy of 91.7%. The five most significant predictors in the JHEP dataset for diagnosing thyrotoxicosis were CRE, T-Cho, age, and ALT and ALP levels. Even when the model was applied to the JND dataset, which incorporated 7 predictive factors, it yielded robust results: an AUC of 0.948, sensitivity of 92.0%, specificity of 81.3%, and accuracy of 90.4%. The five most significant factors in the JND dataset for diagnosing thyrotoxicosis were thus the CRE, ALT level, age, UA level, and sex.

Table 3.

Prediction of Thyrotoxicosis and Hypothyroidism with a TSH Level >10.0 MIU/L.

Subjects Dataset AUC (%) Sensitivity (%) Specificity (%) Accuracy (%) Top five contributing predictors
Thyrotoxicosis group vs. control group Ito 0.977 94.3 89 93.5 CRE, T-Cho, ChE, CPK, MCHC
JHEP 0.966 92.2 89.1 91.7 CRE, T-Cho, age, ALT, ALP
JND 0.948 92 81.3 90.4 CRE, ALT, age, UA, sex
Hypothyroidism (TSH>10 μIU/mL) group vs. control group Ito 0.877 85.2 73.1 78.4 TP, age, CRE, T-Cho, AST
JHEP 0.864 84.2 72.1 77.4 Age, TP, CRE, AST, T-Cho
JND 0.828 84 65.3 73.5 Age, AST, CRE, sex, ALT

Predicting hypothyroidism with a TSH level 10.0 mIU/L

Our predictive model demonstrated moderate discriminatory potential when applied to the Ito dataset, which included patients with a TSH level >10.0 mIU/L and a control group. The model yielded an AUC of 0.877, a sensitivity of 85.2%, a specificity of 73.1%, and an accuracy of 78.4%. The top 5 contributory factors for predicting a TSH level >10.0 mIU/L were the TP, age, CRE, T-Cho, and AST (Table 3). In the JHEP dataset, which consists of 11 predictors that do not include blood cell and electrolyte data, the model displayed moderate performance, with an AUC of 0.864, a sensitivity of 84.2%, a specificity of 72.1%, and an accuracy of 77.4%. The top 5 contributory factors for predicting a TSH >10.0 mIU/L were the age, TP, CRE, AST, and T-Cho.

Finally, the model yielded satisfactory performance for the JND dataset containing seven predictor factors, with an AUC of 0.828, sensitivity of 84.0%, specificity of 65.3%, and accuracy of 73.5%. The top five contributory factors for predicting a TSH >10.0 mIU/L were age, AST level, CRE, sex, and ALT level.

A sub-analysis for predicting thyrotoxicosis and hypothyroidism with a TSH level 10.0 mIU/L

(1)A gender-specific sub-analysis for predicting thyrotoxicosis and hypothyroidism with a TSH level >10.0 mIU/L.

The ability of the datasets to predict thyrotoxicosis was analyzed in 3,408 women in the control group, 19,836 women with thyrotoxicosis, 858 men in the control group, and 4,252 men with thyrotoxicosis. The analysis of the ability of the datasets to predict hypothyroidism with a TSH >10.0 mIU/L was conducted on 3,408 women in the control group and 2,907 women with hypothyroidism and on 858 men in the control group and 435 men with hypothyroidism. The AUC values derived from the sub-analysis predicting thyrotoxicosis in women were nearly identical to those derived from the analysis of the entire group (Table 4-1). There were differences between the top five contributing predictors in the JHEP and JND datasets in men; however, with the exception of the AUCs, the outcomes were consistent.

Table 4-1.

Gender-specific Sub-analysis of the Prediction of Thyrotoxicosis and Hypothyroidism with a TSH Level >10.0 MIU/L.

Subjects Dataset AUC Sensitivity (%) Specificity (%) Accuracy (%) Top five contributing predictors
Thyrotoxicosis group vs. control group Male Ito 0.974 94.0 88.8 93.2 T-Cho, age, CRE, TP, ALP
JHEP 0.966 92.5 88.4 91.8 T-Cho, age, CRE, TP, ALP
JND 0.935 89.9 82.0 88.6 CRE, age, ALT, AST, UA
Female Ito 0.977 94.5 87.8 93.5 CRE, T-Cho, ChE, CPK, age
JHEP 0.965 93.3 84.6 92.0 CRE, T-Cho, age, ALT, ALP
JND 0.947 91.8 80.3 90.1 CRE, ALT, age, UA, γGTP
Hypothyroidism (TSH>10 μIU/mL) group vs. control group Male Ito 0.893 73.8 90.0 84.6 CRE, T-Cho, TP, AST, age
JHEP 0.897 77.8 87.6 84.3 CRE, age, T-Cho, AST, TP
JND 0.883 77.8 87.6 84.3 CRE, AST, age, γGTP, ALT
Female Ito 0.873 78.7 81.5 80.3 TP, age, CRE, T-Cho, AST
JHEP 0.852 75.4 79.5 77.6 Age, TP, CRE, AST, T-Cho
JND 0.806 77.7 68.7 72.8 Age, CRE, AST, ALT, γGTP

(2)A sub-analysis for predicting thyrotoxicosis and hypothyroidism with a TSH level >10.0 mIU/L in the group ≤50 years old vs. the group >50 years old.

A sub-analysis of the ability of the datasets to predict thyrotoxicosis according to age group was conducted on 3,729 control cases and 18,593 thyrotoxicosis cases in the group ≤50 years old and 537 control cases and 5,495 thyrotoxicosis cases in the group >50 years old. A sub-analysis of the ability of the datasets to predict hypothyroidism according to age group was conducted on 3,729 control cases and 1,878 hypothyroidism cases in the group ≤50 years old and 537 control cases and 1,464 hypothyroidism cases in the group >50 years old. The results of the sub-analysis of the ability to predict thyrotoxicosis in the group ≤50 years old showed that the AUC was almost the same as that obtained from the analysis of the group as a whole (Table 4-2). The AUC was slightly lower in the group >50 years old than in the group ≤50.

Table 4-2.

Sub-analysis of Prediction of Thyrotoxicosis and Hypothyroidism with a TSH Level >10.0 MIU/L in Subjects Aged 50 Years Old and under Versus Subjects Aged 51 Years Old and Over.

Subjects Dataset AUC Sensitivity (%) Specificity (%) Accuracy (%) Top five contributing predictors
Thyrotoxicosis group vs. control group 50 years old and under Ito 0.977 93.7 90.2 93.1 CRE, T-Cho, ChE, CPK, age
JHEP 0.968 92.2 87.5 91.4 CRE, T-Cho, ALT, age, ALP
JND 0.952 90.9 84.5 89.8 CRE, ALT, UA, age, AST
51 years old and over Ito 0.960 92.3 85.1 91.6 T-Cho, CPK, ChE, CRE, MCH
JHEP 0.938 90.8 79.4 89.7 T-Cho, Cre, ALP, ALT, TP
JND 0.903 87.0 80.0 86.3 CRE, ALT, UA, age, AST
Hypothyroidism (TSH>10 μIU/mL) group vs. control group 50 years old and under Ito 0.855 79.8 75.5 76.9 TP, T-Cho, CRE, age, AST
JHEP 0.837 66.8 83.0 77.4 Age, TP, CRE, AST, T-Cho
JND 0.792 78.1 66.8 70.7 Age, AST, CRE, sex, ALT
51 years old and over Ito 0.786 83.9 50.9 74.2 TP, CRE, Eo, T-Cho, RBC
JHEP 0.789 85.3 49.1 74.7 CRE, TP, T-Cho, sex, ALT
JND 0.737 83.4 43.4 71.7 CRE, AST, ALT, sex, age

The results of the sub-analysis of the ability to predict hypothyroidism in the group ≤50 years of age revealed a slightly lower AUC than in the group as a whole. For the group >50 years old, the change in AUC decreased slightly compared with the overall group.

(3)Prediction of hypothyroidism with a TSH level >10.0 mIU/L and FT4 level within the reference range or a TSH level >10.0 mIU/L and low FT4 level

We analyzed 1,723 patients with a TSH level >10 mIU/L and FT4 values within the reference range and 2,713 patients with a TSH level >10 mIU/L and FT4 values below the lower limit of the reference range; in the former group, the AUC values obtained with both the Ito and JHEP datasets were approximately 0.8, indicating good results (Table 4-3), and in the latter group, the AUC values obtained using the Ito, JHEP, and JND datasets were 0.985, 0.884, and 0.836, respectively, all of which were very good.

Table 4-3.

Prediction of Hypothyroidism with a TSH Level >10.0 MIU/L and a FT4 Level within the Reference Range or a TSH Level >10.0 MIU/L and Low FT4 Level

Subjects Dataset AUC Sensitivity (%) Specificity (%) Accuracy (%) Top five contributing predictors
Hypothyroidism (TSH>10 μIU/mL and FT4 within reference range) group vs. control group Ito 0.797 75.6 71.3 72.9 TP, T-Cho, age, CPK, Che
JHEP 0.806 69.8 78.5 75.3 Age, TP, AST, T-Bil, T-Cho
JND 0.766 66.9 76 72.7 Age, AST, CRE, sex, ALT
Hypothyroidism (TSH>10.0 mIU/L and low FT4) group vs. control group Ito 0.879 83.5 75.9 79.3 TP, CPK, age, T-Cho, ChE
JHEP 0.856 78.4 77.2 77.7 Age, TP, T-Cho, AST, CRE
JND 0.815 73.6 75.4 74.5 Age, AST, CRE, sex, γGTP

(4)Prediction of thyrotoxicosis and hypothyroidism in cases with a TSH level >10.0 mIU/L after excluding T-Cho and UA as indicators.

We did not exclude patients with dyslipidemia and hyperuricemia who were undergoing medication treatment from the subject population in this study. Because including a large number of such patients may have affected the results, we conducted an analysis after excluding the parameters of T-Cho and UA. However, the results were almost the same as when T-Cho and UA were included (Table 4-4).

Table.

4-4. Prediction of Thyrotoxicosis and Hypothyroidism with a TSH Level >10.0 MIU/L after Excluding T-Cho and UA as Indicators.

Subjects Dataset AUC (%) Sensitivity (%) Specificity (%) Accuracy (%) Top five contributing predictors
Thyrotoxicosis group vs. control group Ito 0.972 93.9 87 92.8 CRE, ChE, CPK, MCH, Ba
JHEP 0.957 91.9 83.7 90.7 CRE, ALT, ALP, age, TP
JND 0.944 91.6 80.5 89.9 CRE, ALT, age, sex, γGTP
Hypothyroidism (TSH>10 μIU/mL)group vs. control group Ito 0.873 80.9 79 79.8 TP, age, CRE, AST, Mo
JHEP 0.852 83.4 70.1 76 Age, TP, CRE, AST, sex
JND 0.824 80.9 66.1 72.6 Age, AST, CRE, sex, ALT

PPV and NPV based on real-world prevalence rates.

We calculated the PPVs and NPVs in this study using the sensitivity and specificity data obtained along with the prevalence data obtained from a previous study (1). The calculations were performed using both the prevalence rate of overt diseases and the sum of the prevalence rates for both latent and overt diseases (Table 5).

Table 5.

Estimating PPV and NPV in Actual Clinical Practice.

Subjects Dataset Sensitivity (%) Specificity (%) Prevalence (%) PPV NPV
Thyrotoxicosis group vs. control group Ito 94.3 89 Overt 0.7 5.7 100.0
Overt+sub 2.8 19.8 99.8
JHEP 92.2 89.1 Overt 0.7 5.6 99.9
Overt+sub 2.8 19.6 99.8
JND 92 81.3 Overt 0.7 2.4 99.9
Overt+sub 2.8 12.4 99.7
Hypothyroidism (TSH>10 μIU/mL) group vs. control group Ito 85.2 73.1 Overt 0.7 2.2 99.9
Overt+sub 6.5 18.1 98.6
JHEP 84.2 72.1 Overt 0.7 2.1 99.9
Overt+sub 6.5 17.3 98.5
JND 84 65.3 Overt 0.7 1.7 99.8
Overt+sub 6.5 14.4 98.3

The prevalence is cited from Reference 1. Sub: subclinical

In the “thyrotoxicosis vs. control” comparison, the PPVs obtained for the Ito dataset and JHEP dataset were 5.7% and 5.6%, respectively, for the overt cases, and 19.8% and 19.6%, respectively, for the overt cases and subclinical cases combined. However, the values obtained for the JND dataset were much lower at 2.4% and 12.4%, respectively. A similar trend was observed in the “hypothyroidism (TSH>10 mIU/L) vs. control”. The NPV for all datasets consistently ranged from 98% to 100%.

A comparison between predictive models: Prediction One vs. logistic regression

We performed a comparative analysis of the ROC curves from the test datasets to evaluate the predictive capabilities of the Prediction One and logistic regression models. The results demonstrated the superior performance of the Prediction One model, which consistently yielded a higher AUC than the logistic regression model (Table 6).

Table 6.

AUCs for Thyrotoxicosis and Hypothyroidism (TSH >10.0 MIU/L): Prediction One vs. Logistic Regression.

Subjects Dataset Analysis methods
Prediction One Logistic regression
Thyrotoxicosis group vs. control group Ito 0.977 0.965
JHEP 0.966 0.946
JND 0.948 0.927
Hypothyroidism (TSH>10 mIU/L)group vs. control group Ito 0.877 0.867
JHEP 0.864 0.837
JND 0.828 0.799

Discussion

From 2005 to 2020, we diagnosed 20,653 patients with Graves' disease and 3,435 patients with painless thyroiditis. In addition, we identified 4,266 individuals with a normal thyroid function. Participants were categorized into training and test datasets. Within the same timeframe, we identified 18,937 patients with untreated Hashimoto's disease, 17.6% of whom exhibited elevated TSH levels. These patients were similarly divided into different datasets. Our predictive model distinguished between thyrotoxicosis and a normal thyroid function. The key indicators included CRE, T-Cho, ChE, CPK, and MCHC. Furthermore, the model effectively identified hypothyroid cases in which TSH exceeded >10.0 mIU/L, emphasizing the significance of parameters, such as the TP, age, CRE, T-Cho, and AST. The model exhibited consistent accuracy for both the JHEP and JND datasets. Notably, its predictive efficiency surpassed that of logistic regression, underscoring the superior capabilities of Prediction One.

In models concerning thyrotoxicosis patients and controls, CRE emerged as a top predictor across all datasets: Ito, JHEP, and JND. The Ito dataset highlighted ChE and CPK as among the top five predictors; however, these predictors were not included in the JHEP or JND datasets. The JHEP dataset featured ALT and ALP, whereas the JND dataset prioritized ALT and UA levels. In the Ito dataset, ALT was ranked 8th, ALP 9th, and UA 18th; the latter was not deemed to be a significant contributor. Variations in AUCs among the datasets might be attributed to the inclusion of distinct parameters, such as ALT, ALP, and UA. Nevertheless, the AUC discrepancies between the Ito and JHEP datasets (0.011) and between the Ito and JND datasets (0.029) remained minimal. Sensitivity and accuracy showed negligible variations across the datasets.

Lipids, including T-Cho, LDL, HDL, and TG, paired with CRE were pivotal predictors in our model when distinguishing thyrotoxicosis patients from the euthyroid control group. The primary cause of the observed reduction in CRE levels in patients with thyrotoxicosis is increased renal filtration [glomerular filtration rate (GFR)]. This phenomenon is due to vasodilation induced by thyroid hormones and a decrease in muscle breakdown, which results in diminished creatinine production (7). Elevated thyroid hormone levels in thyrotoxicosis amplify the expression and activity of HMG-CoA reductase, thereby enhancing cholesterol synthesis. However, blood T-Cho levels drop owing to an increase in the uptake of LDL in the liver and increased LDL excretion into the bile, initiated by the induction of 7α-hydroxylase (8).

In the Ito dataset, the ChE was ranked as the third-most influential factor. While there is an ongoing debate that ChE synthesis might escalate in thyrotoxicosis cases, the exact nature of this relationship remains ambiguous. Although CPK levels are known to increase in cases of hypothyroidism, we observed that in thyrotoxicosis cases, the number of instances showing a similar increase in CPK levels was small and not statistically significant (9). Our findings indicate a notable drop in CPK levels in thyrotoxicosis cases, suggesting CPK's potential utility as a diagnostic marker for thyrotoxicosis.

Both the age and CRE data played a crucial role in predicting TSH levels >10 mIU/L across all datasets, relevant for hypothyroid patients and control subjects. The AUC values recorded for the Ito and JHEP datasets are 0.877 and 0.864, respectively.

To delve deeper into the implications of our results, we examined specific data points more closely. The TP data was instrumental in our research, with distinct differences noted across the 3 groups: 6.9 (0.5) for the thyrotoxicosis group, 7.3 (0.4) for the normal control group, and 7.6 (0.5) in the hypothyroidism group. TP primarily comprises ALB and globulin. Both hyperthyroidism (10) and hypothyroidism (11) tend to decrease TP levels, although the extent of this reduction can vary. Elevated TP values in hypothyroidism might be attributed to the high autoantibody levels often observed in Hashimoto's disease.

In addition to TP, other biochemical markers played a significant role in our analysis. For example, CRE has been identified as a vital predictor of hypothyroidism. The reduced renal blood flow and GFR in hypothyroidism typically results in elevated serum creatinine levels (12,13).

In addition to biochemical markers, demographic factors have also demonstrated predictive importance. Age was one of the top five predictors of thyrotoxicosis and hypothyroidism. Notably, in the JND dataset, sex was also recognized as a significant factor for hyperthyroidism. Generally, patients diagnosed with thyrotoxicosis and hypothyroidism were older and included a higher proportion of women than control subjects. When age and sex were excluded from the predictive model, the AUCs for the thyrotoxicosis and control groups in the Ito, JHEP, and JND datasets were 0.972, 0.958, and 0.935, respectively. The AUCs for hypothyroidism (TSH level >10 mIU/L) in these datasets were 0.858, 0.823, and 0.744, respectively. While these AUCs were marginally lower than those that included age and sex, the differences were minimal, with the notable exception of the hypothyroid group in the JND dataset.

The Ito dataset in our study did not include ALB, LDL-C, HDL-C, and TG levels, even though these variables were present in both the original JHEP and JND datasets. Future models that integrate these variables into the Ito dataset may yield better outcomes than those of the JHEP and JND datasets.

In the sub-analysis by sex, the diagnostic performance measures remained almost unchanged from before the sub-analysis. The results for the hypothyroidism group vs. control group in men showed an AUC of 0.897 for the JHEP dataset and 0.883 for the JND dataset, but the sensitivity, specificity, and accuracy were the same. This suggests that the two models had different probability distributions.

In the sub-analysis comparing the group ≤50 years old and the >50 years old, sensitivity was unsatisfactory for the hypothyroidism group versus the control group. One possible reason for these results, especially in the hypothyroidism vs. normal comparison, is the limited number of cases.

The PPV and NPV data were derived from the sensitivity and specificity data determined in this study, together with prevalence data from the reference literature. As the reference literature includes cases with TSH levels exceeding the upper limit of the reference range to as high as 10 mIU/L, the PPV and NPV values estimated for real-world clinical practice in our study may be slightly inflated. Nonetheless, with an NPV exceeding 98% and a PPV below 20% for both overt and subclinical cases, these figures can be considered acceptable for application in actual clinical settings.

Our research primarily relied on data obtained during initial examinations, including age and sex. Given that health checkups usually encompass body weight measurements and are conducted at intervals ranging from six months to a year, incorporating body weight fluctuations and previous test value changes from examinations might enhance the diagnostic accuracy.

Several limitations associated with the present study warrant mention. First, due to our hospital's specialization in thyroid diseases, even patients categorized as normal occasionally exhibited symptoms typical of thyroid diseases, such as cervical enlargement. As a result, our euthyroid control subjects may not fully represent the general healthy population. Second, our data were cross-sectional in nature. The majority of our cohort was ≤50 years old and predominantly women. Third, while patients treated for thyroid disorders prior to their initial hospital visit and those on thyroid medication were excluded, those on medication for other conditions, such as hypertension or hyperlipidemia, were not.

In conclusion, while machine learning, often equated with artificial intelligence, in medicine is predominantly utilized in diagnostic imaging, our study underscores its efficacy in screening for thyrotoxicosis and hypothyroidism (with TSH >10.0 mIU/L). This is achieved using variables such as age, sex, and common biochemical test data obtained during health checkups.

The authors state that they have no Conflict of Interest (COI).

Acknowledgement

We would like to extend our deep gratitude to the Ito Hospital staff for their unwavering dedication to patient care and diligent data collection. We also wish to thank the patients for their cooperation and consistent engagement throughout this study.

References

  • 1. Kasagi K, Takahashi N, Inoue G, Honda T, Kawachi Y, Izumi Y. Thyroid function in Japanese adults as assessed by a general health checkup system in relation with thyroid-related antibodies and other clinical parameters. Thyroid 19: 937-944, 2009. [DOI] [PubMed] [Google Scholar]
  • 2. Goichot B, Caron P, Landron F, Bouee S. Clinical presentation of hyperthyroidism in a large representative sample of outpatients in France: relationships with age, aetiology and hormonal parameters. Clin Endocrinol (Oxf) 84: 445-451, 2016. [DOI] [PubMed] [Google Scholar]
  • 3. Boelaert K, Torlinska B, Holder RL, Franklyn JA. Older subjects with hyperthyroidism present with a paucity of symptoms and signs: a large cross-sectional study. J Clin Endocrinol Metab 95: 2715-2726, 2010. [DOI] [PubMed] [Google Scholar]
  • 4. Hashimoto K. Update on subclinical thyroid dysfunction. Endocr J 69: 725-738, 2022. [DOI] [PubMed] [Google Scholar]
  • 5. Garber JR, Cobin RH, Gharib H, et al. ; the American Association of Clinical Endocrinologists and American Thyroid Association Taskforce on Hypothyroidism in Adults. Clinical practice guidelines for hypothyroidism in adults: cosponsored by the American Association of Clinical Endocrinologists and the American Thyroid Association. Thyroid 22: 1200-1235, 2012. [DOI] [PubMed] [Google Scholar]
  • 6. Yoshihara A, Yoshimura Noh J, Inoue K, et al. Prediction model of Graves' disease in general clinical practice based on complete blood count and biochemistry profile. Endocr J 69: 1091-1100, 2022. [DOI] [PubMed] [Google Scholar]
  • 7. Sonmez E, Bulur O, Ertugrul DT, Sahin K, Beyan E, Dal K. Hyperthyroidism influences renal function. Endocrine 65: 144-148, 2019. [DOI] [PubMed] [Google Scholar]
  • 8. Hashimoto K. Thyroid hormones and dyslipidemia. Nihon Koujousen Gakkai Zasshi (J Jpn Thyroid Assoc) 8: 135-150, 2017. [Google Scholar]
  • 9. McGrowder DA, Fraser YP, Gordon L, Crawford TV, Rawlins JM. Serum creatine kinase and lactate dehydrogenase activities in patients with thyroid disorders. Niger J Clin Pract 14: 454-459, 2011. [DOI] [PubMed] [Google Scholar]
  • 10. Iglesias P, Devora O, Garcia J, Tajada P, Garcia-Arevalo C, Diez JJ. Severe hyperthyroidism: aetiology, clinical features and treatment outcome. Clin Endocrinol (Oxf) 72: 551-557, 2010. [DOI] [PubMed] [Google Scholar]
  • 11. Wright DJ, Biddulph L, Rinsler MG. Serum albumin and the specificity of free tri-iodothyronine as a test for hypothyroidism. Ann Clin Biochem 26: 233-237, 1989. [DOI] [PubMed] [Google Scholar]
  • 12. Piras C, Pibiri M, Leoni VP, Balsamo A, Tronci L, Arisci N. Analysis of metabolomics profile in hypothyroid patients before and after thyroid hormone replacement. J Endocrinol Invest 44: 1309-1319, 2021. [DOI] [PubMed] [Google Scholar]
  • 13. Mariani LH, Berns JS. The renal manifestations of thyroid disease. J Am Soc Nephrol 23: 22-26, 2012. [DOI] [PubMed] [Google Scholar]

Articles from Internal Medicine are provided here courtesy of Japanese Society of Internal Medicine

RESOURCES