Abstract
Diagnosis of polycystic ovary syndrome remains a challenge. In this study, we propose constructing a diagnostic model of polycystic ovary syndrome by combining anti-Müllerian hormone with steroid hormones and oestrogens, with the aim of providing more bases and auxiliary means for the diagnosis of this disease. 1. Eighty-four samples from patients who were diagnosed with polycystic ovary syndrome at the First Affiliated Hospital of Zhejiang Chinese Medical University from May 2023 to November 2023 were collected as the case group, and 75 samples from the healthy population of the Health Screening Centre of the First Affiliated Hospital of Zhejiang Chinese Medical University during the same period were collected as the control group. 2. General information (including age, BMI, family history, medication history, etc.) and sex hormone data (including luteinising hormone, follicle stimulating hormone, prolactin, estradiol, testosterone, etc.) were collected from all study subjects. AMH and steroid hormone tests were performed on serum collected from all study subjects. 3. The data of 10 case groups and 10 control groups were randomly selected as validation set data, and the rest of the data were included in the model construction. The acquired data were screened for variables, a classification model based on a machine learning algorithm was constructed, and the constructed model was evaluated and validated for diagnostic efficacy. Ultimately, a total of 8 variables were screened and included in the subsequent model construction, namely LH, LH/FSH, E2, PRL, T, AMH, AD, and COR, with AMH having the highest diagnostic potential among all the variables included in the model. A total of five machine learning models were constructed, the logistic classification model has the best overall performance, and the support vector machine has the weakest overall performance. The validation set has an AUC of 0.86 for the model. In this study, five classification models based on machine learning algorithms were successfully constructed. Combining the evaluation metrics of each model performance, we concluded that the logistic classification model had the best performance capability in our study. However, since this study is a single-center small sample size study, some metabolic features of PCOS may be overlooked, and, as the validation set data in this study come from the same center as the modelling data, the validation results may have several limitations, so it is still necessary to expand the sample size and collect multicenter data to establish an external validation dataset to further improve the study.
Keywords: Polycystic ovary syndrome, Steroid hormones, Anti-Müllerian hormone, Machine learning
Subject terms: Diseases, Endocrinology, Medical research
Introduction
Polycystic ovary syndrome (PCOS) is the most common endocrine-metabolic disorder in women of childbearing age, affecting 10–15% of women worldwide1, and is one of the leading causes of infertility in women2. The clinical manifestations of PCOS are varied and are characterised by endocrine abnormalities such as little menstruation or amenorrhea, insulin resistance, hirsutism and acne. Similarly, PCOS increases the risk of cardiovascular disease and potential cardiovascular death in women with this disease and the risk of other endocrine system abnormalities, such as impaired fasting glucose and glucose tolerance3, and some patients also develop different degrees of mental health problems, including anxiety disorders and depression4. Additionally, patients with PCOS have a much greater risk of developing malignant tumours of the female reproductive system than healthy same-age women of the same age5. However, the current diagnosis of PCOS still has many difficulties and challenges, such as the lack of markers with high specificity, single means, and complicated diagnostic processes; therefore, providing more methods that can be used to assist in clinical diagnosis and minimise delays in the diagnosis of PCOS to intervene at an early stage is important for the protection of women’s reproductive health globally. Currently, the diagnosis of PCOS is still based on the Rotterdam Diagnostic Criteria proposed in 20036. The 2023 International Evidence-based Guideline for the Assessment and Management of Polycystic Ovary Syndrome further standardises the diagnostic thinking and diagnostic process for PCOS, which has improved the past problems of inconsistent criteria, lack of an evidence-based process and inconsistent assessment and management, while providing more evidence-based recommendations and clinical consensus, through the imposition of more stringent developmental rules. This guideline addresses on past inconsistencies in standards, a lack of evidence-based processes, and inconsistencies in assessment and management while providing more evidence-based recommendations and clinical consensus7; at the same time, the guidelines also suggest that there is a lack of specific markers for the diagnosis of PCOS and that a single diagnostic indicator can easily lead to misdiagnosis and underdiagnosis of the disease, so exploring a multi-indicator codiagnostic strategy is highly important for early intervention in the diagnosis of the disease.
Anti-Müllerian hormone (AMH) is a homodimeric glycoprotein belonging to the transforming growth factor-β (TGFβ) superfamily. It is secreted by the female ovary but is not regulated by the hypothalamic‒pituitary‒ovarian axis, and its level is relatively stable during the female menstrual cycle8. The 2023 International Evidence-based Guideline for the Assessment and Management of Polycystic Ovary Syndrome recommends that AMH can be used as a diagnostic marker for PCOS, but at the same time, the guidelines also point out that AMH cannot be used as a stand-alone diagnostic marker for the diagnosis of PCOS; in other words, the clinical application of AMH should be further expanded to explore the combined diagnosis with other indices to improve diagnostic efficiency7. In other words, the clinical application of AMH should be further expanded to explore combined diagnosis with other indicators to improve diagnostic efficiency. The levels of sex hormones, especially testosterone, are important in assessing a patient’s condition and formulating treatment plans. However, even though hyperandrogenaemia, as an important metabolic feature of PCOS, plays an important role in the diagnosis of PCOS, owing to its lack of specificity, it is still necessary to improve various tests to exclude other diseases that may cause hyperandrogenaemia to help diagnose PCOS, which undoubtedly will delay the diagnosis of the disease. This will undoubtedly delay the diagnosis of the disease. Steroid hormones regulate the growth and development of organisms through a variety of signalling pathways, and some studies have shown that the levels of steroid hormones are differentially expressed between patients with PCOS and the normal population; however, no study has confirmed the clinical significance of these steroid hormones in the diagnosis of PCOS.
Overall, the current methods for the diagnosis of PCOS are still limited by the use of single methods, the lack of specific markers, and the ease of missed and delayed diagnosis. Therefore, this study is the first to use AMH, steroid hormones and estrogen to jointly construct a diagnostic model of PCOS, aiming to provide more bases for the diagnosis of PCOS, provide more auxiliary diagnostic strategies for the clinic, increase the value of the application of these markers in the clinic, and provide a new direction for the definitive diagnosis of the disease.
Methods and materials
Participants
The study was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang Chinese Medical University (Ethics Approval No. 2023–KLS–244–01). The participants included 84 adult PCOS patients and 75 healthy controls, of which data from 10 PCOS patients and 10 healthy controls were randomly selected as model validation set data, and data from the remaining 74 PCOS patients and 65 healthy controls were used as modelling data. Patients with PCOS were obtained from patients diagnosed with PCOS who visited our gynecology and endocrinology clinics from May 2023 to November 2023. All healthy controls had normal menstrual cycles (25–35 days), no clinical or biochemical manifestations of hyperandrogenaemia, and no history of drug use in the last three months. All patients with PCOS were required to fulfil the following inclusion and exclusion criteria:
Inclusion criteria
(1) Diagnostic criteria for PCOS in adults were based on the 2003 Rotterdam criteria (where T > 1.97 nmol/L is considered hyperandrogenemia); (2) there was no history of hormonal drug use in the 3 months prior to enrolment; (3) the age of all the study participants was between 20–35 years; and (4) there was no history of occupational exposure to the environment, such as ionising radiation, toxic and hazardous gases, and organic chemicals.
Exclusion criteria
(1) Endocrine metabolic diseases affecting female ovulation such as hyperprolactinaemia, Cushing’s syndrome, and thyroid dysfunction; (2) adrenal tumours, adrenocortical hyperplasia, and other diseases causing hyperandrogenism; (3) severe hepatic, renal, pulmonary and haematopoietic insufficiency and cardiac diseases, autoimmune diseases, etc.; and (4) pregnancy or breastfeeding.
Sample and data collection
All the serum was collected after centrifugation in vacuum tubes containing separating gel from healthy people who came to the hospital or health check-up center on the same day from 8:00 to 10:00 AM, or 3–5 days after the onset of menstruation if the subjects had a normal menstrual cycle. All samples were collected and dispensed in enzyme-free EP tubes, with no less than 0.5 mL of serum per sample, and placed in cryopreservation tubes at -80 °C. The samples were kept at room temperature for no more than 2 h before cryopreservation. Basic clinical data and estrogen indicators of all the subjects, including age, height, weight, follicle-stimulating hormone (FSH), luteinising hormone (LH), prolactin (PRL), estradiol (E2), progesterone (P), and testosterone (T), were collected. The collected samples were tested for AMH and steroid hormones, which included androstenedione (AD), dehydroepiandrosterone (DHEA), dihydrotestosterone (DHT), dehydroepiandrosterone sulfate (DHEAS), 17–hydroxyprogesterone (17–OHP), estrone (E1), pregnenolone (P5), cortisol (Cort), corticosterone (COR), 11–dehydrocorticosterone (11–DHC), deoxycorticosterone (DOC).
Data analysis
AMH was detected by electrochemiluminescence (Roche automated electrochemiluminescence immunoassay system, Cobas 6000, Roche Diagnostics Roche Diagnostics GmbH). Steroid hormones were detected via liquid chromatography–tandem mass spectrometry (Agilent 1290-AB Sciex 5500, Ultra High Performance Liquid Chromatography–Tandem Mass Spectrometer Agilent-AB Sciex).
All the clinical data were analysed and processed via SPSS version 25.0 software. All the data were analysed for normality, and the independent samples t- test was used for the basic clinical data, estrogen data and steroid hormone data that conformed to a normal distribution. The results of the analyses were expressed as
. The nonparametric rank sum test was used for the data that did not conform to a normal distribution, and the results of the analyses were expressed as M (p25, p75), with p25 and p75 denoting the 25th and 75th quartiles of the data and the 75th percentile values of the data, where p < 0.05 was considered statistically significant.
The machine learning approach to building models will be performed on the Deepwise & Beckman Coulter DxAI platform. The platform is capable of autonomously selecting the machine learning models to be analysed and splitting the data between training and test sets, automatically demonstrating model-related efficacy. In this study, five different machine learning algorithms were chosen for model construction, namely, XG Boost, random forest, support vector machine, logistic classification and Gaussian plain Bayes, because the reason is that the algorithms of these five types of models are more mature at present, which are more suitable for constructing binary endings. In the process of constructing the models, we find that the stability of these five models is better; therefore, fluctuations in the results due to the instability of the algorithm can be avoided. The XG Boost model is constructed via python: xgboost = 2.0.1, and the other models are constructed using python: scikit-learn = 1.1.3. Shap plots are drawn via Python: Shap = 0.43.0, and the ratio of the training set to the test set is chosen to be 7:3 to improve the accuracy of the algorithm. tenfold cross-validation to improve the accuracy of the algorithm. In addressing the issue of missing values, we employed the random forest algorithm to impute these gaps.
Results
Serum samples were collected and analysed from 139 enrolled subjects used for modelling, including 74 in the adult PCOS patient group and 65 in the healthy control group. There was no statistically significant difference in age or BMI between the PCOS patient group and the healthy control group (p > 0.05). In terms of the estrogen data, the FSH and P levels were not significantly different (p > 0.05); the LH, LH/FSH (median greater than 2.0), E2, PRL and T levels were significantly different (p < 0.05), with the LH, LH/FSH and T levels being significantly greater in the PCOS group than in the control group and the E2 and PRL levels being significantly lower than those in the control group (Table 1).
Table 1.
General information and estrogen data of the participants.
| Variables |
or M(P25, P75) |
P- value | |
|---|---|---|---|
| Control groups(n = 65) | PCOS groups (n = 74) | ||
| Age (years) | 30.05 ± 3.83 | 29.00 ± 3.75 | 0.082 |
| BMI (kg/m2) | 22.32 (20.10,25.20) | 22.95 (20.5,26.2) | 0.277 |
| FSH (IU/L) | 5.21 (3.76,6.46) | 5.64 (4.81,6.69) | 0.099 |
| LH (IU/L) | 4.84 (3.23,7.55) | 10.45 (7.22,14.94) | < 0.01 |
| LH/FSH | 1.05 (0.66,1.63) | 2.02 (1.35,2.81) | < 0.01 |
| E2 (pmol/L) | 268.87 (177.36,406.69) | 196.16 (162.63,239.83) | < 0.01 |
| PRL (mIU/L) | 311.73 (229.00,422.94) | 264.09 (223.82,343.91) | 0.018 |
| P (nmol/L) | 0.54 (0.36,1.17) | 0.46 (0.37,0.68) | 0.181 |
| T (nmol/L) | 1.24 (1.05,1.51) | 2.44 (1.94,2.86) | < 0.01 |
Comparison of general information and estrogen data of the participants between the PCOS (n = 74) and control groups (n = 65), where p < 0.05 was considered significant.
The results of the comparative analysis of the serum levels of AMH and steroid hormones in the control and PCOS groups suggested that there was a statistically significant difference in AMH, AD, DHEAS, Cort, COR, 11-DHC and DOC between the two groups (p < 0.05), whereas there was no statistically significant difference between the two groups (p > 0.05) in DHEA, DHT, 17-OHP, E1 and P5. Among the significantly different hormones, the levels of AMH, AD and DHEAS were significantly greater in the case group than in the control group, whereas the levels of Cort, COR, 11-DHC and DOC were significantly lower in the case group than in the control group (Table 2).
Table 2.
Comparison of AMH and steroid hormone data among the participants.
| Variables |
or M(P25, P75) |
P- value | |
|---|---|---|---|
| Control groups(n = 65) | PCOS groups(n = 74) | ||
| AMH (ng/mL) | 3.26 (1.79, 4.94) | 9.93 (6.79, 12.96) | < 0.01 |
| AD (ng/mL) | 1.01 (0.75, 1.40) | 1.37 (1.12, 1.84) | < 0.01 |
| DHEA (ng/mL) | 3.59 (2.56, 5.38) | 4.01 (2.78, 5.00) | 0.796 |
| DHT (ng/mL) | 87.40 ± 32.10 | 82.32 ± 19.24 | 0.222 |
| DHEAS (ng/mL) | 1262.84 ± 608.37 | 1557.60 ± 693.44 | < 0.01 |
| 17-OHP (ng/mL) | 0.41 (0.28, 0.67) | 0.42 (0.33, 0.58) | 0.661 |
| E1 (pg/mL) | 58.07 (43.50, 80.46) | 58.00 (48.87, 73.12) | 0.934 |
| P5 (ng/mL) | 0.69 ± 0.34 | 0.59 ± 0.37 | 0.079 |
| Cort (ug/dL) | 7.26 (5.35, 9.79) | 6.00 (4.48, 7.76) | 0.013 |
| COR (ng/mL) | 1.65 (1.03, 3.07) | 1.16 (0.66, 1.95) | < 0.01 |
| 11-DHC (ng/mL) | 0.25 (0.18, 0.37) | 0.20 (0.14, 0.31) | 0.017 |
| DOC (pg/mL) | 16.44 (11.78, 25.46) | 12.30 (8.17, 19.59) | < 0.01 |
Comparison of AMH and steroid hormone data of the participants between the PCOS (n = 74) and control groups (n = 65), where p < 0.05 was considered significant.
To preliminarily investigate the correlation between AMH and estrogen levels in the enrolled population, the present study used Spearman correlation analysis to correlate the AMH data and estrogen data of the study population, in which p < 0.05 was considered to indicate that there was a certain degree of correlation between the two variables (Table 3). To visualise the correlation results of the variables more, we also drew a heatmap that can reflect the correlation, where the shade of the color is used to represent the strength of the correlation (Fig. 1). AMH was correlated with FSH, LH, E2 and T (p < 0.05), while, there was a positive correlation between AMH and FSH, LH and T and a negative correlation with E2, in which the absolute values of the correlation coefficients of AMH with LH and T were greater, indicating a stronger correlation.
Table 3.
Spearman’s analysis of AMH and estrogen.
| Variables | FSH | LH | PRL | E2 | P | T |
|---|---|---|---|---|---|---|
| P– value | 0.042 | < 0.01 | 0.077 | 0.027 | 0.235 | < 0.01 |
| Spearman’s coefficient | 0.162 | 0.536 | −0.14 | −0.175 | −0.095 | 0.529 |
The size of the absolute value of the Spearman correlation coefficient indicates the strength of the correlation between two variables; the closer the absolute value is to 1, the stronger the correlation. The closer the absolute value is to 1, the stronger the correlation is, and a positive or negative value of the correlation coefficient indicates a positive or negative correlation between the variables.
Fig. 1.

Heatmap of the coefficient of correlation between AMH and estrogen. Note: Both the horizontal and left vertical coordinates indicate the individual variables included in the model, and the absolute value of the number in the box represents the magnitude of the correlation, with a larger absolute value indicating a stronger correlation. Also, the color of the boxes is used to indicate the correlation (right vertical coordinates). The Beckman Coulter DxAI platform (https://www.xsmartanalysis.com/beckman/login/) was used for generating the heatmap.
Model variable screening
Combining the results of the baseline data analyses of the control and PCOS patient groups, indicators with between-group differences were selected and subjected to one-way logistic regression analyses to screen for variables associated with positive outcomes. The results revealed 11-DHC was not significantly associated with the occurrence of positive outcomes and was excluded (p- value > 0.05), whereas LH, LH/FSH, E2, PRL, T, AMH, AD, DHEAS, Cort, COR and DOC were strongly associated with positive outcomes (p < 0.05). LH, LH/FSH, T, AMH, AD and DHEAS were positively associated with the probability of a positive outcome and were independent risk factors for the development of disease; E2, PRL, Cort, COR and DOC were negatively associated with the probability of the outcome (having PCOS) (Table 4).
Table 4.
Results of single-factor logistic regression analysis of risk factors associated with PCOS.
| Variables | B | SE | Wald | OR | 95% CI | P- value |
|---|---|---|---|---|---|---|
| LH | 0.110 | 0.032 | 12.092 | 1.116 | 1.049–1.188 | < 0.01 |
| LH/FSH | 0.857 | 0.196 | 13.821 | 2.355 | 1.604–3.460 | < 0.01 |
| E2 | −0.003 | 0.001 | 8.245 | 0.997 | 0.996–0.999 | < 0.01 |
| PRL | −0.003 | 0.001 | 5.695 | 0.997 | 0.994–0.999 | 0.02 |
| T | 3.253 | 0.517 | 39.663 | 25.872 | 9.400–71.206 | < 0.01 |
| AMH | 0.654 | 0.102 | 41.142 | 1.923 | 1.575–2.349 | < 0.01 |
| AD | 1.596 | 0.375 | 18.133 | 4.931 | 2.366–10.278 | < 0.01 |
| DHEAS | 0.001 | 0.026 | 7.326 | 1.001 | 1.000–1.001 | < 0.01 |
| Cort | −0.124 | 0.048 | 6.750 | 0.883 | 0.804–0.970 | < 0.01 |
| COR | −0.303 | 0.108 | 7.943 | 0.738 | 0.598–0.912 | < 0.01 |
| 11-DHC | −1.955 | 0.999 | 3.833 | 0.142 | 0.020–1.002 | 0.052 |
| DOC | −0.046 | 0.017 | 7.370 | 0.955 | 0.924–0.987 | < 0.01 |
SE, Standard Error; B, regression coefficient. B and OR were used to distinguish whether the relationship between the variable and the outcome (having PCOS) was positive or negative. If B < 0, and OR < 1, it was considered a negative correlation, and vice versa for a positive correlation.
Since this study is a small sample size and multivariate factor study, the variables are prone to multicollinearity problems; therefore, on the basis of the results of the above single factor logistic regression analysis, further LASSO regression analysis was performed to weaken the multicollinearity, tenfold cross validation was used, and coefficient plots and cross validation plots of the LASSO regression were plotted at the same time (Fig. 2A, B). Finally, a total of eight variables were selected and included in the subsequent model construction, namely LH, LH/FSH, E2, PRL, T, AMH, AD, and COR. The diagnostic performance of each of these eight variables was subsequently analysed in the present study (Fig. 3 and Table 5), among which the combined diagnostic performance of AMH was the best (AUC = 0.925), the sensitivity of AMH was the highest, reaching 0.905, which was the highest, and the sensitivity of AMH was the highest. Reached 0.905, and the variable with the highest specificity was PRL, which reached 0.917, but its sensitivity was extremely low at 0.293.
Fig. 2.
Coefficient plots (A) and cross-validation plots (B) for LASSO regression analyses of variables. Note: The coefficient plot (A) shows the relationship between the coefficient values of the variables and the penalty coefficients. The larger the horizontal Log Lambda, the larger the penalty coefficient, and the vertical coordinate indicates the coefficient value of the variable. The cross-validation plot (B) is used to select the best Lambda value to minimise the cross-validation error, the dotted line on the left side of the plot corresponds to the value 8 on the graph representing that 8 variables with non-zero analysis results were eventually screened out.
Fig. 3.

ROC curve analysis for all variables in the model. Note: We plotted the ROC curves for all the variables included in the model and labeled the colors corresponding to the different variables on the right side, as well as the AUC for each variable, with AUC closer to 1 representing better performance of the variable.
Table 5.
Diagnostic performance analysis of all variables in the model.
| Variables | AUC | Sensitivity | Specificity | Youden index | Cut-off |
|---|---|---|---|---|---|
| LH | 0.766 | 0.798 | 0.72 | 0.518 | 7.03 |
| LH/FSH | 0.758 | 0.655 | 0.813 | 0.468 | 1.758 |
| E2 | 0.637 | 0.533 | 0.798 | 0.331 | 260.32 |
| PRL | 0.609 | 0.293 | 0.917 | 0.21 | 416.62 |
| T | 0.867 | 0.667 | 0.86 | 0.667 | 1.94 |
| AD | 0.72 | 0.774 | 0.627 | 0.4 | 1.117 |
| COR | 0.65 | 0.653 | 0.607 | 0.26 | 1.371 |
| AMH | 0.925 | 0.905 | 0.787 | 0.691 | 5.12 |
Eight variables were finally screened for inclusion in the model construction. AUC was used to assess the diagnostic ability of the variables, and the Youden index was used to screen the cut-off of the variables on the ROC, with the cut-off indicating the threshold for diagnosing PCOS.
Establishing diagnostic models
Eight variables, LH, LH/FSH, E2, PRL, T, AMH, AD and COR, were simultaneously included in the construction of the machine learning model. In this study, five commonly used classifiers were simultaneously selected to construct machine learning models, namely, XG Boost, random forest, support vector machine, logistic classification and Gaussian plain Bayes, because these five machine learning algorithms are more suitable for dichotomous endings (whether or not they are diagnosed with PCOS), and in the process of constructing the models, we found that even if we change the training set and the test set, the results of these five models are more stable and do not cause large fluctuations in the results due to changes in the dataset. A tenfold cross-validation was used to construct the model, while the data were randomly split into a test set and a training set at a ratio of 3:7 at the time of model construction, and the performance ability of the test set data in these models was compared (Table 6, Fig. 4). The results suggested that the model with the largest AUC was the logistic classification model at 0.934, the XG Boost model had the highest diagnostic accuracy and sensitivity at 0.905 and 0.870, respectively, the model with the highest specificity was the logistic classification model at 0.947 and the logistic classification model had the highest F1 score at 0.864. To explore the importance of each variable in different models, this study conducted a variable weighting analysis of the models, calculated the contribution of all variables included in the models to the occurrence of PCOS, and ranked the importance of the variables, which revealed that AMH and T were the most important variables in all four models except for the support vector machines, and the variables that were the most important in the support vector machines model were RPL and E2 (Figs. 5 and 6).
Table 6.
Comparison of five machine learning test set model performance evaluation metrics.
| Models | AUC | 95% CI | Cut-off | Accuracy | Sensitivity | Specificity | F1 |
|---|---|---|---|---|---|---|---|
| XG Boost | 0.866 | 0.736–0.997 | 0.757 | 0.905 | 0.870 | 0.907 | 0.829 |
| Random forest | 0.867 | 0.785–0.984 | 0.700 | 0.738 | 0.826 | 0.789 | 0.813 |
| Support vector machine | 0.740 | 0.577–0.874 | 0.863 | 0.667 | 0.609 | 0.842 | 0.642 |
| Logistic classification | 0.934 | 0.852–0.999 | 0.576 | 0.873 | 0.826 | 0.947 | 0.864 |
| Gaussian plain Bayes | 0.922 | 0.839–0.996 | 0.863 | 0.857 | 0.739 | 0.912 | 0.814 |
Accuracy was used to evaluate the number of diseased persons successfully diag-nosed by the model as a proportion of the total number of diseased persons. Specificity was used to evaluate the true-negative rate of the model in diagnosing the disease. The F1 score was used to evaluate the recall of the model.
Fig. 4.
ROC curve analysis of machine learning models. Note: The ROC curves of five machine learning models for diagnosing PCOS were plotted, using AUC to indicate the performance of the models. (A) XG Boost (AUC = 0866); (B) random forest (AUC = 0.867); (C) support vector machine (AUC = 0.740); (D) logistic classification (AUC = 0.934); (E) Gaussian plain Bayes (AUC = 0.922).
Fig. 5.
Importance ranking of variables in all machine learning models. Note: The vertical coordinates represent the different variables and the horizontal coordinates represent the average impact on model output magnitude, which means that the longer the histogram corresponding to the variable, the greater the effect on the diagnostic power of the model and the greater the importance of the variable in the model (A) XG Boost; (B) random forest; (C) support vector machine; (D) logistic classification; (E) Gaussian plain Bayes).
Fig. 6.
Contribution of variables in all machine learning models. Note: The vertical coordinate is the variable, the horizontal coordinate represents the Shap value, each point is a sample, and the colour is to represent the high or low eigenvalue. Red indicates high values and blue indicates low values (A) XG Boost; (B) random forest; (C) support vector machine; (D) Logistic Classification; (E) Gaussian plain Bayes).
Logistic classification model performance evaluation and validation
Performance evaluation of the logistic classification model
Combining the above findings we tentatively concluded that the logistic classification model performed best among the five machine learning models constructed using the data from this study. To further analyse the performance of the model in depth, we evaluate the model’s expressiveness from the following perspectives. First, combining the results of the above studies, we found that even though AMH had the best combined diagnostic power and diagnostic sensitivity among all the variables, it was still smaller than that of the logistic classification model. Although the specificity of the PRL is high, its sensitivity is too low, suggesting that its ability to diagnose PCOS is still limited. We then analysed the machine learning curve of the logistic classification model (Fig. 7), which was designed to evaluate the change in the model’s performance as the training set data increased, and thus to determine whether the variance or bias of the model was too high, i.e., to determine whether the model’s fit was good and whether the model was stable when the training set was different. The results show that the training and validation sets always fit better when the combination of data in the training set is constantly changed, suggesting that the model is well fitted and that the model is sufficiently stable. Finally, we plotted the K‒S curve of the logistic classification model with the aim of clarifying the model’s ability to discriminate between negative and positive events, that is, between the PCOS and non-PCOS populations (Fig. 8). In the figure, the blue curve represents the probability of correctly judged positive events (TPR), the green curve represents the probability of incorrectly judged positive events (FPRs), and the dotted line represents the position where the spacing between the two solid lines is the largest. The difference between the corresponding TPR and FPR at this time is the K‒S value. The magnitude of K‒S reflects the ability of the model to distinguish between negative and positive events, and the larger the K-S value is, the better the model’s ability to distinguish between negative and positive events. The size of the K-S value can reflect the ability of the model to distinguish between negative and positive events, and the larger the K-S value is, the stronger the ability of the model to distinguish between negative and positive events. The figure shows that the K-S value of the model constructed in this study is 0.773, which suggests that the model has excellent differentiation ability.
Fig. 7.

Learning curve for the Logistic model. Note: The horizontal coordinate represents the number of samples in the training set, the vertical coordinate represents the size of the AUC, the red dashed line represents the training set, and the blue dashed line represents the internal validation set.
Fig. 8.

K‒S curve results for the Logistic model. Note: The horizontal coordinates in the figure represent the thresholds, the vertical coordinates are the values of the TPR or FPR, and the dashed line represents the state when the TPR- FPR is at its maximum. The blue solid line Class 0 represents the TPR, and the green solid line Class 1 represents the FPR.
Performance Validation of the logistic classification model
Owing to the lack of an external validation dataset, to verify whether the constructed model has application value, this study validated the applicability of the model via data from 20 cases outside of the modelled dataset as validation set data, including 10 cases of data from the PCOS case group and 10 cases of data from the normal healthy control population. None of the 20 cases of data were incorporated into the construction of the model and were used for the validation of the performance of the model only (Fig. 9). The results show that the validation set data have an AUC of 0.86 in the model and a sensitivity of 0.80 and a specificity of 0.90, suggesting that the validation set data have good performance in the logistic classification model constructed in this study and that the model has good applicability.
Fig. 9.

Validation set data in the logistic classification model.
Discussion
This study found that LH and T showed a strong correlation with AMH. We believe this may explain why patients with PCOS primarily exhibit symptoms such as hyperandrogenemia (acne, hirsutism, etc.) and menstrual irregularities. We constructed five machine learning models, all of which demonstrated good diagnostic capabilities; however, the support vector machine exhibited the weakest performance among them. Upon comprehensive evaluation, we determined that the Logistic classification model had the best overall performance. Additionally, we validated the Logistic classification model using data from a validation set and found that its performance remained robust. This indicates that not only did the Logistic classification model demonstrate superior overall efficacy in our study, but it also possesses good applicability. However, this does not imply that it is necessarily the optimal model for all populations.
Metabolic characteristics of estrogen and its pathological mechanisms in patients with PCOS
Existing studies have demonstrated the abnormal expression of various types of oestrogens at circulating levels in patients with PCOS, but analysing these metabolic profiles in isolation lacks specificity; therefore, analysing the expression profiles of these oestrogens and using these metabolically abnormal oestrogens to formulate strategies that can be used to aid in the diagnosis of PCOS is expected to improve the efficiency of the diagnosis of PCOS.
After analysing the intergroup variability in the basic clinical data and oestrogen levels of the patients enrolled in this study, it was found that among the healthy control population and the PCOS case group population of similar age and BMI, the levels of LH, LH/FSH, and T were significantly greater, and the levels of E2 and PRL were significantly lower, all of which were significantly different from those of the control group (p < 0.05). Dysregulation of the ratio of LH to FSH is a common endocrine disorder in patients with PCOS and is mainly characterised by an increase in the LH/FSH ratio9. Studies suggest that dysfunction of the hypothalamic‒pituitary‒ovarian axis in patients with PCOS may contribute to this presentation10. Luteinising hormone receptor (LHR) and follicle‒stimulating hormone receptor (FSHR) are important mediators of the physiological roles of LH and FSH, and LHR and FSHR expression are increased in patients with PCOS11. At the same time, the interaction between high levels of LHR and LH promotes and phosphorylates the PI3K signalling pathway and induces the proliferation of follicular membrane cells, which increases androgen levels and ultimately leads to a significant increase in T12. In addition to promoting lactation and breast development, PRL has a number of metabolic regulatory effects that play important roles in maintaining endocrine stability in the body. Some studies have shown that low premenopausal levels of PRL are associated with the development of insulin resistance, which may cause a number of metabolic syndromes, leading to a series of adverse consequences13. A retrospective study analysing thousands of PCOS patients and control individuals revealed that PCOS patients had significantly lower serum levels of PRL, similar to our findings, and the retrospective study also revealed that low serum levels of PRL were highly correlated with infertility in PCOS patients14. The regulation of female physiology by E2 not only is limited to the maintenance of secondary sex characteristics, but equally importantly, is involved in the physiological and pathological processes of cardiovascular, skeletal, endocrine and other systems; and therefore, it is also associated with many endocrine metabolic diseases. The expression of estrogen receptors in the endometria of PCOS patients is inhibited, which not only affects the physiological role of E2, but also inhibits the level of E2 in the circulation15. As mentioned above, E2 also plays a crucial role in regulating the cardiovascular system, and the high risk of cardiovascular events in patients with PCOS may be due not only to disturbances in lipid metabolism but also to that abnormal levels of E2.
Machine learning models in this study
The logistic classification model in this study has the best performance among all five classes of machine learning models. Further analysis of the performance ability of the logistic classification model revealed that even though AMH had the greatest diagnostic potential among all the variables, the diagnostic potential of the logistic classification model was still greater than that of all the variables included in the model, which not only suggests the importance of AMH in assisting in the diagnosis of PCOS, but also that the model constructed by combining AMH with steroid hormones and sex hormones can effectively improve the diagnostic efficiency of PCOS, which provides some reference significance for the joint diagnosis of AMH. Both AMH and T have the most important contributions in models other than support vector machines, which may explain why ovulatory disorders and hyperandrogenaemia are the two most important pathological features of PCOS. The weakest diagnostic efficacy of the support vector machine may be because the fact that the support vector machine, as a high-precision algorithm with one of the longest histories of development, is highly sensitive to the parameters, which leads to a greater influence of the covariance between variables when dealing with a small sample size of data16, and at the same time, to make the classification results of the model more desirable and reliable when the model construction is carried out via the support vector machine, the parameters of the model need to be continuously adjusted, and the lack of selection of regularisation parameters when performing model operations in this study may also be the reason for the poor classification of the model in this study. In our study, the Logistic Classification model demonstrated superior performance compared to models such as XGBoost and Random Forests. This finding may come as a surprise, given that XGBoost and Random Forests are often regarded as more powerful machine learning algorithms. However, this perception is not absolute and remains subject to interpretation. This outcome suggests that the indicators and variables examined in this study exhibit a linear relationship with one another. With a limited number of features available, Logistic Classification appears to have effectively captured these linear relationships within the data. Conversely, if significant non-linear patterns are absent from the dataset, the advantages offered by more complex models may not be fully realized.
In PCOS, there is a close relationship between AMH levels and estrogen expression
The main physiological role of AMH is to regulate follicle production in the ovary, which affects the number and volume of follicles in the course of clinical practice. AMH is an important marker for evaluating follicular growth and development, and is therefore also regarded as a marker of ovarian reserve function17, and is also used to assess the efficacy of ovulation-promoting treatments18. Serum levels of AMH are virtually undetectable in women at birth, reaching a maximum level after puberty, after which they gradually decline with age and become undetectable after menopause19. AMH levels have been found to be higher in patients with PCOS than in the normal population20, and the same phenomenon has been observed in daughters of patients with PCOS21. AMH levels have even been found to be approximately 18-fold higher in anovulatory women with PCOS than in ovulatory women with PCOS22. This feature of PCOS has also been confirmed in rodent models of PCOS, where it was found in an animal model experiment that the high serum LH and AMH levels exhibited by animal models of PCOS directly affect follicular proliferation and development as well as the ovulatory process, and that this phenomenon is also present in female offspring of such model animals23, which seems to explain the genetic susceptibility to PCOS. Indeed, this inhibitory effect of AMH is present mainly at two different stages of follicular development. First, AMH inhibits the development of follicles from primary follicles to mature follicles to play an important role in regulating the number of follicles remaining in the initial follicular pool24; second, after follicular development begins to become progressively dependent on FSH, AMH reduces the sensitivity of follicles to FSH, thus playing a role in follicle selection25.
This study comprehensively analysed the correlation between AMH and estrogen in the enrolled population and revealed that there was a positive correlation between AMH and both LH and T, and the correlation between AMH and estrogen was stronger than that between AMH and other hormonal indicators, which is related to some pathological mechanisms of PCOS. T can promote follicular growth and development by inducing FSH receptors, thereby directly or indirectly stimulating AMH secretion26. In vivo and in vitro experiments have confirmed that hyperandrogenaemia induces the overexpression of AMH in ovarian granulosa cells (GCs) of PCOS patients27;likewise, several studies have reported a positive correlation between androgens and AMH, and more specifically, serum AMH concentrations in PCOS patients with hyperandrogenaemia are significantly higher than in PCOS patients with normal androgen levels28. A significant correlation has also been found between LH and AMH. One study reported a positive correlation between LH and AMH in approximately 60% of women with PCOS, possibly because high levels of LH induce high levels of AMH expression in GCs29. LH has also been found to have no significant effect on AMH levels in ovulating normal women, but in patients with PCOS, LH can upregulate the expression of mRNAs and related proteins of AMH, thus maintaining high AMH levels in patients with PCOS30. Moreover, the present study revealed that even though there may be some correlation between AMH levels and E2 levels in PCOS patients, the correlation is still weak (coefficient of -0.175) on the basis of the results of the present study; however, there is insufficient research evidence to confirm what kind of correlation exists between AMH levels and E2 levels in PCOS patients, and that the GCs of PCOS patients whose AMH mRNA concentrations are not affected by E2 levels31. Interestingly, however, the expression of the estradiol receptor (ESR) in the GCs of patients with PCOS seems to influence the concentration of AMH. Pierre31 et al. reported a positive correlation between the ESR1/ESR2 ratio and the concentration of AMH in the GCs of PCOS patients, and they speculated that an increase in the ESR1/ESR2 ratio may weaken the ability of estrogen to inhibit the action of AMH, thus resulting in high AMH expression. Overall, the results of the present study tentatively suggest that there is a strong positive correlation between AMH levels and the levels of LH and T in patients with PCOS, which may be due to the interaction between circulating levels of AMH and the levels of various estrogens in patients with PCOS.
Steroids are valuable in assisting in the diagnosis of PCOS
Steroid hormones, or steroid hormones, which include mainly adrenal hormones and sex hormones, constitute a class of tetracyclic aliphatic hydrocarbon compounds that are secreted by mammalian adrenal glands and are also secreted in the ovaries of females. These compounds play which occupy an important roles in immune homeostasis, the regulation of sexual function, and the promotion of growth and development in these organisms. The fact that patients with PCOS often present with clinical or biochemical manifestations of hyperandrogenaemia has led to increasing attention in recent years to the expression of steroid hormones in the body circulation of PCOS patients. A case‒control study analysed the levels of steroid hormones in the circulation of normal women of childbearing age and women with PCOS and reported that the levels of steroid hormones in patients with PCOS were significantly greater, and that this difference was even more pronounced in patients with PCOS who had combined hyperandrogenaemia, with high circulating levels of, mainly, 17-OHP, AD, DHT, E1, etc.32. There is still no direct evidence to prove the specific reason for the abnormal expression of steroid hormones in PCOS patients, but some indirect findings can explain this phenomenon. Some scholars have reported that the metabolites of steroid hormones in the urine of PCOS patients are significantly greater than those in the normal population, but there is no significant change in the levels of pro-adrenocorticotropic hormones; thus, the conclusion that PCOS patients are abnormally sensitive to pro-adrenocorticotropic hormones has been hypothesised33. In addition, dysregulation of the synthesis and metabolism of steroid hormone production due to abnormalities in the glucose-insulin axis in insulin-resistant PCOS patients may likewise contribute to this endocrine metabolic abnormality34.
Since the levels of steroid hormones are strongly influenced by age, the PCOS case group and the healthy control group were controlled at similar levels (30.05 ± 3.83 in the control group and 29.00 ± 3.75 in the case group) in the present study at the time of inclusion of the subjects. In the present study, the expression of the steroid hormones AD, DHEAS, Cort, COR, 11-DHC and DOC was significant in patients with PCOS, with AD and DHEAS being two highly expressed steroid hormones, which are associated mainly with the clinical features of hyperandrogenaemia occurring in PCOS. Hyperandrogenaemia is one of the most important bases for the diagnosis of PCOS, but the means of biochemical assessment of hyperandrogenaemia are currently limited in clinical practice, not only because most scholars believe that T is limited in its ability to accurately reflect a patient’s hyperandrogenemic state, but also because it is still controversial as to which or which androgen indicators are currently used to evaluate the level of androgens35. AD and DHEAS have been found to be helpful in improving the diagnostic efficiency of PCOS, especially in the assessment of hyperandrogenaemia36, and, AD has also been found to be highly valuable in predicting miscarriage in patients with PCOS37. AD is a naturally occurring steroid hormone secreted by the ovaries and gonads; it is one of the earliest steroid hormones identified as a direct precursor of T, E2 and DHT; and it can also produce various types of derived steroid hormones such as COR, Cort, and E138. A search for better markers for the assessment of hyperandrogenaemia by simultaneous measurement of AD, T and sex hormone-binding globulin levels in follicular fluid, with sex hormone-binding globulin being used to calculate the free testosterone index, revealed that AD was more useful in the diagnosis of hyperandrogenaemia than was T and the free testosterone index39. DHEAS is in fact the sulfated form of DHEA, which has lost its biological activity and is more stable and less susceptible to environmental factors, such as circadian rhythms and patient stress, than DHEA is. Therefore, DHEAS is more commonly used and more accurate than DHEA in assessing adrenal androgen levels, and is also used to assist in the diagnosis of PCOS40. This seems to explain the high expression of DHEAS, i.e., the ovaries of PCOS patients synthesise excessive amounts of DHEA, which is sulfated to increase DHEAS levels significantly. Although a number of studies have confirmed that DHEA expression is significantly higher in PCOS patients than in normal women, this result was not reflected in the present study, and we speculate that this may be because circulating levels of excess DHEA in PCOS patients are sulfated, resulting in a decrease in DHEA levels.
Limitations
Due to the limited sample size of our study, there may be potential biases and issues related to overfitting. To address these concerns, we employed Lasso regression analysis for variable selection, which helps mitigate model complexity and reduces the risk of bias and overfitting by incorporating an L1 regularization parameter. Additionally, we integrated the predictive outcomes from multiple models to further minimize bias and reduce the risk of overfitting based on existing data. However, this still suggests that multi-center validation is essential for ensuring the reliability and generalizability of clinical prediction models. Data derived from a single center may introduce selection bias and exhibit localized characteristics that can restrict the broader applicability of these models. In contrast, multicenter validation facilitates the evaluation of model performance across diverse populations, devices, and operational procedures, thereby confirming their stability and accuracy in various clinical environments. Furthermore, multicenter data more accurately represent the diversity found in real-world scenarios, which enhances both the credibility of the model and its potential for widespread clinical application.
However, as a single-center study, this study lacked an external validation dataset to further validate the reliability of the model, and the validation set data in the study was derived from the same centre data as the modelling data, which may lead to some limitations in the results. Although we validated our model, we lacked an external validation dataset, which is not conducive to further interpretation of the applicability of our model. Obesity in PCOS patients was not significantly characterized in our study, and we analyzed that this reason might be: 1. In the Chinese population, the female group is used to controlling their body weight by eating less, which resulted in a lower than normal basal BMI before developing PCOS; 2. The data of this study were small and might not have covered this group. Also, we were missing a portion of information related to family history, waist-to-hip ratio, etc., which are valuable for the diagnosis of PCOS. This is because in order to exclude bias in the results of the study due to inpatient treatment, we selected participants from patients who were seen on an out-patient basis and were not on medication, and information about the outpatient clinic of the patients is usually not detailed enough.
We believe that insulin resistance and lipid profiles are crucial for the diagnosis of PCOS. However, these factors were not included in our study due to the absence of suitable indicators for assessing insulin resistance, as outlined in The 2023 International Evidence-based Guideline for the Assessment and Management of Polycystic Ovary Syndrome. Consequently, we consider that metrics such as insulin levels or HOMA-IR index may not be applicable.Furthermore, given the limited sample size of our research, overly complex variables could lead to model underfitting. Additionally, we aim to emphasize the significant role of steroid hormones in PCOS diagnosis; thus, introducing excessive complexity might obscure some important variables.
Conclusion
Numerous studies have reported various metabolic features of PCOS and metabolic differences under different phenotypes, and these studies provide important references for us to explore the pathological mechanisms of PCOS, grasp patients’ conditions, and assess patients’ prognosis. In the present study, we found that there was a correlation between AMH levels and FSH, LH, E2, and T in PCOS patients, which may explain some pathological features of PCOS. In this study, machine learning models for the treatment of PCOS were constructed using AMH and steroid hormones and estrogen, and the results revealed that the five types of machine learning classification models constructed on the basis of LH, LH/FSH, E2, PRL, T, AMH, AD, and COR had superior diagnostic efficacy, although we believe that the Logistic classification model had the optimal comprehensive ability in this study. Overall, multi-center validation represents a crucial phase in confirming the clinical utility of predictive models, while the model we developed demonstrates commendable diagnostic performance, the limitations of our dataset necessitate further validation through multi-center studies with larger sample sizes. Besides, it is essential to expand the scope of detection and incorporate as many factors as possible that contribute to the diagnosis of PCOS in order to construct a model with greater clinical applicability.
Acknowledgements
We express our deepest gratitude to the participants for their cooperation and all the reviewers who participated in the review.
Abbreviations
- PCOS
Polycystic ovary syndrome
- PCOM
Polycystic ovary morphology
- AMH
Anti-Müllerian hormone
- TGFβ
Transforming growth factor-β
- FSH
Follicle-stimulating hormone
- LH
Luteinizing hormone
- PRL
Prolactin
- E2
Estradiol
- P
Progesterone
- T
Testosterone
- AD
Androstenedione
- DHEA
Dehydroepiandrosterone
- DHT
Dihydrotestosterone
- DHEAS
Dehydroepiandrosterone sulphate
- 17-OHP
17-Hydroxy progesterone
- E1
Estrone
- P5
Pregnenolone
- Cort
Cortisol
- COR
Corticosterone
- 11-DHC
11-Dehydrocorticosterone
- DOC
Deoxycorticosterone
- LHR
Luteinizing hormone receptor
- FSHR
Follicle-stimulating hormone receptor
- GCs
Ovarian granulosa cells
- ESR
Estradiol receptor
- TPR
True positive rate
- FPR
False positive rate
Author contributions
CT and YW came up with the idea and design the study. CT wrote the initial draft (including substantive translation), made the tables and the figures. YW contributed to the preparation of the published work. ZZ contributed to data analysis and lan-guage revision. YY ensured that the descriptions are accurate and agreed by all au-thors. All authors contributed to the article and approved the submitted version.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Declarations
Competing interest
The authors declare no competing interests.
Ethical approval
The study was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang Chinese Medical University (Ethics Approval No. 2023–KLS–244–01). The biological samples used in this study were obtained from the secondary use of samples previously kept by the participants, thus, this study has applied to the Ethics Committee for a waiver of written informed consent from the patients in this study. All methods were in accordance with the Declaration of Helsinki.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Cheng Tong and Yue Wu have contributed equally to this work and share first authorship.
Contributor Information
Zhenchao Zhuang, Email: zhuangzzc2015@163.com.
Ying Yu, Email: yuying721030@163.com.
References
- 1.Neven, A. C. H., Laven, J., Teede, H. J. & Boyle, J. A. A summary on polycystic ovary syndrome: diagnostic criteria, prevalence, clinical manifestations, and management according to the latest international guidelines. Semin. Reprod. Med.36(1), 5–12 (2018). [DOI] [PubMed] [Google Scholar]
- 2.Mayrhofer, D. et al. The Prevalence and impact of polycystic ovary syndrome in recurrent miscarriage: A retrospective cohort study and meta-analysis. J. Clin. Med.9(9), 2700 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Osibogun, O., Ogunmoroti, O. & Michos, E. D. Polycystic ovary syndrome and cardiometabolic risk: Opportunities for cardiovascular disease prevention. Trends Cardiovasc. Med.30(7), 399–404 (2020). [DOI] [PubMed] [Google Scholar]
- 4.Damone, A. L. et al. Depression, anxiety and perceived stress in women with and without PCOS: A community-based study. Psychol. Med.49(9), 1510–1520 (2019). [DOI] [PubMed] [Google Scholar]
- 5.Tian, W. et al. High level of visfatin and the activation of Akt and ERK1/2 signaling pathways are associated with endometrium malignant transformation in polycystic ovary syndrome. Gynecol Endocrinol.36(2), 156–161 (2020). [DOI] [PubMed] [Google Scholar]
- 6.Revised 2003 consensus on diagnostic criteria and long-term health risks related to polycystic ovary syndrome. Fertil Steril. 81(1), 19–25 (2004). [DOI] [PubMed]
- 7.Teede, H. J. et al. Recommendations from the 2023 international evidence-based guideline for the assessment and management of polycystic ovary syndrome. Eur. J. Endocrinol.189(2), 43–64 (2023). [DOI] [PubMed] [Google Scholar]
- 8.Cedars, M. I. Evaluation of female fertility-AMH and ovarian reserve testing. J. Clin. Endocrinol. Metab.107(6), 1510–1519 (2022). [DOI] [PubMed] [Google Scholar]
- 9.Li, Y., Wei, L. N., Xiong, Y. L. & Liang, X. Y. Effect of luteinizing hormone vs follicular stimulating hormone ratio on anti-Müllerian hormone secretion and folliculogenesis in patients with polycystic ovarian syndrome. Zhonghua Fu Chan Ke Za Zhi.45(8), 567–570 (2010). [PubMed] [Google Scholar]
- 10.Li, Y. et al. Multi-system reproductive metabolic disorder: significance for the pathogenesis and therapy of polycystic ovary syndrome (PCOS). Life Sci.228, 167–175 (2019). [DOI] [PubMed] [Google Scholar]
- 11.Valkenburg, O. et al. Genetic polymorphisms of GnRH and gonadotrophic hormone receptors affect the phenotype of polycystic ovary syndrome. Hum. Reprod.24(8), 2014–2022 (2009). [DOI] [PubMed] [Google Scholar]
- 12.Arroyo, A., Kim, B. & Yeh, J. Luteinizing hormone action in human oocyte maturation and quality: Signaling pathways, regulation, and clinical impact. Reprod. Sci.27(6), 1223–1252 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Jiao, P. et al. PRL/microRNA-183/IRS1 pathway regulates milk fat metabolism in cow mammary epithelial cells. Genes (Basel).11(2), 196 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yang, H. et al. The association between prolactin and metabolic parameters in PCOS women: A retrospective analysis. Front. Endocrinol. (Lausanne).11, 263 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tang, Z. R., Zhang, R., Lian, Z. X., Deng, S. L. & Yu, K. Estrogen-receptor expression and function in female reproductive disease. Cells.8(10), 1123 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Maltarollo, V. G., Kronenberger, T., Espinoza, G. Z., Oliveira, P. R. & Honorio, K. M. Advances with support vector machines for novel drug discovery. Expert Opin. Drug Discov.14(1), 23–33 (2019). [DOI] [PubMed] [Google Scholar]
- 17.di Clemente, N., Racine, C., Pierre, A. & Taieb, J. Anti-Müllerian hormone in female reproduction. Endocr. Rev.42(6), 753–782 (2021). [DOI] [PubMed] [Google Scholar]
- 18.Dewailly, D. et al. The physiology and clinical utility of anti-Mullerian hormone in women. Hum. Reprod. Update.20(3), 370–385 (2014). [DOI] [PubMed] [Google Scholar]
- 19.Bedenk, J., Vrtačnik-Bokal, E. & Virant-Klun, I. The role of anti-Müllerian hormone (AMH) in ovarian disease and infertility. J. Assist. Reprod. Genet.37(1), 89–100 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Balen, A. H., Laven, J. S., Tan, S. L. & Dewailly, D. Ultrasound assessment of the polycystic ovary: international consensus definitions. Hum. Reprod. Update.9(6), 505–514 (2003). [DOI] [PubMed] [Google Scholar]
- 21.Cesta, C. E. et al. Maternal polycystic ovary syndrome and risk of neuropsychiatric disorders in offspring: prenatal androgen exposure or genetic confounding?. Psychol. Med.50(4), 616–624 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Tal, R. et al. Characterization of women with elevated antimüllerian hormone levels (AMH): correlation of AMH with polycystic ovarian syndrome phenotypes and assisted reproductive technology outcomes. Am. J. Obstet. Gynecol.211(1), 59.e51–58 (2014). [DOI] [PubMed] [Google Scholar]
- 23.Bourgneuf, C. et al. The Goto-Kakizaki rat is a spontaneous prototypical rodent model of polycystic ovary syndrome. Nat. Commun.12(1), 1064 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dayal, M., Sagar, S., Chaurasia, A. & Singh, U. Anti-mullerian hormone: A new marker of ovarian function. J. Obstet. Gynaecol. India.64(2), 130–133 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Anderson, R. A. et al. Anti-Müllerian hormone as a marker of ovarian reserve and premature ovarian insufficiency in children and women with cancer: a systematic review. Hum. Reprod. Update.28(3), 417–434 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Islam, R. M., Bell, R. J., Skiba, M. A. & Davis, S. R. Testosterone and androstenedione are positively associated with anti-Müllerian hormone in premenopausal women. Clin. Endocrinol. (Oxf).95(5), 752–759 (2021). [DOI] [PubMed] [Google Scholar]
- 27.Peigné, M. et al. The numbers of 2–5 and 6–9 mm ovarian follicles are inversely correlated in both normal women and in polycystic ovary syndrome patients: what is the missing link?. Hum. Reprod.33(4), 706–714 (2018). [DOI] [PubMed] [Google Scholar]
- 28.Dilaver, N. et al. The regulation and signalling of anti-Müllerian hormone in human granulosa cells: Relevance to polycystic ovary syndrome. Hum. Reprod.34(12), 2467–2479 (2019). [DOI] [PubMed] [Google Scholar]
- 29.Sova, H. et al. Hormone profiling, including anti-Müllerian hormone (AMH), for the diagnosis of polycystic ovary syndrome (PCOS) and characterization of PCOS phenotypes. Gynecol. Endocrinol.35(7), 595–600 (2019). [DOI] [PubMed] [Google Scholar]
- 30.Pierre, A. et al. Loss of LH-induced down-regulation of anti-Müllerian hormone receptor expression may contribute to anovulation in women with polycystic ovary syndrome. Hum. Reprod28(3), 762–769 (2013). [DOI] [PubMed] [Google Scholar]
- 31.Pierre, A. et al. Dysregulation of the Anti-Müllerian hormone system by steroids in women with polycystic ovary syndrome. J. Clin. Endocrinol. Metab.102(11), 3970–3978 (2017). [DOI] [PubMed] [Google Scholar]
- 32.Ge, J. et al. Steroid hormone profiling in hyperandrogenism and non-hyperandrogenism women with polycystic ovary syndrome. Reprod. Sci.29(12), 3449–3458 (2022). [DOI] [PubMed] [Google Scholar]
- 33.Vassiliadi, D. A. et al. Increased 5 alpha-reductase activity and adrenocortical drive in women with polycystic ovary syndrome. J. Clin. Endocrinol. Metab.94(9), 3558–3566 (2009). [DOI] [PubMed] [Google Scholar]
- 34.Goodarzi, M. O., Carmina, E. & Azziz, R. DHEA, DHEAS and PCOS. J. Steroid Biochem. Mol. Biol.145, 213–225 (2015). [DOI] [PubMed] [Google Scholar]
- 35.Grassi, G. et al. Hyperandrogenism by liquid chromatography tandem mass spectrometry in PCOS: Focus on testosterone and androstenedione. J. Clin. Med.10(1), 119 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yi, W. et al. A model combining testosterone, androstenedione and free testosterone index improved the diagnostic efficiency of polycystic ovary syndrome. Endocr. Pract.29(8), 629–636 (2023). [DOI] [PubMed] [Google Scholar]
- 37.Yang, W. et al. Body mass index and basal androstenedione are independent risk factors for miscarriage in polycystic ovary syndrome. Reprod. Biol. Endocrinol.16(1), 119 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Malaviya, A. & Gomes, J. Androstenedione production by biotransformation of phytosterols. Bioresour. Technol.99(15), 6725–6737 (2008). [DOI] [PubMed] [Google Scholar]
- 39.Barth, J. H., Field, H. P., Yasmin, E. & Balen, A. H. Defining hyperandrogenism in polycystic ovary syndrome: Measurement of testosterone and androstenedione by liquid chromatography-tandem mass spectrometry and analysis by receiver operator characteristic plots. Eur. J. Endocrinol.162(3), 611–615 (2010). [DOI] [PubMed] [Google Scholar]
- 40.Sørensen, A. E., Udesen, P. B., Wissing, M. L., Englund, A. L. M. & Dalgaard, L. T. MicroRNAs related to androgen metabolism and polycystic ovary syndrome. Chem. Biol. Interact.259(Pt A), 8–16 (2016). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.






