Embryologist experience affects concordance with an artificial intelligence embryo ranking algorithm: benefit of artificial intelligence assistance

Jørgen Berntsen; Philip Marsh; Brendan Burkart; Mark G Larman; Martin N Johansen; Mitchell Rosen

doi:10.1016/j.xfre.2026.04.002

. 2026 Apr 8;7(3):188–194. doi: 10.1016/j.xfre.2026.04.002

Embryologist experience affects concordance with an artificial intelligence embryo ranking algorithm: benefit of artificial intelligence assistance

Jørgen Berntsen ^a,^∗, Philip Marsh ^b, Brendan Burkart ^b, Mark G Larman ^c, Martin N Johansen ^a, Mitchell Rosen ^b

PMCID: PMC13282521 PMID: 42325725

Abstract

Objective

To study the concordance between artificial intelligence (AI) algorithm selection and manual embryo selection, and how the concordance depends on the experience levels of embryologists.

Design

Retrospective observational study.

Subjects

A total of 1,321 preimplantation genetic testing for aneuploidy (PGT-A) tested cycles from 1,165 patients with at least one transferred euploid embryo.

Exposure

Embryo ranking was performed both manually and using an AI algorithm.

Main Outcome Measures

The concordance rate between embryologists and an AI algorithm.

Results

The overall concordance rate was 58.6% (95% confidence interval [CI]: 55.9%–61.2%) and increased with embryologist experience: Junior (51.3%, 95% CI: 44.3%–58.2%), Senior (57.1%, 95% CI: 53.3%–61.0%), and Senior+ (63.4%, 95% CI: 59.2%–67.7%). The fetal heartbeat (FHB) rates were 68.3% (95% CI: 65.1%–71.6%) in the concordant group and 66.5% (95% CI: 62.6%–70.5%) in the discordant group. The odds ratio between the two groups was not statistically significant, however when adjusting for the number of usable embryos, age and oocyte source, the odds ratio was significant.

Conclusion

The concordance levels between embryologists and an AI algorithm depended significantly on the level of experience. The study also indicated that adjunctive use could potentially increase clinical outcomes.

Key Words: Embryo selection, deep-learning, time-lapse, concordance

Accurate embryo ranking is crucial in in vitro fertilization (IVF) as it directly impacts treatment success and the time to live birth. The standard practice involves the embryologist visually assessing development and morphology for each embryo within the cohort. The subsequent ranking provides a putative order for which embryo should be transferred to the patient. However, this subjective assessment is susceptible to variability both within and between embryologists (1, 2), affecting decisions regarding embryo transfer, cryopreservation, or discard (3, 4, 5, 6).

Embryologist experience is a key factor contributing to this variability. In their junior years, embryologists often undergo additional education, training, and certification programs, such as the European Society of Human Reproduction and Embryology (ESHRE) certification for clinical embryologists (7). The impact of such training has been studied by Cimadomo et al. (4) in a large multicenter study. Their findings revealed greater intracenter agreement between junior and senior embryologists compared with intercenter agreement. Importantly, they also observed a significant increase in intercenter agreement for both experience levels after online training.

The inherent subjectivity and variability of current embryo evaluation methods highlight the need for more objective and consistent approaches. Artificial intelligence (AI) offers a potential solution, with numerous algorithms proposed to assist or fully automate the ranking process. One method for estimating the benefits of an AI algorithm is to evaluate the agreement level between manual evaluation and the AI algorithm. The agreement level is often evaluated as the rate of concordance, where the concordance is positive when the embryo selected for the first transfer was also given the highest AI score among the available embryos. In a large study by Zaninovic et al. (8), the investigators investigated the concordance in blastocyst selection by five embryologists and eight AI algorithms in cycles with eight available blastocysts. Overall, the embryologists had an average concordance rate of 60%, whereas the average concordance rate among the six best AI algorithms was only 40%. The concordance rate between the different AI algorithms and each embryologist varied between 9% and 54%.

Most AI algorithms for embryo selection are promoted for adjunctive use or as decision support tools. However, nearly all evaluations of AI performance are done as stand-alone evaluations where the output from the algorithm is related directly to the clinical endpoint. This is typically done according to the TRIPOD-AI guidelines (9, 10). This stand-alone evaluation will probably underestimate the actual clinical benefit in which the embryologist and the AI collaborate adjunctively to rank the embryos (11). A few studies have investigated the adjunctive use of AI performance. In the study by Palmer et al. (12) they found that in 37%–41% of the cases, the embryologist changed their original selection of the top-ranked embryo after receiving adjunctive AI evaluation. Similarly, Kim et al. (13) found improved intraobserver consistency with AI guidance and that interobserver consistency did not differ between junior and senior embryologists after AI guidance.

The association between concordance and clinical outcome has only been studied in a few studies. In a randomized controlled trial (RCT) (14), the Intelligent Data Analysis for embryo evaluation (iDAScore) algorithm (15, 16) demonstrated a concordance rate of 65.8% with the manual selection. The fetal heartbeat (FHB) rate was 48.3% in the concordance group and 44.7% in the discordance group. This indicates a positive association between FHB rate and agreement between the embryologist and the AI algorithm.

This study aims to investigate the relationship between the embryologist's experience level and the concordance between AI-based embryo ranking and manual embryo ranking. A secondary objective was to explore the association between concordance and clinical outcomes.

Materials and methods

Ethics approval

The study was approved by the University of California, San Francisco (UCSF) Institutional Review Board and was performed as part of a quality improvement project.

Study population

This retrospective analysis encompassed 2,830 treatment cycles cultured at the University of California, San Francisco, over 3 years, from February 2020 to October 2023. The inclusion criteria were that treatment cycles had been incubated in the EmbryoScope+ incubator (Vitrolife A/S, Denmark) for at least 5 days, and preimplantation genetic testing for aneuploidy (PGT-A) had been performed. Once placed in the EmbryoScope, embryos were cultured at 37 °C with 6.5% CO₂ and 5.0% O₂ for up to 6 days without media exchange. Automatic imaging was conducted every 10 minutes, capturing images across 11 focal planes. The EmbryoViewer software (Vitrolife A/S, Denmark) was used to assess embryo development. Fertilization check was performed 20 hours after insemination. Embryologists assessed blastocysts at 116 hours (Day 5) and 140 hours (Day 6) after insemination. Gradings were conducted according to the Gardner scoring scheme, which includes grade ‟D” for inner cell mass (ICM) and trophectoderm (TE), to denote poor/unusable quality. The selection of a blastocyst for the first transfer was done according to the morphology grade and PGT-A result per clinical protocols. Only euploid blastocysts were included in this study. The embryologists who performed the selection were all blinded to the AI score during the selection process. The identity of the embryologist performing embryo selection was recorded in the IDEAS EMR (Infertility Database, Endocrinology and Andrology medical data System) (Mellowood Medical, Toronto, Canada). The embryologist performing the grading was not necessarily the same as the one who performed the biopsies or the vitrifications, although they could be the same person. The experience level of each embryologist was classified into Junior (0–3 years), Senior (3–5 years), and Senior+ (5 or more years) based on their experience by the start of the study period.

For each treatment cycle, embryo selection was considered concordant if the blastocyst selected manually for first transfer was also given the highest iDAScore value among the available euploid embryos. To be able to evaluate the concordance between manual and iDAScore ranking, we excluded treatment cycles without transfers, fewer than two usable blastocysts, double embryo transfer and treatments where iDAScore could not be calculated.

Clinical protocols

Ovarian stimulation was performed with a gonadotropin-releasing hormone (GnRH) agonist or antagonist-based protocol. The initial dose of gonadotropins (Follistim, Merck; Gonal-F, EMD-Serono; and/or Menopur, Ferring) was determined by the patient’s age, body mass index (BMI), and ovarian reserve, as estimated by antral follicle count (AFC). The GnRH antagonist (0.25 mg Ganirelix acetate, Organon or 0.25 mg Cetrotide, EMD-Serono) was administered daily to prevent premature ovulation when the lead follicle measured >13 mm mean diameter. Induction of final oocyte maturation was performed with human chorionic gonadotropin (hCG) (5,000 or 10,000 IU subcutaneously) or a GnRH agonist (4 mg leuprolide acetate subcutaneously) when the largest follicle attained a mean diameter of 18 mm with a general cohort of follicles >13 mm. Oocyte retrieval was performed according to clinic standards, 36 hours following ovulatory trigger. A semen sample was obtained by masturbation within an hour of oocyte retrieval. Fertilization via conventional insemination (CI) vs. intracytoplasmic sperm injection (ICSI) was determined by the patient’s primary physician. Euploid embryos were subsequently thawed and transferred. The euploid embryo chosen for transfer was based on static morphology. If sex selection was desired, the embryo chosen for transfer was based on sex, followed by morphology. The approach to uterine preparation in frozen embryo transfer cycles was determined by the primary physician; protocols included a natural cycle augmented with vaginal progesterone for luteal phase support, or controlled cycle approaches with the estrogen administration via patch, oral, or injectable routes, followed by progesterone in oil.

Determination of ploidy

Breaching of the zona was performed at the cleavage stage. Embryonic biopsy for PGT was performed at the blastocyst stage in all embryos reaching full blastocyst. On the day of biopsy, 5–10 TE cells were gently aspirated. Biopsied cells were washed and cryopreserved before being sent for testing. Biopsied trophectoderm cells were analyzed for all 24 chromosomes by the testing laboratory (Cooper Surgical, Trumbull, CT) using a next generation sequencing (NGS)-based assay.

AI algorithm

The iDAScore v2 is a proprietary deep learning algorithm (15, 16) that is based on a 3D convolutional neural network and was trained on 181,429 time-lapse sequences from 22 clinics, including 33,688 transferred embryos, of which 8,465 resulted in an FHB. Using 128 time-lapse images covering the period of 20 to 148 hours postinsemination (hpi) as input, the algorithm generates a relative score (1.0–9.9) correlating with the likelihood of an FHB for each embryo. The iDAScore algorithm is an add-on for the EmbryoScope+ system and can be installed on the local server, provided it complies with local regulatory approvals.

All embryos were retrospectively scored by the iDAScore v2 algorithm covering the period from 20 hpi until they were removed.

Outcome measures

The primary clinical endpoint of the study was defined as the presence of an FHB at the 6th week of gestation. The FHB rates are expressed as the number of positive FHB’s divided by the total number of transferred blastocysts.

Statistical methods

All statistical analyses were performed using R software (17) version 4.4.1. The rms package version 7.0 was used to fit logistic regression models (18). Odds ratios (OR) and adjusted odds ratios (aOR) were estimated using the ‟contrast” function in the rms package. For all adjusted logistic regression models, the binary outcomes were modeled while controlling for the number of usable embryos in categories, patient age, and oocyte source (autologous vs. donor). These confounding variables were included to mitigate confounding bias, ensuring that the estimated ORs represent the independent association between the primary exposure (e.g., experience level) and the outcomes of interest, including concordance and FHB. Receiver operating characteristic curves were analyzed using the pROC package (19).

Confidence intervals for concordance rates and FHB rates were estimated using the Wald method.

Results

We conducted a retrospective cohort study with data from February 2020 to October 2023. The study data flow is shown in Supplemental Fig. 1 (available online). The final analysis data contained 1,321 treatments from 1,165 patients with a total of 19,962 embryos. There were 20 embryologists included in the study. Overall, 197 treatments were evaluated by Junior embryologists (n = 5), 637 treatments by Senior embryologists (n = 6), and 487 treatments by Senior+ embryologists (n = 9).

The description of patients and treatment characteristics is shown in Table 1, both overall and in the concordance and discordance groups. In general, the treatment cycles in the discordance group concern younger women and have higher numbers of embryos.

Table 1.

Treatment characteristics for all treatment cycles and the concordance and discordance groups. Continuous values are presented as medians and 25th and 75th percentiles. Categorical values are presented as numbers and percentage distributions.

Variable	Total (n = 1321)	Concordance (n = 774)	Discordance (n = 547)
Maternal age	38.2 [35.6; 40.5]	38.7 [36.2; 41.0]	37.5 [34.9; 39.8]
Partner age	39.0 [36.0; 44.0]	39.0 [36.0; 44.0]	39.0 [35.0; 44.0]
OPU	14 [9; 19]	12 [8; 18]	16 [11; 22]
No. of usable embryos
2–3	355 (26.9%)	279 (36.0%)	76 (13.9%)
4–5	334 (25.3%)	229 (29.6%)	105 (19.2%)
6–8	338 (25.6%)	162 (20.9%)	176 (32.2%)
9+	294 (22.3%)	104 (13.4%)	190 (34.7%)
Insemination method
ICSI	1,097 (83.0%)	648 (83.7%)	449 (82.1%)
IVF	224 (17.0%)	126 (16.3%)	98 (17.9%)
Oocyte source
Own	1,212 (91.7%)	722 (93.3%)	490 (89.6%)
Donor	108 (8.2%)	52 (6.7%)	56 (10.2%)
Sperm source
Partner	1,213 (91.8%)	717 (92.6%)	496 (90.7%)
Donor	107 (8.1%)	57 (7.4%)	50 (9.1%)

Open in a new tab

Note: ICSI = intracytoplasmic sperm injection; IVF = in vitro fertilization; OPU = oocyte pick-up.

iDAScore predictive performance

The overall iDAScore predictive performance regarding the discrimination between positive and negative FHB from transfers of a single euploid blastocyst (n = 1,321) showed an overall area under the curve (AUC) of 0.669 (95% confidence interval [CI] 0.637–0.700) (Supplemental Fig. 2). The calibration plot showed an increase in FHB rates with iDAScore where the scores were grouped into intervals with a size of 1 (Supplemental Fig. 3).

Concordance

In 774 treatment cycles, there was concordance between the manual selection and the blastocyst with the highest iDAScore. The overall concordance rate was 58.6% (95% CI: 55.9%–61.2%). The concordance rate increased with the experience level of the embryologists (Fig. 1A): Junior embryologists had a rate of 51.3% (95% CI: 44.3%–58.2%), Senior embryologists had a rate of 57.1% (95% CI: 53.3%–61.0%), and Senior+ embryologists had a rate of 63.4% (95% CI: 59.2%–67.7%). Table 2 shows the unadjusted and the adjusted OR for the difference in concordance for the different levels of experience. For both the unadjusted and the adjusted logistic regression, there were statistically significant differences between the Senior+ and the Junior embryologists and the Senior+ and the Senior embryologists. As expected, the OR between the Senior and the Junior embryologists was above 1 but did not reach significance.

Table 2.

Unadjusted odds ratios (OR) and adjusted odds ratios (aOR) for the association between concordance and FHB rates for different experience levels. Odds ratios are adjusted for the number of usable embryos, age, and oocyte source.

Experience level	OR	95% CI	P value	aOR	95% CI	P value
Concordance
• Senior vs. Junior	1.27	[0.92; 1.75]	.147	1.19	[0.84; 1.67]	.327
• Senior+ vs. Junior	1.65	[1.18; 2.31]	.003	1.57	[1.10; 2.24]	.014
• Senior+ vs. Senior	1.30	[1.02; 1.66]	.033	1.32	[1.02; 1.71]	.036
Fetal heartbeat
• Senior vs. Junior	1.33	[0.96; 1.86]	.089	1.41	[1.01; 1.98]	.046
• Senior+ vs. Junior	1.53	[1.08; 2.16]	.016	1.64	[1.15; 2.34]	.006
• Senior+ vs. Senior	1.15	[0.89; 1.48]	.294	1.16	[0.89; 1.51]	.261

Open in a new tab

Note: aOR = adjusted odds ratio; CI = confidence interval; OR = odds ratio.

Concordance rates also depended on the number of usable embryos, with a decreasing concordance rate with increasing number of usable embryos (Supplemental Table 1, available online). Concordance rates were statistically different between all groups of usable embryos. The decreasing trend should be viewed in the context of random selection, which decreased from 38.8% in the group with two to three usable embryos to 8.2% when there are nine or more usable embryos.

Clinical outcome

The overall FHB rate, expressed as the number of positive FHB divided by the total number of transferred blastocysts, in the study was 67.6% (95% CI: 65.1%–70.1%). For the concordance group, the FHB rate was 68.3% (95% CI: 65.1%–71.6%), whereas the discordance group had a lower FHB rate of 66.5% (95% CI: 62.6%–70.5%). This corresponds to an FHB rate difference of 1.8%, but the difference was not significant (OR = 1.09; P=.491).

The FHB rates increased with the experience level of the embryologists (Fig. 1B): Junior embryologists had an FHB rate of 60.9% (95% CI: 54.1%–67.7%), Senior embryologists had an FHB rate of 67.5% (95% CI: 63.9%–71.1%), and Senior+ embryologists had an FHB rate of 70.4% (95% CI: 66.4%–74.5%). Table 2 shows unadjusted and adjusted OR for the difference in FHB between the experience groups. The OR between the Senior+ and the Junior embryologists was statistically significant in both the adjusted and unadjusted analyses, whereas the OR between the Senior and the Junior embryologists was only significant in the adjusted analysis. There were no significant differences between Senior+ and Senior embryologists, although the ORs were higher than 1 as expected.

The FHB rates increased with the number of usable embryos (Supplemental Table 1). The FHB rates were significantly different between the different groups of usable embryos, except between 2–3 and 4–5 usable embryos (P=.074) and between 4–5 and 6–8 usable embryos (P=.132).

Association between concordance and FHB rate

The unadjusted and adjusted ORs for the association between concordance and FHB rates are shown in Figure 2. Overall, the FHB rates were significantly higher in the concordance group compared with the discordance group in the adjusted analysis (P=.001), while there was no statistical difference in the unadjusted analysis (P=.491). When stratifying according to the number of usable embryos, significantly high ORs were found for the 4–5 and 6–8 groups. The ORs for the 2–3 and 9+ groups were lower and not significant. Absolute FHB rates for the concordance and discordance groups are shown both overall and stratified by the number of usable embryos in Supplemental Table 2.

Discussion

Our study demonstrated that Senior+ embryologists exhibited a higher concordance with the AI algorithm compared with Junior embryologists. This difference in concordance rates between experience levels underscores the impact of experience in the embryo evaluation and selection processes. The high concordance between Senior+ embryologists and the AI algorithm suggests that extensive experience and training enhance the ability to evaluate embryos more consistently and objectively. It also suggests that Junior embryologists could benefit from the assistance of AI tools during training and/or standard clinical practice. Such an integration of AI in the embryo selection process holds the potential to reduce subjectivity, improve consistency and ultimately improve clinical outcomes.

The overall concordance rate of 58.6% observed in the current study was lower than the rate of 65.8% observed in the RCT study by Illingworth et al. (14). This discrepancy can likely be attributed to the higher experience levels of embryologists typically involved in RCTs, as our findings also suggest that experience significantly impacts concordance rates.

In the study by Zaninovic et al. (8), where iDAScore v1 was included, the best AI algorithm had a concordance rate of 54%, comparable with the current study. Similarly, Bori et al. (20) found that iDAScore v2 had a concordance rate of 61.0% and 71.4% in an oocyte donation program and patients' own oocytes, respectively. This difference could probably be attributed to the difference in the number of oocytes between the two groups in their study (11.3 vs. 7.7). This is similar to the decrease in concordance rate with an increase in the number of oocyte pick-ups (OPUs) observed in our study.

In addition, we also observed that there was a nonsignificant increase in the FHB rate from 66.5% in the discordance group to 68.3% in the concordance group (rate difference of 1.8%). However, it is problematic to compare these two groups directly because the patients in the concordant group are older, have fewer usable embryos, and use fewer donated oocytes. After adjusting for these confounders, the FHB rates were significantly different. In the study by Fitz et al. (11), they also observed an overall significant improvement with adjunctive use in simulated embryo pairs. Similarly, Harir et al. (21) in a small study found a nonsignificant increase in pregnancy rates from 44.8% to 45.2% when there was concordance between the embryologist and iDAScore. Palmer et al. (12) did not find any significant effect of adjunctive use of ERICA (Embryo Ranking Intelligent Classification Algorithm), although there was a trend that senior embryologists are less likely to change their original decision for the worse after adjunctive use.

A significant hurdle when investigating the impact of concordance between an embryologist and an automated embryo selection algorithm is the number of available embryos. As anticipated, our study revealed a clear negative correlation between concordance rates and the number of available embryos. Conversely, we found a positive correlation between FHB rates and the number of available embryos. Notably, stratifying the data by usable embryos indicated the greatest benefits of concordance in the groups with 4 to 8 usable embryos. The reduced impact of concordance in the group with two or three usable embryos could probably be attributed to the fact that the task of manually selecting between a very limited number of embryos is quite easy. Similarly, with more than nine available embryos, the observed lower impact might be due to the already high FHB rate (above 75%), suggesting that deviating from the AI’s selection among numerous high-quality embryos has a limited impact on the FHB rate.

It is important to acknowledge several limitations in our study. First, the cycles in this study used PGT-A testing, which limits the generalizability of the results. Hence, these specific rates will not apply to nontested treatments. However, differences between embryologists with different experience levels would still be expected. Second, since our results are based on data from a single site, they may not be generalizable to other settings and patient populations. Third, our analysis does not include the potential impact of other tasks the embryologist may perform (e.g., vitrification, warming or embryo transfer) that might impact the clinical outcome. Fourth, we acknowledge that sex-based selection will affect both the observed concordance and discordance rates. However, this occurred only in a minor part of the cycles, so we believe that the overall conclusions will be the same. Fifth, the study does not account for the embryologist’s workload, stress, fatigue, or similar human factors that can impact both the clinical outcome and the embryo selection process. Sixth, there does not exist any generally accepted definition of experience levels; hence, we defined experience in three predefined categories relevant for our specific clinic. This definition will, of course, impact the results in our study. In addition, the analysis also ignores that the experience level increases during the study period. Finally, a major limitation of the study is the need to adjust all logistic regressions for external factors. Especially, the number of available embryos, which naturally has a strong impact. Thus, the concordance rates decreased with a higher number of available embryos. However, the clinic outcomes increased with a higher number of available embryos, as this is a good indicator of good prognosis patients. Hence, we suggest that more prospective studies be conducted to investigate the interaction between embryologists’ experience levels and the use of adjunctive AI algorithms on clinical outcomes.

Conclusion

The concordance levels between embryologists and an AI algorithm depended significantly on the level of experience. This indicates that less experienced embryologists might benefit the most from the assistance of an AI algorithm. Such assistance would reduce subjectivity and increase consistency. The study also indicated that adjunctive use might increase clinical outcomes.

CRediT Authorship Contribution Statement

Jørgen Berntsen: Writing – review & editing, Writing – original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Philip Marsh: Writing – review & editing, Data curation, Conceptualization. Brendan Burkart: Writing – review & editing, Formal analysis, Data curation. Mark G. Larman: Writing – review & editing, Project administration, Methodology, Conceptualization. Martin N. Johansen: Writing – review & editing, Methodology, Investigation, Formal analysis, Conceptualization. Mitchell Rosen: Writing – review & editing, Project administration, Methodology, Investigation, Conceptualization.

Declaration of Interests

J.B. is an employee of Vitrolife Group; travel support from Vitrolife; Vitrolife owns patents concerning the iDAScore algorithm. P.M. reports funding from Vitrolife for the submitted work; consulting fees from Vitrolife. B.B. has nothing to disclose. M.G.L. is an employee of Vitrolife Group; travel support from Vitrolife; Vitrolife owns patents concerning the iDAScore algorithm. M.N.J. is an employee of Vitrolife Group; travel support from Vitrolife; Vitrolife owns patents concerning the iDAScore algorithm. M.G.L reports funding from Vitrolife for the submitted work; consulting fees from Vitrolife.

Footnotes

Anonymized data underlying this article will be shared on reasonable requests to the corresponding author.

Supplementary Data

Supplementary Material

mmc1.docx^{(191.1KB, docx)}

References

1.Adolfsson E., Andershed A.N. Morphology vs. morphokinetics: a retrospective comparison of interobserver and intra-observer agreement between embryologists on blastocysts with known implantation outcome. JBRA Assist Reprod. 2018;22:228–237. doi: 10.5935/1518-0557.20180042. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Sundvall L., Ingerslev H.J., Breth Knudsen U., Kirkegaard K. Inter- and intra-observer variability of time-lapse annotations. Hum Reprod. 2013;28:3215–3221. doi: 10.1093/humrep/det366. [DOI] [PubMed] [Google Scholar]
3.Storr A., Venetis C.A., Cooke S., Kilani S., Ledger W. Inter-observer and intra-observer agreement between embryologists during selection of a single day 5 embryo for transfer: a multicenter study. Hum Reprod. 2017;32:307–314. doi: 10.1093/humrep/dew330. [DOI] [PubMed] [Google Scholar]
4.Cimadomo D., Sosa Fernandez L., Soscia D., Fabozzi G., Benini F., Cesana A., et al. Inter-centre reliability in embryo grading across several IVF clinics is limited: implications for embryo selection. Reprod Biomed Online. 2022;44:39–48. doi: 10.1016/j.rbmo.2021.09.022. [DOI] [PubMed] [Google Scholar]
5.Martínez-Granados L., Serrano M., González-Utor A., Ortiz N., Badajoz V., López-Regalado M.L., et al. Reliability and agreement on embryo assessment: 5 years of an external quality control programme. Reprod Biomed Online. 2018;36:259–268. doi: 10.1016/j.rbmo.2017.12.008. [DOI] [PubMed] [Google Scholar]
6.Chiappetta V., Innocenti F., Coticchio G., Ahlström A., Albricci L., Badajoz V., et al. Discard or not discard, that is the question: an international survey across 117 embryologists on the clinical management of borderline quality blastocysts. Hum Reprod. 2023;38:1901–1909. doi: 10.1093/humrep/dead174. [DOI] [PubMed] [Google Scholar]
7.Balaban B., Brison D., Calderón G., Catt J., Conaghan J., Cowan L., et al. The Istanbul consensus workshop on embryo assessment: proceedings of an expert meeting. Hum Reprod. 2011;26:1270–1283. doi: 10.1093/humrep/der037. [DOI] [PubMed] [Google Scholar]
8.Zaninovic N., Sierra J.T., Malmsten J.E., Rosenwaks Z. Embryo ranking agreement between embryologists and artificial intelligence algorithms. F S Sci. 2024;5:50–57. doi: 10.1016/j.xfss.2023.10.002. [DOI] [PubMed] [Google Scholar]
9.Collins G.S., Moons K.G.M., Dhiman P., Riley R.D., Beam A.L., Van Calster B., et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br Med J. 2024;385 doi: 10.1136/bmj-2023-078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kragh M.F., Karstoft H. Embryo selection with artificial intelligence: how to evaluate and compare methods? J Assist Reprod Genet. 2021;38:1675–1689. doi: 10.1007/s10815-021-02254-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Fitz V.W., Kanakasabapathy M.K., Thirumalaraju P., Kandula H., Ramirez L.B., Boehnlein L., et al. Should there be an “AI” in TEAM? Embryologists selection of high implantation potential embryos improves with the aid of an artificial intelligence algorithm. J Assist Reprod Genet. 2021;38:2663–2670. doi: 10.1007/s10815-021-02318-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Palmer G.A., Chavez-Badiola A., Valencia-Murillo R., Harvey S.C., Mendizabal-Ruiz G., Farías A.F.-S., et al. Can artificial intelligence guided feedback improve embryologists’ selection of euploid embryos based on morphology alone? Reprod Biomed Online. 2025;51 doi: 10.1016/j.rbmo.2025.104990. [DOI] [PubMed] [Google Scholar]
13.Kim H.M., Kang H., Lee C., Park J.H., Chung M.K., Kim M., et al. Evaluation of the clinical efficacy and trust in AI-assisted embryo ranking: survey-based prospective study. J Med Internet Res. 2024;26 doi: 10.2196/52637. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Illingworth P.J., Venetis C., Gardner D.K., Nelson S.M., Berntsen J., Larman M.G., et al. Deep learning versus manual morphology-based embryo selection in IVF: a randomized, double-blind noninferiority trial. Nat Med. 2024;30:3114–3120. doi: 10.1038/s41591-024-03166-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Berntsen J., Rimestad J., Lassen J.T., Tran D., Kragh M.F. Robust and generalizable embryo selection based on artificial intelligence and time-lapse image sequences. PLoS One. 2022;17 doi: 10.1371/journal.pone.0262661. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Theilgaard Lassen J., Fly Kragh M., Rimestad J., Nygård Johansen M., Berntsen J. Development and validation of deep learning based embryo selection across multiple days of transfer. Sci Rep. 2023;13:4235. doi: 10.1038/s41598-023-31136-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.R Core Team . R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2025. https://www.R-project.org/ Available at: [Google Scholar]
18.Harrell F.E., Jr. rms: Regression Modeling Strategies. 2025. https://CRAN.R-project.org/package=rms Available at:
19.Robin X., Turck N., Hainard A., Tiberti N., Lisacek F., Sanchez J.C., et al. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bori L., Toschi M., Esteve R., Delgado A., Pellicer A., Meseguer M. External validation of a fully automated evaluation tool: a retrospective analysis of 68,471 scored embryos. Fertil Steril. 2025;123:634–643. doi: 10.1016/j.fertnstert.2024.10.006. [DOI] [PubMed] [Google Scholar]
21.Harir Y., Halevy Amiran R., Or Y. Vol. 61. 2025. Embryologist versus AI in embryo selection: agreement and impact on pregnancy rates; pp. 1107–1109. (Vitro Cell Dev Biol Anim). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

mmc1.docx^{(191.1KB, docx)}

[bib1] 1.Adolfsson E., Andershed A.N. Morphology vs. morphokinetics: a retrospective comparison of interobserver and intra-observer agreement between embryologists on blastocysts with known implantation outcome. JBRA Assist Reprod. 2018;22:228–237. doi: 10.5935/1518-0557.20180042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Sundvall L., Ingerslev H.J., Breth Knudsen U., Kirkegaard K. Inter- and intra-observer variability of time-lapse annotations. Hum Reprod. 2013;28:3215–3221. doi: 10.1093/humrep/det366. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Storr A., Venetis C.A., Cooke S., Kilani S., Ledger W. Inter-observer and intra-observer agreement between embryologists during selection of a single day 5 embryo for transfer: a multicenter study. Hum Reprod. 2017;32:307–314. doi: 10.1093/humrep/dew330. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Cimadomo D., Sosa Fernandez L., Soscia D., Fabozzi G., Benini F., Cesana A., et al. Inter-centre reliability in embryo grading across several IVF clinics is limited: implications for embryo selection. Reprod Biomed Online. 2022;44:39–48. doi: 10.1016/j.rbmo.2021.09.022. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Martínez-Granados L., Serrano M., González-Utor A., Ortiz N., Badajoz V., López-Regalado M.L., et al. Reliability and agreement on embryo assessment: 5 years of an external quality control programme. Reprod Biomed Online. 2018;36:259–268. doi: 10.1016/j.rbmo.2017.12.008. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Chiappetta V., Innocenti F., Coticchio G., Ahlström A., Albricci L., Badajoz V., et al. Discard or not discard, that is the question: an international survey across 117 embryologists on the clinical management of borderline quality blastocysts. Hum Reprod. 2023;38:1901–1909. doi: 10.1093/humrep/dead174. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Balaban B., Brison D., Calderón G., Catt J., Conaghan J., Cowan L., et al. The Istanbul consensus workshop on embryo assessment: proceedings of an expert meeting. Hum Reprod. 2011;26:1270–1283. doi: 10.1093/humrep/der037. [DOI] [PubMed] [Google Scholar]

[bib8] 8.Zaninovic N., Sierra J.T., Malmsten J.E., Rosenwaks Z. Embryo ranking agreement between embryologists and artificial intelligence algorithms. F S Sci. 2024;5:50–57. doi: 10.1016/j.xfss.2023.10.002. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Collins G.S., Moons K.G.M., Dhiman P., Riley R.D., Beam A.L., Van Calster B., et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br Med J. 2024;385 doi: 10.1136/bmj-2023-078378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Kragh M.F., Karstoft H. Embryo selection with artificial intelligence: how to evaluate and compare methods? J Assist Reprod Genet. 2021;38:1675–1689. doi: 10.1007/s10815-021-02254-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Fitz V.W., Kanakasabapathy M.K., Thirumalaraju P., Kandula H., Ramirez L.B., Boehnlein L., et al. Should there be an “AI” in TEAM? Embryologists selection of high implantation potential embryos improves with the aid of an artificial intelligence algorithm. J Assist Reprod Genet. 2021;38:2663–2670. doi: 10.1007/s10815-021-02318-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Palmer G.A., Chavez-Badiola A., Valencia-Murillo R., Harvey S.C., Mendizabal-Ruiz G., Farías A.F.-S., et al. Can artificial intelligence guided feedback improve embryologists’ selection of euploid embryos based on morphology alone? Reprod Biomed Online. 2025;51 doi: 10.1016/j.rbmo.2025.104990. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Kim H.M., Kang H., Lee C., Park J.H., Chung M.K., Kim M., et al. Evaluation of the clinical efficacy and trust in AI-assisted embryo ranking: survey-based prospective study. J Med Internet Res. 2024;26 doi: 10.2196/52637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Illingworth P.J., Venetis C., Gardner D.K., Nelson S.M., Berntsen J., Larman M.G., et al. Deep learning versus manual morphology-based embryo selection in IVF: a randomized, double-blind noninferiority trial. Nat Med. 2024;30:3114–3120. doi: 10.1038/s41591-024-03166-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Berntsen J., Rimestad J., Lassen J.T., Tran D., Kragh M.F. Robust and generalizable embryo selection based on artificial intelligence and time-lapse image sequences. PLoS One. 2022;17 doi: 10.1371/journal.pone.0262661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Theilgaard Lassen J., Fly Kragh M., Rimestad J., Nygård Johansen M., Berntsen J. Development and validation of deep learning based embryo selection across multiple days of transfer. Sci Rep. 2023;13:4235. doi: 10.1038/s41598-023-31136-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.R Core Team . R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2025. https://www.R-project.org/ Available at: [Google Scholar]

[bib18] 18.Harrell F.E., Jr. rms: Regression Modeling Strategies. 2025. https://CRAN.R-project.org/package=rms Available at:

[bib19] 19.Robin X., Turck N., Hainard A., Tiberti N., Lisacek F., Sanchez J.C., et al. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Bori L., Toschi M., Esteve R., Delgado A., Pellicer A., Meseguer M. External validation of a fully automated evaluation tool: a retrospective analysis of 68,471 scored embryos. Fertil Steril. 2025;123:634–643. doi: 10.1016/j.fertnstert.2024.10.006. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Harir Y., Halevy Amiran R., Or Y. Vol. 61. 2025. Embryologist versus AI in embryo selection: agreement and impact on pregnancy rates; pp. 1107–1109. (Vitro Cell Dev Biol Anim). [DOI] [PubMed] [Google Scholar]

PERMALINK

Embryologist experience affects concordance with an artificial intelligence embryo ranking algorithm: benefit of artificial intelligence assistance

Jørgen Berntsen, M.Sc.

Philip Marsh, M.Sc.

Brendan Burkart, B.A.

Mark G Larman, Ph.D.

Martin N Johansen, Ph.D.

Mitchell Rosen, M.D., HCLD

Abstract

Objective

Design

Subjects

Exposure

Main Outcome Measures

Results

Conclusion

Materials and methods

Ethics approval

Study population

Clinical protocols

Determination of ploidy

AI algorithm

Outcome measures

Statistical methods

Results

Table 1.

iDAScore predictive performance

Concordance

Figure 1.

Table 2.

Clinical outcome

Association between concordance and FHB rate

Figure 2.

Discussion

Conclusion

CRediT Authorship Contribution Statement

Declaration of Interests

Footnotes

Supplementary Data

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases