Ophthalmology Operation Note Encoding with Open-Source Machine Learning and Natural Language Processing

Yong Min Lee; Stephen Bacchi; Carmelo Macri; Yiran Tan; Robert J Casson; Weng Onn Chan

doi:10.1159/000530954

. 2023 May 11;66(1):928–939. doi: 10.1159/000530954

Ophthalmology Operation Note Encoding with Open-Source Machine Learning and Natural Language Processing

Yong Min Lee ^a,^b,^✉, Stephen Bacchi ^a,^b, Carmelo Macri ^a,^b, Yiran Tan ^a,^b, Robert J Casson ^a,^b, Weng Onn Chan ^a,^b

PMCID: PMC10308528 PMID: 37231984

Abstract

Introduction

Accurate assignment of procedural codes has important medico-legal, academic, and economic purposes for healthcare providers. Procedural coding requires accurate documentation and exhaustive manual labour to interpret complex operation notes. Ophthalmology operation notes are highly specialised making the process time-consuming and challenging to implement. This study aimed to develop natural language processing (NLP) models trained by medical professionals to assign procedural codes based on the surgical report. The automation and accuracy of these models can reduce burden on healthcare providers and generate reimbursements that reflect the operation performed.

Methods

A retrospective analysis of ophthalmological operation notes from two metropolitan hospitals over a 12-month period was conducted. Procedural codes according to the Medicare Benefits Schedule (MBS) were applied. XGBoost, decision tree, Bidirectional Encoder Representations from Transformers (BERT) and logistic regression models were developed for classification experiments. Experiments involved both multi-label and binary classification, and the best performing model was used on the holdout test dataset.

Results

There were 1,000 operation notes included in the study. Following manual review, the five most common procedures were cataract surgery (374 cases), vitrectomy (298 cases), laser therapy (149 cases), trabeculectomy (56 cases), and intravitreal injections (49 cases). Across the entire dataset, current coding was correct in 53.9% of cases. The BERT model had the highest classification accuracy (88.0%) in the multi-label classification on these five procedures. The total reimbursement achieved by the machine learning algorithm was $184,689.45 ($923.45 per case) compared with the gold standard of $214,527.50 ($1,072.64 per case).

Conclusion

Our study demonstrates accurate classification of ophthalmic operation notes into MBS coding categories with NLP technology. Combining human and machine-led approaches involves using NLP to screen operation notes to code procedures, with human review for further scrutiny. This technology can allow the assignment of correct MBS codes with greater accuracy. Further research and application in this area can facilitate accurate logging of unit activity, leading to reimbursements for healthcare providers. Increased accuracy of procedural coding can play an important role in training and education, study of disease epidemiology and improve research ways to optimise patient outcomes.

Keywords: Machine learning, Natural language processing, Operation note, Procedural coding, Electronic medical records

Introduction

Operation notes are medico-legal documents that serve as permanent clinical and administrative records. In Australia, each procedure can be encoded using the MBS item number, which guides reimbursement to the relevant department [1]. In the USA, Current Procedural Terminology (CPT^®) codes are commonly used as procedural codes. Procedural codes are often manually entered by medical officers or administrators when booking patients for surgery. Current coding methods are susceptible to error which can result in a discrepancy between the preoperative codes and the surgical activity that was performed. This is particularly evident in complex cases, often involving multiple item numbers. Uncoded activity is a major concern in the public health sector as it inaccurately portrays departmental activity, leading to under or overestimation of financial funding and resource distribution for the clinical unit.

Postoperatively, medical officers can accurately identify procedural codes by conducting a retrospective review of the operation notes. However, this process can be time-consuming and resource-intensive. We hypothesized that artificial intelligence could serve as a tool to detect mismatches between the preoperative code and completed operation notes.

Introduction of a variety of language datasets and the development of deep learning techniques have led to significant progress within the field of natural language processing [2]. Word representation methods have played a key role in this progress, enabling larger scale and more accurate analysis of human language. Early approaches, such as one-hot encoding to represent words from dictionaries, had major limitations in establishing relationships or similarities between vectors representing similar words. Vectors were unnecessarily large and high-dimensional, making it difficult for language datasets to identify words with similar semantic phrases [2]. Mikolov et al. [3] developed word embedding techniques that allowed relating phrases and words with similar meanings through low-dimensional word representation. These methods were still challenged by larger scale text representation, leading to the development of the Bag-of-Words (BOW) method, which aimed to represent documents and group of words as a feature vector with large dimensions. Although useful in their application in filtering spam and document classification, the BOW methods did not take into consideration the meaning of the group of texts as effectively and led to inaccuracies when analysing the text [4].

Deep learning is an important branch of artificial intelligence that can process large amounts of data and has diverse applications. Integration of deep learning in computer vision technology has enabled effective prediction of missing regions in incomplete images [5] and accurate image reconstruction of low-resolution images [6]. However, in NLP, its progress has been marked by processing sentence-level text representation. Unsupervised approaches involve training artificial neuron networks (ANN) to identify probability distribution of words over a large, unlabelled database, allowing for the statistical estimation of corresponding outputs. Supervised approaches were mostly reliant on recurrent neuron networks (RNNs) or convolutional neuron networks (CNNs) to process labelled word vector inputs through pre-trained multilayered neuron networks [7]. The application of this technology in medicine demonstrated effective automated classification of medical data but still has limitations in the magnitude of processable texts [2].

Pre-trained transformer models represent the latest natural language processing with the capability of processing large amount of text. These models are able to detect relationships between input and output variables, providing greater flexibility and efficiency compared to neural network models [8]. Bidirectional Encoder Representations from Transformers (BERT) is one form of a pre-trained unsupervised transformer model that incorporates the functionality of Masked Language Model (MLM), allowing it to predict missing words in a sentence, and Next Sentence Prediction (NSP), which assesses the sequential relationship between sentences [9]. Despite its ability to be trained significantly faster than RNNs or CNNs, it still has limitations when processing large text sequences. Other state-of-the-art models for discriminative NLP tasks include logistic regression and support vector machine (SVM), which label the output by assessing the conditional probability distribution of its input training data. These methods contrast to generative models, such as the Naïve Bayes classifiers or hidden Markov models (HMM) that aim to learn how the data were labelled and generated before producing an output label. The performance of these models varies depending on the task at hand. For example, Dhola and Saradva et al. [10] achieved the best results in sentiment analysis with 85.4% accuracy using the BERT model, compared to 76.9% with the Naïve Bayes classifier. Hybrid models including RoBERTa (an altered form of BERT) and recurrent neural networks demonstrated outstanding results in sentiment analysis of IMDb dataset, Twitter US Airline Sentiment dataset, and Sentiment 140 dataset with F1 scores up to 93% [11].

While NLP technology has been used to process electronic medical records (EMR), it has yet to be explored for analysing operation notes using an open-source approach. A machine-led approach allows algorithms to label surgeries with procedural codes which can be reviewed by the surgeon. Other potential ways to facilitate a combined machine learning and human approach to coding procedures may include manual coding during admission and subsequently utilising machine learning to highlight potential errors, further increasing the accuracy of coding. The aims of this study were to develop and evaluate machine learning NLP models to aid with the coding of procedures based upon ophthalmology operation notes.

Materials and Methods

Participant Recruitment

Individuals included in this study were patients who underwent ophthalmic surgery in an operating theatre within the Central Adelaide Local Health Network, comprising the Royal Adelaide Hospital and The Queen Elizabeth Hospital, from March 1, 2020, until March 1, 2021. Completed operation notes written by the surgeons present during the procedure were identified from existing departmental registries and extracted. Operation notes that were incorrectly classified under ophthalmology were not included. After the identification of ophthalmology cases with MBS coding available, incomplete operation notes were excluded, and the first 1,000 cases were selected for analysis (see Fig. 1).

Fig. 1. — Flowchart describing case selection.

Procedure Encoding and Reimbursement Allocation

Following case identification, procedures were encoded (see Fig. 2). All procedures had previously been encoded as per standard hospital procedures (generally performed by the booking medical officer or administrative staff). Manual review of these labels was undertaken by multiple investigators (Y.M.L., C.M.) employing the criteria in the MBS [1]. Arbitrary codes were reviewed by WOC and resolved, serving as the ground-truth for classification experiments. The allocated reimbursement for each of the procedures was also determined through review of the MBS.

Machine Learning Analysis

Cases with missing or blank operation notes were excluded from the analysis during the case identification stage. Subsequently, pre-processing was undertaken with the BERT library, prior to analysis with BERT. Otherwise, pre-processing involved negation detection, stopword removal, word stemming, and punctuation removal. Negation detection was performed using the Natural Language Toolkit (NLTK) library [12]. In particular, the negation detection utility, from the sentiment analysis utilities, was used for negation detection. This tool was applied so that a negating string was added as a suffix to any negated terms (“_NEG”). These terms were then included in subsequent analyses, as for non-negated terms. The text then underwent count vectorisation (including n-grams 1–3 stems in length) and was then transformed into a term frequency inverse document frequency (TF-IDF) array. Data were then randomly split into a training dataset (80%) and test dataset (20%).

Models were developed on the training dataset using 5-fold cross-validation for the classification of the five most coded procedures. Models that were developed included logistic regression, random forest, XGBoost, and BERT algorithms. Hyperparameters were tuned on the training dataset. These models were then tested on the holdout test dataset (primary outcome). The best performing model from the first task was the XGBoost model, which had a structure comprising 200 estimators, a maximum tree depth of 6, a minimum child weight of 1, a uniform sampling method, and a learning rate of 0.3. This best performing model from the first task was then applied to the classification task of classifying all procedures that had five or more cases (secondary outcome).

The primary outcome was the classification accuracy in the categorisation of the five most commonly coded procedures in the holdout test dataset (as a multi-label classification task). As a secondary outcome, the best performing model for this task was then applied to the classification of all procedures for which there were five or more cases (multi-label classification task).

Exploratory Financial Analysis

In the test dataset, the total reimbursement was calculated for the manually labelled codes, the coding that was claimed (as determined from ORMIS), and the correctly labelled machine learning classifications. These values enabled the calculation of differences in reimbursement between the three approaches.

Results

Patient Characteristics

The mean age of the cohort was 65.5 years (SD 16.0) and 482 were female (48.2%). The five most common procedures were “lens extraction and insertion of intraocular lens” (code 42702) (374 cases), “vitrectomy via pars plana sclerotomy” (code 42702) (298 cases), “retina, photocoagulation of” (code 42809) (149 cases), “glaucoma, filtering operation for” (code 42746) (56 cases), and “intravitreal injection” (code 42740) (49 cases). The list of all procedures that occurred in five or more instances is detailed in Table 1.

Table 1.

Procedures included in dataset, which occurred in 5 or more cases

Procedure code	Procedure description	Total number of occurrences in dataset
42702	Cataract surgery	374
42725	Vitrectomy	298
42809	Laser photocoagulation	149
42746	Trabeculectomy	56
42740	Intravitreal injection	49
45623	Ptosis repair	29
42653	Corneal transplantation	25
42776	Scleral buckling	24
31356	Malignant skin excision	22
42815	Silicone oil or liquid removal	21
42641	Auto-conjunctival transplant	17
42698	Lens extraction	16
42686	Removal of pterygium	15
45617	Blepharoplasty	14
72855	Frozen section examination (1 section)	13
42801	Radioactive plaque insertion	13
42503	Ophthalmological examination under anaesthesia	12
42623	Dacryocystorhinostomy	12
42738	Paracentesis of anterior chamber or vitreous cavity	11
42719	Removal of vitreous or capsular or lens material	10
42802	Radioactive plaque removal	10
42833	Squint operation	9
42551	Repair of penetrating wound or rupture of eye	8
42731	Lensectomy combined with vitrectomy	8
42701	Intraocular lens insertion	7
42704	Intraocular lens removal or repositioning	7
45451	Full thickness skin grafts	7
45626	Correction of ectropion or entropion	7
42509	Enucleation of eye	7
42710	Removal of intraocular lens and replacement with posterior chamber lens or scleral fixation	6
42533	Exploration of orbit with drainage or biopsy	6
42818	Cryotherapy	6
72856	Frozen section examination (2–4 sections)	5
42584	Repair of rupture extraocular muscle or medial palpebral ligament	5

Open in a new tab

Machine Learning Performance for Operation Note Coding

The XGBoost model achieved the highest classification accuracy (88.0%) in the multi-label classification of the five most common procedures (primary outcome) in the test dataset (of 200 individuals). Random Forest and Logistic Regression and returned accuracies of 85.5% and 77.0%, respectively. With respect to the secondary outcome, when the XGBoost was applied as a multi-label classification task across all procedure types with greater than or equal to 5 cases, it gave an accuracy of 75.5%. Examples of misclassifications are outlined in Table 2, while procedure-by-procedure classification performance is detailed in Table 3.

Table 2.

Examples of misclassifications

Procedure to be classified as present or absent	Type of misclassification	Operation note	Gold-standard coding
42702	False positive	Right IOL rotation to 1 degree Prep drape Mendez ring used to assess the current IOL axis position Main incision from previous surgery at 35 degrees Main incision reopened using keratome Viscoelastic used to open the capsular bag, no issues IOL rotated to a new position at 1 degree IOL axis confirmed with Mendez ring Aspiration irrigation of viscoelastic Incision hydrated Cefazolin 1 mg/0.1 mL into the AC. Incision stable Chlorsig and maxidex to the right eye Pad and shield	42704
42725	False positive	LE Vitritis? PCNSL – LE Phaco/IOL/25GPPV/Vitreous Biopsy LE Routine Phaco + IOL (small pupil) 3 × 25 G ports in vitreous biopsy PVD present, checked with triamcinolone Int search, ports out SC Cef and Dex Pad	42702 42738 42740
42809	False positive	Right Cataract and Diabetic VH-RE Phaco/IOL/PPV/Laser/Avastin Eye cleaned and draped Routine phaco + IOL 3 × 25 G ports in PPV. pvd present Internal search Fill in PRP Partial fax IVI Avastin Ports out. SC Cef and Sex Pad Intraop findings: VH, no areas of traction or obvious NVD/NVE	42702 42725 42740 42809
42740	False positive	R phaco + IOL with Iris hooks without complications Cefazolin IC 0.05 kenacort intravitreal	42702 42740
42746	False positive	Nil	N/A
42702	False negative	LE cataract + SiO filled eye post RD + CMO – LEPhaco/IOL/Removal of SiO/IVTA Eye prepped with Betadine and draped 2 corneal incisions made, synechiolysis with cannula and viscoelastic Anterior capsule stained with brilliant blue, 5x iris hooks Capsulorrhexis, hydrodissection Phacoemulsification of nucleus, removal of cortex with IA probe IOL into bag 3 × 25g ports in, removal of SiO, multiple FAX, IVTA 0.1 mL Removal of iris hooks, removal of Viscoelastic, sclerotomy closed with Vicyl 7/0, IC Cef Hydration of self-sealing Corneal wounds, SC Cef and Dex Eye cleaned, pad and shield	42702 42725
42725	False negative	Eye prep with Betadine and draped 360 conjunctival peritomy, recti muscles isolated and slinged Scleral buckle applied and sutured to sclera with Nylon 5/0 Corneal sections, 4 x iris hooks, phacoemulsification, no lens injected 3 × 25 G 6 mm ports inserted Funnel retinal detachment, with giant retinal tear temporally Vitrectomy with base dissection with the assistance of triamcinolone stain, membranes stained with membrane blue and peeled 360 retinectomy done close to the arcade vessels after 360 endodiathermy Dislocation of macula, tried repositioning with tano brush, PFCL to flatten the retina, FAX, Endolaser 360, Reformation of AC with BSS, iris hooks removed Silicone oil 5500 cst injected, sclerotomy closed with Vicryl 7/0 Scleral buckle removed Conj closed with Vicryl 7/0 Subtenon ropivacaine 0.5% 5 mLs RE examined with BIO with indentation, 2 areas of retinal tear, lasered from before. no new breaks	42776 42698 42725 42740 42809
42809	False negative	Left PDR with VH and TRD-LE PPV/Laser/Segmentation/Avastin/SiO 1,300 cst Eye prep with Betadine and draped 3 × 27 G ports inserted, PVD not present Vitrectomy, segmentation of tractional membranes Intraop findings: solitary tractional membrane nasal to disc, broad tractional membrane superiorly with massive exudations subretinally extending to 1/2 disc diameter from fovea. No breaks PRP laser Fluid-air exchange, intravitreal avastin, Silicone Oil 1300 cst injected Ports out, no leak from sclerotomy Subconj Cef and Dex Eye cleaned, pad and shield applied	42725 42809
42740	False negative	R eye 25g PPV + laser +air for VH secondary to PDR 3 ports in PPV + PVD checked with triamcinolon Tag on superior arcade and nasally to a bifrovascular old pannus not bleeding 360 search laser top up inferiorly and temp FAX AVASTIN ic 3 ports out/sealed Cef and Dex subconj	42725 42740
42746	False negative	RIGHT trabeculectomy with mitomycin C 0.02% Betadine prep, drape, and speculum (Ong) Superior peritomy, conjunctiva and tenonâ€^™s capsule undermined MMC soaked sponge applied for 2 min Saline irrigation to area Cautery to scleral vessels 4 × 3 mm half thickness square scleral flap Paracentesis Trabeculectomy with Kelly punch. Peripheral iridectomy with de Wecker scissors 10-0 nylon to flap BSS to reform AC – flap filtering. 10-0 nylon to close conjunctiva – wound secure Subconj dexamethasone Atropine Pad and shield	42746

Open in a new tab

Table 3.

Procedure-wise classification performance, when XGBoost model applied to the classification of the test set for all procedures which occurred on five or more occasions

Procedure	True negative	False positive	False negative	True positive
42702	125	1	1	73
42725	141	6	2	51
42809	169	4	6	21
42746	183	0	3	14
42740	189	3	3	5
45623	196	0	1	3
42653	193	0	1	6
42776	196	0	1	3
31356	197	0	0	3
42815	199	0	0	1
42641	196	0	0	4
42698	195	0	4	1
42686	196	0	0	4
45617	198	0	2	0
72855	197	0	1	2
42801	196	0	2	2
42503	200	0	0	0
42623	194	0	1	5
42738	196	0	4	0
42719	200	0	0	0
42802	197	0	2	1
42833	197	0	1	2
42551	198	0	1	1
42731	199	0	1	0
42701	197	1	2	0
42704	199	0	1	0
45451	198	0	2	0
45626	199	0	0	1
42509	198	0	1	1
42710	198	0	1	1
42533	199	0	1	0
42818	199	0	1	0
72856	200	0	0	0
42584	198	1	0	1

Open in a new tab

The MBS coding that was entered into the hospital theatre system (ORMIS) and reflected the claims made by clinicians and administrative teams was examined for the entire dataset. Among 1,000 operation notes, only 539 (53.9%) cases were entered completely correct.

Financial Analysis of Machine Leaning Application to Operation Note Coding

In the test dataset (of 200 individuals), the total MBS reimbursement for the procedures, calculated using the manual labels, was $214,527.50 ($1,072.64 per case). In contrast, the MBS reimbursement based on the previously entered coding was $199,498.30 ($997.49 per case) which includes incorrect coding that could falsely increase the reimbursement. The total MBS reimbursement for procedures accurately labelled by machine learning was $184,689.45 ($923.45 per case).

Discussion

The documentation of operation notes as an electronic record opens new possibilities for research with the use of artificial intelligence. Application of NLP and ML can assist clinicians and coders to categorise operations into their dedicated coding category. The current coding system has inherent bias and error due to its reliance on human input and interpretation. XGBoost model was the most accurate model with 88.0% classification accuracy for the five most common ophthalmology procedures which were cataract surgery, vitrectomy, laser coagulation, trabeculectomy, and intravitreal injections. A cost analysis of the generated coding demonstrated that the algorithm was able to successfully generate $923.45 per case compared to $1,072.64 per case if the process was undertaken manually. Our findings suggest that incorporating NLP and machine learning technology into clinical coding has the potential to improve accuracy, save time, and generate profitable reimbursements, creating a high-quality database in healthcare.

NLP technology has been applied in diverse ways within the field of ophthalmology, including the extraction of microbial keratitis measurements [13], the identification of open globe injury [14], and early detection of multiple sclerosis [15]. It has also been used to develop predictive models of cataract surgery complications by associating risk factors that were described in the EMR [16], as well as to triage ophthalmology outpatient referrals [17]. NLP has also been applied to aspects of clinical coding [18, 19], although this has thus far been limited to proprietary software or for detection of pathology through diagnostic codes. There has been limited research regarding open-source mechanisms for performing this task.

Recent studies have demonstrated the utility of NLP in extracting specific information from the surgical notes. Wyles et al. [20] developed an algorithm that identifies three common elements in total hip arthroplasty: operative approach, fixation method, and bearing surface. A separate algorithm was devised with each variable. The training datasets comprised 250, 467, and 300 notes, while the test dataset included 250, 291, and 284 notes for the operative approach, fixation method, and bearing surface, respectively. The algorithms achieved an accuracy of 99.2%, 90.7%, and 95.8%, respectively. The system underwent external validation with 422 operative notes from other hospital systems. Of these, 242 operative notes were used for refinement of the existing algorithm. The final performance was measured on the remaining 180 data, achieving an accuracy of 94.4%, 95.6%, and 98% for identifying the operative approach, fixation method, and bearing surface. Liu et al. [21] used NLP to ascertain key variables such as intracameral antibiotic injections and posterior capsular rupture (PCR) in cataract surgeries. The NLP tool achieved positive and negative predictive values that exceeded 99% for operation notes involving intracameral antibiotic injections and greater than 94% for notes involving PCR, demonstrating the feasibility of NLP in detecting key features within operation notes.

Our study demonstrated that only 53.9% of MBS codes were entered correctly – most of which were uncomplicated single-coded procedures such as isolated cataract surgery or trabeculectomy. Coding of more complex procedures such as vitrectomy were challenging due to the difficulty of multi-labelling and interpreting complex ophthalmology operation notes. These factors contribute to the inaccuracies and misclassifications in coding, which can lead to decreased funding allocation and underestimation of unit activity. Our data revealed that the average reimbursements for operations was $997.49 per case, compared to the gold standard of $1,072.64 per case. This discrepancy is largely due to under-coded activity and takes into consideration incorrectly applied codes that result in inflated funding. Accurate unit activity is essential for government funding policies, distribution of workforce and funding, and understanding disease epidemiology, as well as for research and training purposes.

Systematic review by Nouraei et al. [22] demonstrated that 30,127 patients across multiple surgical disciplines required at least one change to the original coding post-audit review in 51% of patients. 12% of cases required a change to the procedures performed, and 17% of cases led to changes to their reimbursement category, highlighting a financial difference amounting to $E$ 3,974,544. The highest proportion of changes in this study was seen in ophthalmology, which required an overall change in 22% of its pre-audit to post-audit coding. Clinical coding of oculoplastic procedures was shown to have an accuracy of 30.7% with errors being attributed to clinician factors including lack of awareness of coding issues, lack of training and minimal exposure to clinical coding, diagnostic uncertainty, and illegible handwriting. Coder factors that contribute to inaccurate coding include the dependence on accurate documentation and difficulty in interpreting individual abbreviations and speciality-specific terminology [23].

Despite more hospitals emphasising the importance of clinical coding and delivering educational sessions, clinical coding remains an issue in tertiary hospitals. Although there is evidence to suggest medical team input can generate more accurate clinical coding, the reality is that medical teams are overburdened with long wait times for patients [24], negatively impacting physician productivity. In addition, there are more incentives, both professionally and morally, to prioritise delivering patient care, and clinicians may be unwilling to assume data entry roles unless there are immediate returns. Individual factors such as limited technological skills and lack of interest in academia, or data quality are barriers for clinicians to produce accurate coding. Medical documentation plays a significant role in clinician-to-clinician handover of patients, leading to specialised notes and complex clinical descriptions making it difficult for coders to categorise.

Previous research investigating the use of machine learning in medicine has shown its diverse potential to automate and improve healthcare. Research by Mahendra et al. [25] investigated the use of random forests and neural network models to predict inpatient mortality in the intensive care unit (ICU) based on medical documentation. The approach of this research highlights the issue of creating machine learning algorithms with limited institutional data as it impacts the external generalisability of the model and its performance in analysing notes from other ICU departments [25]. After selection of training dataset, standardisation and labelling of data can be variable per individual and require specialised medical knowledge which can limit the development of accurate algorithms. Lu et al. [26] also highlighted the effectiveness of using language models to extract key information from medical documentation; however, criticised the inadequate semantic interpretation of the extracted data. Although its application can vary, the cross-sectional nature of the training data can limit its use in predicting the entire medical journey of each individual. Following development of algorithms, the cost of implementation and integration into hospital documentation system is a major barrier to utilising artificial intelligence in optimising healthcare. The universal application of successful algorithms is essential for positive outcomes in healthcare even if the experimental results are promising [27].

Integration of artificial intelligence to generate clinical coding is an area of interest among current researchers. Various studies have been conducted ranging from different architectures within RNNs, CNNs, machine learning on various datasets with the main objective of improving coding accuracy while reducing the required time. Zhou et al. [28] experimented Regexps techniques on discharge summaries to generate codes in accordance with the International Classification of Disease 10th revision (ICD-10). Although their recall rates were between 23.67∼27.90% and overall accuracy measuring approximately 41.19%, the time taken to complete its tasks compared to manual coding was 2.1∼2.4 s versus 213.3∼272.2 s for every ten discharge summaries. Other studies by Teng et al. [29] used CNN and autoencoder techniques to code as per the Swiss Operation Classification System (CHOP) system which is another complex coding system consisting of 18 categories and over 14,000 codes. CNN with embedding techniques recorded the highest F1 score of 60.86%. Deep learning models trained through publicly available datasets such as the MIMIC-III recorded variable levels of success in applying ICD-10 codes [30–33], although the results did not validate AI use for persistent performance. BERT models have been successful in predicting clinical codes with variation models trained with medical content (BlueBERT), achieving the greatest AUC with 89.4∼92.0 [34].

There are multiple components that complicate the automation of clinical coding that needs to be addressed for future successful research. The initial selection of data to train algorithms is often inefficient with limited availability of gold-standard coded hospital data [35]. Algorithms trained on publicly accessible datasets (such as MIMIC-III) are prone to error which limits the success of the trained algorithm. Furthermore, clinical documentation that is selected for training purposes varies in structure per clinician, can be incomplete, and includes personalised notations that create semantic ambiguities that challenge the architecture that is employed [28]. The application of complex and dynamic classification systems such as the ICD-10 often cause difficulties for algorithms, let alone even for professionally trained medical coders. Approximately 70,000 codes for ICD-10 and 1.6 million diagnostic codes for ICD-11 can be applied in multiple ways to describe a clinical situation [36]. There are limitations to the architecture model that is employed, where BERT can only process up to 512 tokens, despite medical discharge summaries having an average of 1,500 tokens, which limits its effectiveness in clinical coding [35]. Implementation to practice is another major limitation as it involves significant deployment costs, novel interaction between coders and AI-based systems, and the risk of negligence due to overreliance on computational coding, which may lead to errors and omissions.

Our research investigated the use of NLP and ML algorithms in the interpretation of operation notes to assign procedural MBS codes. Operation notes are often abbreviated and challenging to decipher. Clinicians can analyse these notes and, due to their brevity compared to larger discharge summaries, can be a suitable dataset for successful application of NLP and ML. The success of our NLP model training was directly correlated with the frequency of the operation notes. When the model was applied to all procedures that occurred more than five times, the accuracy decreased to 75.5%. The discrepancy could be accounted by the complexity of the NLP task, where it struggled to differentiate between less frequently occurring procedures.

Especially with many clinicians being unaware of the significance of uncoded unit activity, the reality is current clinical coding has major room for improvements. This is where we suggest the utility of machine learning and NLP to strengthen the accuracy of clinical coding. Although complete removal of error in clinical coding will be difficult due to diagnostic uncertainty and variation in clinical context interpretation, implementation of a combined human and machine learning approach could further improve the portrayal of unit activity. The reduction of under-coded activity allows for more accurate resource and finance distribution to the relevant department. This additional funding can allow departments to equip themselves with higher quality technology, staffing, and education to ultimately improve delivery of patient care.

Conclusion

NLP has been successful in classifying ophthalmic operation notes into MBS coding categories. The use of open-source mechanisms for performing this task is currently limited in research. Our XGBoost model demonstrated an accuracy of 88.0% for the top five procedures, including cataract surgery, vitrectomy, laser coagulation, trabeculectomy, and intravitreal injections. Our results demonstrate the potential for surgeons and coders to rely on NLP technology to assign the correct billing codes. We propose applying this algorithm to completed operation notes to prompt generation of MBS codes for surgeons and coding team to subsequently review. This algorithm could also be applied to operations processed by coding teams to highlight codes that potentially missed codes due to a lack of understanding of speciality-specific terminology and procedures. Combining human and machine-led approaches can expedite procedural coding with greater accuracy, leading to more accurate logging of unit activity for funding and research purposes. Surgeons will be able to prioritise patient care and invest time in more detailed operation notes, while further funding with more advanced technology and staffing will also optimise patient care.

Limitation of the study that impacts the accuracy of the algorithm is the inclusion of operation notes from two sites. Variation in individual surgeon’s operation notes, as well as institutional report structures, can affect the structure of the EMR documentation, potentially leading to discrepancies in the algorithm’s output. Use of variable abbreviations and misspellings may impact the algorithm’s accuracy as the success of NLP is heavily dependent on accurate operation notes. Surgeon preferences may also lead to discrepancy in coding as in cases where “limbal or pars plana lensectomy combined with vitrectomy” (code 42731) may be coded in two separate operations as “lens extraction and insertion of intraocular lens” (code 42702) and “vitrectomy via pars plana sclerotomy” (code 42725). Our study did not consider the specific surgeon who completed the operation note. There are limitations to the deployment of this algorithm into clinical practice such as costs and feasibility, such as the training of staff members as well as ensuring necessary infrastructure to support its use. The NLP model produced will be specific to the English language within the MBS coding system.

Further advancements of this technology would involve studies that focus on training and coding of rarer procedures and expansion of dataset to include multiple ophthalmologic departments across Australia. This would increase the generalisability of the model at the expense of decreased accuracy. Ongoing algorithm refinements and external validation will be essential to improve the current model and create an automated streamlined process. Future studies could also consider input data from individual surgeons to create a user-specific NLP algorithm that adapts to the language and reporting style of the surgeon. International institutions will require local expertise to implement NLP respective to the coding services provided in their preferred language. Expanding the application of this technology to other medical specialities could lead to a more automated and streamlined coding process for healthcare providers.

Statement of Ethics

The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee(s) and with the Helsinki Declaration (as revised in 2013). This study protocol was reviewed and approved by the committee of Central Adelaide Local Health Network (CALHN) Research Services and Royal Adelaide Hospital, approval number 14372. Consent is not required for this study in accordance with local or national guidelines.

Conflict of Interest Statement

The authors have no conflicts of interest to declare.

Funding Sources

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author Contributions

For this paper, the main contributions are as follows: (1) Yong Min Lee was involved in data collection, analysis, and manuscript production; (2) Stephen Bacchi was involved in data analysis and processing and manuscript production; (3) Carmelo Macri was involved in data collection and manuscript production; (4) Yiran Tan and Robert J. Casson were involved in project supervision and manuscript production; (5) Weng Onn Chan was involved in data collection, project supervision, and manuscript production.

Funding Statement

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Data Availability Statement

Data for this project are secured on a hospital network that requires authorised access to maintain confidentiality of all patients involved in the study. Data are not publicly available due to ethical reasons. Further enquiries can be directed to the corresponding author.

References

1.MBS Online Medicare Benefits Schedule Australian Government Department of Health. Available from: http://www9.health.gov.au/mbs/search.cfm.
2. Lauriola I, Lavelli A, Aiolli F. An introduction to deep learning in Natural Language processing: models, techniques, and tools. Neurocomputing. 2022;470:443–56. 10.1016/j.neucom.2021.05.103. [DOI] [Google Scholar]
3. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their Compositionality. 2013 Oct 1. [arXiv:1310.4546 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2013arXiv1310.4546M. [Google Scholar]
4. Cormack G, Gomez Hidalgo J, Sanz E. Spam filtering for short messages2007. p. 313–20.
5. Chen Y, Xia R, Zou K, Yang K. FFTI: image inpainting algorithm via features fusion and two-steps inpainting. J Vis Commun Image Representation. 2023;91:103776. 10.1016/j.jvcir.2023.103776. [DOI] [Google Scholar]
6. Chen Y, Liu L, Phonevilay V, Gu K, Xia R, Xie J, et al. Image super-resolution reconstruction based on feature map attention mechanism. Appl Intell. 2021;51(7):4367–80. 10.1007/s10489-020-02116-1. [DOI] [Google Scholar]
7. Banerjee I, Ling Y, Chen MC, Hasan SA, Langlotz CP, Moradzadeh N, et al. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med. 2019;97:79–88. 10.1016/j.artmed.2018.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Tanaka H, Shinnou H, Cao R, Bai J, Ma W. Document classification by word embeddings of BERT. Comput Lings. Singapore: Springer; 2020. [Google Scholar]
9. Khurana D, Koli A, Khatter K, Singh S. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl. 2023;82(3):3713–44. 10.1007/s11042-022-13428-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Dhola K, Saradva M, editors. A comparative evaluation of traditional machine learning and deep learning classification techniques for sentiment analysis. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence); 2021 Jan 28–29. [Google Scholar]
11. Tan KL, Lee CP, Anbananthen KSM, Lim KM. RoBERTa-LSTM: a hybrid model for sentiment analysis with transformer and recurrent neural network. IEEE Access. 2022;10:21517–25. 10.1109/access.2022.3152828. [DOI] [Google Scholar]
12. Bird S, Klein E, Loper E. Natural Language processing with Python O’Reilly Media Inc.; 2009. [Google Scholar]
13. Maganti N, Tan H, Niziol LM, Amin S, Hou A, Singh K, et al. Natural Language processing to quantify microbial keratitis measurements. Ophthalmology. 2019;126(12):1722–4. 10.1016/j.ophtha.2019.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Apostolova E, White HA, Morris PA, Eliason DA, Velez T. Open globe injury patient identification in warfare clinical notes. AMIA Annu Symp Proc. 2017;2017:403–10. [PMC free article] [PubMed] [Google Scholar]
15. Chase HS, Mitrani LR, Lu GG, Fulgieri DJ. Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak. 2017;17(1):24. 10.1186/s12911-017-0418-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Gaskin GL, Pershing S, Cole TS, Shah NH. Predictive modeling of risk factors and complications of cataract surgery. Eur J Ophthalmol. 2016;26(4):328–37. 10.5301/ejo.5000706. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Tan Y, Bacchi S, Casson RJ, Selva D, Chan W. Triaging ophthalmology outpatient referrals with machine learning: a pilot study. Clin Exp Ophthalmol. 2020;48(2):169–73. 10.1111/ceo.13666. [DOI] [PubMed] [Google Scholar]
18. Wadia R, Akgun K, Brandt C, Fenton BT, Levin W, Marple AH, et al. Comparison of Natural Language processing and manual coding for the identification of cross-sectional imaging reports suspicious for lung cancer. JCO Clin Cancer Inform. 2018;2:1–7. 10.1200/CCI.17.00069. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Banerji A, Lai KH, Li Y, Saff RR, Camargo CA Jr, Blumenthal KG, et al. Natural Language processing combined with ICD-9-CM codes as a novel method to study the epidemiology of allergic drug reactions. J Allergy Clin Immunol Pract. 2020;8(3):1032–8.e1. 10.1016/j.jaip.2019.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Wyles CC, Tibbo ME, Fu S, Wang Y, Sohn S, Kremers WK, et al. Use of Natural Language processing algorithms to identify common data elements in operative notes for total hip arthroplasty. J Bone Joint Surg Am. 2019;101(21):1931–8. 10.2106/JBJS.19.00071. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Liu L, Shorstein NH, Amsden LB, Herrinton LJ. Natural language processing to ascertain two key variables from operative reports in ophthalmology. Pharmacoepidemiol Drug Saf. 2017;26(4):378–85. 10.1002/pds.4149. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Nouraei SA, Hudovsky A, Frampton AE, Mufti U, White NB, Wathen CG, et al. A study of clinical coding accuracy in surgery: implications for the use of administrative big data for outcomes management. Ann Surg. 2015;261(6):1096–107. 10.1097/SLA.0000000000000851. [DOI] [PubMed] [Google Scholar]
23. Juniat V, Athwal S, Khandwala M. Clinical coding and data quality in oculoplastic procedures. Eye. 2019;33(11):1733–40. 10.1038/s41433-019-0475-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Mahbubani K, Georgiades F, Goh EL, Chidambaram S, Sivakumaran P, Rawson T, et al. Clinician-directed improvement in the accuracy of hospital clinical coding. Future Healthc J. 2018;5(1):47–51. 10.7861/futurehosp.5-1-47. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Mahendra M, Luo Y, Mills H, Schenk G, Butte AJ, Dudley RA. Impact of different approaches to preparing notes for analysis with Natural Language processing on the performance of prediction models in intensive care. Crit Care Explor. 2021;3(6):e0450. 10.1097/CCE.0000000000000450. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Lu Z, Sim JA, Wang JX, Forrest CB, Krull KR, Srivastava D, et al. Natural Language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study. J Med Internet Res. 2021;23(11):e26777. 10.2196/26777. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng. 2022;6(12):1330–45. 10.1038/s41551-022-00898-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Zhou L, Cheng C, Ou D, Huang H. Construction of a semi-automatic ICD-10 coding system. BMC Med Inform Decis Mak. 2020;20(1):67. 10.1186/s12911-020-1085-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Teng F, Liu Y, Li T, Zhang Y, Li S, Zhao Y. A review on deep neural networks for ICD coding. IEEE Trans Knowl Data Eng. 2022:1. 10.1109/tkde.2022.3148267. [DOI] [Google Scholar]
30. Xie X, Xiong Y, Yu PS, Zhu Y. EHR coding with multi-scale feature attention and structured knowledge graph propagation. Proceedings of the 28th ACM International Conference on information and knowledge management. Beijing, China: Association for Computing Machinery; 2019. p. 649–58. [Google Scholar]
31. Huang J, Osorio C, Wicent Sy L. An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical Notes. 2018 Feb 1. [arXiv:1802.02311 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2018arXiv180202311H. [DOI] [PubMed] [Google Scholar]
32. Li F, Yu H. ICD coding from clinical text using multi-filter residual convolutional neural Network. 2019 Nov 1. [arXiv:1912.00862 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2019arXiv191200862L. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Yogarajan V, Montiel J, Smith T, Pfahringer B. Seeing the whole patient: using multi-label medical text classification techniques to enhance predictions of medical Codes. 2020 Mar 1. [arXiv:2004.00430 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2020arXiv200400430Y. [Google Scholar]
34. Ji S, Holtta M, Marttinen P. Does the magic of BERT apply to medical code assignment? A quantitative study. Comput Biol Med. 2021;139:104998. 10.1016/j.compbiomed.2021.104998. [DOI] [PubMed] [Google Scholar]
35. Dong H, Falis M, Whiteley W, Alex B, Matterson J, Ji S, et al. Automated clinical coding: what, why, and where we are? NPJ Digit Med. 2022;5(1):159. 10.1038/s41746-022-00705-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. ICD-11: International classification of disease 11th revision. World Health Organisation; 2023. Available from: https://icd.who.int/en. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1.MBS Online Medicare Benefits Schedule Australian Government Department of Health. Available from: http://www9.health.gov.au/mbs/search.cfm.

[B2] 2. Lauriola I, Lavelli A, Aiolli F. An introduction to deep learning in Natural Language processing: models, techniques, and tools. Neurocomputing. 2022;470:443–56. 10.1016/j.neucom.2021.05.103. [DOI] [Google Scholar]

[B3] 3. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their Compositionality. 2013 Oct 1. [arXiv:1310.4546 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2013arXiv1310.4546M. [Google Scholar]

[B4] 4. Cormack G, Gomez Hidalgo J, Sanz E. Spam filtering for short messages2007. p. 313–20.

[B5] 5. Chen Y, Xia R, Zou K, Yang K. FFTI: image inpainting algorithm via features fusion and two-steps inpainting. J Vis Commun Image Representation. 2023;91:103776. 10.1016/j.jvcir.2023.103776. [DOI] [Google Scholar]

[B6] 6. Chen Y, Liu L, Phonevilay V, Gu K, Xia R, Xie J, et al. Image super-resolution reconstruction based on feature map attention mechanism. Appl Intell. 2021;51(7):4367–80. 10.1007/s10489-020-02116-1. [DOI] [Google Scholar]

[B7] 7. Banerjee I, Ling Y, Chen MC, Hasan SA, Langlotz CP, Moradzadeh N, et al. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med. 2019;97:79–88. 10.1016/j.artmed.2018.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Tanaka H, Shinnou H, Cao R, Bai J, Ma W. Document classification by word embeddings of BERT. Comput Lings. Singapore: Springer; 2020. [Google Scholar]

[B9] 9. Khurana D, Koli A, Khatter K, Singh S. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl. 2023;82(3):3713–44. 10.1007/s11042-022-13428-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Dhola K, Saradva M, editors. A comparative evaluation of traditional machine learning and deep learning classification techniques for sentiment analysis. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence); 2021 Jan 28–29. [Google Scholar]

[B11] 11. Tan KL, Lee CP, Anbananthen KSM, Lim KM. RoBERTa-LSTM: a hybrid model for sentiment analysis with transformer and recurrent neural network. IEEE Access. 2022;10:21517–25. 10.1109/access.2022.3152828. [DOI] [Google Scholar]

[B12] 12. Bird S, Klein E, Loper E. Natural Language processing with Python O’Reilly Media Inc.; 2009. [Google Scholar]

[B13] 13. Maganti N, Tan H, Niziol LM, Amin S, Hou A, Singh K, et al. Natural Language processing to quantify microbial keratitis measurements. Ophthalmology. 2019;126(12):1722–4. 10.1016/j.ophtha.2019.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. Apostolova E, White HA, Morris PA, Eliason DA, Velez T. Open globe injury patient identification in warfare clinical notes. AMIA Annu Symp Proc. 2017;2017:403–10. [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Chase HS, Mitrani LR, Lu GG, Fulgieri DJ. Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak. 2017;17(1):24. 10.1186/s12911-017-0418-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Gaskin GL, Pershing S, Cole TS, Shah NH. Predictive modeling of risk factors and complications of cataract surgery. Eur J Ophthalmol. 2016;26(4):328–37. 10.5301/ejo.5000706. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Tan Y, Bacchi S, Casson RJ, Selva D, Chan W. Triaging ophthalmology outpatient referrals with machine learning: a pilot study. Clin Exp Ophthalmol. 2020;48(2):169–73. 10.1111/ceo.13666. [DOI] [PubMed] [Google Scholar]

[B18] 18. Wadia R, Akgun K, Brandt C, Fenton BT, Levin W, Marple AH, et al. Comparison of Natural Language processing and manual coding for the identification of cross-sectional imaging reports suspicious for lung cancer. JCO Clin Cancer Inform. 2018;2:1–7. 10.1200/CCI.17.00069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Banerji A, Lai KH, Li Y, Saff RR, Camargo CA Jr, Blumenthal KG, et al. Natural Language processing combined with ICD-9-CM codes as a novel method to study the epidemiology of allergic drug reactions. J Allergy Clin Immunol Pract. 2020;8(3):1032–8.e1. 10.1016/j.jaip.2019.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20. Wyles CC, Tibbo ME, Fu S, Wang Y, Sohn S, Kremers WK, et al. Use of Natural Language processing algorithms to identify common data elements in operative notes for total hip arthroplasty. J Bone Joint Surg Am. 2019;101(21):1931–8. 10.2106/JBJS.19.00071. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21. Liu L, Shorstein NH, Amsden LB, Herrinton LJ. Natural language processing to ascertain two key variables from operative reports in ophthalmology. Pharmacoepidemiol Drug Saf. 2017;26(4):378–85. 10.1002/pds.4149. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22. Nouraei SA, Hudovsky A, Frampton AE, Mufti U, White NB, Wathen CG, et al. A study of clinical coding accuracy in surgery: implications for the use of administrative big data for outcomes management. Ann Surg. 2015;261(6):1096–107. 10.1097/SLA.0000000000000851. [DOI] [PubMed] [Google Scholar]

[B23] 23. Juniat V, Athwal S, Khandwala M. Clinical coding and data quality in oculoplastic procedures. Eye. 2019;33(11):1733–40. 10.1038/s41433-019-0475-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24. Mahbubani K, Georgiades F, Goh EL, Chidambaram S, Sivakumaran P, Rawson T, et al. Clinician-directed improvement in the accuracy of hospital clinical coding. Future Healthc J. 2018;5(1):47–51. 10.7861/futurehosp.5-1-47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Mahendra M, Luo Y, Mills H, Schenk G, Butte AJ, Dudley RA. Impact of different approaches to preparing notes for analysis with Natural Language processing on the performance of prediction models in intensive care. Crit Care Explor. 2021;3(6):e0450. 10.1097/CCE.0000000000000450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Lu Z, Sim JA, Wang JX, Forrest CB, Krull KR, Srivastava D, et al. Natural Language processing and machine learning methods to characterize unstructured patient-reported outcomes: validation study. J Med Internet Res. 2021;23(11):e26777. 10.2196/26777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B27] 27. Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat Biomed Eng. 2022;6(12):1330–45. 10.1038/s41551-022-00898-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B28] 28. Zhou L, Cheng C, Ou D, Huang H. Construction of a semi-automatic ICD-10 coding system. BMC Med Inform Decis Mak. 2020;20(1):67. 10.1186/s12911-020-1085-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29. Teng F, Liu Y, Li T, Zhang Y, Li S, Zhao Y. A review on deep neural networks for ICD coding. IEEE Trans Knowl Data Eng. 2022:1. 10.1109/tkde.2022.3148267. [DOI] [Google Scholar]

[B30] 30. Xie X, Xiong Y, Yu PS, Zhu Y. EHR coding with multi-scale feature attention and structured knowledge graph propagation. Proceedings of the 28th ACM International Conference on information and knowledge management. Beijing, China: Association for Computing Machinery; 2019. p. 649–58. [Google Scholar]

[B31] 31. Huang J, Osorio C, Wicent Sy L. An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical Notes. 2018 Feb 1. [arXiv:1802.02311 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2018arXiv180202311H. [DOI] [PubMed] [Google Scholar]

[B32] 32. Li F, Yu H. ICD coding from clinical text using multi-filter residual convolutional neural Network. 2019 Nov 1. [arXiv:1912.00862 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2019arXiv191200862L. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33. Yogarajan V, Montiel J, Smith T, Pfahringer B. Seeing the whole patient: using multi-label medical text classification techniques to enhance predictions of medical Codes. 2020 Mar 1. [arXiv:2004.00430 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2020arXiv200400430Y. [Google Scholar]

[B34] 34. Ji S, Holtta M, Marttinen P. Does the magic of BERT apply to medical code assignment? A quantitative study. Comput Biol Med. 2021;139:104998. 10.1016/j.compbiomed.2021.104998. [DOI] [PubMed] [Google Scholar]

[B35] 35. Dong H, Falis M, Whiteley W, Alex B, Matterson J, Ji S, et al. Automated clinical coding: what, why, and where we are? NPJ Digit Med. 2022;5(1):159. 10.1038/s41746-022-00705-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B36] 36. ICD-11: International classification of disease 11th revision. World Health Organisation; 2023. Available from: https://icd.who.int/en. [Google Scholar]

PERMALINK

Ophthalmology Operation Note Encoding with Open-Source Machine Learning and Natural Language Processing

Yong Min Lee

Stephen Bacchi

Carmelo Macri

Yiran Tan

Robert J Casson

Weng Onn Chan

Abstract

Introduction

Methods

Results

Conclusion

Introduction

Materials and Methods

Participant Recruitment

Fig. 1.

Procedure Encoding and Reimbursement Allocation

Fig. 2.

Machine Learning Analysis

Exploratory Financial Analysis

Results

Patient Characteristics

Table 1.

Machine Learning Performance for Operation Note Coding

Table 2.

Table 3.

Financial Analysis of Machine Leaning Application to Operation Note Coding

Discussion

Conclusion

Statement of Ethics

Conflict of Interest Statement

Funding Sources

Author Contributions

Funding Statement

Data Availability Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases