Gross motor function prediction using natural language processing in cerebral palsy

Kelly Greve; Yizhao Ni; Amy F Bailes; Jilda Vargus-Adams; Aimee E Miley; Bruce Aronow; Mary M McMahon; Brad G Kurowski; Alexis Mitelpunkt

doi:10.1111/dmcn.15301

. Author manuscript; available in PMC: 2024 Jan 1.

Published in final edited form as: Dev Med Child Neurol. 2022 Jun 5;65(1):100–106. doi: 10.1111/dmcn.15301

Gross motor function prediction using natural language processing in cerebral palsy

Kelly Greve ^1,^2,^*, Yizhao Ni ^3,^*, Amy F Bailes ^1,², Jilda Vargus-Adams ^4,^5,⁶, Aimee E Miley ⁴, Bruce Aronow ^3,⁵, Mary M McMahon ^4,^5,⁶, Brad G Kurowski ^4,^5,^6,^**, Alexis Mitelpunkt ^3,^7,^8,^**

PMCID: PMC9720038 NIHMSID: NIHMS1811744 PMID: 35665923

Abstract

Aim:

To predict ambulatory status and Gross Motor Function Classification System (GMFCS) levels in patients with cerebral palsy (CP) by applying natural language processing (NLP) to electronic health record (EHR) clinical notes.

Method:

Individuals aged 8 to 26 years with a diagnosis of CP in the EHR between January 2009 and November 2020 (~12 years of data) were included in a cross-sectional retrospective cohort of 2483 patients. The cohort was divided into train-test and validation groups. Positive predictive value, sensitivity, specificity, and area under the receiver operating curve (AUC) were calculated for prediction of ambulatory status and GMFCS levels.

Results:

The median age was 15 years (interquartile range 10–20 years) for the total cohort, with 56% being male and 75% White. The validation group resulted in 70% sensitivity, 88% specificity, 81% positive predictive value, and 0.89 AUC for predicting ambulatory status. NLP applied to the EHR differentiated between GMFCS levels I–II and III (15% sensitivity, 96% specificity, 46% positive predictive value, and 0.71 AUC); and IV and V (81% sensitivity, 51% specificity, 70% positive predictive value, and 0.75 AUC).

Interpretation:

NLP applied to the EHR demonstrated excellent differentiation between ambulatory and non-ambulatory status, and good differentiation between GMFCS levels I–II and III; and IV and V. Clinical use of NLP may help to individualize functional characterization and management.

Graphical Abstract

graphic file with name nihms-1811744-f0001.jpg

Cerebral palsy (CP) is the most common non-progressive movement and posture disorder in children.^1,2 A critical part of children’s care is to characterize their function over time. Currently there is variability in how function is tracked over time, dependent on clinician specialty and expertise in CP. A unique way to standardize tracking of function is application of natural language processing (NLP) to the electronic health record (EHR).^3,4 NLP is an area of artificial intelligence focused on training computer programs to understand written human language in a way that a computer comprehends (i.e. converting components of human language into code) but maintains the original intent and meaning of the human language. NLP provides a standardized way of analyzing unstructured and text-heavy data, such as EHR notes. Subsequently, efficient analyses of these data can allow providers to consult a broader set of data in their clinical decision making. Previously, NLP has been applied to the EHR to predict, identify, and classify a range of conditions across the lifespan.^5–13 The literature suggests that NLP could reduce burdens on providers and accelerate the pace of healthcare system processes.

Although physicians and therapists can use the Gross Motor Function Classification System (GMFCS), a validated and commonly accepted way to classify functional mobility severity for individuals with CP throughout childhood, it is not used consistently in clinical practice. A clear understanding of function, such as GMFCS, informs care delivery to maximize outcomes.^14–18 Inconsistent assessment of function across healthcare systems is a barrier to optimal care. Therefore, we sought to overcome the inconsistent use of GMFCS in clinical practice by applying NLP to descriptive data from the EHR to predict function at the level of the individual for children with CP. If successful, this novel approach would provide an automated and standardized way to define functional mobility and thereby individualize prognosis and clinical management. The objective of this paper was to describe the application of NLP to clinical notes in the EHR to predict ambulatory status (ambulatory and non-ambulatory) and GMFCS level in children with CP. Using approximately 12 years of EHR data, we hypothesized that application of NLP to clinical notes could be used to develop a prediction model to classify function, specifically ambulatory status and GMFCS levels for children with CP.

METHOD

Study design

A cross-sectional retrospective cohort design was used for this study. Inclusion criteria for the cohort were a diagnosis of CP, according to codes in the International Classification of Diseases (9th and 10th revisions), seen at a single tertiary medical center in southwest Ohio, USA, between January 2009 and November 2020. Inpatient and outpatient clinical notes (e.g. face-to-face visits) from the cohort were used for the NLP. An encounter represents multiple clinical notes related to an inpatient admission or outpatient interaction (e.g. in-person visit, telephone encounter, telemedicine encounter). This study was approved by the Institutional Review Board at Cincinnati Children’s Hospital Medical Center, with consent waived because of the nature of the registry database.

Annotations

Using structured documentation in the EHR and extraction methods based on text search templates, we extracted the GMFCS levels for patients in the cohort. Data from physical therapy and physical medicine and rehabilitation provider clinical notes were excluded to eliminate using direct GMFCS documentation during the model creation. The GMFCS levels are defined as follows: I (walks without limitations), II (walks with limitations), III (walks using a hand-held mobility device), IV (self-mobility with limitations, which may include power mobility), and V (transported in a manual wheelchair).¹¹ The levels were dichotomized according to the accepted standard, with categories of ambulatory (levels I–III) and non-ambulatory (levels IV and V).¹⁷ In cases when a patient had different GMFCS levels documented, the most frequently used level was selected. Using the levels within the cohort, we applied NLP to develop a model to predict ambulatory status and GMFCS levels at encounter and patient levels. The encounter level referred to one interaction of the patient and/or family with the clinical team (e.g. this could be an in-clinic patient visit or telephone interaction). Patient level referred to all encounters for the patient for the period in which the data were extracted. Encounter-level modeling referred to methods where we considered each visit (i.e. encounter) separately, and the patient-level modeling referred to the methods where we used data from all encounters linked to an individual patient (i.e. encounters for an individual were modeled as a group rather than independently).

Feature extraction and predictive modeling

We implemented an NLP pipeline in our earlier studies to extract information from clinical notes.^19–22 Using the pipeline, the clinical notes were first tokenized and lemmatized, where punctuations, pronouns, and other common words were removed. We used n-grams and term frequency–inverse document frequency (TF–IDF). An n-gram describes phrases of words in text. For example, ‘The sun is shining’ can be split to unigrams (i.e. n-grams n=1; ‘The’, ‘sun’, ‘is’, ‘shining’), bi-grams (i.e. n-grams n=2; ‘The sun’, ‘sun is’, ‘is shining’), tri-grams (i.e. n-grams n=3; ‘The sun is’, ‘sun is shining’), or a quadri-gram (i.e. n-grams n=4; ‘The sun is shining’). The term TF–IDF uses a mathematical formula to quantify the importance of an n-gram in a data set.²⁴ The TF–IDF assesses the importance of a word (or n-gram) in a set of documents. It combines the frequency of a word in a specific document in relation to its frequency across all documents in the data set. The more common the word is in the entire data set (i.e. across all documents), the closer its TF–IDF value will be to 0; and the more frequent the word is in a single document, the higher its TF–IDF value will be. The n-gram features (≤2) captured both semantic and context information in the text. To prevent overfitting, features that occurred less than 1000 times and appeared in less than 100 notes were excluded and cross-validation was completed. A cross-validation was completed to verify model performance and indicated similar performance between the train-test and validation settings.²³ The rest of the n-grams were weighted with term TF–IDF weighting to represent their importance in the text.^24,25 Finally, the features were aggregated from clinical notes for each encounter. Using word2vec,²⁴ an NLP processing method, we represented patient status in each encounter as an array of semantic and contextual features.

Because ambulation status and GMFCS levels are important functional distinctions, a multi-label prediction task was formalized in two stages. Stage 1 was the projection of ambulatory status (ambulatory vs non-ambulatory), while stage 2 was the prediction of GMFCS level (I–II vs III, and IV vs V). We formatted prediction of ambulatory status as a binary-class classification and implemented two machine-learning classifiers: (1) logistic regression with L1/L2 normalization that measures the linear relation between linguistic features and risk assessment outcomes;²⁶ and (2) support vector machines with polynomial kernels were used to build best-performing models; support vector machine constructs hyperplanes in linear and nonlinear dimensions to distinguish high- and low-risk features.²⁷ As the best-performing models could not be determined a priori, we chose these standard classifiers to allow for the possibility of linear and nonlinear relations between linguistic features and prediction outcomes. The classifiers predicted probabilities of ambulatory status for each encounter, and the predictions were aggregated at patient level using sum aggregation to represent a patient’s GMFCS level. Encounter- and patient-level models were both explored; however, patient-level predictions were unlikely to change because GMFCS levels are stable over time for individuals with CP.^28–32 The same process was then applied to each ambulation status separately. The ambulatory status was structured for prediction of GMFCS level as a binary-class classification of levels I–II versus III. Non-ambulatory status was structured for prediction of GMFCS level as a binary-class classification between levels IV and V. A binary class was chosen over a multi-label target because there were too few individuals in each GMFCS level to allow adequate multi-level analysis and modeling. Additionally, clinical documentation text used to define GMFCS levels I and II was highly overlapping compared with text used to differentiate between levels III, IV, and V. For these reasons, the ability of NLP methodology to distinguish between GMFCS levels I and II was limited. The same process of predictive modeling (logistic regression with L1/L2 normalization and support vector machines with polynomial kernels) was created at encounter-level prediction. Patient-level prediction was constructed using a sum aggregation of encounter levels.

The two-stage binary label prediction process was applied to the data. The data set was stratified by random sampling into two data sets (70% for train-test and 30% for validation). Tenfold cross-validation was applied on the train-test set to tune model parameters. During the process, parameters achieving the highest performance on the train-test set were considered optimal parameters for the models. The classifiers with optimal parameters were then applied to the validation set for performance comparison and error analysis.

Statistical analysis

Descriptive statistics were used to describe the cohort. Race and ethnicity were defined on the basis of commonly accepted standards for US-based research projects.³³ Independent t-tests and χ² tests were used to compare groups. Logistic regression modeling was used to calculate sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiver operating curve (AUC).³⁴ The sensitivity, specificity, positive predictive value, negative predictive value, and AUC were determined using logistic regression modeling at the patient and encounter levels for the following binary dependent variables: ambulatory versus non-ambulatory status; GMFCS levels I–II versus III; and GMFCS level IV versus V.

RESULTS

A total of 2621 patients with 789 246 encounters and 1 993 405 clinical notes were identified in the EHR for possible inclusion, with 2483 patients meeting inclusion criteria. The 138 patients who were excluded did not have notes within our selected note types to be included in the study. Because multiple notes can be associated with a single encounter, there were more notes than encounters (Figure S1). The median age of individuals in the cohort was 15 years (interquartile range [IQR] 10–20 years), with being 56% male, 75% White, and less than 1% Hispanic. Clinical notes represented therapy visits, office visits, hospital encounters, special needs summaries, therapy progress summaries, telemedicine, and early intervention. Thirty-two percent of clinical specialists’ notes were excluded, including physical therapy (22.3%) and rehabilitation (9.7%). The train-test group (70% of the data) included 1974 patients from the cohort associated with 147 439 encounters and 325 577 clinical notes. The median age and demographics in the train-test group were similar to the total cohort: 15 years (IQR 10–20 years), with 56% male, 75% White, and less than 1% Hispanic. The validation group (30% of the data) consisted of 509 patients from the cohort associated with 32 305 encounters and 77 839 clinical notes. The patients in the validation group were a median age of 14 years (IQR 10–20 years), with 56% male, 75% White, and less than 1% Hispanic. No significant difference was found in demographics, ambulatory status, or GMFCS levels I–II and III between the train-test and validation groups. A significant association (p < 0.001) was found between GMFCS levels IV and V and group preferences between the train-test and validation groups (Table 1).

TABLE 1.

Cohort demographics

Cohort group	Total, n (%)	Train-test, n (%)	Validation, n (%)	p
Patient sample	2483 (100)	1974 (70)	509 (30)
Median age, years (IQR)	15 (10–20)	15 (10–20)	14 (10–20)	0.25
Sex				0.995
Male	1390 (56)	1105 (56)	285 (56)
Female	1093 (44)	869 (44)	224 (44)
Race^a				0.99
White	1862 (75)	1481 (75)	382 (75)
Non-White	621 (25)	493 (25)	127 (25)
Ambulatory	1486 (59.8)	1186 (60.1)	300 (58.9)	0.64
Non-ambulatory	997 (40.2)	788 (39.9)	209 (41.1)	0.64
GMFCS levels I–II	1230 (49.5)	984 (49.8)	246 (48.3)	0.69
GMFCS level III	256 (10.3)	202 (10.3)	54 (10.6)	0.69
GMFCS level IV	567 (22.9)	481 (24.3)	86 (16.9)	<0.001^b
GMFCS level V	430 (17.3)	307 (15.6)	123 (24.2)	<0.001^b

Open in a new tab

Ethnicity over 99% non-Hispanic.

Significance level is p<0.05.

Abbreviations: GMFCS, Gross Motor Function Classification System; IQR, interquartile range.

NLP and machine-learning-based model

An NLP-based classification model for ambulatory versus non-ambulatory status was initially applied at both the encounter and patient levels. Sixty percent of the encounters in the train-test set and 58.9% of the encounters in the validation set were labeled as ambulatory (i.e. GMFCS levels I, II, or III). NLP was then applied to encounters within the ambulatory group to classify GMFCS levels I–II (83% for train-test and validation) versus level III (17% for train-test and validation). NLP-based classification of encounters was also performed within the non-ambulatory group for GMFCS level IV (61% for train-test and 41% for validation) versus level V (39% for train-test and 59% for validation) (Table 2).

TABLE 2.

Results of train-test and validation sets at the encounter and patient levels

Model set	Train-test	Validation
Ambulatory versus non-ambulatory
Encounter level
Sensitivity	68.0%	70.4%
Specificity	88.6%	88.3%
Positive predictive value	79.8%	80.8%
Negative predictive value	80.7%	81.1%
Area under the receiver operating curve	0.88	0.89
Patient level
Sensitivity	84.2%	82.0%
Specificity	92.3%	92.4%
Positive predictive value	88.4%	88.0%
Negative predictive value	89.2%	88.3%
Area under the receiver operating curve	0.92	0.93
GMFCS levels I–II versus III
Encounter level
Sensitivity	17.3%	14.6%
Specificity	95.1%	96.3%
Positive predictive value	41.9%	45.6%
Negative predictive value	84.9%	84.3%
Area under the receiver operating curve	0.68	0.71
Patient level
Sensitivity	21.8%	14.3%
Specificity	95.8%	98%
Positive predictive value	53.9%	61.5%
Negative predictive value	84.3%	83.5%
Area under the receiver operating curve	0.68	0.67
GMFCS level IV versus V
Encounter level
Sensitivity	79.4%	81.1%
Specificity	51.2%	51.4%
Positive predictive value	71.2%	70.4%
Negative predictive value	61.4%	65.7%
Area under the receiver operating curve	0.74	0.75
Patient level
Sensitivity	87.8%	86.9%
Specificity	58.3%	57.1%
Positive predictive value	76.3%	74.7%
Negative predictive value	75.7%	75%
Area under the receiver operating curve	0.80	0.80

Open in a new tab

Abbreviation: GMFCS, Gross Motor Function Classification System.

Ambulatory status prediction model based on NLP

The train-test group had 88 663 encounters for 1186 patients who were ambulatory and 58 776 encounters for 788 patients who were non-ambulatory. The validation set had 19 042 encounters for 300 ambulatory patients and 13 263 encounters for 209 non-ambulatory patients. At the encounter level, the logistic regression with L1/L2 normalization achieved the best prediction performance on the train-test set, ranging from 68% to 89%, for detecting non-ambulatory status. Using the optimized parameters, the logistic regression achieved performance of 70% to 80% for detecting non-ambulatory status on the validation set. The AUC was high (0.88 and 0.99 respectively) for non-ambulatory status on both the train-test and validation sets. The same trend was observed at the patient level, where the logistic regression with L1/L2 normalization achieved high performance for detecting non-ambulatory patients on the train-test (AUC 0.92) and validation (AUC 0.93) sets. Table 2 shows the specified performance of sensitivity, specificity, positive predictive value, negative predictive value, and AUC for the encounter and patient levels on the train-test and validation sets.

GMFCS level prediction model based on NLP

The train-test group had 73 609 encounters for 984 patients in GMFCS levels I–II and 15 054 for 202 patients in level III. The validation set had 15 728 encounters for 246 patients in GMFCS levels I–II and 3314 for 54 patients in level III. At the encounter level for the train-test group, the logistic regression with L1/L2 normalization achieved performance of 17% to 95% for detecting GMFCS level III status on the train-test set. Using the optimized parameters, the logistic regression achieved performance of 15% to 96% for detecting GMFCS level III status on the validation set. The AUC was 0.68 and 0.71 on the train-test and validation sets respectively. The same trend was observed at the patient level, where the logistic regression achieved similar performance in detecting patients in GMFCS level III (Table 2).

The train-test group had 35 845 encounters for 481 patients in GMFCS level IV and 22 931 for 307 patients in level V. The validation set had 5475 encounters for 86 patients in GMFCS level IV and 7788 for 123 patients in level V. At the encounter level, the best-performing logistic regression with L1/L2 normalization achieved performance of 51% to 79% for detecting GMFCS level V status on the train-test set with an AUC of 0.74. The optimized logistic regression achieved performance of 51% to 81% for detecting GMFCS level V status on the validation set with an AUC of 0.75. The same trend was observed at the patient level, where the logistic regression achieved similar performance in detecting patients in GMFCS level V (Table 2).

DISCUSSION

The findings of this study support the use of NLP to predict ambulatory status and GMFCS levels. Using NLP to characterize functional status in children with CP, there was excellent prediction of ambulatory versus non-ambulatory status. The models were less robust for differentiating between GMFCS levels within ambulatory and non-ambulatory groups. NLP applied to the EHR differentiated between GMFCS level IV and V more optimally than between levels I–II and III. Sensitivity was low (~15%), indicating it is difficult to rule out GMFCS levels I–II when considering individuals in levels I, II, and III; however, specificity was high (~95%) for ruling in levels I–II. When considering all GMFCS levels (i.e. I, II, III, IV, and V), sensitivity was moderate (~70–80%) for ruling out, and specificity was high (~90%) for ruling in, ambulatory status (I–III) versus non-ambulatory status (IV and V) at both encounter and patient levels. The projection of ambulatory status resulted in high train-test validation performance at both the encounter and patient levels. The precision of AUC was high for both encounter (train-test 0.88, validation 0.89) and patient (train-test 0.92, validation 0.93) levels. Generalizability of the model to determine ambulatory status shows promise for individuals with CP. Good prediction model results were found for classifying between GMFCS levels IV and V at the encounter and patient levels; performance was maintained for the train-test validation models in this scenario as well. Clinical applications using NLP may help optimize and individualize functional characterization in children with CP.

To our knowledge, this is the first study applying NLP to the EHR to distinguish function using GMFCS levels in children with CP. Like previous NLP work,³ this project had similar challenges in distinguishing between ambulatory and non-ambulatory status. Our results indicate that performance decreased when attempting to differentiate between individual GMFCS levels, which is consistent with clinical uncertainty when determining levels within ambulatory (GMFCS levels I–III) and non-ambulatory (GMFCS levels IV and V) subgroups.³⁵ Detecting individuals classified in GMFCS level III is achieved more easily than levels I–II because the use of an assistive device (e.g. walker, gait trainer) with ambulation eliminates GMFCS levels I and II.³⁶ When attempting to distinguish between GMFCS levels I and II,¹⁷ it is challenging to differentiate an individual walking with few limitations (level II) or without limitations (level I). For individuals with more severe CP, it is less complicated to differentiate between GMFCS levels IV and V. This is probably due to the type of equipment associated with level V compared with level IV. For example, an individual in GMFCS level V is fully dependent upon caregivers for transfers and mobility, thus requiring a manual wheelchair, adaptive stroller, transfer lift, and stander. Because individuals in GMFCS level IV demonstrate some means of floor mobility (e.g. rolling or crawling) or self-mobility (e.g. power wheelchair or gait trainer), there was better distinction between levels IV and V in the EHR. With continued improvement in identifying text, NLP applied to the EHR could define GMFCS levels more accurately and allow better distinction between levels when there is clinical uncertainty. Overall, NLP can be applied to the EHR to predict GMFCS levels, but there may be ways to further optimize the prediction model, especially when distinguishing between levels I–II and III.

Limitations

This was a single-center study at a tertiary pediatric medical center in southwest Ohio, USA; thus generalizability to other institutions is uncertain. The median age of the cohort was 15 years at the time of data extraction; however, the information obtained from the EHR was over a 10-year period, so the models included a wide age-range of patients with CP. Age should be considered when interpreting the findings and generalizability. Our population was primarily White; therefore, future studies should evaluate these models in more diverse populations with consideration of race, ethnicity, sex, and age. Additionally, word clustering of text and phrases for the NLP model may have been missing. For example, discrete text for specific equipment, transfers, need for a lift team, and fall risk assessments may have helped distinguished between GMFCS levels. In the future, more sophisticated NLP methods should be considered, including use of phrases in addition to discrete text^4,11 and complicated classifiers such as neural network and deep learning to see whether model complexity affects predictive performance. Moreover, because GMFCS level III constituted only 17% of the data set, there was less information from that population to inform our model to differentiate between GMFCS levels I–II and III.

Clinical implications

In pediatric and adult populations, NLP has been applied to the EHR to screen for new medical diagnoses, categorize diagnosis types,^5–7,9,10 and look at outcomes broadly.^3,37 Clinical applications using the NLP-based prediction model across healthcare systems may help individualize functional characterization in patients with CP, assist with management across the lifespan, and optimize outcomes. Training all clinicians on measures specific to CP is difficult, especially in rural areas where subspecialists may not be prevalent. Using discrete data from imaging reports and clinical notes, a child could be identified as being at risk for CP at a younger age. In a clinical setting, the NLP methods described here could run in the background of an EHR system and the results made available to a clinician in real-time. A next step would be to expand the application of NLP and link in evidence-based suggestions; for example, if a patient were classified in GMFCS level IV, then a suggestion for when a hip X-ray was indicated could be supplied in real-time to a clinician through the EHR system. Future research should consider using NLP to determine the critical text to improve GMFCS prediction and how NLP could be used to assist with linking to evidence-based management suggestions for an individual patient. The NLP methodology used in this study could also have application for other developmental and neuromotor conditions.

Supplementary Material

fS1

Figure S1: Flow diagram.

NIHMS1811744-supplement-fS1.pdf^{(451KB, pdf)}

What this paper adds.

Natural language processing (NLP) applied to the electronic health record (EHR) can predict ambulatory status in children with CP.
NLP provides good prediction of Gross Motor Function Classification System level in children with CP using the EHR.
NLP methods described could be integrated in an EHR system to provide real-time information.

ACKNOWLEDGEMENTS

The authors have stated that they had no interests that might be perceived as posing a conflict or bias.

Funding information

This work was supported by Cincinnati Children’s Research Foundation and the National Institutes of Health (R01 HD103654).

Abbreviations:

AUC: Area under the receiver operating curve
EHR: Electronic health record
NLP: Natural language processing
TF–IDF: Term frequency–inverse document frequency

Footnotes

SUPPORTING INFORMATION

The following additional may be found online:

REFERENCES

1.Rosenbaum P, Paneth N, Leviton A, et al. A report: the definition and classification of cerebral palsy April 2006. Dev Med Child Neurol Suppl. Feb 2007;109:8–14. [PubMed] [Google Scholar]
2.Smithers-Sheedy H, Badawi N, Blair E, et al. What constitutes cerebral palsy in the twenty-first century? Dev Med Child Neurol. Apr 2014;56(4):323–8. doi: 10.1111/dmcn.12262 [DOI] [PubMed] [Google Scholar]
3.Agaronnik ND, Lindvall C, El-Jawahri A, He W, Iezzoni LI. Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability. Arch Phys Med Rehabil. Oct 2020;101(10):1739–1746. doi: 10.1016/j.apmr.2020.04.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. Sep-Oct 2011;18(5):544–51. doi: 10.1136/amiajnl-2011-000464 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Seol HY, Rolfes MC, Chung W, et al. Expert artificial intelligence-based natural language processing characterises childhood asthma. BMJ open respiratory research. 2020;7(1):e000524. doi: 10.1136/bmjresp-2019-000524 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kaur H, Sohn S, Wi CI, et al. Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC Pulm Med. Feb 13 2018;18(1):34. doi: 10.1186/s12890-018-0593-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Liang H, Tsui BY, Ni H, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. Mar 2019;25(3):433–438. doi: 10.1038/s41591-018-0335-9 [DOI] [PubMed] [Google Scholar]
8.Kurowski JA, Milinovich A, Ji X, et al. Differences in Biologic Utilization and Surgery Rates in Pediatric and Adult Crohn’s Disease: Results From a Large Electronic Medical Record-derived Cohort. Inflamm Bowel Dis. Sep 11 2020;doi: 10.1093/ibd/izaa239 [DOI] [PubMed] [Google Scholar]
9.Clark MM, Hildreth A, Batalov S, et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci Transl Med. Apr 24 2019;11(489)doi: 10.1126/scitranslmed.aat6177 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Yadav K, Sarioglu E, Choi HA, Cartwright WBt, Hinds PS, Chamberlain JM. Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury. Acad Emerg Med. Feb 2016;23(2):171–8. doi: 10.1111/acem.12859 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform. Apr 27 2019;7(2):e12239. doi: 10.2196/12239 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Graham SA, Lee EE, Jeste DV, et al. Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review. Psychiatry Res. Feb 2020;284:112732. doi: 10.1016/j.psychres.2019.112732 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Graham S, Depp C, Lee EE, et al. Artificial Intelligence for Mental Health and Mental Illnesses: an Overview. Curr Psychiatry Rep. Nov 7 2019;21(11):116. doi: 10.1007/s11920-019-1094-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Hanna SE, Bartlett DJ, Rivard LM, Russell DJ. Reference curves for the Gross Motor Function Measure: percentiles for clinical description and tracking over time among children with cerebral palsy. Phys Ther. May 2008;88(5):596–607. doi: 10.2522/ptj.20070314 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Palisano R, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol. Apr 1997;39(4):214–23. [DOI] [PubMed] [Google Scholar]
16.Rosenbaum PL, Walter SD, Hanna SE, et al. Prognosis for gross motor function in cerebral palsy: creation of motor development curves. Jama. Sep 18 2002;288(11):1357–63. doi: 10.1001/jama.288.11.1357 [DOI] [PubMed] [Google Scholar]
17.Palisano RJ, Rosenbaum P, Bartlett D, Livingston MH. Content validity of the expanded and revised Gross Motor Function Classification System. Dev Med Child Neurol. Oct 2008;50(10):744–50. doi: 10.1111/j.1469-8749.2008.03089.x [DOI] [PubMed] [Google Scholar]
18.Hanna SE, Rosenbaum PL, Bartlett DJ, et al. Stability and decline in gross motor function among children and youth with cerebral palsy aged 2 to 21 years. Dev Med Child Neurol. Apr 2009;51(4):295–302. doi: 10.1111/j.1469-8749.2008.03196.x [DOI] [PubMed] [Google Scholar]
19.Li Q, Spooner SA, Kaiser M, et al. An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Med Inform Decis Mak. May 2015;15:37. doi: 10.1186/s12911-015-0160-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ni Y, Kennebeck S, Dexheimer JW, et al. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. J Am Med Inform Assoc. Jan 2015;22(1):166–78. doi: 10.1136/amiajnl-2014-002887 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Ni Y, Wright J, Perentesis J, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC medical informatics and decision making. 2015;15:28–28. doi: 10.1186/s12911-015-0149-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Ni Y, Barzman D, Bachtel A, Griffey M, Osborn A, Sorter M. Finding warning markers: Leveraging natural language processing and machine learning technologies to detect risk of school violence. Int J Med Inform. Jul 2020;139:104137. doi: 10.1016/j.ijmedinf.2020.104137 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Salciccioli JD, Crutain Y, Komorowski M, Marshall DC. Sensitivity Analysis and Model Validation. Secondary Analysis of Electronic Health Records. Springer International Publishing; 2016:263–271. [PubMed] [Google Scholar]
24.Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. 3111–3119. [Google Scholar]
25.Manning C, Schutze H. Foundations of statistical natural language processing. MIT press; 1999. [Google Scholar]
26.Bishop CM. Pattern recognition. Machine learning. 2006;128(9) [Google Scholar]
27.Shawe-Taylor J, Cristianini N. Kernel methods for pattern analysis. Cambridge university press; 2004. [Google Scholar]
28.Alriksson-Schmidt A, Nordmark E, Czuba T, Westbom L. Stability of the Gross Motor Function Classification System in children and adolescents with cerebral palsy: a retrospective cohort registry study. Dev Med Child Neurol. Jun 2017;59(6):641–646. doi: 10.1111/dmcn.13385 [DOI] [PubMed] [Google Scholar]
29.Imms C, Carlin J, Eliasson AC. Stability of caregiver-reported manual ability and gross motor function classifications of cerebral palsy. Dev Med Child Neurol. Feb 2010;52(2):153–9. doi: 10.1111/j.1469-8749.2009.03346.x [DOI] [PubMed] [Google Scholar]
30.Palisano RJ, Avery L, Gorter JW, Galuppi B, McCoy SW. Stability of the Gross Motor Function Classification System, Manual Ability Classification System, and Communication Function Classification System. Dev Med Child Neurol. Oct 2018;60(10):1026–1032. doi: 10.1111/dmcn.13903 [DOI] [PubMed] [Google Scholar]
31.Rutz E, Tirosh O, Thomason P, Barg A, Graham HK. Stability of the Gross Motor Function Classification System after single-event multilevel surgery in children with cerebral palsy. Dev Med Child Neurol. Dec 2012;54(12):1109–13. doi: 10.1111/dmcn.12011 [DOI] [PubMed] [Google Scholar]
32.Wood E, Rosenbaum P. The gross motor function classification system for cerebral palsy: a study of reliability and stability over time. Dev Med Child Neurol. May 2000;42(5):292–6. [DOI] [PubMed] [Google Scholar]
33.Flanagin A, Frey T, Christiansen SL. Updated Guidance on the Reporting of Race and Ethnicity in Medical and Science Journals. JAMA. 2021–08-17 2021;326(7):621. doi: 10.1001/jama.2021.13304 [DOI] [PubMed] [Google Scholar]
34.Coughlin SS, Trock B, Criqui MH, Pickle LW, Browner D, Tefft MC. The logistic modeling of sensitivity, specificity, and predictive value of a diagnostic test. J Clin Epidemiol. Jan 1992;45(1):1–7. doi: 10.1016/0895-4356(92)90180-u [DOI] [PubMed] [Google Scholar]
35.Nordmark E, Jarnlo GB, Hägglund G. Comparison of the Gross Motor Function Measure and Paediatric Evaluation of Disability Inventory in assessing motor function in children undergoing selective dorsal rhizotomy. Dev Med Child Neurol. Apr 2000;42(4):245–52. doi: 10.1017/s0012162200000426 [DOI] [PubMed] [Google Scholar]
36.Reid SM, Carlin JB, Reddihough DS. Using the Gross Motor Function Classification System to describe patterns of motor severity in cerebral palsy. Dev Med Child Neurol. Nov 2011;53(11):1007–12. doi: 10.1111/j.1469-8749.2011.04044.x [DOI] [PubMed] [Google Scholar]
37.Newman-Griffis D, Camacho Maldonado J, Ho P-S, et al. Linking Free Text Documentation of Functioning and Disability to the ICF With Natural Language Processing. Original Research. Frontiers in Rehabilitation Sciences. 2021;2 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

fS1

Figure S1: Flow diagram.

NIHMS1811744-supplement-fS1.pdf^{(451KB, pdf)}

[R1] 1.Rosenbaum P, Paneth N, Leviton A, et al. A report: the definition and classification of cerebral palsy April 2006. Dev Med Child Neurol Suppl. Feb 2007;109:8–14. [PubMed] [Google Scholar]

[R2] 2.Smithers-Sheedy H, Badawi N, Blair E, et al. What constitutes cerebral palsy in the twenty-first century? Dev Med Child Neurol. Apr 2014;56(4):323–8. doi: 10.1111/dmcn.12262 [DOI] [PubMed] [Google Scholar]

[R3] 3.Agaronnik ND, Lindvall C, El-Jawahri A, He W, Iezzoni LI. Challenges of Developing a Natural Language Processing Method With Electronic Health Records to Identify Persons With Chronic Mobility Disability. Arch Phys Med Rehabil. Oct 2020;101(10):1739–1746. doi: 10.1016/j.apmr.2020.04.024 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. Sep-Oct 2011;18(5):544–51. doi: 10.1136/amiajnl-2011-000464 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Seol HY, Rolfes MC, Chung W, et al. Expert artificial intelligence-based natural language processing characterises childhood asthma. BMJ open respiratory research. 2020;7(1):e000524. doi: 10.1136/bmjresp-2019-000524 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Kaur H, Sohn S, Wi CI, et al. Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC Pulm Med. Feb 13 2018;18(1):34. doi: 10.1186/s12890-018-0593-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Liang H, Tsui BY, Ni H, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. Mar 2019;25(3):433–438. doi: 10.1038/s41591-018-0335-9 [DOI] [PubMed] [Google Scholar]

[R8] 8.Kurowski JA, Milinovich A, Ji X, et al. Differences in Biologic Utilization and Surgery Rates in Pediatric and Adult Crohn’s Disease: Results From a Large Electronic Medical Record-derived Cohort. Inflamm Bowel Dis. Sep 11 2020;doi: 10.1093/ibd/izaa239 [DOI] [PubMed] [Google Scholar]

[R9] 9.Clark MM, Hildreth A, Batalov S, et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci Transl Med. Apr 24 2019;11(489)doi: 10.1126/scitranslmed.aat6177 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Yadav K, Sarioglu E, Choi HA, Cartwright WBt, Hinds PS, Chamberlain JM. Automated Outcome Classification of Computed Tomography Imaging Reports for Pediatric Traumatic Brain Injury. Acad Emerg Med. Feb 2016;23(2):171–8. doi: 10.1111/acem.12859 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform. Apr 27 2019;7(2):e12239. doi: 10.2196/12239 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Graham SA, Lee EE, Jeste DV, et al. Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review. Psychiatry Res. Feb 2020;284:112732. doi: 10.1016/j.psychres.2019.112732 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Graham S, Depp C, Lee EE, et al. Artificial Intelligence for Mental Health and Mental Illnesses: an Overview. Curr Psychiatry Rep. Nov 7 2019;21(11):116. doi: 10.1007/s11920-019-1094-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Hanna SE, Bartlett DJ, Rivard LM, Russell DJ. Reference curves for the Gross Motor Function Measure: percentiles for clinical description and tracking over time among children with cerebral palsy. Phys Ther. May 2008;88(5):596–607. doi: 10.2522/ptj.20070314 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Palisano R, Rosenbaum P, Walter S, Russell D, Wood E, Galuppi B. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev Med Child Neurol. Apr 1997;39(4):214–23. [DOI] [PubMed] [Google Scholar]

[R16] 16.Rosenbaum PL, Walter SD, Hanna SE, et al. Prognosis for gross motor function in cerebral palsy: creation of motor development curves. Jama. Sep 18 2002;288(11):1357–63. doi: 10.1001/jama.288.11.1357 [DOI] [PubMed] [Google Scholar]

[R17] 17.Palisano RJ, Rosenbaum P, Bartlett D, Livingston MH. Content validity of the expanded and revised Gross Motor Function Classification System. Dev Med Child Neurol. Oct 2008;50(10):744–50. doi: 10.1111/j.1469-8749.2008.03089.x [DOI] [PubMed] [Google Scholar]

[R18] 18.Hanna SE, Rosenbaum PL, Bartlett DJ, et al. Stability and decline in gross motor function among children and youth with cerebral palsy aged 2 to 21 years. Dev Med Child Neurol. Apr 2009;51(4):295–302. doi: 10.1111/j.1469-8749.2008.03196.x [DOI] [PubMed] [Google Scholar]

[R19] 19.Li Q, Spooner SA, Kaiser M, et al. An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Med Inform Decis Mak. May 2015;15:37. doi: 10.1186/s12911-015-0160-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Ni Y, Kennebeck S, Dexheimer JW, et al. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. J Am Med Inform Assoc. Jan 2015;22(1):166–78. doi: 10.1136/amiajnl-2014-002887 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Ni Y, Wright J, Perentesis J, et al. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC medical informatics and decision making. 2015;15:28–28. doi: 10.1186/s12911-015-0149-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Ni Y, Barzman D, Bachtel A, Griffey M, Osborn A, Sorter M. Finding warning markers: Leveraging natural language processing and machine learning technologies to detect risk of school violence. Int J Med Inform. Jul 2020;139:104137. doi: 10.1016/j.ijmedinf.2020.104137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Salciccioli JD, Crutain Y, Komorowski M, Marshall DC. Sensitivity Analysis and Model Validation. Secondary Analysis of Electronic Health Records. Springer International Publishing; 2016:263–271. [PubMed] [Google Scholar]

[R24] 24.Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. 3111–3119. [Google Scholar]

[R25] 25.Manning C, Schutze H. Foundations of statistical natural language processing. MIT press; 1999. [Google Scholar]

[R26] 26.Bishop CM. Pattern recognition. Machine learning. 2006;128(9) [Google Scholar]

[R27] 27.Shawe-Taylor J, Cristianini N. Kernel methods for pattern analysis. Cambridge university press; 2004. [Google Scholar]

[R28] 28.Alriksson-Schmidt A, Nordmark E, Czuba T, Westbom L. Stability of the Gross Motor Function Classification System in children and adolescents with cerebral palsy: a retrospective cohort registry study. Dev Med Child Neurol. Jun 2017;59(6):641–646. doi: 10.1111/dmcn.13385 [DOI] [PubMed] [Google Scholar]

[R29] 29.Imms C, Carlin J, Eliasson AC. Stability of caregiver-reported manual ability and gross motor function classifications of cerebral palsy. Dev Med Child Neurol. Feb 2010;52(2):153–9. doi: 10.1111/j.1469-8749.2009.03346.x [DOI] [PubMed] [Google Scholar]

[R30] 30.Palisano RJ, Avery L, Gorter JW, Galuppi B, McCoy SW. Stability of the Gross Motor Function Classification System, Manual Ability Classification System, and Communication Function Classification System. Dev Med Child Neurol. Oct 2018;60(10):1026–1032. doi: 10.1111/dmcn.13903 [DOI] [PubMed] [Google Scholar]

[R31] 31.Rutz E, Tirosh O, Thomason P, Barg A, Graham HK. Stability of the Gross Motor Function Classification System after single-event multilevel surgery in children with cerebral palsy. Dev Med Child Neurol. Dec 2012;54(12):1109–13. doi: 10.1111/dmcn.12011 [DOI] [PubMed] [Google Scholar]

[R32] 32.Wood E, Rosenbaum P. The gross motor function classification system for cerebral palsy: a study of reliability and stability over time. Dev Med Child Neurol. May 2000;42(5):292–6. [DOI] [PubMed] [Google Scholar]

[R33] 33.Flanagin A, Frey T, Christiansen SL. Updated Guidance on the Reporting of Race and Ethnicity in Medical and Science Journals. JAMA. 2021–08-17 2021;326(7):621. doi: 10.1001/jama.2021.13304 [DOI] [PubMed] [Google Scholar]

[R34] 34.Coughlin SS, Trock B, Criqui MH, Pickle LW, Browner D, Tefft MC. The logistic modeling of sensitivity, specificity, and predictive value of a diagnostic test. J Clin Epidemiol. Jan 1992;45(1):1–7. doi: 10.1016/0895-4356(92)90180-u [DOI] [PubMed] [Google Scholar]

[R35] 35.Nordmark E, Jarnlo GB, Hägglund G. Comparison of the Gross Motor Function Measure and Paediatric Evaluation of Disability Inventory in assessing motor function in children undergoing selective dorsal rhizotomy. Dev Med Child Neurol. Apr 2000;42(4):245–52. doi: 10.1017/s0012162200000426 [DOI] [PubMed] [Google Scholar]

[R36] 36.Reid SM, Carlin JB, Reddihough DS. Using the Gross Motor Function Classification System to describe patterns of motor severity in cerebral palsy. Dev Med Child Neurol. Nov 2011;53(11):1007–12. doi: 10.1111/j.1469-8749.2011.04044.x [DOI] [PubMed] [Google Scholar]

[R37] 37.Newman-Griffis D, Camacho Maldonado J, Ho P-S, et al. Linking Free Text Documentation of Functioning and Disability to the ICF With Natural Language Processing. Original Research. Frontiers in Rehabilitation Sciences. 2021;2 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Gross motor function prediction using natural language processing in cerebral palsy

Kelly Greve

Yizhao Ni

Amy F Bailes

Jilda Vargus-Adams

Aimee E Miley

Bruce Aronow

Mary M McMahon

Brad G Kurowski

Alexis Mitelpunkt

Abstract

Aim:

Method:

Results:

Interpretation:

Graphical Abstract

METHOD

Study design

Annotations

Feature extraction and predictive modeling

Statistical analysis

RESULTS

TABLE 1.

NLP and machine-learning-based model

TABLE 2.

Ambulatory status prediction model based on NLP

GMFCS level prediction model based on NLP

DISCUSSION

Limitations

Clinical implications

Supplementary Material

What this paper adds.

ACKNOWLEDGEMENTS

Funding information

Abbreviations:

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases