Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 1.
Published in final edited form as: Health Informatics J. 2024 Oct-Dec;30(4):14604582241295930. doi: 10.1177/14604582241295930

BoneScore: A natural language processing algorithm to extract bone mineral density data from DXA scans

Samah Fodeh 1, Rixin Wang 2, Terrence E Murphy 3, Farah Kidwai-Khan 4, Linda S Leo-Summers 5, Baylah Tessier-Sherman 6, Evelyn Hsieh 7, Julie A Womack 8
PMCID: PMC11872221  NIHMSID: NIHMS2056513  PMID: 39526751

Abstract

Objective:

To develop and test an NLP algorithm that accurately detects the presence of information reported from DXA scans containing femoral neck T-scores of the patients scanned.

Methods:

A rule-based NLP algorithm that iteratively built a collection of regular expressions in testing data consisting of 889 snippets of text pulled from DXA reports. This was manually checked by clinical experts to determine the proportion of manually verified annotations that contained T-score information detected by this algorithm called ‘BoneScore’. Testing of 30- and 50-word lengths on each side of the key term ‘femoral’ were pursued until achievement of adequate accuracy. A separate clinical validation regressed the extracted T-score values on five risk factors with established associations.

Results:

BoneScore built a set of 20 regular expressions that in concert with a width of 50 words on each side of the key term yielded an accuracy of 98% in the testing data. The extracted T-scores, when modeled with multivariable linear regression, consistently exhibited associations supported by the literature.

Conclusion:

BoneScore uses regular expressions to accurately extract annotations of T-score values of bone mineral density with a width of 50 words on each side of the key term. The extracted T-scores exhibit clinical face validity.

Keywords: NLP, dual-energy X-ray absorptiometry, femoral neck t-score, veterans, electronic health record

Introduction

As the number of people aging increases,1 so does the importance of early and accurate identification of age-related comorbidities such as osteoporosis and fragility fractures. A key limitation for studying fragility fractures is their very low incidence. Our prior research suggests an annual rate of occurrence of any osteoporotic fracture (wrist, hip, vertebral, upper arm) of approximately 1%.2 Large electronic health record (EHR)-based cohorts, such as the Veterans Aging Cohort Study-HIV (VACS-HIV),3 and the VACS-National Cohort, are useful when exploring these outcomes as they provide adequate power in the context of a longitudinal study. Nonetheless, most EHR-based cohorts are unable to exploit bone mineral density (BMD) as a predictor for fragility fractures as it is typically absent from structured data fields. Studies have dealt with the challenge of identifying BMD in different ways. Previous approaches have included use of administrative data (e.g., presence of diagnostic codes for osteoporosis or osteopenia), personal interviews or questionnaires, manual chart review, and prospective recruitment.4-9 As might be expected, these labor-intensive approaches have been restricted to small studies. A more efficient approach that lends itself to application of large samples was presented by LaFleur and colleagues10,11 who developed a natural language processing (NLP) pipeline to extract measures of BMD from radiology reports and clinical notes. The accuracy of their algorithm for identifying standardized measures of BMD (T-scores) was 82.8%.

Objective

We proposed to develop “BoneScore,” an NLP tool to extract femoral neck T-scores of patients who undergo dual-energy X-ray absorptiometry (DXA), that we expect to demonstrate better performance than the algorithm developed by LaFleur and colleagues.10,11 The femoral neck is the narrow section of the femur that connects the femoral shaft with the femoral head. Because femoral neck BMD is highly predictive of hip fractures, it represents a specific area of clinical interest. We will limit our analysis to T-scores, which are standardized measures of femoral neck BMD that can be directly entered in the Fracture Risk Assessment Tool (FRAX) to identify individuals at risk for fragility fractures.12,13 T-scores are a comparison of the patient’s bone density with that of healthy, young individuals of the same sex. A T-score of between −1 and −2.5 is diagnostic of osteopenia (reduced bone mass); a T-score of −2.5 or lower is diagnostic of osteoporosis (severely reduced bone mass). In addition to validating the accuracy of our tool, as an illustration of its utility, we will also characterize the distribution of T-scores extracted from a separate, temporally distinct sample of Veterans living with HIV who are clinically at risk for low BMD. Lastly, we will demonstrate that the T-scores extracted from this sample display associations that reflect those of well-established risk factors including age, history of fragility fracture, body mass index (BMI), female sex, and non-white race.

Materials and methods

Amongst the VACS-National Cohort of 13 million Veterans (917,596 women) followed from 2000 to 2020, 695,645 Veterans (including 159,454 women) had DXA imaging, comprising a total of 1,387,479 DXA reports. To ensure that we covered a large variety of the linguistic expressions and patterns documented by radiologists across the VA, we randomly selected 1000 DXA scan reports spanning the years 2015 to 2020. DXA reports typically include extensive text, most of which is unrelated to T-scores. Accordingly, we focused on snippets of text centered around a key term, which may present multiple times in a report. This means that multiple snippets may be extracted from a single DXA report. We selected all snippets from the 1000 DXA scan reports, resulting in 1089 snippets that were partitioned into training and test samples. The training sample contained 889 snippets and the testing sample consisted of the remaining 200 snippets (Figure 1).

Figure 1.

Figure 1.

Development of training and test sets for BoneScan.

Extraction of snippets

A DXA report can have multiple snippets corresponding to T-score values taken from measurements of BMD from both the “Left” and the “Right” femoral necks or from the mean of the two. For this reason, multiple snippets were extracted from each patient’s DXA report. The T-score value usually appears in the context of the term “femoral neck,” so we used the word “femoral” as the search keyword. This approach examined the context of an initial window size of 30 words on each side of the keyword “femoral.” We then expanded the window size to 50 words on each side and achieved more accurate results.

Manual annotation

Our process of manual annotation involved the reviewing of DXA scan reports to verify representations of femoral neck T-score measurements focusing on identifying the value of the T-score, the site of the T-score (femoral neck), and the laterality of the value (right, left, or the mean of the two). Because DXA scan reports are written by radiologists, language and presentation of DXA scan results varied considerably across clinical sites. For this reason, we used an iterative approach to capture the variation in terminology that would need to be reflected in the regular expressions. This provided the gold standard on which we were able to confidently assess whether the annotations that contained T-score information were in fact detected by BoneScan. We found that the T-score was generally reported in either of two formats; some were embedded as text in the body of the narrative while others were presented within data tables. In the narratives, T-scores were presented as either a signed real number, or in units of standard deviations either below (negative) or above (positive) the mean. Table 1 shows the sequence of identified patterns that were iteratively updated with new terminology to identify femoral neck T-score annotations that were subsequently discovered among previously unexamined DXA reports. It also provides an example of a table found in a particular report; we note that tables with different formats appeared throughout the reports.

Table 1.

Examples of how T-scores appeared in the snippets.

Examples of T-score mention in the narratives
The right femoral neck is 0.830 grams per centimeter squared with a T score of −0.7grams per centimeter
Right proximal femur is 0.987 g/cm2, with a T-score of −0.2
Femoral neck T-Score (compares patient with a 30 year old normal of the same sex and ethnic group) is −2.3
Femoral neck= T score −2.4
Score is −1.8
T-score -- −1.4 %
T – score = −0.1
T-score −0.9SD
T-score (−3.5)SD
T score of 0.1
T-score value of −1.4
The left femoral neck bone mineral density is 0.864 g/cm2 which is 1.6 standard deviations below the mean corresponding to the WHO criteria for osteopenia.
Example of a table in a DXA scan report
Region BMD T-score Z-score Classification
AP Spine (L1-L4) 0.919 1.6 1.0 Osteopenic
Femoral Neck (Left) 0.638 2.1 1.2 Osteopenic
Total Hip (Left) 0.797 1.6 1.2 Osteopenic

To be as thorough as possible in the development of our NLP approach, our clinical expert performed multiple rounds of manual annotation to ensure a high degree of confidence that we considered all possible patterns of T-score information as they manifested within the snippets. The DXA reports were written by radiologists from across the VA, resulting in significant variations in the way BMD information was presented including differences in formatting, details, and metrics included. We therefore selected multiple samples from the training data to capture the full variation of documentation included in the reports.

BoneScore: A rule-based NLP algorithm

As discussed previously, the T-score values from DXA scans were reported in two possible formats: text or data table. In both formats, the text around the T-score values demonstrated heterogenous linguistic patterns. To extract T-scores from the snippets of text in an efficient and effective manner, we developed a rule-based algorithm and applied it to the reports. We annotated 1089 snippets with T-scores (total of 1346 annotations) and developed two sets of regular expressions to extract them from the particular formats in which they occurred. Specifically, we developed one set of regular expressions that targeted the relevant patterns within snippets of text and a second set of expressions that retrieved the T-score values from within data tables.

Set of text regular expressions.

A set of regular expressions was used to identify those T-scores that were embedded within the text of the DXA scan reports. Based on manual annotation, we catalogued many patterns of text containing T-scores and wrote corresponding regular expressions that captured these patterns. We quantified these regular expressions in two ways: the first captured the T-score in units of standard deviations (above and below the mean), whereas the second retrieved T-scores reported as signed real numbers. For T-scores reported in units of standard deviations, we used the regular expressions listed in Table 2 to fetch the site, the T-score, and the +/− sign.

Table 2.

Regular Expressions to capture T-scores reported in units of standard deviations.

Regular expression for T-score presented as standard deviation
((left∣right∣both∣)\s+femoral\s+neck(?:(?!standard deviations∣standard deviation∣T[− ]+score∣L-1 - L-4∣L1-L4∣L2-L3∣neck of the∣−?\s*\d\.?\d*\s*to\s+minus\s*−?\s*\d\.?\d*∣−?\s*\d\.?\d*\s*and\s*−?\s*\d\.?\d*).)*\s+(−?\d\.?\d*)\s+standard\s+deviation[s ](?:(?!standard deviations∣Z[− ]score).)*below)
((left∣right∣both∣)\s+femoral\s+neck(?:(?!standard deviations∣standard deviation∣T[− ]+score∣L-1 - L-4∣L1-L4∣L2-L3∣neck of the∣−?\s*\d\.?\d*\s*to\s+minus\s*−?\s*\d\.?\d*∣−?\s*\d\.?\d*\s*and\s*−?\s*\d\.?\d*).)*\s+(−?\d\.?\d*)\s+standard\s+deviation[s ](?:(?!standard deviations∣Z[− ]score).)*above)

Several patterns were used to describe the T-scores in the reports. To capture these, we developed regular expressions based on the annotated snippets obtained from DXA reports. Although we made every effort to adjust existing regular expressions to accommodate new patterns, in almost every iteration of the manual annotation process we identified new pattern(s). When necessary, we created a new expression that was appended to the end of the list. If this new expression conflicted with the performance of a prior expression, the new regular expression was added to the beginning of the list. This emphasized the importance of the order in which we applied the regular expressions to the snippets. The expression with order number 0 was the first created; it was followed by 15 regular expressions that were added to the list to address subsequently discovered patterns. To avoid conflicts with pre-existing regular expressions, four more regular expressions were later added to the top of the list (Table 3).

Table 3.

Regular expressions to detect patterns of T-score in report narratives.

Order Regular Expression
−4 (t[− ]*score\s+\(.*\)\s*is\s+(?:(?!femoral neck).)*\s+(−?\s*\d\.?\d*)\s+at the\s+(left∣right∣both∣)\s*femoral neck)
−3 ((?:(?!L∣R∣left∣right∣both∣left hip density scores∣right hip density scores).)*\W+femoral neck\W+(left∣right)(?:(?!t[− ]*score∣femoral neck∣l1-l4).)*t[− ]*score\s(\([^\)]*\) of∣\([^\)]*\) is∣ value of∣ (calculated ∣ measuring ∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*))
−2 ((left∣right∣both)\s+hip(?:(?!L\W+femoral neck∣R\W+femoral neck∣left\W+femoral neck∣right\W+femoral neck∣both\W+femoral neck∣density scores\W+femoral neck∣density scoresW+femoral neck∣right hip∣left hip).)*\W+femoral neck(?:(?!t[− ]*score∣femoral neck∣l1-l4∣forearm).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated ∣ measuring ∣ is∣of∣=∣:∣–∣\(∣ )\s*([−+]?\s*\d\.?\d*)\s*(below\s+the\s+mean∣above\s*the\s*mean∣))
−1 ((left∣right)\s+hip:?\s+(BMD\s*\(Total\)\s*=\s*∣study\s+shows?\s+total\s+bone\s+mineral\s+density\s+measures\s+)[\s*∣\d∣\.]+\s+(t-score\s*=\s*∣gm/cm2\s+which\s+correspond\s+to\s+a\s+T\s+score\s+of\s+[−∣\d∣\.]+\s+and\s+Z\s+score\s+of\s+)(−?\s*\d\.?\d*)\s+femoral\s+neck:?\s*(BMD\s*=\s*∣\s+study\s+shows?\s+total\s+bone\s+mineral\s+density\s+measures)\s*(−?\s*\d\.?\d*)\s*(T-score\s*=\s*∣\s+gm/cm2\s+which\s+correspond\s+to\s+a\s+T\s+score\s+of\s+)(−?\s*\d\.?\d*))
0 ((L∣R∣left∣right∣both∣left hip density scores∣right hip density scores∣)\W+femoral neck(?:(?!t[− ]*score∣femoral neck∣l1-l4∣Lumbar\s+Spine∣distal\s+radius).)*t[− ]*score\s*(\([^\)]*\)of∣\([^\)]*\) is∣\), this is∣value of∣calculated∣measuring∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*)\s*(below\s+the\s+mean∣above\s*the\s*mean∣S\.D\.\s+below\s+the\s+mean\s+value∣S\.D\.\s+above\s+the\s+mean\s+value ∣))
1 ((left∣right∣both∣)\s+hip\W+femoral neck(?:(?!t[− ]*score∣femoral neck∣l1-l4).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*))
2 neck of the ((left∣right∣both∣)\s+femur(?:(?!t[− ]*score∣ femur).)*t[− ]*score\s*(\([^\)]*\)\s*:∣\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\d\.?\d*))
3 (t[− ]*score\s*(of∣for)\s+the\s+(left∣right∣both∣)\s+femoral\s+neck\s*(is∣=∣:∣–∣measures∣\(∣ )\s*(−?\s*\d\.?\d*))
4 (t\s+value\s+of\s+(−?\s*\d\.?\d*)\s+within\s+the\s*(left∣right∣both∣)\s*femoral neck)
5 ((left∣right∣both∣)\s+femoral\s+neck(?:(?!t[− ]*score∣femoral neck∣Lumbar\s+Spine∣distal\s+radius).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣is∣of∣=∣:∣–∣\(∣)\s*[\(](−?\s*\d\.?\d*))
6 (femoral\s+neck\s+(left∣right∣both∣)\s+(?:(?!t[− ]*score∣femoral neck).)*\s+(−?\s*\d\.?\d*)\s*t[− ]*score)
7 ((left∣right∣)\W+femoral neck BMD(?:(?!femoral neck).)*femoral neck %(?:(?!femoral neck).)*T\s*(=)\s*(−?\s*\d\.?\d*))
8 (mean t[− ]*score(?:(?!femoral neck).)*in\s+(left∣right∣the∣)*\s+femoral neck\s*(−?\s*\d\.?\d*))
9 ((Lt\.∣Rt\∣left\right)\W+Hip-neck(?:(?!t[− ]*score∣femoral neck∣l1-l4).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣measuring∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*))
10 ((left∣right∣both∣left hip density scores∣right hip density scores∣)\W*femur neck(?:(?!t[− ]*score∣femoral neck∣l1-l4).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣measuring∣is∣of∣=∣:∣∣–∣\(∣ )\s*(−?\s*\d\.?\d*))
11 (mean\s+bone\s+mineral\s+density\s+of\s+the\s+(left∣right∣both∣)\s*femoral\s+necks*(?:(?!t[∣ ]*score∣femoral neck).)*corresponds\s+to\s+a\s+T\s+value\s+ohs*(−?\s*\d\.?\d*))
12 ((left∣right∣both∣)\s+proximal femur(?:(?!t[∣ ]*score∣femoral neck∣l1-l4).)*neck(?:(?!t[∣ ]*score∣femoral neck∣l1-l4).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣measuring∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*))
13 ((left∣right∣both∣)\s+femoral head neck.*the hips(?:(?!t[− ]*score∣femoral neck).)*t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣measuring∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*))
14 (t[− ]*score\s+for\s+the\s+(left∣right∣both∣)\s+femoral\s+neck(?:(?!t[− ]*score(femoral neck).)*(\([^\))*\) of∣\([^\)]*\) is∣value of∣calculated∣measuring∣is∣of∣=∣:∣–∣\(∣?\s*\d\.?\d*))
15 ((left∣right∣both∣)\s+femoral\s+neck.*(?:(?!t[− ]*score∣femoral neck).)*femur\s+t[− ]*score\s*(\([^\)]*\) of∣\([^\)]*\) is∣value of∣calculated∣measuring∣is∣of∣=∣:∣–∣\(∣ )\s*(−?\s*\d\.?\d*))

Set of table regular expressions.

T-scores were extracted from tables in two steps. We first wrote regular expressions to match the table header in the snippet. We then used a regular expression specific to that table to parse its context and select the T-score value. Based on the first round of annotations, we identified a single table format. More formats were identified through subsequent rounds of annotation. Table 4 presents the different regular expressions that were developed to extract the T-scores from all the tables encountered in the manually annotated reports.

Table 4.

Regular expressions to detect T-scores contained within tables.

Regular expression to detect table
headers
Regular expression to find T-score values
REGION\s+BMD\s+v?\s+T-SCORE ((left∣right∣both∣)\s+femoral\s+neck[s]*\s*\s+−?\d\.?\d*\s+(−?\d\.?\d*))
(femoral\s+neck\s*\((left∣right∣both∣)\)\s+−?\d\.?\d*\s+(−?\d\.?\d*))
'\(g.cm2\)\s+\(t-Score\)\s+\(z-score\)∣bmd\s+t score\s+z score\s*∣bmd\s+t−\s+z−' ((left∣right∣both∣)\s+femoral\s+neck\s*\s+−?\d\.?\d*\s+(−?\d\.?\d*)\s+(−?\d\.?\d*))
(femoral\s+neck\s*\((left∣right∣both∣)\)\s+−?\d\.?\d*\s+(−?\d\.?\d*)\s+(−?\d\.?\d*))
L2-4\(AP\)\s+Femoral Neck\(L\)\s+Femoral Neck\(R\) (t[− ]*score:\s+−?\s*\d\.?\d* SD\*\s*(−?\s*\d\.?\d*)\s*SD\s*(−?\s*\d\.?\d*)\s*SD)
Region\s+Exam\s+Age\s+BMD\s+T-score\s+BMD Change\s+BMD Change ((left∣right∣both∣)\s+femoral\s+neck\s*\s+\d\d/\d\d/\d\d\d\d\s+\d+\s+−?\d\.?\d*\s+(−?\d\.?\d*))
Region\s+Exam\s+Age\s+BMD\s+T-score\s+BMD Change\s+BMD Change (Date\s+T-Score\s+\(L1-L4\)\s+T-Score\(Neck\)\s*T-Score\(Total\s+Hip\) (femoral\s+neck\s*\((left∣right∣both∣)\)\s+\d\d/\d\d/\d\d\d\d\s+\d+\s+−?\d\.?\d*\s+(−?\d\.?\d*))
(\d\d?/\d\d?/\d\d\d\d)\s+(−?\d\.?\d*)\s+(−?\d\.?\d*)\s+(−?\d\.?\d*))

While experimenting, we first applied the table regular expressions followed by the text regular expressions. We used this approach because the text regular expressions were more complex than the table regular expressions, and therefore we were more likely to encounter mismatches. By applying the table regular expressions first, we were able to reduce the number of mismatches. We found that after each round, we had to re-apply the updated regular expression list to the snippets of text to ensure that we captured all new patterns.

Evaluation.

We used the snippets in the training set to develop the NLP method and to generate the lists of regular expressions for both text and tables. Once satisfied with the overall performance, we tested the NLP approach by comparing the T-scores identified by our method against instances that were manually identified. To evaluate these comparisons, we computed the following measures:

Accuracy:

The proportion of instances wherein the BoneScan approach correctly identified T-scores out of the total of all annotations that had been manually confirmed as containing T-score information.

To check our progress, we computed accuracy during all rounds of annotation and would subsequently decide whether more patterns needed to be identified and whether the NLP rules needed to be adjusted. Once satisfied with the performance, we applied our method to the test data and reported the results.

Validating BoneScore on a second cohort.

To demonstrate whether the T-scores obtained by BoneScore displayed clinical face validity, we further analyzed them in a second cohort. This cohort consisted of Veterans with HIV who were 50 years of age or older who were at elevated risk of having decreased BMD. We took those who had undergone a DXA scan in either 2008 or 2009 which resulted in a sample of 372 persons. As this was a sample where many were younger (mean age 62 years) than the age at which routine screening would be expected (65 years for women and 70 years for men14), we assumed that these were patients who had risk factors for low BMD and therefore had a positive likelihood of experiencing a fracture or of having low BMD. We applied BoneScore to this cohort and extracted the T-scores for each person. We subsequently used multivariable linear regression to model the T-scores.

Results

The top half of Table 5 shows that when parsing out sections of text with the width of 30 words on either side of the key term ‘femoral’, the algorithm did not achieve monotonically increasing accuracy with additional rounds of data evaluation. This constitutes evidence that this width was inadequate to consistently detect the annotations of interest. The bottom half of Table 5 shows that when the width was extended to 50 words on either side of the key term the algorithm exhibited monotonically increasing accuracy through the overall count of 263 manually verified annotations present within the test data. We were satisfied with this performance in our training data and tested the final set of regular expressions on the test data of 200 snippets that contained 263 annotations with T-score information. In the test data, we obtained an accuracy (95% confidence interval) of 98.5% (96.7 – 100.2). Out of all annotations of T-score instances that were manually identified, this represents the proportion that were correctly detected by BoneScan (259/263). Precision (positive predictive value) was 1.00, and recall (sensitivity) was 0.98. We note that because the gold standard, that is, manual identification, is limited to confirming the presence of a relevant annotation rather than its absence, there is no performance metric equivalent to specificity for these algorithms. The other metric of performance routinely reported for NLP algorithms is the F-measure, which represents an equally weighted combination of precision and recall. Our F-measure was 99%.

Table 5.

Performance of the regular expressions at multiple rounds of annotation.

Dataset Size of context Round # Snippets #annotations # Errors Accuracy (%)
Training 30 words left & right of the keyword 1 67 75 3 96
2 100 114 4 96
3 80 103 16 85
4 171 201 24 88
5 171 205 27 87
50 words left & right of the keyword 6 100 125 10 92
7 100 127 9 93
8 100 133 5 96
Testing 50 words - 200 263 4 98

Demonstration of clinical validity of BoneScore

We first plotted a histogram of the T-scores that were extracted from the second cohort. The histogram demonstrated a symmetrically unimodal distribution that comprised strong evidence for an assumption of normality (Figure 2), with the mean of −1.5 falling squarely within the range of T-scores for persons with osteopenia.15

Figure 2.

Figure 2.

Distribution of femoral neck T-scores.

Prior to modeling the T-scores, we calculated descriptive statistics for five factors whose associations with BMD are well established: age, body mass index (BMI), female sex, history of fragility fracture, and non-white race16 (Table 6).

Table 6.

Explanatory factors of variability in T-scores.

Variables N Minimum Maximum Mean Median
Age as of 12/31/2010 372 50.00 89.00 61.98 61.00
History of fracture within last 5 years 372 0.00 1.00 0.07 0.00
Body Mass index 372 15.01 42.930 26.01 25.60
Female sex 372 0.00 1.00 0.09 0.00
Non-white race 372 0.00 1.00 0.42 0.00

These values attested to the relative youth of this cohort (mean age of 62 years), which was made up almost entirely of men (9% females), was slightly overweight (mean BMI of 26), and demonstrated a good balance of white and non-white races (42% non-white). The non-white group included participants who self-reported their race as one of the following: American Indian, Asian, Black, mixed race, other, or Pacific Islander. We note that because fragility fractures are a low incidence outcome, the proportion with history of fragility fracture in the prior 5 years was commensurately low (7%). Using multivariable linear regression, we modeled the T-scores on the five aforementioned factors (Figure 3). Age, history of prior fracture, BMI, and nonwhite race each exhibited significant associations with magnitudes and directions supported by the literature. Whereas female sex, owing to its wide confidence interval, was not significant, its negative point estimate was also corroborated by the literature. Age exhibited a small magnitude association of −0.02, meaning that for each successive year, the T-score decreased by a little more than two percent.

Figure 3.

Figure 3.

Forest Plot of multivariable associations with T-scores.

This is in accordance with the literature which has clearly established that aging is often associated with a slow, progressive decrease in BMD.17 In contrast, BMI exhibits a larger point estimate of 0.10 that is positive, meaning that incremental gain in BMI is associated with higher BMD. This is also supported by the literature as heavier persons require greater bone mass to sustain their weight.18 History of fragility fracture exhibits a large negative association of −0.53, which is also supported by the literature.19 Lastly, the positive association of non-white race is also well supported in prior studies.20 Although female sex did not achieve statistical significance, its demonstration of a negative point estimate (−0.22) is also consistent with prior literature.21 It is likely that the very small proportion of women explains the lack of significance. In summary, while this illustration does not add new insight regarding the performance of BoneScan, the overall distribution of the extracted T-scores, in concert with their exhibition of well-established associations with five important factors, represent strong clinical validity of the T-scores extracted by BoneScore.

Discussion

We have developed BoneScore using regular expressions that identify femoral neck T-scores from DXA scan reports with high accuracy. LaFleur and colleagues developed an algorithm that had an accuracy of 82.8%, while the accuracy of BoneScore was 98%. Furthermore, we have demonstrated that the T-scores extracted from this sample align with values reported in the literature and display associations that reflect those of well-established risk factors for decreased bone mineral density and fragility fractures including age, history of fragility fracture, BMI, female sex, and non-white race.16,22 However, in EHR-based studies that are large enough to explore fragility fractures, identifying T-scores has been problematic as they are not included as structured data fields. Using BoneScore, we will be able to include femoral neck T-scores in our predictive models for fragility fracture.

In our analysis, we found that the width of the window around the snippet was an important factor in matching T-score patterns. Narrower snippet windows missed important word combinations that resulted in a failure to extract certain T-score values. Whereas wider snippet windows may be desirable, overly wide snippets can introduce noise and thereby generate false positives, especially when historical T-score values were reported in the snippet. Our first window width of 30 words on each side of the keyword femoral was insufficient and resulted in the omission of many T-scores and/ or truncation of their values. The results improved when we extended the width of the window to 40 words on each side of the keyword. The rationale for this change was that the T-score was expected to follow the mention of the word “femoral”. Nonetheless, because we noticed that 40-word widths on each side of the keyword were insufficient, we increased the widths to 50 words on each side of the keyword. This change yielded the best performance for our NLP tool.

This study has several important strengths and limitations. The strengths include a large sample size and annotation performed by clinical experts. The sample from our second cohort had a mean femoral neck T-score of −1.5, well within the range of osteopenia. This reflects that our sample of persons, who were younger than the commonly used screening ages of ≥65 and 70 for women and men, respectively, included a higher proportion of low T-scores than would be expected in the general population of older adults. Our choice of regular expressions as the base approach for building BoneScore was driven by the limited and well-defined vocabulary typically used to describe T-scores in DXA scan reports. However, despite our efforts to select a representative sample of DXA scan reports, it is possible that we incorrectly identified some T-scores or missed them altogether because of our tool not leveraging new or different linguistic expressions that were not encountered during the training of our algorithm. To its advantage, BoneScore can be easily scaled to accommodate large volumes of data which will facilitate future research and clinical studies.

Conclusion

In conclusion, we have developed a very accurate NLP tool-- BoneScore -- that can extract femoral neck T-scores from DXA scan reports. Our tool demonstrates significant improvement in accuracy relative to existing methods. BoneScore can be easily scaled to accommodate large volumes of data. The data produced by this tool will be useful in developing predictive models for fractures among PWH and other persons at risk for fragility fractures.

Acknowledgements

The authors would like to acknowledge all Veterans who contributed data to this analysis.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the This work was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (R01 AR078715), the Yale Claude D. Pepper Older Americans Independence Center (P30AG021342) and the National Center for Advancing Translational Sciences at the National Institutes of Health (UL1 TR002014).

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Disclaimer

The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.

Ethical Approval

The VACS National Cohort and VACS HIV were approved by the Institutional Review Boards of VA Connecticut Healthcare System at West Haven (AJ0001) and by Yale University (0309025943). Both are EHR-based cohorts that have been granted waivers of informed consent by the VA and Yale University and are HIPAA compliant.

Contributor Information

Samah Fodeh, Yale School of Medicine, New Haven, CT, USA; VA Connicticut Healthcare System, West Haven, CT, USA.

Rixin Wang, Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA.

Terrence E Murphy, Penn State College of Medicine, Hershey, PA, USA.

Farah Kidwai-Khan, Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA.

Linda S Leo-Summers, Yale School of Medicine, New Haven, CT, USA.

Baylah Tessier-Sherman, Yale School of Medicine, New Haven, CT, USA.

Evelyn Hsieh, Yale School of Medicine, New Haven, CT, USA; VA Connecticut Healthcare System, West Haven, CT, USA.

Julie A Womack, VA Connecticut Healthcare System, West Haven, CT, USA; Yale School of Nursing, New Haven, CT, USA.

References

  • 1.Althoff KN, Stewart CN, Humes E, et al. The shifting age distribution of people with HIV using antiretroviral therapy in the United States. AIDS 2021; 36: 459–471. DOI: 10.1097/QAD.0000000000003128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Womack JA, Murphy TE, Leo-Summers L, et al. Assessing the contributions of modifiable risk factors to serious falls and fragility fractures among older persons living with HIV. J Am Geriatr Soc 2023, In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fultz SL, Skanderson M, Mole LA, et al. Development and verification of a “virtual” cohort using the National VA health information System. Med Care 2006; 44: S25–30. [DOI] [PubMed] [Google Scholar]
  • 4.Kong SY, Kim DY, Han EJ, et al. Effects of a ‘drug holiday’ on bone mineral density and bone turnover marker during bisphosphonate therapy. J Bone Metab 2013; 20: 31–35. DOI: 10.11005/jbm.2013.20.1.31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Soroko SB, Barrett-Connor E, Edelstein SL, et al. Family history of osteoporosis and bone mineral density at the axial skeleton: the Rancho Bernardo Study. J Bone Miner Res 1994; 9: 761–769. DOI: 10.1002/jbmr.5650090602. [DOI] [PubMed] [Google Scholar]
  • 6.Robitaille J, Yoon PW, Moore CA, et al. Prevalence, family history, and prevention of reported osteoporosis in U.S. women. Am J Prev Med 2008; 35: 47–54. DOI: 10.1016/j.amepre.2008.03.027. [DOI] [PubMed] [Google Scholar]
  • 7.de Lusignan S, Chan T, Wood O, et al. Quality and variability of osteoporosis data in general practice computer records: implications for disease registers. Publ Health 2005; 119: 771–780. DOI: 10.1016/j.puhe.2004.10.018. [DOI] [PubMed] [Google Scholar]
  • 8.Cvijetic S, Colic Baric I and Satalic Z. Influence of heredity and environment on peak bone density: a parent-offspring study. J Clin Densitom 2010; 13: 301–306. DOI: 10.1016/j.jocd.2010.03.003. [DOI] [PubMed] [Google Scholar]
  • 9.Allin S, Munce S, Jaglal S, et al. Capture of osteoporosis and fracture information in an electronic medical record database from primary care. AMIA Annu Symp Proc 2014; 2014: 240–248. [PMC free article] [PubMed] [Google Scholar]
  • 10.LaFleur J, DuVall SL, Willson T, et al. Analysis of osteoporosis treatment patterns with bisphosphonates and outcomes among postmenopausal veterans. Bone 2015; 78: 174–185. DOI: 10.1016/j.bone.2015.04.022. [DOI] [PubMed] [Google Scholar]
  • 11.LaFleur J, Ginter T, Curtis JR, et al. A novel method for obtaining bone mineral densities from a dataset of radiology reports and clinic notes: natural language processing in a national cohort of postmenopausal veterans. Baltimore: American Society of Bone and Mineral Research Annual Meeting, MD2013. [Google Scholar]
  • 12.Kanis JA, Oden A, Johnell O, et al. The use of clinical risk factors enhances the performance of BMD in the prediction of hip and osteoporotic fractures in men and women. Osteoporos Int 2007; 18: 1033–1046. DOI: 10.1007/s00198-007-0343-y. [DOI] [PubMed] [Google Scholar]
  • 13.Kanis JA, Johansson H, Oden A, et al. Assessment of fracture risk. Eur J Radiol 2009; 71: 392–397. [DOI] [PubMed] [Google Scholar]
  • 14.Cosman F, de Beur SJ, LeBoff MS, et al. Clinician’s guide to prevention and treatment of osteoporosis. Osteoporos Int 2014; 25: 2359–2381. DOI: 10.1007/s00198-014-2794-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Varacallo M, Seaman TJ, Jandu JS, et al. Osteopenia. Treasure Island, FL: StatPearls. Disclosure: Travis Seaman declares no relevant financial relationships with ineligible companies. Disclosure: Jagmohan Jandu declares no relevant financial relationships with ineligible companies. Disclosure: Peter Pizzutillo declares no relevant financial relationships with ineligible companies, 2023. [Google Scholar]
  • 16.Pouresmaeili F, Kamalidehghan B, Kamarehei M, et al. A comprehensive overview on osteoporosis and its risk factors. Ther Clin Risk Manag 2018; 14: 2029–2049. DOI: 10.2147/TCRM.S138000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Demontiero O, Vidal C and Duque G. Aging and bone loss: new insights for the clinician. Ther Adv Musculoskelet Dis 2012; 4: 61–76. DOI: 10.1177/1759720X11430858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hariri AF, Almatrafi MN, Zamka AB, et al. Relationship between body mass index and T-scores of bone mineral density in the hip and spine regions among older adults with diabetes: a retrospective review. J Obes 2019; 2019: 9827403. DOI: 10.1155/2019/9827403. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Balasubramanian A, Zhang J, Chen L, et al. Risk of subsequent fracture after prior fracture among older women. Osteoporos Int 2019; 30: 79–92. DOI: 10.1007/s00198-018-4732-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Noel SE, Santos MP and Wright NC. Racial and ethnic disparities in bone health and outcomes in the United States. J Bone Miner Res 2021; 36: 1881–1905. DOI: 10.1002/jbmr.4417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bliuc D, Alarkawi D, Nguyen TV, et al. Risk of subsequent fractures and mortality in elderly women and men with fragility fractures with and without osteoporotic bone density: the Dubbo Osteoporosis Epidemiology Study. J Bone Miner Res 2015; 30: 637–646. DOI: 10.1002/jbmr.2393. [DOI] [PubMed] [Google Scholar]
  • 22.Papaioannou A, Kennedy CC, Cranney A, et al. Risk factors for low BMD in healthy men age 50 years or older: a systematic review. Osteoporos Int 2009; 20: 507–518. DOI: 10.1007/s00198-008-0720-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES