Predictive Modeling of the Total Joint Replacement Surgery Risk: a Deep Learning Based Approach with Claims Data

Riyi Qiu; Yugang Jia; Fei Wang; Pramod Divakarmurthy; Samuel Vinod; Behlool Sabir; Mirsad Hadzikadic

. 2019 May 6;2019:562–571.

Predictive Modeling of the Total Joint Replacement Surgery Risk: a Deep Learning Based Approach with Claims Data

Riyi Qiu ^1,³, Yugang Jia ^1,^*, Fei Wang ², Pramod Divakarmurthy ⁴, Samuel Vinod ⁴, Behlool Sabir ⁴, Mirsad Hadzikadic ³

PMCID: PMC6568108 PMID: 31259011

Abstract

Total joint replacement (TJR) is one of the most commonly performed, fast-growing elective surgical procedures in the United States. Given its huge volume and cost variation, it has been regarded as one of the top opportunities to reduce health care cost by the industry. Identifying patients with a high chance of undergoing TJR surgery and engaging them for shopping is the key to success for plan sponsors. In this paper, we experimented with different machine learning algorithms and developed a novel deep learning approach to predict TJR surgery based on a large commercial claims dataset. Our results demonstrated that the performance of the gated recurrent neural network is better than other methods regardless of data representation methods (multi-hot encoding or embedding). Additional pooling mechanism can further improve the performance of deep learning models for our case.

Introduction and Background

Total joint replacement (TJR) is one of the most commonly performed, elective surgical procedures in the United States, with over 1 million total hip and total knee replacement procedures performed each year¹. The volume of primary and revision TJR procedures have risen continuously in recent decades. By 2030, primary total hip replacement (THR) is projected to grow 171% and primary total knee replacement (TKR) is projected to grow by up to 1 89%, for a projected 635,000 and 1.28 million procedures, respectively².

Given its volume and growth rate, the total cost of TJR has been scrutinized for opportunities to improve the margin of providers or reduce the healthcare burden of payers. One important finding is that there is a significant cost variation of TJR procedures. Based on a report published by Health Care Cost Institute (HCCI) in 2016³, inpatient facility service of TJR is a top shoppable service in the United States for employer-sponsored insurance (ESI) population with age younger than 65, which accounts for 1.3% of total ESI spending in 2011. Another report published by BlueCross BlueShield (BCBSA) and Blue Health Intelligence (BHI) in 2015 showed that identical TJR procedures can quadruple in cost depending on which hospital is selected within a market. A more recent study⁴ showed that the average cost of care for total knee arthroplasty across the hospitals varied by a factor of about 2 to 1, despite having similar patient demographics and readmission and complication rates.

Based on those findings, various cost transparency tools have been developed to enable patients (consumers) to consume value through shopping, chosen the lower-priced higher-quality providers. Employers usually offer those tools to their employees through third parties or carriers for free. To maximize the return of investment on such tools, it is crucial to identify those who might benefit from such tools (e.g. people who need TJR surgery in the future) and engage them in time.

In view of this, we proposed to leverage claims data to identify the patients who might need a TJR surgery in the future. Compared with clinical data, the claims data are easy to obtain and deploy in large scale, especially for non-clinical settings. However, the claims data are usually noisy, high-dimensional, sparse, incomplete, and heterogeneous^5–10. To tackle such challenges, researchers have been applying deep neural networks models such as Convolutional Neural Networks (CNN)^7,8,11,12 and Recurrent Neural Networks (RNN)^{5,6,9–11,13–19} to predict the events.

In this paper, we investigated the performance of various CNN and RNN algorithms to predict TJR on a large scale commercial claim dataset. More specifically, we are interested in the following aspects:

Compared with baseline algorithms (LASSO and random forest), how much performance gain can we achieve by using the complex deep learning approach? The baseline algorithms aggregate the medical events along the time dimension hence losing the temporal and contextual information, while the deep learning based approach should be able to capture more complex structure of data at the expense of computational complexity.
Which deep learning model is better for elective surgery prediction? It is well known that the RNN algorithm can do a better job in capturing longtime dependency than that of CNN algorithm. However, the results from the literature were data dependent²⁰.
How will data representation methods impact the performance of the deep learning model? We implemented two data representation methods in this paper: multi-hot coding and embedding. Previous studies have shown mixed results for acute cases and we want to investigate its role for elective surgery^5,6,14.
Will the hidden state information help our prediction task? In traditional RNN algorithm, only the last hidden state information will be used for prediction. Given that TJR is an elective procedure, it is possible that the patient may delay the procedure even if he has met the criteria. From this perspective, we believed the intermediate state information could also be useful.

Data Description

Data are extracted from MarketScan^† commercial claims and encounters database. It covers employees and their dependents with age less than 65 years old. The cohort is defined as follows:

Fully enrolled in years 2014, 2015, and 2016.
Diagnosed with Rheumatoid Arthritis/ Osteoarthritis based on CMS-CCW Chronic Condition Algorithms^‡.
No TJR surgery§ in 2014 and 2015.
Age over 45 in 2014.

The cohort of 540,000 patients are selected with around 3.5% positive cases (have a TJR surgery in 2016). The basic statistics of the dataset are listed in Table 1.

Table 1:

Basic statistics of the TJR dataset by year.

year	2014	2015
# patients with records	535,499	537,205
# days with events	18,300,352	19,863,997
# medical codes	134,071,176	146,802,340
Avg. days with events per patient	34	37
Avg. codes per patient	250	273

Open in a new tab

The following data elements from MarketScan database are selected as features for modeling purpose:

Demographic variables: Age and gender. For deep learning models, the Demographic variables are concatenated with other variables in the last layer.
Diagnosis codes: 283 distinct CCS diagnosis codes¶, mapped from both ICD-9-CM^|| and ICD-10-CM** codes in the MarketScan database.
Procedure codes: 240 CCS procedure codes^††, mapped from both ICD-10-PCS codes^‡‡, Current Procedural Terminology (CPT) and Healthcare Common Procedure Coding System (HCPCS) in the database.
Therapeutic classes: 222 therapeutic classes which are derived from drug information by data vendor.
Revenue codes: 651 standard revenue codes defined by the Health Care Finance Administration (HCFA).
Place of service: 45 codes, such as pharmacy, home, ambulance, hospital, or other facilities.
Provider types. 131 provider types such as birthing center, radiology, or dentist.
Service sub-category codes: 498 sub-service types such as mammograms, MRIs, or PET Scans etc.

Data Representation

Three data representation methods are used in this paper. An aggregated occurrence vector is applied on the conventional machine learning models; For the deep neural networks, the simple multi-hot encoding and medical feature embedding are explored.

Aggregated occurrence vector. we create a binary vector for each patient with length equals to the number of unique codes for each year. If a code appears in that year, the corresponding elements in that vector will be set as 1 for this patient. The final feature vector for each patient is the concatenation of binary vectors for all years in the observation window.
Multi-hot encoding. Every patient record is formed as a temporary-code binary matrix. The (i, j)th element of the matrix is 1 if i-th code appeared on the j-th day for a specific patient. The detailed explanation can be found in the works of Cheng et al.⁷ and Che et al.⁸
Embedding. To reduce the dimension of the feature matrix, we use Skip-gram method for code embedding^21,22. More specifically, we use a sliding time window of 14 days to collect unique codes and reshuffle it as a ‘sentence’. Detailed explanation is described by Choi et al.²³ and Farhan et al.²⁴. The output embedding dimension is set to 100.

Implemented Models and Experimental Details

The script language used for the experiments is Python. The medical feature embedding is trained with Gensim. The neural network models are implemented in Keras and Tensorflow. For each implemented method, the result is provided by the mean and standard deviation of a 5-fold cross validation.

Implementation of LASSO and random forest

The alpha of LASSO is set to 0.001 after performing a grid search.

The detailed parameter for RF is as follows: the number of trees is 100; the maximum depth of the tree is 100; the minimum number of samples required to split an internal node is 10; the minimum number of samples required to be at a leaf node is 10; the number of features to consider when looking for the best split is set to the square root of the number of total features.

Implementation of CNN

The CNN model we use is similar to the work of Cheng et al.⁷. The network have 4 layers: The size of the input layer is the same as the number of features (codes); the second layer is a one dimensional convolutional layer, where convolutions of different sizes slide along the time axis and obtain features; the third layer performs dropout, pooling, and normalization operations, in order to fasten the computation and control over-fitting; the last layer is a fully connected output layer with a logistic regression to make the prediction. After tuning, we use 3 type of filters with filter length equals 3, 4 and 5 and set the number of filters to 100. We use ‘adam’ as the optimizer and the learning rate is set to 0.001. The batch size is set as 250.

Implementation of RNN

We selected the gated recurrent unit (GRU) to implement the RNN for simplicity. As shown in Figure 1, given an input sequence x_t and the last hidden state h_t-1 at each time step, GRU updates the hidden states h_t. GRU cell is built with sophisticated gating mechanism. It contains a reset gate r_t and an update gate z_t. The computations inside the solid line box of Figure 1 are as follows:

z_{t} = σ (U_{z} x_{t} + W_{z} h_{t - 1} + b_{z})

r_{t} = σ (U_{r} x_{t} + W_{r} h_{t - 1} + b_{r})

{\tilde{h}}_{t} = \tanh (U_{h} x_{t} + r_{t} ⊙ W_{h} h_{t - 1} + b_{h})

h_{t} = z_{t} ⊙ h_{t - 1} + (1 - z_{t}) ⊙ {\tilde{h}}_{t}

Figure 1: — The RNN architecture and pooling operations.

where σ() denotes the sigmoid function and ⊙ is the operator for Hadamard product (i.e. element-wise multiplication); At each time step, the visit information x_t and hidden state of the last time step h_t-1 are the inputs; three sets of U, W, b are the weights and biases to calculate two gates and the intermediate memory unit ${\tilde{h}}_{t}$ . The sigmoid function makes the values of both gates between 0 and 1. The reset gate retains the useful information and drops the rest and the update gate decides how much of last hidden state to be passed onto the next state. We employ dropout on the final hidden state h_{_T} for regularization, concatenate it with the patient’s demographic data, and make the TJR prediction with logistic regression (Listed as the “Typical RNN” in Figure 1).

After tuning, we set the hyper-parameters as follows: the hidden layer dimension is set to 200; the dropout rate is 0.2; the optimizer, learning rate, and batch size are ‘adam’, 0.001, and 250, respectively.

Implementation of pooling with RNN

In a typical GRU model, the output only depends on the logistic regression of the last hidden state h_T. Recently researchers started to explore how to leverage other hidden state information to boost the performance^25,26. We also apply different pooling operations to all hidden state and concatenate it with the final state for final prediction. The rationale for using these sequences is that the positive or negative signal is often included in just a few visits at any point of the whole time period. The signal may have been ignored or weakened while it is passed to the last hidden state. As an elective surgery, the decision of TJR can possibly be made early but postponed due to other more urgent health problems. Three kinds of pooling operations are as follows:

Max pooling layer. This layer outputs the maximum values of each dimension over the whole time period. By sending the strongest signal directly to the final state, the network becomes more sensitive to the important events.
Average pooling layer. The idea of average pooling is to make the loss function consider all intermediate states, leading to better convergence and generalization.
Minimum pooling layer. This layer is equivalent to -maxpool(-x). Proposed by Skinner²⁶, min-pooling is supposed to enable the model to pass the other end of the activations in addition to the max-pooling. The network will become more “balanced” and “expressive”.

We can use one or multiple pooling strategies and concatenate them together to test its performance improvement, as shown in Figure 1.

Results and Discussion

We perform experiments with two different observation windows. For the first setting, we only use 2015 data to predict TJR event in 2016. For the second setting, we use both 2014 and 2015 data to predict TJR event in 2016.

Two metrics are used for performance comparison. The first one is the area under the curve (AUC) which measures the overall performance of the model. The second one is precision with recall set to 0.9, which measures the real world performance of the model after consulting with business partners.

The deep learning approach performs much better than traditional algorithms such as logistic regression and random forest (RF).

Table 2 shows the performance of RF, LASSO, CNN with multi-hot coding (CNN-MH) and GRU with multi-hot coding (RNN-MH). The pair-wise t-test shows that the deep learning methods (CNN-MH and RNN-MH) perform significantly better than RF and LASSO in all scenarios with p=0.001.

Another interesting observation is that the performance of RF and LASSO does not increase when more data (2014 data) was included. However, the performance of deep learning methods will increase significantly with more data. This indicates that the deep learning method is more capable of exploring complex relationships in time series data.
The RNN based algorithm outperforms CNN based approach regardless of different representation (multi-hot or embedding).

From Table 2, it is clear that RNN is much better than CNN in all scenarios, especially when 2014 data is included in the observation window. Pair-wise t-test shows that the difference is significant with p=0.001. In view of this, we will only investigate RNN algorithms from now on.
The data representation has a limited effect on the performance and training efficiency of RNN algorithms.

As shown in Table 3, the difference between RNN with two different data representation methods (RNN-EMB: RNN with embedding) are not significant in all scenarios. The training time per epoch for RNN-EMB is almost 2-3 times of that for RNN-MH. However, the RNN-MH converges much faster than RNN-EMB. As a result, the total training time of RNN-MH is similar to that of RNN-EMB in our experiments. As we formatted our input data as a 3-D matrix (the 3 dimensions are feature, time and patients respectively), the training time for both of models increases linearly with the number of patients, which is different compared with what is reported by Choi et al.⁶.
Additional maximum and minimum pooling mechanism can improve the performance of the RNN baseline algorithm. The best performance is achieved when we add pooling mechanism to RNN-MH algorithm.

Table 4 shows the performance of RNN-MH and RNN-EMB with pooling methods. Pair-wise t-test demonstrates that the performance of RNN-EMB will be better if the maximum or minimum pooling are included with p =0.005. However, there is no significant difference when the average pooling is used with RNN-EMB. When we use all three pooling methods, the performance of both RNN-MH and RNN-EMB are improved significantly.

Table 2:

Performance comparison between baseline models and deep learning models.

Model	Trained with 2015 data		Trained with both 2014 and 2015 data
Model	AUC	Precision@Recall=0.9	AUC Precision@Recall=0.9
LASSO	0:7616 ± 0:0048	0:0527 ± 0:0003	0:7682 ± 0:0046	0:0532 ± 0:0013
RF	0:7853 ± 0:0050	0:0533 ± 0:0015	0:7887 ± 0:0040	0:0541 ± 0:0007
CNN-MH	0:8086 ± 0:0036	0:0572 ± 0:0012	0:8218 ± 0:0053	0:0645 ± 0:0015
RNN-MH	0:8200 ± 0:0073	0:0577 ± 0:0029	0:8339 ± 0:0024	0:0662 ± 0:0008

Open in a new tab

Table 3:

Running time comparison between RNN-MH and RNN-EMB.

	Measurement	RNN-MH	RNN-EMB
Trained with 2015 data	Avg. training time per epoch	2589s	1028s
	No. of epoch to converge	3	7
	Total training time	7767s	7196s
Trained with both 2014 and 2015 data	Avg. training time per epoch	3450s	1246s
	No. of epoch to converge	4	11
	Total training time	13800s	13706s

Open in a new tab

Table 4:

Performance of RNN with different pooling methods.

Model	Trained with 2015 data		Trained with both 2014 and 2015 data
Model	AUC	Precision@Recall=0.9	AUC	Precision@Recall=0.9
RNN-MH	0:8200 ± 0:0073	0:0577 ± 0:0029	0:8339 ± 0:0024	0:0662 ± 0:0008
RNN-MH-MAX-MIN-AVG	0:8289 ± 0:0043	0:0601 ± 0:0015	0:8423 ± 0:0042	0:0693 ± 0:0025
RNN-EMB	0:8154 ± 0:0060	0:0574 ± 0:0016	0:8349 ± 0:0053	0:0668 ± 0:0020
RNN-EMB-MAX-MIN-AVG	0:8234 ± 0:0056	0:0591 ± 0:0019	0:8402 ± 0:0051	0:0685 ± 0:0024
RNN-EMB-MAX	0:8222 ± 0:0060	0:0594 ± 0:0018	0:8379 ± 0:0032	0:0681 ± 0:0017
RNN-EMB-MIN	0:8241 ± 0:0043	0:0598 ± 0:0018	0:8387 ± 0:0055	0:0675 ± 0:0020
RNN-EMB-AVG	0:8173 ± 0:0061	0:0575 ± 0:0017	0:8334 ± 0:0065	0:0667 ± 0:0032
RNN-EMB-MAX-MIN	0:8218 ± 0:0044	0:0591 ± 0:0014	0:8405 ± 0:0045	0:0687 ± 0:0020

Open in a new tab

In Figure 2, we plot the ROC curves of baseline models and three deep learning models with multi-hot encoding trained with 2014 and 2015 data. The figures demonstrate that deep learning methods are much better than the traditional methods and adding the pooling mechanisms further improve the performance of RNN.

Model explanation

We analyze the predictions made by the best model (RNN-MH-MAX-MIN-AVG) trained with both 2014 and 2015 data. Similar to what is described by Choi et al.,⁶ we sample 200 patients each from true positive (TP), false positive (FP), true negative (TN) and false negative (FN). This procedure is performed for all 5-folds, which give us 1000 patients in total. Then we calculate the number of days with those codes per patient. In the end, we compare their distribution across 4 categories (TP, FP, TN, and FN).

Figure 3 showed 3 representative codes that can explain the behavior of our prediction. The box-plot in each sub-figure showed the distribution of the number of days with that specific code per 4 categories (TP, FP, TN, and FN). The model assign more weights to the patients who have more interactions with health care provider, as shown in left sub-figure. More specifically, our model prefers those who got service from orthopedic surgery more often and have undergone the procedure of arthrocentesis more frequently.

Conclusion

We investigated several deep learning methods to predict the TJR surgery based on a large commercial claims dataset with more than 2,000 variables and 540,000 patients. Without surprise, the performance of deep learning based approach is much better than traditional methods (e.g. random forest and LASSO). Among the investigated deep learning methods, the RNN with pooling mechanism works the best for our use case. We tested two different data representation methods and discovered that the embedding techniques do not improve either the performance or the training efficiency of RNN in all scenarios. Our experiments also suggest that pooling mechanisms are able to discover the additional signals from the intermediate hidden states hence improving the performance of the baseline RNN algorithm.

Footnotes

^†

^‡

https://www.ccwdata.org/documents/10280/19139608/ccw-cond-algo-arthritis.pdf

^§

TJR surgery is identified by DRG = 269 or DRG=270

^¶

clinical classification software (CCS) provided by ARHQ https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp

^||

International Classification of Disease Ninth Revision, Clinical Modification (ICD-9-CM)

^**

International Classification of Disease Tenth Revision, Clinical Modification (ICD-10-CM)

^††

clinical classification software (CCS) provided by ARHQ https://www.hcup-us.ahrq.gov/toolssoftware/ccs10/ccs10.jsp

^‡‡

International Classification of Disease Tenth Revision, Procedure Coding System (ICD-10-PCS)

References

1.Steiner C, Andrews R, Barrett M. A W. HCUP projections: mobility/orthopedic procedures 2003 to 2012. 2012. HCUP Projections Report. 2012;(03) [Google Scholar]
2.Matthew S, Sheth NP. Projected volume of primary and revision total joint arthroplasty in the United States, 2030-2060. In: 2018 Annual Meeting of the American Academy of Orthopaedic Surgeons (AAOS). AAOS.2018. [Google Scholar]
3.Spending on shoppable services in health care. HCCI; 2016 2016. Mar, In: Healthcare Cost Institute, Issue Brief 11. [Google Scholar]
4.Haas DA, Rober S K. Variation in the cost of care for primary total knee arthroplasties. Arthroplasty Today. 2016;3(1):33–37. doi: 10.1016/j.artd.2016.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Xiao C, Ma T, Dieng AB, Blei DM, Wang F. Readmission prediction via deep contextual embedding of clinical concepts. PloS one. 2018;13(4):e0195024. doi: 10.1371/journal.pone.0195024. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Choi E, Schuetz A, Stewart WF, Sun J. Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association. 2016;24(2):361–370. doi: 10.1093/jamia/ocw112. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining; SIAM; 2016. pp. 432–440. [Google Scholar]
8.Che Z, Cheng Y, Sun Z, Liu Y. Exploiting convolutional neural network for risk prediction with medical feature embedding. arXiv preprint arXiv:170107474. 2017.
9.Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems. 2016. pp. 3504–3512.
10.Che C, Xiao C, Liang J, Jin B, Zho J, Wang F. An RNN architecture with dynamic temporal matching for personalized predictions of parkinson’s disease. In: Proceedings of the 2017 SIAM International Conference on Data Mining; SIAM; 2017. pp. 198–206. [Google Scholar]
11.Razavian N, Marcus J, Sontag D. Multi-task prediction of disease onsets from longitudinal lab tests. arXiv preprint arXiv:160800647. 2016.
12.Razavian N, Sontag D. Temporal convolutional neural networks for diagnosis from lab tests. arXiv preprint arXiv:151107938. 2015.
13.Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J. Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM; 2017. pp. 65–74. [Google Scholar]
14.Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting; NIH Public Access; 2016. p. 473. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Lipton ZC, Kale DC, Elkan C, Wetzel R. arXiv preprint arXiv:151103677. 2015. Learning to diagnose with LSTM recurrent neural networks. [Google Scholar]
16.Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Scientific reports. 2018;8(1):6085. doi: 10.1038/s41598-018-24271-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Esteban C, Staeck O, Baier S, Yang Y, Tresp V. Predicting clinical events by combining static and dynamic information using recurrent neural networks. In: Healthcare Informatics (ICHI), 2016 IEEE International Conferenceon; Ieee; 2016. pp. 93–101. [Google Scholar]
18.Nickerson P, Tighe P, Shickel B, Rashidi P. Deep neural network architectures for forecasting analgesic response. In: Conference proceedings:… Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference; NIH Public Access; 2016. p. 2966. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor ai: Predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference; 2016. pp. 301–318. [PMC free article] [PubMed] [Google Scholar]
20.Ma T, Xiao C, Wang F. Health-ATM: a deep architecture for multifaceted patient health record representation and risk prediction. In: Proceedings of the 2018 SIAM International Conference on Data Mining; SIAM; 2018. pp. 261–269. [Google Scholar]
21.Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. In: Advances in neural information processing systems. 2013. Distributed representations of words and phrases and their compositionality; pp. 3111–3119. [Google Scholar]
22.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013.
23.Choi Y, Chiu CYI, Sontag D. Learning low-dimensional representations of medical concepts. AMIA Summits on Translational Science Proceedings. 2016;2016:41. [PMC free article] [PubMed] [Google Scholar]
24.Farhan W, Wang Z, Huang Y, Wang S, Wang F, Jiang X. A predictive model for medical events based on contextual embedding of temporal sequences. JMIR medical informatics. 2016;4(4) doi: 10.2196/medinform.5977. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Howard J, Ruder S. Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2018. pp. 328–339. [Google Scholar]
26.Skinner M. In: SIGIR 2018 Workshop on eCommerce (ECOM 18); 2018. Product categorization with LSTMs and Balanced Pooling Views. [Google Scholar]

[r1-3055824] 1.Steiner C, Andrews R, Barrett M. A W. HCUP projections: mobility/orthopedic procedures 2003 to 2012. 2012. HCUP Projections Report. 2012;(03) [Google Scholar]

[r2-3055824] 2.Matthew S, Sheth NP. Projected volume of primary and revision total joint arthroplasty in the United States, 2030-2060. In: 2018 Annual Meeting of the American Academy of Orthopaedic Surgeons (AAOS). AAOS.2018. [Google Scholar]

[r3-3055824] 3.Spending on shoppable services in health care. HCCI; 2016 2016. Mar, In: Healthcare Cost Institute, Issue Brief 11. [Google Scholar]

[r4-3055824] 4.Haas DA, Rober S K. Variation in the cost of care for primary total knee arthroplasties. Arthroplasty Today. 2016;3(1):33–37. doi: 10.1016/j.artd.2016.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5-3055824] 5.Xiao C, Ma T, Dieng AB, Blei DM, Wang F. Readmission prediction via deep contextual embedding of clinical concepts. PloS one. 2018;13(4):e0195024. doi: 10.1371/journal.pone.0195024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6-3055824] 6.Choi E, Schuetz A, Stewart WF, Sun J. Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association. 2016;24(2):361–370. doi: 10.1093/jamia/ocw112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-3055824] 7.Cheng Y, Wang F, Zhang P, Hu J. Risk prediction with electronic health records: A deep learning approach. In: Proceedings of the 2016 SIAM International Conference on Data Mining; SIAM; 2016. pp. 432–440. [Google Scholar]

[r8-3055824] 8.Che Z, Cheng Y, Sun Z, Liu Y. Exploiting convolutional neural network for risk prediction with medical feature embedding. arXiv preprint arXiv:170107474. 2017.

[r9-3055824] 9.Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems. 2016. pp. 3504–3512.

[r10-3055824] 10.Che C, Xiao C, Liang J, Jin B, Zho J, Wang F. An RNN architecture with dynamic temporal matching for personalized predictions of parkinson’s disease. In: Proceedings of the 2017 SIAM International Conference on Data Mining; SIAM; 2017. pp. 198–206. [Google Scholar]

[r11-3055824] 11.Razavian N, Marcus J, Sontag D. Multi-task prediction of disease onsets from longitudinal lab tests. arXiv preprint arXiv:160800647. 2016.

[r12-3055824] 12.Razavian N, Sontag D. Temporal convolutional neural networks for diagnosis from lab tests. arXiv preprint arXiv:151107938. 2015.

[r13-3055824] 13.Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J. Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM; 2017. pp. 65–74. [Google Scholar]

[r14-3055824] 14.Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting; NIH Public Access; 2016. p. 473. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15-3055824] 15.Lipton ZC, Kale DC, Elkan C, Wetzel R. arXiv preprint arXiv:151103677. 2015. Learning to diagnose with LSTM recurrent neural networks. [Google Scholar]

[r16-3055824] 16.Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Scientific reports. 2018;8(1):6085. doi: 10.1038/s41598-018-24271-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17-3055824] 17.Esteban C, Staeck O, Baier S, Yang Y, Tresp V. Predicting clinical events by combining static and dynamic information using recurrent neural networks. In: Healthcare Informatics (ICHI), 2016 IEEE International Conferenceon; Ieee; 2016. pp. 93–101. [Google Scholar]

[r18-3055824] 18.Nickerson P, Tighe P, Shickel B, Rashidi P. Deep neural network architectures for forecasting analgesic response. In: Conference proceedings:… Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference; NIH Public Access; 2016. p. 2966. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19-3055824] 19.Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Doctor ai: Predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference; 2016. pp. 301–318. [PMC free article] [PubMed] [Google Scholar]

[r20-3055824] 20.Ma T, Xiao C, Wang F. Health-ATM: a deep architecture for multifaceted patient health record representation and risk prediction. In: Proceedings of the 2018 SIAM International Conference on Data Mining; SIAM; 2018. pp. 261–269. [Google Scholar]

[r21-3055824] 21.Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. In: Advances in neural information processing systems. 2013. Distributed representations of words and phrases and their compositionality; pp. 3111–3119. [Google Scholar]

[r22-3055824] 22.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013.

[r23-3055824] 23.Choi Y, Chiu CYI, Sontag D. Learning low-dimensional representations of medical concepts. AMIA Summits on Translational Science Proceedings. 2016;2016:41. [PMC free article] [PubMed] [Google Scholar]

[r24-3055824] 24.Farhan W, Wang Z, Huang Y, Wang S, Wang F, Jiang X. A predictive model for medical events based on contextual embedding of temporal sequences. JMIR medical informatics. 2016;4(4) doi: 10.2196/medinform.5977. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25-3055824] 25.Howard J, Ruder S. Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2018. pp. 328–339. [Google Scholar]

[r26-3055824] 26.Skinner M. In: SIGIR 2018 Workshop on eCommerce (ECOM 18); 2018. Product categorization with LSTMs and Balanced Pooling Views. [Google Scholar]

PERMALINK

Predictive Modeling of the Total Joint Replacement Surgery Risk: a Deep Learning Based Approach with Claims Data

Riyi Qiu, MS

Yugang Jia, PhD

Fei Wang, PhD

Pramod Divakarmurthy, PhD

Samuel Vinod, BS

Behlool Sabir, MS

Mirsad Hadzikadic, PhD

Abstract

Introduction and Background