Abstract
Objective:
Amyloid A (AA) amyloidosis is found in humans and non-human primates, but quantifying disease risk prior to clinical symptoms is challenging. We applied machine learning to identify the best predictors of amyloidosis in rhesus macaques from available clinical and pathology records. To explore potential biomarkers, we also assessed whether changes in circulating SAA or lipoprotein profiles accompany disease.
Methods:
We conducted a retrospective study using 86 cases and 163 controls matched for age and sex. We performed data reduction on 62 clinical, pathological, and demographic variables, and applied multivariate modeling and model selection with cross-validation. To test performance of our final model, we applied it to a replication cohort of 2,775 macaques.
Results:
The strongest predictors of disease were colitis, gastrointestinal adenocarcinoma, endometriosis, arthritis, trauma, diarrhea, and number of pregnancies. Sensitivity and specificity of the risk model were predicted to be 82%, and were assessed at 79% and 72%, respectively. Total, LDL, and HDL cholesterol levels were significantly lower, and SAA levels and triglyceride-to-HDL ratios were significantly higher in cases versus controls.
Conclusion:
Machine learning is a powerful approach to identifying macaques at risk of AA amyloidosis, which is accompanied by increased circulating SAA and altered lipoprotein profiles.
Keywords: AA amyloidosis, machine learning, Macaca mulatta, dyslipidemia, inflammation
Introduction
Reactive (secondary) or AA amyloidosis is characterized by the extracellular deposition of misfolded and insoluble amyloid A (AA) fibrils generated from the precursor serum amyloid A (SAA) protein [1]. Deposition of AA fibrils occurs in major organs, most frequently the kidney and gastrointestinal (GI) tract, and may result in significant morbidity and mortality caused by tissue and organ failure [2]. In humans, AA amyloidosis occurs secondary to chronic inflammatory diseases, primarily in rheumatoid arthritis and inflammatory bowel disease [3].
SAA is highly conserved among primate lineages, and many non-human primate (NHP) species are also susceptible to AA amyloidosis [4]. Indeed, systemic AA amyloidosis is a significant cause of morbidity and mortality in rhesus macaques (Macaca mulatta) housed at the Oregon National Primate Research Center (ONPRC) in Beaverton, Oregon. In NHPs, it is difficult to diagnose because its clinical presentation is highly variable, and many cases are diagnosed only at a post-mortem examination. AA amyloidosis occurs most frequently by deposition of amyloid protein in the small intestine, the spleen, the colon, and the liver, with clinical symptoms determined by the organs affected. Several other studies have described disorders that co-occur frequently with AA amyloidosis in NHPs, including enterocolitis, osteo- and rheumatoid arthritis, and retroperitoneal fibromatosis as a result of retroviral infection [5–8]. While no single disorder has been solely associated with the development of AA amyloidosis in NHPs, all associated disorders involve systemic, chronic inflammation. However, systematic studies of AA amyloidosis in NHPs are rare, its etiology is poorly understood, and there is no systematic way to identify animals at risk prior to disease manifestation.
In humans, SAA (primarily the SAA1 isoform) is upregulated in the liver during the acute phase immune response (APR), and is carried in the circulation primarily by the high-density lipoprotein (HDL3) particle. During APR, SAA may displace apolipoprotein A-I (Apo-A1) to become the primary apolipoprotein of HDL; concurrently, HDL cholesterol (HDL-C) levels are decreased, and triglyceride levels are increased [9]. The association of SAA with HDL has been implicated in both reducing anti-inflammatory effects, and increasing pro-inflammatory effects, of HDL [10–13]. Thus, HDL may play a critical role in the etiology of this disease.
The goals of this study were to identify risk factors that will allow us to predict AA amyloidosis in macaques prior to clinical symptoms, and to explore potential new biomarkers of disease that may be easily measured on a large scale. We utilized electronic health records (EHR) available at the ONPRC on >37,000 animals spanning more than 50 years, as a source of data on clinical symptoms, pathological diagnoses obtained at necropsy, lipoprotein profiles, and demographic information. EHR are a more recent addition to human healthcare, in which large amounts of data are stored that can be analyzed retrospectively [14–15]. In conjunction with the rise of EHR, computational scientists have developed powerful new approaches to mine these data, including machine or statistical learning. These approaches seek to develop the simplest and most informative algorithm that predicts an outcome, without requiring that data be generated by a known underlying process or statistical model, an assumption made in more traditional statistical approaches [16]. Our aims were, 1) to estimate the prevalence of AA amyloidosis in the ONPRC colony, 2) to apply a machine learning approach to EHR data in order to identify and quantify the best predictors of disease, and 3) to assess whether SAA levels and/or lipoprotein profiles differed between affected and unaffected animals.
Materials and methods
Study cohort and electronic health records (EHR) data
We performed a retrospective, case-control study on Indian-origin rhesus macaques at the ONPRC based on clinical and pathology data available in the Primate Records and Information Management (PRIMe) database, a LabKey-based electronic health records (EHR) system [17]. Indian-origin rhesus macaques that died from all causes between January 1, 2010 and January 1, 2015 (N=3,061) were included for consideration in the study, excluding macaques aged <1 year. Disease diagnoses were based on Systematized Nomenclature of Medicine (SNOMED) codes assigned at necropsy, and included D-38912 (‘Amyloidosis, Secondary Systemic’), M-55100 (‘Deposition, Amyloid’), and D-38900 (‘Amyloidosis, NOS’), confirmed post-mortem by gross and microscopic evaluation conducted by veterinary pathologists. We included AA amyloidosis diagnoses due to amyloid deposition in kidney, liver, spleen, and intestine, but excluded amyloidosis due to deposition in the islets of Langerhans, as this form of amyloidosis has an unrelated etiology [18–20].
All clinical and diagnostic data recorded in the EHR for each animal during the time period defined by this study were included in the initial modeling. EHR data included other pathology diagnoses confirmed at necropsy, clinical presentations, sex, species, number of infants born by either vaginal or Caesarean section, and age at diagnosis or clinical presentation. Disease diagnoses included instances where presence of disease was confirmed by histopathology, coded as a binary variable (i.e., with or without diagnosis). Clinical data included integer number of occurrences of clinical problems (e.g., episodes of diarrhea). We note that different clinical problems can occur multiple times in the same animal over a lifetime. For example, an animal may experience an episode of diarrhea at one year of age, a wound at three years and four years of age, and diarrhea again at five years of age. We removed extremely rare diagnoses and clinical problems for which there were fewer than 3 instances in the entire dataset. Based on the criteria outlined above, we returned 283 cases of systemic amyloidosis in our colony. Controls were defined as all animals not matching the case criteria that died during this time period (i.e., 2,778 of the initial 3,061 animals). Because some macaques at the ONPRC may be closely related, we ensured the independence of our data by removing animals related at the level of second cousins or closer, using kinship estimated according to an approach described by Lange [21], and implemented in the R-package kinship2 [22]. Following the removal of close relatives, we performed case-control matching for age and sex with two controls per case where possible. For 5 cases, we were unable to match 4 to any corresponding controls, while 1 animal matched with only a single control; however, we did retain these 5 cases in our analyses. Based on these criteria, we characterized a final analysis cohort of 249 monkeys comprising 86 cases and 163 controls. Differences in descriptive variables between cases and controls were tested using 2-sample t-tests, 2-sample tests for equality of proportions, or the Mann-Whitney U test as appropriate. All EHR data was obtained using the LabKey Python API, and custom Python scripts (v.2.7).
Machine learning approach
We applied a supervised machine learning approach to our analysis of AA amyloidosis in cases and controls. The initial set of 62 variables included age at necropsy, sex, number of pregnancies, all clinical problems, and all post-mortem diagnoses other than amyloidosis. We first split the data into ‘training’ and ‘test’ datasets, using 75% of the data for model training and setting aside 25% for evaluating the final model. Using the training dataset, we reduced the total number of initial variables by applying a univariate logistic regression model to each, and removing any variable returning a P-value >0.25 from further consideration. In order to develop a final model from this reduced set of variables, we incorporated 10-fold cross-validation in order to assess the error rate of the final model. To do this, we split the ‘training’ data into 10 equal parts, with 9 parts used for development of the final model, and 1 part used for validation of the final model. We then applied a multivariate logistic regression model to the reduced set of variables, using a backward, step-wise selection method and the Akaike information criterion (AIC) [23] for model selection. For each of the 10 resulting models, we calculated the model accuracy in predicting disease status in the corresponding validation part. This process was repeated until each of the 10 parts had been used once for validation. All 10 values were then averaged to assess the accuracy expected of the final model. Following this cross-validation procedure, we built the final multivariate logistic regression model using all of the ‘training’ data. Odds ratios and confidence intervals for each of the variables in our final model were calculated using standard methods.
Using the ‘test’ data set, we evaluated the sensitivity of the final model to assign cases and controls using the area under the receiver operating characteristic (ROC) curve (AUROC) [24] (Figure 2). We calculated 95% confidence intervals on the ROC curve for the range of true negative rates between 0 and 1.0 at 0.1 increments using 2,000 stratified bootstrap replicates, and calculated the standard deviation of the AUC as defined by DeLong et al. [25] using an algorithm by Sun and Xu [26]. To evaluate the optimal value on the curve that maximizes the true positive and true negative rates, we used a modified Youden J statistic and maximum distance from the identity line [27]. Because of the high clinical burden associated with amyloidosis in this colony and the relatively low cost of follow-up with a positive classification, when calculating the optimal value we chose to weight false negatives (i.e., failing to classify an animal with disease) 10-fold more heavily than false positives. We tested the significance of the AUC using a Z-test, and by permutation tests that randomized predictions over 10,000 replications [28]. All analyses were conducted in R (v.3.4.1) [29], using the pROC (1.10.0) [30], dplyr (0.7.4) [31], and RColorBrewer (1.1–2) [32] packages.
Assessing model performance in a replication cohort
To assess the performance of our risk model in a replication data set, we applied it to the cohort of 2,775 macaques that were removed initially from analysis due to kinship with the model-building cohort (37 animals were not considered in the replication analysis due to incomplete case/control matching). The replication cohort thus included 197 cases and 2,578 controls with disease status confirmed by the same criteria applied in the model-building cohort. Performance parameters were measured as described above, with the addition of the F1 score parameter calculated as 2*(precision*sensitivity)/(precision + sensitivity). The F1 confidence intervals were calculated using a basic bootstrap interval with 2,000 bootstrap replicates, to produce an equi-tailed two-sided nonparametric confidence interval [33–34]. The P-value for the F1 score was calculated using a permutation test with 10,000 permutations [35].
SAA levels and lipoprotein profiles
SAA levels were measured in 20 cases and 20 controls selected from the subset of the replication cohort that had plasma available, and which could be matched closely by age, sex, and weight. Circulating SAA was measured using a monkey-specific SAA1/SAA/serum amyloid A sandwich ELISA assay following manufacturer protocols, using plasma collected at the time of necropsy, diluted 1:10,000 (LifeSpan BioSciences, Inc., Seattle, WA, cat. #LS-F36154). Lipoprotein cholesterol levels measured closest to the necropsy date were obtained from EHR where available, excluding any animals fed an experimental diet; this subset of the overall sample included 9 cases and 40 controls not matched for age or sex. Lipoprotein data obtained from EHR included direct measures of total, HDL, and LDL cholesterol and triglycerides, using standard methods and an on-site Horiba Pentra 400 clinical chemistry analyzer. All data were adjusted for age and sex, and cases and controls were compared using either a paired t-test (SAA levels) or a one-sided Mann-Whitney U test with normal approximation and continuity correction (lipoprotein profiles).
Results
Prevalence of AA amyloidosis
The set of rhesus macaques >1 year of age that died between January 1, 2010 and January 1, 2015 comprised 3,061 monkeys, including 1,609 females and 1,452 males (Figure 1, Table 1). The number of animals necropsied for all causes per year during this time period ranged from 539 to 695, and the number of macaques among these with AA amyloidosis diagnosed at necropsy ranged from 8–10% (i.e., 53–65 animals per year). The mean prevalence of AA amyloidosis estimated from diagnosis at necropsy over this 5-year time period was 9.2%. In our study cohort, there were no significant differences between cases and controls by age, sex, average number of clinical problems, or average frequency of clinical problems. However, cases had significantly greater numbers of pregnancies and numbers of post-mortem diagnoses than did controls.
TABLE 1:
Variable | Cases | Controls | P-Value |
---|---|---|---|
N | 86 | 163 | - |
Mean age (years) | 14.61 ± 7.20 | 15.02 ± 6.30 | 0.65 |
Sex (% Female) | 68 (79.07%) | 130 (79.75%) | 1.00 |
Mean no. pregnancies | 5.06 | 2.38 | <0.001 |
Mean no. clinical problems | 2.29 ± 1.41 | 2.29 ± 1.62 | 0.42 |
Mean frequency, clinical problems | 4.51 ± 4.31 | 3.80 ± 3.57 | 0.12 |
Mean no. diagnoses | 2.83 ± 1.69 | 0.25 ± 0.64 | <0.001 |
Machine learning algorithm applied to the study cohort
The initial univariate modeling included 19 unique clinical problems, 40 unique post-mortem diagnoses, and 3 demographic variables among our study cohort of 249 macaques, totaling 62 potential covariates (see Supplemental Table 1). Eleven of these variables survived the threshold P-value of <0.25. These included 3 post-mortem diagnoses (colitis NOS, GI adenocarcinoma, and endometriosis), 7 clinical problems (episodes of trauma/wounding, GI - diarrhea, other GI abnormality, musculoskeletal abnormality including osteo- or reactive arthritis and bone fractures, preventative care, obstetrics and gynecological condition, and weight loss) and 1 demographic variable (number of pregnancies). These 11 variables were used in the cross-validation procedure to develop the final best multivariate logistic regression model, using model selection as determined by the AIC. This procedure returned an expected 13.22% ± 0.06 error rate in predicting disease status of animals. Model selection indicated that the most informative model required only 8 variables of the 11 that entered the final model-building analysis. This final model (Table 2) is expressed as:
where xCOL is a diagnosis of colitis, xADC is a diagnosis of GI adenocarcinoma, xEND is a diagnosis of endometriosis, xMSC is episodes of musculoskeletal abnormality, xTRM is number of trauma events, xPRG is number of pregnancies, xNCD is episodes of non-colitis diarrhea, xGIO is episodes of other GI abnormalities, and z is the probability of amyloidosis, found as:
TABLE 2:
Factors | Odds Ratio | 95% CI | P-Value | |
---|---|---|---|---|
Baseline (Intercept) | 0.05 | (0.02, 0.12) | <0.001 | |
Diagnoses | Colitis, NOS | 54.13 | (15.56, 241.81) | <0.001 |
GI adenocarcinoma | 41.72 | (8.47, 270.88) | <0.001 | |
Endometriosis | 26.45 | (1.83, 746.88) | 0.02 | |
Clinical problems | MS abnormalities | 9.40 | (2.95, 58.76) | <0.01 |
Trauma/wounding | 1.52 | (1.05, 2.27) | 0.03 | |
GI, diarrhea | 1.24 | (1.00, 1.63) | 0.08 | |
GI, other | 0.44 | (0.17, 0.87) | 0.049 | |
Demographic variables |
Pregnancies | 1.28 | (1.10, 1.51) | <0.01 |
Evaluation of this final model using the test data indicated good power for distinguishing affected from unaffected animals (Figure 2). The baseline odds ratio of 0.05 indicates that the expected odds of developing disease are very low when an animal lacks any indication of the other 8 variables. Odds-ratios for post-mortem diagnoses ranged from 26.45 for endometriosis to 54.13 for colitis. Odds-ratios for clinical problems were considerably lower, ranging from 0.44 for other GI abnormalities to 9.40 for musculoskeletal abnormalities. Our final model applied to the test data set had an AUC of 0.814 ± 0.120 (P< 0.01 by permutation, and P< 0.001 for Z=2.611) and a 17.03% error rate in classification of disease. The optimal threshold for our final model calculated from the ROC curve is 0.393, above which animals are classified as affected. At this optimal threshold value, our model is expected to have a true positive rate or sensitivity of 0.821, and a false positive rate of 0.176, indicating a specificity of 0.824.
Model evaluation in the replication cohort
As seen in TABLE 3, the replication cohort consisted of 2,775 macaques, of which 197 were positive for amyloid on autopsy, indicating a prevalence of 0.071 in this population. Applying our final model to this dataset produced 867 predictions of “disease” and 1,908 predictions of “no disease” with 155 true positives, 1,866 true negatives, 712 false positives, and 42 false negatives. This results in a sensitivity or true positive rate of 0.786, a specificity or true negative rate of 0.724, accuracy of 0.728, and a positive predictive value or precision of 0.179. All performance metrics were highly significant improvements over random classification (see Table 4).
TABLE 3:
No disease, predicted | Disease, predicted | Total | |
---|---|---|---|
No disease, actual | 1,866 = TN | 712 = FP | 2578 |
Disease, actual | 42 = FN | 155 = TP | 197 |
TOTAL | 1,908 | 867 | 2,775 |
TABLE 4.
Value | 95% Confidence Interval | P-Value | |
---|---|---|---|
Sensitivity | 0.786 | 0.723, 0.842 | << 0.0001 |
Specificity | 0.724 | 0.706, 0.741 | << 0.0001 |
Precision | 0.179 | 0.154, 0.206 | << 0.0001 |
Error Rate | 0.272 | 0.255, 0.289 | << 0.0001 |
Accuracy | 0.728 | 0.711, 0.745 | << 0.0001 |
F1 score | 0.291 | 0.256, 0.326 | << 0.0001 |
AUC | 0.755 | 0.725, 0.785 | << 0.0001 |
SAA levels and lipoprotein profiles in cases and controls
Cholesterol levels were significantly lower in cases than in controls, including total cholesterol (median 149 vs. 166 mg/dL; P=4.0 10−5), HDL cholesterol (median 61 vs. 75 mg/dL; P=2.5 ×10−4), and LDL cholesterol (median 59 vs. 66 mg/dL; P=3.1 ×10−4). Conversely, SAA levels (median 1,076.7 vs. 718.5 ng/mL; P=1.3×10−4), triglyceride levels (median 58 vs. 53 mg/dL; P=2.7 ×10−3), and triglyceride-to-HDL ratios were significantly greater in cases than in controls (0.96 vs. 0.74; P=2.3 ×10−3) (see Figure 3; some results not shown).
Discussion
AA amyloidosis continues to be a serious consequence of long-term inflammatory disease in both humans and NHPs, and is associated with substantial morbidity and mortality. We found that chronic colitis, GI adenocarcinoma, endometriosis, and osteo- or reactive arthritis, all characterized by chronic and systemic inflammation, are major risk factors for AA amyloidosis in the rhesus macaque. Minor risk factors included trauma, number of pregnancies, and diarrhea not associated with colitis. Many risk factors identified in this study have been associated previously with AA amyloidosis in NHPs, in particular enterocolitis, arthritis, and trauma [5–7, 36]. However, this study extends this body of knowledge by developing a model of AA amyloidosis risk, and demonstrating that it has sufficient power to correctly identify animals likely to develop disease, potentially before clinical manifestation.
All four major risk factors identified in this study are associated with chronic, systemic inflammation. Chronic colitis, the risk factor with the largest odds ratio, is characterized in macaques by diffuse inflammatory infiltrates of lymphocytes, plasma cells, and neutrophils, as well as mucosal proliferation and crypt abscesses in the cecum and colon. The second largest risk factor, GI adenocarcinoma is the most common malignant neoplasm in macaques. These tumors develop most often within the cecum and ascending colon, and frequently there is ulceration of the affected mucosa and chronic inflammation in the adjacent intestine. GI clinical problems other than chronic colitis or adenocarcinoma appear to increase susceptibility only weakly (e.g., ‘GI, diarrhea’), or are associated with reduced risk (‘GI, other’). While ‘GI, other’ includes an array of non-inflammatory or short-lived inflammatory clinical problems, the weak relationship of ‘GI, diarrhea’ with AA amyloidosis is consistent with evidence demonstrating that diarrhea is a symptom of more than one GI disorder, and is only partially correlated with the morphologic changes in the alimentary tract found in colitis [37]. In humans, AA amyloidosis is a serious complication of inflammatory bowel disease (IBD). It occurs significantly more often in Crohn’s disease (CD) than in ulcerative colitis (UC), although the reasons for this are unknown. Although rare overall (estimated at 0.9% among 1,709 CD patients), it contributes to nephrotic syndrome and renal failure, which were the primary causes of death for 10 of 13 CD patients who died during the study period reported by Greenstein et al. [38]. Chronic colitis in macaques bears some similarity to human IBD in that both are chronic inflammatory conditions that preferentially affect the large intestine, and both immune dysregulation and the gut microbiome appear to be important in the pathogeneses [39].
Another major risk factor for AA amyloidosis in macaques, reactive arthritis or chronic polyarthritis is an inflammatory, non-infectious arthritis that often follows enteric and urogenital infections [40–42]. Clinically, it is characterized by an acute onset of lameness and joint swelling one to two months following an episode of enteric disease. Stifles, elbows, tarsal and interphalangeal joints are most commonly affected, and there is marked joint effusion with mature neutrophils and fibrin, as well as synovitis and enthesitis. Also included in musculoskeletal abnormalities, osteoarthritis in NHPs encompasses a group of degenerative joint diseases of unknown cause characterized by progressive change in the articular cartilage and subchondral bone. While originally considered non-inflammatory, human osteoarthritis is now recognized to include a significant inflammatory component mediated by cytokines and chemokines that are produced by, or act upon, synoviocytes and chondrocytes, which induce matrix metalloproteinases and other proteinases involved in the degradation of cartilage [43].
Consistent with arthritis as a risk factor for amyloidosis in macaques, AA amyloidosis is an uncommon but serious side effect of rheumatoid arthritis (RA) in humans. Indeed, in 2 studies renal failure due to AA amyloidosis was the cause of death for ~9% of RA patients [44–45]. Similar to inflammatory arthritis in macaques, human RA is a bilaterally symmetric, chronic, and systemic inflammatory disease process with proliferative synovitis, joint pain, and swelling. Immune dysregulation is suspected in the pathogenesis of this disease in both humans and macaques. Dissimilarities between RA in humans and inflammatory arthritis in NHPs include a lower prevalence and lower frequency of rheumatoid factor involvement in NHPs[40].
The association of endometriosis and number of pregnancies with AA amyloidosis in macaques was unexpected, and to our knowledge has not been reported previously in human AA amyloidosis. Nevertheless, systemic inflammation is a known characteristic of both normal reproduction, and of reproductive disorders in women. Inflammation is increased during human pregnancy and persists following birth, as demonstrated by longitudinal measures of IL-6 in 37 women that increased starting at week 7 in pregnancy, and remained elevated at 10 weeks post-partum [46]. Systemic inflammation also occurs during endometriosis [47], a disease of women and menstruating Old World primates characterized by the presence of ectopic endometrial tissue that undergoes regular cyclical changes under the influence of estrogen and progesterone, similar to normal endometrium. Interestingly, although endometriosis and pregnancy are specific to females, these risk factors were retained in the final model, while sex as a variable was not. This result suggests that, outside of these female-specific clinical findings, sex does not otherwise influence susceptibility to this disease.
Similarities between macaques and humans with AA amyloidosis extend to pathologies that accompany this disease. We found that macaques with amyloidosis had higher circulating SAA, triglycerides, and triglyceride-to-HDL ratios, as well as lower total, HDL, and LDL cholesterol levels, as compared with healthy controls. These results are consistent with reports by others of the dysregulation of SAA function and lipoprotein metabolism in chronic inflammation. During the acute phase response, SAA is greatly upregulated by the liver, and travels bound primarily to HDL3. With chronic inflammation, HDL becomes increasingly enriched with SAA, which may displace Apo-A1 as the primary apolipoprotein of HDL [48], and HDL is remodeled to carry proportionately greater triglycerides and less cholesterol. These shifts in HDL composition are associated with reduced anti-oxidant and anti-inflammatory capacity, as well as diminished cholesterol efflux capacity, of the HDL particle. The lipoprotein profiles described here are well-known to accompany poorly controlled inflammatory disease in humans [49], including IBD and RA [50–54], and are linked to poorer outcomes in both diseases. For example, IBD patients in the lowest quartile for total cholesterol and the highest quartile for triglycerides were significantly more likely to have severe disease, as indicated by disease-related hospitalizations and surgery [50]. While our study replicates other reports that describe the association of persistently elevated levels of SAA with AA amyloidosis [55], our finding that dysregulation of cholesterol homeostasis is also robustly associated with this disease is more novel, and suggests the need for further research in this area.
There are several limitations to our study. The first limitation is that the model we describe was based in part on diagnoses obtained after necropsy. Future research may improve upon our disease risk model by repeating this study using only clinical variables and biomarkers implicated in this study. A second limitation of this study is the secondary nature of our analysis of lipoprotein cholesterol profiles, as it was limited to data collected for clinical reasons unrelated to this study. Discrepancies between the intended and secondary use of data have led to similar difficulties when data are incomplete, inaccurate, or inconsistent in analogous human EHR [56]. Finally, while our model of disease risk has good sensitivity and specificity, its estimated precision leaves room for improvement. However, this result is partly due to the low prevalence of the disease in our replication cohort. We note that, even at the current level of precision, the model is still ~2.5-fold more powerful than disease classification at random. In general, machine learning can produce predictive models that are more accurate than traditional statistical models because these methods focus on correct prediction, rather than statistical significance, and therefore can make use of variables that would not individually meet statistical criteria but together improve the predictions in a substantial manner. In summary, this study has produced a powerful and data-driven clinical support tool that may be used to mitigate development of AA amyloidosis in this and other NHP species. Moreover, we describe robust biomarkers of AA amyloidosis in macaques that can be implemented in clinic at reasonable cost, and used to improve our model in future research.
Supplementary Material
Acknowledgements
This work was supported by the NIH NLM award T15LM007088, and by the NIH OD award P51 OD011092 for operation of the Oregon National Primate Research Center, Oregon Health & Science University, Portland, Oregon. We gratefully acknowledge all of the dedicated veterinary and animal care staff at the ONPRC facility that make this study possible.
Abbreviations:
- AA
serum amyloid A
- AIC
Akaike information criterion
- Apo-A1
apolipoprotein A-I
- APR
acute phase response
- CD
Crohn’s diseases
- EHR
electronic health records
- GI
gastrointestinal
- HDL
high density lipoprotein
- HDL-C
HDL cholesterol
- IBD
inflammatory bowel disease
- LDL
low density lipoprotein
- NHP
non-human primate
- ONPRC
Oregon National Primate Research Center
- RA
rheumatoid arthritis
- ROC
receiver operating characteristic
Footnotes
Disclosure statement
The authors declare no conflicts of interest.
References
- [1].Westermark GT, Fandrich M, Westermark P. AA Amyloidosis: pathogenesis and targeted therapy. Annu Rev Pathol Mech Dis. 2015;10:321–344. [DOI] [PubMed] [Google Scholar]
- [2].Picken MM. Modern approaches to the treatment of amyloidosis: The critical importance of early detection in surgical pathology. Adv Anat Pathol. 2013;20:424–439. [DOI] [PubMed] [Google Scholar]
- [3].Ebert EC, Nagar M. Gastrointestinal manifestations of amyloidosis. Am J Gastroenterol. 2008;103:776–787. [DOI] [PubMed] [Google Scholar]
- [4].Blanchard JL. Generalized amyloidosis, nonhuman primates In: Jones TC, Mohr U, Hunt RD, editors. Nonhuman Primates I. Berlin, Heidelberg: Springer Berlin Heidelberg; 1993. p. 194–197. [Google Scholar]
- [5].Blanchard JL, Baskin GB, Watson EA. Generalized amyloidosis in rhesus monkeys. Vet Pathol. 1986;23:425–430. [DOI] [PubMed] [Google Scholar]
- [6].Chapman W Jr, Crowell W. Amyloidosis in rhesus monkeys with rheumatoid arthritis and enterocolitis. J Am Vet Med Assoc. 1977;171:855–858. [PubMed] [Google Scholar]
- [7].Ellsworth L, Farley S, DiGiacomo RF, et al. Factors associated with intestinal amyloidosis in pigtailed macaques (Macaca nemestrina). Lab Anim Sci. 1992;42:352–355. [PubMed] [Google Scholar]
- [8].Slattum M, Tsai C, DiGiacomo R, et al. Amyloidosis in pigtailed macaques (Macaca nemestrina): pathologic aspects. Lab Anim Sci. 1989;39:567–570. [PubMed] [Google Scholar]
- [9].Kontush A, Chapman MJ. High-density lipoproteins: structure, metabolism, function and therapeutics. Hoboken (NJ): John Wiley & Sons, Inc.; 2012. [Google Scholar]
- [10].Han CY, Tang C, Guevara ME, et al. Serum amyloid A impairs the antiinflammatory properties of HDL. J Clin Invest. 2015;126:266–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Tölle M, Huang T, Schuchardt M, et al. High-density lipoprotein loses its anti-inflammatory capacity by accumulation of pro-inflammatory-serum amyloid A. Cardiovasc Res. 2012;94:154–162. [DOI] [PubMed] [Google Scholar]
- [12].Khovidhunkit W, Kim M-S, Memon RA, et al. Effects of infection and inflammation on lipid and lipoprotein metabolism mechanisms and consequences to the host. J Lipid Res. 2004;45:1169–1196. [DOI] [PubMed] [Google Scholar]
- [13].Esteve E, Ricart W, Fernández-Real JM. Dyslipidemia and inflammation: an evolutionary conserved mechanism. Clin Nutr. 2005;24:16–31. [DOI] [PubMed] [Google Scholar]
- [14].Hebert C, Du H, Peterson LR, et al. Electronic health record–based detection of risk factors for Clostridium difficile infection relapse. Infect Control Hosp Epidemiol. 2013;34:407–414. [DOI] [PubMed] [Google Scholar]
- [15].Sun J, Hu J, Luo D, et al. Combining knowledge and data driven insights for identifying risk factors using electronic health records. AMIA Annu Symp Proc. 2012;2012:901–910. [PMC free article] [PubMed] [Google Scholar]
- [16].Breiman L Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat Sci 2001;16:199–231. [Google Scholar]
- [17].Nelson EK, Piehler B, Eckels J, et al. LabKey Server: An open source platform for scientific data integration, analysis and collaboration. BMC Bioinformatics. 2011;12:71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Westermark P The pancreatic islets in systemic amyloidosis: The occurrence of two different types of amyloid. Virchows Arch A Pathol Anat Histol. 1974;363:179–182. [DOI] [PubMed] [Google Scholar]
- [19].Arslan HH, Nisbet C, Guvenc T. Spontaneous amyloidosis and diabetes mellitus in a rhesus macaque (Macaca mulatta). Bull Vet Inst Pulawy. 2007;51:655. [Google Scholar]
- [20].Howard CF, Fang T-Y, Southwick C, et al. Islet cell antibodies in Sulawesi macaques. Am J Primatol. 1999;47:223–229. [DOI] [PubMed] [Google Scholar]
- [21].Lange K. Mathematical and statistical methods for genetic analysis. 1st ed New York, NY: Springer New York; 1997. (Statistics for Biology and Health book series) [Google Scholar]
- [22].Sinnwell JP, Therneau TM, Schaid DJ. The kinship2 R package for pedigree data. Hum Hered. 2014;78:91–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Akaike H A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19:716–723. [Google Scholar]
- [24].Fawcett T An introduction to ROC analysis. Pattern Recognit Lett. 2006;27:861–874. [Google Scholar]
- [25].DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845. [PubMed] [Google Scholar]
- [26].Sun X, Xu W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process Lett. 2014;21:1389–1393. [Google Scholar]
- [27].Perkins NJ, Schisterman EF. The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol. 2006;163:670–675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Mason SJ, Graham NE. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q J Roy Meteor Soc. 2002;128:2145–2166. [Google Scholar]
- [29].R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. [Google Scholar]
- [30].Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Wickham H, Francois R, Henry L, et al. dplyr: A grammar of data manipulation. 2017. R package version 0.7.6.
- [32].Neuwirth E RColorBrewer: ColorBrewer Palettes. 2014. R package version 1.1–2.
- [33].Angelo C, Ripley B. boot: Bootstrap R (S-Plus) Functions. 2017. R package version 1.3–20.
- [34].Davison AC, Hinkley DV. Bootstrap methods and their applications. Cambridge (UK): Cambridge University Press; 1997. [Google Scholar]
- [35].Yeh A More accurate tests for the statistical significance of result differences. Proceedings of the 18th Conference on Computational Linguistics. 2000;2:947–953. [Google Scholar]
- [36].Naumenko ES, Krylova RI. Amyloidosis in macaques in Adler Primatological Center. Bull Exp Biol Med. 2003;136:80–83. [DOI] [PubMed] [Google Scholar]
- [37].Holmberg CA, Leininger R, Wheeldon E, et al. clinicopathological studies of gastrointestinal disease in macaques. Vet Pathol. 1982;19:163–170. [PubMed] [Google Scholar]
- [38].Greenstein AJ, Sachar DB, Panday AKN et al. Amyloidosis and inflammatory bowel disease: a 50-year experience with 25 patients. Medicine. 1992;71:261–270. [DOI] [PubMed] [Google Scholar]
- [39].de Lange KM, Barrett JC. Understanding inflammatory bowel disease via immunogenetics. J Autoimmun. 2015;64:91–100. [DOI] [PubMed] [Google Scholar]
- [40].Pritzker KPH, Kessler MJ. Arthritis, muscle, adipose tissue, and bone diseases of nonhuman primates In: Abee CR, Mansfield K, Tardif S, Morris T, editors. Nonhuman Primates in Biomedical Research (Second Edition). Academic Press; 2012. p. 629–697. [Google Scholar]
- [41].Simmons HA. Age-associated pathology in rhesus macaques (Macaca mulatta). Vet Pathol. 2016;53:399–416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Urvater JM, McAdam SN, Loehrke JH et al. A high incidence of Shigella-induced arthritis in a primate species: major histocompatibility complex class I molecules associated with resistance and susceptibility, and their relationship to HLA-B27. Immunogenetics. 2000;51:314–325. [DOI] [PubMed] [Google Scholar]
- [43].Goldring MB, Otero M. Inflammation in osteoarthritis. Curr Opin Rheumatol. 2011;23:471–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Mutru O, Laakso M, Isomäki H, et al. Ten year mortality and causes of death in patients with rheumatoid arthritis. Br Med J (Clin Res Ed). 1985;290:1797–1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Koivuniemi R, Paimela L, Suomalainen R, et al. Causes of death in patients with rheumatoid arthritis autopsied during a 40-year period. Rheumatol Int. 2008;28:1245–1252. [DOI] [PubMed] [Google Scholar]
- [46].Palm M, Axelsson O, Wernroth L et al. Involvement of inflammation in normal pregnancy. Acta Obstet Gynecol Scand. 2013;92:601–605. [DOI] [PubMed] [Google Scholar]
- [47].Symons LK, Miller JE, Kay VR, et al. The immunopathophysiology of endometriosis. Trends Mol Med. 2018;24:748–762. [DOI] [PubMed] [Google Scholar]
- [48].Coetzee GA, Strachan AF, van der Westhuyzen DR et al. Serum amyloid A-containing human high density lipoprotein 3. J Biol Chem. 1986;61:9644–9651. [PubMed] [Google Scholar]
- [49].Kerekes G, Nurmohamed MT, Gonzalez-Gay MA et al. Rheumatoid arthritis and metabolic syndrome. Nat Rev Rheumatol. 2014;10:691–696. [DOI] [PubMed] [Google Scholar]
- [50].Koutroumpakis E, Ramos-Rivers C, Regueiro M et al. Association between long-term lipid profiles and disease severity in a large cohort of patients with inflammatory bowel disease. Dig Dis Sci. 2016;61:865–871. [DOI] [PubMed] [Google Scholar]
- [51].Sappati Biyyani RSR, Putka BS, Mullen KD. Dyslipidemia and lipoprotein profiles in patients with inflammatory bowel disease. J Clin Lipidol. 2010;4:478–482. [DOI] [PubMed] [Google Scholar]
- [52].Ripolles Piquer B, Nazih H, Bourreille A et al. Altered lipid, apolipoprotein, and lipoprotein profiles in inflammatory bowel disease: consequences on the cholesterol efflux capacity of serum using Fu5AH cell system. Metab Clin Exp. 2006;55:980–988. [DOI] [PubMed] [Google Scholar]
- [53].Myasoedova E, Crowson CS, Kremers HM et al. Lipid paradox in rheumatoid arthritis: the impact of serum lipid measures and systemic inflammation on the risk of cardiovascular disease. Ann Rheum Dis. 2011;70:482–487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Bag-Ozbek A, Giles JT. Inflammation, adiposity, and atherogenic dyslipidemia in rheumatoid arthritis: is there a paradoxical relationship? Curr Allergy Asthma Rep. 2015;15:497. [DOI] [PubMed] [Google Scholar]
- [55].Rice KA, Chen ES, Metcalf Pate KA et al. Diagnosis of amyloidosis and differentiation from chronic, idiopathic entercolitis in rhesus (Macaca mulatta) and pig-tailed (M. nemestrina) macaques. Comp Med. 2013;63:262–271. [PMC free article] [PubMed] [Google Scholar]
- [56].Botsis T, Hartvigsen G, Chen F, et al. Secondary use of EHR: data quality issues and informatics opportunities. Summit on Translat Bioinforma. 2010;2010:1–5. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.