Skip to main content
Cell Reports Medicine logoLink to Cell Reports Medicine
. 2023 Dec 19;4(12):101328. doi: 10.1016/j.xcrm.2023.101328

Pan-viral serology uncovers distinct virome patterns as risk predictors of hepatocellular carcinoma and intrahepatic cholangiocarcinoma

Whitney L Do 1,11, Limin Wang 1,11, Marshonna Forgues 1, Jinping Liu 1, Siritida Rabibhadana 2, Benjarath Pupacdi 2, Yongmei Zhao 3, Heelah Gholian 1, Vajarabhongsa Bhudhisawasdi 4, Chawalit Pairojkul 4, Wattana Sukeepaisarnjaroen 4, Ake Pugkhem 4, Vor Luvira 4, Nirush Lertprasertsuke 5, Anon Chotirosniramit 5, Chirayu U Auewarakul 6, Teerapat Ungtrakul 6, Thaniya Sricharunrat 6, Suleeporn Sangrajrang 7, Kannika Phornphutkul 8, Anuradha Budhu 1,9, Curtis C Harris 1, Chulabhorn Mahidol 2, Mathuros Ruchirawat 1,10,, Xin Wei Wang 1,9,12,∗∗
PMCID: PMC10772458  PMID: 38118412

Summary

This study evaluates the pan-serological profiles of hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (iCCA) compared to several diseased and non-diseased control populations to identify risk factors and biomarkers of liver cancer. We used phage immunoprecipitation sequencing, an anti-viral antibody screening method using a synthetic-phage-displayed human virome epitope library, to screen patient serum samples for exposure to over 1,280 strains of pathogenic and non-pathogenic viruses. Using machine learning methods to develop an HCC or iCCA viral score, we discovered that both viral scores were positively associated with several liver function markers in two separate at-risk populations independent of viral hepatitis status. The HCC score predicted all-cause mortality over 8 years in patients with chronic liver disease at risk of HCC, while the viral hepatitis status was not predictive of survival. These results suggest that non-hepatitis viral infections may contribute to HCC and iCCA development and could be biomarkers in at-risk populations.

Keywords: liver cancer, viruses, phage immunoprecipitation sequencing, phip-seq, hepatocellular carcinoma, cholangiocarcinoma, viral history, serology, HCC, iCCA

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Pan-viral serological profiling can distinguish liver cancer from healthy individuals

  • Exposure to hepatitis and non-hepatitis viruses associated with liver cancer

  • HCC viral score can determine clinical risk and mortality in at-risk population


Liver cancer is a leading cause of cancer-related mortality due in part to poor screening and diagnostic tools. Do et al. found that comprehensively profiling viral exposure history, beyond hepatitis B and C strains, may reveal unique biomarkers for populations at risk of liver cancer development.

Introduction

Liver cancer is the seventh leading cause of cancer incidence and the third leading cause of cancer-related mortality.1 In the US, it was estimated that 71.2% of cases of liver cancer were attributable to preventable risk factors.2 Several epidemiological factors have been found to associate with liver cancer onset, though these factors differ between the two primary liver cancer subtypes, hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (iCCA). Risk factors for HCC include infections from hepatitis C virus (HCV) and HBV, alcohol addiction, metabolic disease, and non-alcoholic fatty liver disease.3 These risk factors serve as the basis for screening populations for HCC development, with recommendations that individuals with cirrhosis or a history of HBV be regularly monitored. Nevertheless, it is unclear whether these screening procedures have ultimately succeeded in reducing liver cancer mortality,4,5,6 primarily due to a lack of participation of at-risk populations, with less than 20% of eligible individuals participating in screening and even lower rates in non-White people and individuals of low socioeconomic status.7 The risk profile for iCCA includes the same factors as those listed for HCC. However, the degree of risk between these two cancer types tends to differ, with more heterogeneity in patients with iCCA. In contrast to HCC, the etiological factors for iCCA remain poorly defined, with nearly 50% of affected individuals diagnosed without any known risk factor.8 Some risk factors for iCCA include HCC risk factors as well as several liver diseases (e.g., bile duct cysts, Caroli’s disease, cholangitis, gastric ulcers, hemochromatosis), chemical exposures, smoking, and liver fluke infection.8 For iCCA, given the high degree of heterogeneity, there are no current screening recommendations. Identifying novel risk factors for HCC and iCCA could allow for more targeted screening and improved detection of HCC and iCCA.

While HCV and HBV are well-known causal factors in liver cancer, no studies have comprehensively examined whether other viruses may be predictive of liver cancer development. Viruses play a role in host biology through immunomodulation during acute and chronic infection.9 There is increasing evidence of pathogenic and non-pathogenic acute viral exposures leaving a molecular footprint on host machinery long after viral clearance, with sustained effects on the immune system.10,11,12 Examining viral serology has traditionally been hindered due to the technical difficulty of testing individual viruses through targeted laboratory assays. To overcome this limitation, phage immunoprecipitation sequencing (PhIP-Seq) was developed as a viral screening method using a synthetic phage-displayed virome library to screen patient serum samples for exposure to over 1,280 strains of pathogenic and non-pathogenic viruses13 and has been examined in several disease contexts.14,15,16,17

We have recently found that viral exposures may be early biomarkers of HCC in several populations,18 yet it is unknown whether pathogenic or non-pathogenic viral features are associated with iCCA. Additionally, these associations have not been evaluated with consideration to multiple confounding factors and in terms of the potential utility in targeting at-risk populations. This study evaluates the pan-serological profiles of HCC and iCCA compared to several diseased and non-diseased control populations in a Thai population-based case-control study to identify risk factors and biomarkers of liver cancer (Figure 1A).

Figure 1.

Figure 1

Disease status is the largest predictor of virome

(A) Description of the study design. Created with BioRender.com.

(B) Number of enriched strains and proportion enriched by group among the top six viral families represented in the cohort. Differential enrichment tested using analysis of variance between enriched strains and proportion enriched by group with adjustment for multiple comparisons using Tukey’s all-pair comparisons.

(C) Partition of variation within virome examining lifestyle and biological factors calculated using permutational multivariate analysis of variance and reported by predictor category. Variables had an Adonis R2 FDR <0.05.

(D) Average number of enriched species per participant by group. Differential enrichment examined using analysis of variance.

(E) Prevalence of the top 10 enriched viral species by disease group.

See also Figure S2 and Table S1.

Results

Disease status is the largest predictor of virome

Participants were derived from the Thailand Initiative in Genomics and Expression Research for Liver Cancer (TIGER-LC) study. This study enrolled patients diagnosed with HCC (n = 663), iCCA (n = 1,115), and chronic liver disease (CLD; n = 199), patients infected by liver fluke (Opisthorchis viverrini [OV], n = 271), and healthy population control (PC) participants (n = 686; Table S1), sequentially recruited in 5 regions that cover 85% of Thailand. Patients with CLD and OV were defined as the populations at risk of HCC and iCCA, respectively, and served as comparators for each cancer type. As expected, liver cancer risk factors were all significantly different between the study populations, with HCC participants significantly more likely to be men, live in central Thailand, be current or former smokers, consume higher amounts of alcohol, have low BMI, and be HCV antibody (anti-HCV) and hepatitis B surface Ag (HBsAg) positive compared to control populations. Participants with iCCA were more likely to live in northeast Thailand, work as farmers, be current or former smokers, have low BMI, be of older age, and be HBsAg positive compared to control populations.

We assessed the performance of phip-seq compared to internal replicates and clinical measures. After read alignment, there were on average 1,463,790 mapped reads per sample (Figure S1). The average concordance across replicate plates for individual samples was 82%. We examined how well phip-seq compared with clinical diagnostic tests for HBV and HCV including HBsAg and anti-HCV, respectively. The overall accuracies were 0.69 and 0.70 for HBsAg and anti-HCV, respectively. The sensitivity was low for HBsAg detection using PhIP-Seq (0.43); however, the specificity was high (0.75). Both sensitivity and specificity were high for anti-HCV (0.92 and 0.68, respectively), consistent with previously published data.18 The poor sensitivity of HBsAg could be because clinical diagnostic tests measure the HBV antigen, whereas PhIP-Seq is measuring serum antibody levels. Additionally, positive HBsAg is indicative of an individual who is infectious acutely or chronically, though past infection and recovery may yield a negative result. As another validation, we compared the PhIP-Seq peptide reactivity to computationally predicted epitope reactivity from BepiPred 2.0.19 We show peptide reactivity of HCV 6α (the most enriched HCV strain within this study) aligned with predicted B cell epitope reactivity from BepiPred 2.0 (Figure S2). We find the most reactive peptides from PhIP-Seq aligned with the predicted amino acid reactivity from BepiPred 2.0.

We found that disease status was a significant predictor of the overall virome. We initially examined the differences in the population abundance and number of enriched strains by viral family. 95% enriched strains were members of six viral families (Figure 1B). HCC had a larger proportion of enriched strains in Flaviviridae compared to all groups and a larger proportion of strains in Orthomyxoviridae compared to PC participants. In contrast, PC participants had a larger proportion of enriched strains in Picornaviridae compared to HCC. iCCA differed from CLD with a smaller proportion of enriched strains from Orthomyxoviridae, Picornaviridae, and Flaviviridae. Disease group also accounted for the largest percentage of variance in the virome among the environmental factors evaluated using permutational multivariate analysis of variance tests. However, subsequent factors that remained significant predictors included region, alcohol, age, job, smoking status, and gender (Adonis R2 false discovery rate [FDR] < 0.05; Figure 1C). At the species level, we found no difference in the median enrichment across groups, with an average of 25 species enriched (Figure 1D). Additionally, there were few differences by group in the top enriched species, which includes several strains of herpesvirus, respiratory syncytial virus (RSV), adenovirus, rhinovirus, and enterovirus (Figure 1E). Overall, these data align well with epidemiological seroprevalence data in Thai20,21,22 and other populations.23,24,25

Viral signatures can distinguish patients with liver cancer from control populations

We next aimed to identify specific viral features more highly enriched in patients with HCC compared to PC participants. To identify viral features associated with HCC, we used the XGBoost algorithm to discriminate patients with HCC from PC participants. After cross-validation and dimension reduction of the 1,280 strains, 53 viral features remained important features in 80% models tested 100 times. These features were examined with XGBoost in the training set of patients with HCC and PC participants, with 46 identified as important viral features (Figure 2A; Table S2). In the test set, the model could discriminate patients with HCC vs. PC test populations and patients with CLD with areas under the curve (AUCs) of 0.77 and 0.60, respectively (Figure S3A). The model had lower AUCs between patients with HCC and CLD compared to patients with HCC vs. PC participants, potentially since CLD may be a transient state between PC and HCC. The phylogenetic similarity between strains in the model clustered among those enriched in participants with HCC and those enriched in PC participants (Figure 2B). Among the 46 strains, ten strains were HCV variants, three were HBV variants, and two were hepatitis E virus (HEV) variants (Figure 2C). This was to be expected with nearly 21% and 51% of the patients with HCC HCV and HBV positive, respectively. We examined whether eliminating hepatitis strains (including HCV, HBV, and HEV26) would change the predictive ability of features. Models were not statistically different when compared with previous models including hepatitis strains (Delong and bootstrap permutation test comparing AUC p > 0.05; Figure S3B).

Figure 2.

Figure 2

HCC viral signatures can distinguish patients with HCC from control populations

(A) Receiver operating characteristic (ROC) curve representing performance of XGBoost model in the training and test sets comparing HCC to PC populations. p value was calculated using methods from Mason and Graham.27

(B) Phylogenetic tree diagram of identified viral features from HCC model annotated by the group they are more highly enriched in.

(C) HCC model viral features represented by prevalence within groups on the bottom x axis. The top x axis is shown by a bar plot of the XGBoost model gain of the viral strains in the HCC XGBoost model.

See also Figure S3 and Table S2.

We completed a similar analysis of iCCA versus PC with 54 viral features tested in the training set and 52 identified as important features (Figure 3A; Table S3). In the test AUC, the model could discriminate iCCA from all control and case populations with AUCs between 0.60 and 0.70 (Figure S3C). Like the HCC model, the iCCA model performed better comparing iCCA vs. PC than the other comparisons. As previously, the strains more enriched in patients with iCCA tended to be more genetically similar (Figure 3B). Notably, HBV and HCV variants were not identified as relevant predictors of iCCA from the XGBoost model. Twenty viral strains were identified in both models of HCC and iCCA, with 13 concordant in the direction of effect. Unlike HCC, the iCCA model was largely dominated by features that are depleted in patients with iCCA and enriched in PC participants, suggesting that non-pathological features may be playing a role in these populations (Figure 3C).

Figure 3.

Figure 3

iCCA viral signatures can distinguish patients with iCCA from control populations

(A) ROC curve representing performance of XGBoost model in the training and test sets comparing iCCA to PC populations. p value was calculated using methods from Mason and Graham.27

(B) Phylogenetic tree diagram of identified viral features from iCCA model annotated by the group they are more highly enriched in.

(C) iCCA model viral features represented by prevalence within groups on the bottom x axis. The top x axis is shown by a bar plot of the XGBoost model gain of the viral strains in the iCCA XGBoost model.

See also Figure S3 and Table S3.

Viral antibodies significantly differ between case and non-case groups

As we identified several biological and lifestyle confounding factors that accounted for a significant percentage of the variance in the virome, we next examined whether the viral features identified in XGBoost models differed statistically between liver cancer case groups (HCC and iCCA) and the at-risk and PC populations after adjusting for several relevant biological and lifestyle confounders (age, gender, alcohol intake, region, job category, and smoking). HCC was compared with PC and CLD and iCCA was compared with PC and OV. Thirty and two viral strains were significantly different between HCC and PC and CLD, respectively (Figure 4A; Table S4; FDR q < 0.05), adjusting for biological and lifestyle confounders. In iCCA models, nine and five viral strains were significantly different between iCCA and PC or OV participants, respectively (Figure 4B; Table S5; FDR q < 0.05), adjusting for biological and lifestyle confounders. Rhinovirus, norovirus, and influenza A virus strains were more highly enriched in PC compared to both cancer types, hinting at a potentially protective role for these viruses on tumorigenesis.

Figure 4.

Figure 4

Viral antibodies significantly differ between case and non-case groups

(A and B) HCC (A) and iCCA (B) model features that significantly differed between cancer and PC, respectively. Logistic regression models tested differential enrichment of individual model viral features between HCC and iCCA and control populations, respectively, adjusting for biological and lifestyle confounding factors. Significance defined by FDR <0.05. The reference group is represented in the heading.

(C and D) Distribution of HCC (C) and iCCA (D) viral scores by group, with differences tested using analysis of variance in top two panels. Logistic regression models individually tested HCC (C) and iCCA (D) viral scores between control populations after adjusting for biological and lifestyle confounders in bottom two panels.

See also Tables S4, S5, S6, and S7.

Using the 46 and 51 features identified in the HCC and iCCA XGBoost models, respectively, we developed individual HCC and iCCA viral scores using the pooled Shapley additive explanations (SHAP) values28 of important features from the XGBoost models. Both viral scores were significantly able to discriminate between all groups (Figures 4C and 4D). Compared to PC and CLD, every 10% point increase in the HCC viral score (ranging from 0 to 1) was associated 5.8× and 77% higher odds of HCC, respectively, adjusting for biological and lifestyle confounders (PC odds ratio [OR]: 5.83, 95% confidence interval [CI]: 3.96, 9.36; CLD OR: 1.77, 95% CI: 1.33, 2.41; Table S6). The iCCA viral score was associated with 4.8× and 2.1× higher odds of iCCA compared to PC and OV, respectively, adjusting for biological and lifestyle confounders (PC OR: 4.82, 95% CI: 4.02, 5.84; OV OR: 2.11, 95% CI: 1.78, 2.53; Table S7).

HCC viral score associated with liver function and survival in CLD

We wanted to further explore the at-risk (CLD and OV) populations to tease out whether the viral scores could provide some utility as a biomarker of cancer in an at-risk population. We examined whether the HCC or iCCA viral score was associated with liver function in the CLD or OV population, respectively. Using several liver function metrics, we used linear regression to model the association between the HCC/iCCA viral score and continuous liver function values. In the CLD population, the HCC viral score was positively associated with alpha-feto protein (AFP; p = 0.0003; Figure 5A), albumin-bilirubin (ALBI) score (p = 0.02; Figure 5B),29 fibrosis-4 (FIB-4) score (p = 0.04; Figure 5C), and aspartate transaminase (AST; p = 0.0004; Figure 5D), adjusting for age, gender, smoking, alcohol intake, and HBsAg and anti-HCV status. We found a significant interaction between HCV status and the HCC viral score as associated with alanine aminotransaminase (ALT). In HCV-positive individuals, the HCC viral score more positively associated with ALT compared to HCV-negative participants (p interaction = 0.03; Figure 5E). Every 10% point increase in the HCC viral score was associated with an 18.1 ng/mL (SE = 4.4) increase in AFP, a 0.1 unit (SE = 0.3) increase in the ALBI ratio, a 0.28 unit (SE = 0.3) increase in the FIB-4 score, and a 5.5 U/L (SE = 4.2) increase in AST. In HCV-positive individuals, every 10% point increase in the viral score associated with a 9.2 U/L (SE = 4.9) increase in ALT (Table S8). The iCCA viral score was positively associated with CA19-9 in the OV population, adjusting for age, gender, smoking, and HBsAg and anti-HCV status (p = 0.03; Figure 5F). Every 10% point increase in the iCCA viral score was associated with a 0.12 unit (SE = 0.06) increase in CA19-9 in OV participants (Table S9).

Figure 5.

Figure 5

HCC viral score associated with liver function and survival in CLD

(A–E) Scatterplot showing association between HCC viral score and alpha-feto protein (AFP; A), albumin-bilirubin (ALBI) ratio (B), fibrosis-4 (FIB-4) score (C), aspartate transaminase (AST; D), and alanine transaminase (ALT; E) on the log10 scale. Linear regression models regressed HCC viral score on liver function markers adjusted for age, gender, smoking, and HBV and HCV status (A–D), with an interaction between HCC viral score and HCV status identified in the ALT model (E).

(F) Scatterplot showing the association between the iCCA viral score and CA19-9 on the log10 scale.

(G) Example representation of potential mediation identified via mediation analysis between HCC viral score, AFP, and odds of HCC compared to CLD.

(H) Survival probability of all-cause mortality in CLD population by low versus high viral score in TIGER-LC population.

(I) ROC curve representing performance of TIGER-LC HCC XGBoost model in the NCI-UMD cohort comparing HCC vs. PC and HCC vs. CLD.

See also Figure S4 and Tables S8 and S9.

To examine potential directionality of these associations, we examined whether these liver function markers might be mediating the association between the viral score and HCC development. We found the AFP, ALT, AST, and FIB-4 score were significant mediators in the association between the viral score and odds of HCC between CLD and HCC, with 64%, 13%, 48%, and 10% of the association between the viral score and odds of HCC mediated by AFP, ALT, AST, and FIB-4 score, respectively (all tests p < 2E−16; Figure 5G). These models were adjusted for HBsAg and anti-HCV status, suggesting that the HCC viral score may have a direct influence on liver function independent of hepatitis C or B infection. We did not find evidence of mediation between the iCCA viral score, CA19-9, and iCCA status.

We examined whether the HCC or iCCA viral score associated with all-cause mortality in the CLD or OV participants, respectively, ascertained from the National Death Index. Over a follow-up period of 8.1 years, a higher HCC viral score was associated with higher incidence of all-cause mortality in the CLD population (hazard ratio [HR]: 2.01, 95% CI: 1.06, 3.82, p = 0.03; Figure 5H). Results remained after adjustment for age, gender, and smoking (HR: 2.00, 95% CI: 1.04, 3.86, p = 0.036). While we do not know whether these deaths are due to liver cancer, the 10-year incidence of HCC in patients with cirrhosis has been found to be anywhere between 4% and 16%.30,31,32,33 Therefore, we estimate that at least a portion of these deaths are due to underlying HCC development. In contrast to the HCC viral score, clinical anti-HCV and HBsAg status were not associated with survival (anti-HCV HR: 0.44, 95% CI: 0.05, 3.66, p = 0.45; HBsAg HR: 0.61, 95% CI: 0.30, 1.26, p = 0.19). We did not find an association between the iCCA viral score and all-cause mortality in the OV population (HR: 2.16, 95% CI: 0.74, 6.32, p = 0.16). In contrast to the iCCA viral score, anti-HCV status (though not HBsAg) was positively associated with all-cause mortality in the OV population (anti-HCV status HR: 23.9, 95% CI: 4.95, 115.6, p = 7.81E−05).

To explore whether the model and features we identified in a Thai population would translate to a US population, we examined whether the HCC model and score replicated in the University of Maryland- National Cancer Institute (UMD-NCI) cohort. The HCC XGBoost model could discriminate between HCC and PC participants (AUC: 0.76, 95% CI: 0.71, 0.81; Figure 5I), though the AUC of the HCC and CLD participants was lower (AUC: 0.57, 95% CI: 0.52, 0.63; Figure 5I). We additionally tested whether the HCC viral score associated with liver function markers in this cohort. The HCC viral score was positively associated with AST (p = 0.032) and ALT (p = 0.037) in the HCC population when adjusting for anti-HCV and HBsAg status (Figure S4). None of the liver function markers were significantly associated with the HCC viral score in the CLD population when adjusted for anti-HCV and HBsAg status. We additionally found the HCC viral score was not statistically associated with all-cause and liver-specific mortality in the HCC population. Nevertheless, the association was in a trending direction (all-cause mortality HR: 1.43, 95% CI: 0.93, 2.17, p = 0.096; liver-specific mortality HR: 1.37, 95% CI: 0.83, 2.26, p = 0.22). In the HR population, the viral score was not associated with all-cause or liver specific mortality (all-cause mortality HR: 0.74, 95% CI: 0.48, 1.15, p = 0.18; liver-specific mortality HR: 0.89, 95% CI: 0.40, 1.97, p = 0.77).

Discussion

Our study indicates that both hepatitis and non-hepatitis viral exposures were able to distinguish HCC and iCCA from control populations and that the HCC and iCCA viral strains may be a useful biomarker in populations at risk of HCC and iCCA, respectively. Using an aggregate 46 viral feature score that distinguishes HCC from control populations, we found improved utility of the HCC viral score at determining clinical risk and mortality in patients with CLD compared to the utility of clinical hepatitis status. Finally, the HCC score could be validated in an independent cohort.

It is well known that the viral hepatitis infection is a causal driver of HCC development. However, we found that additional patient viral reactivity beyond hepatitis viruses could predict liver function markers in two populations at risk of HCC and iCCA and could predict mortality in the population at risk of HCC alone. The additional strains in the viral scores include pathogenic and non-pathogenic viruses. While it is unknown as to whether they may be biologically influencing tumorigenesis, there are several potential mechanisms whereby previous viral infection could be inducing changes on the immune environment.

We found three viruses whose peptide antibody levels were not as enriched in HCC and iCCA compared to PC, several strains of norovirus, rhinovirus, and influenza A virus. Norovirus strains (including echovirus 11, Lordsdale virus HUNV, Southampton virus, and norovirus 1, GII and MD145) were more significantly enriched in PC participants compared to HCC and iCCA participants. Norovirus is an enteric virus that has been shown to play a role in intestinal stability. In murine models, norovirus has been found to be essential for homeostasis of intraepithelial lymphocytes,34,35 restore intestinal morphology after antibiotics,35 and protect against type 1 diabetes by increasing T regulatory cell populations.36 This could explain our results, as alterations in the intestinal barrier have been shown to play a critical role promoting liver cancer.37

Rhinovirus and influenza virus are common respiratory viruses we have found to be depleted in HCC and iCCA populations compared to PC. Rhinovirus has been used as an oncolytic virus and could be inducing protective effects in PC participants like the documented effects from oncolytic immunotherapy inducing oncolytic cell death38 and indirectly harnessing host immunity.39 These viruses may play an important role in the tumorigenic pathway.

There is also the possibility that these serological profiles are reflective of immunological changes as a consequence of tumorigenesis (reverse causation). There is evidence that the circulating antibodies may be influenced by tumorigenesis. While circulating immunoglobulin G (IgG) has not been found to substantially differ in individuals with or without cancer,40 there is some evidence that tumor-derived IgG levels including B lymphocyte tumor infiltration and antibody expression could be prognostic markers in patients with cancer,41,42 thus potentially mitigating factors during tumorigenesis. This may explain the similarity in the serological profiles between HCC and CLD participants, where immunological changes could be underway.

Overall, the HCC viral score was superior at delineating risk in the CLD population as compared to clinical anti-HCV and HbsAg alone in the TIGER-LC study. We also examined the utility of the HCC viral score in a separate population (UMD-NCI cohort). We found the results of these two cohorts are largely consistent with a few discrepancies, despite the fact that UMD-NCI and TIGER-LC are two different populations with different race/ethnicity across two different continents with a wide range of exposure history. It is anticipated that these differences may therefore account for some of the discrepancy in the model validation. Nevertheless, the direction of effect remained consistent across both cohorts and supports the utility of validating this marker in other populations. We additionally found the iCCA viral score was positively associated with CA19-9 in the OV population from the TIGER-LC study adjusted for additional risk factors. However, the viral score did not associate with all-cause mortality.

Several serum markers have been identified as being an early or prognostic biomarker for liver cancer.43 Our study proposes the utility of a non-invasive method for targeting populations at risk of developing liver cancer. This is particularly relevant because it is estimated that <30% of individuals with HCV or HBV participated in regular surveillance in the US.44 While the majority of cases of HCC worldwide derive from HCV and HBV infection, the incidence of HCC in patients with cirrhosis caused by chronic HBV and HCV is 2%–5%,45 making it challenging to identify the most vulnerable populations. For iCCA, few screening factors are available to identify patients at risk. Integrating patient serological profiles could improve targeting and predicting risk of liver cancer development, though more research in prospective studies is needed. Future research efforts should evaluate serological profiles in a prospective study of HCC to examine the performance of these metrics. Furthermore, investigation of these viral exposures in model systems may elucidate whether mechanisms or markers are contributing to these cancers.

Limitations of the study

In our study population, we found differential immune reactivity to viral strains is associated with higher odds of liver cancer. Viral exposures could be highly confounded by several factors. Viral transmissibility is affected by the properties of each viruses, whether respiratory or enteric.46 Individuals of lower socioeconomic conditions with limited access to sanitation may be more prone to viral infections.47 However, these factors could also independently associate with liver cancer.48 We adjusted for several proxies of socioeconomic status including job category and region. However, there may be unknown confounders that were not accounted for in our analyses. There are also several important limitations related to the cohort design. The study was designed and coordinated at five large hospital centers, allowing for a large coverage of several Thai regions. The PC population was based on a population visit to a hospital, allowing for the potential of selection bias if the hospital-based PC population had higher or lower viral exposure history. Nevertheless, we found no difference in overall number of species enriched in the population, and we hypothesize that hospital-based control subjects might have higher viral exposure, suggesting that our findings may be a more conservative estimate of the true effect in the target population. There are also several important questions that remain with this technology. The persistence of serum antibodies is often differential depending on the virus. Thus, we do not know whether the enrichment of a viral strains using PhIP-Seq represents more recent infection or long-term immunity. There is also the potential that human proteins may share homology with small viral peptides and account for some of the differences between disease groups. Similarly, protein structure similarity, rather than sequence similarity, between host proteins and microbiome may be possible via a molecular mimicry mechanism. However, the initial viral sequence design was based on public available sequence database such as UniProt and Genebank. Thus, this would likely play a modest role, if any, in the discriminating viruses we identified between disease groups. Additionally, we tested and found some differences by batch, leading to the exclusion of several plates. However, no strategy has been developed to account for these differences using PhIP-Seq data. Future research efforts should focus on examining and accounting for these differences.

Overall, we found a set of viral strains including viral hepatitis and non-hepatitis viruses that were positively and negatively associated with liver cancer, potentially reflective of a causal relationship between non-pathogenic viral infection and tumorigenesis. We further demonstrated the utility of examining serological profiles in patients with CLD and OV, establishing their association with liver function markers and mortality. Future research efforts should examine these data in prospective studies of liver cancer and integrated with additional molecular data to elucidate potential pathways that could account for these associations.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Bacterial and virus strains

BLT5403 E. coli strain Novagen (EMD MIllipore) 69142-0.2ML
T7 phage Novagen (EMD MIllipore) 70010-3

Biological samples

Blood samples from TIGER-LC consortium (Phip-seq data) Thailand

Chemicals, peptides, and recombinant proteins

Q5® Hot Start High-Fidelity DNA Polymerase for PCR New England Biolabs Cat# M0493L
Deoxynucleotide (dNTP) Solution Mix for PCR New England Biolabs Cat# N0447L
UltraPure™ DNase/RNase-Free Distilled Water Thermo Fisher Cat# 10977015
Chloramphenicol Sigma Aldrich Cat# C0378-5G
Kanamycin sulfate Sigma Aldrich Cat# 60615-5G
BSA Sigma Aldrich Cat# A3983
1M Tris-HCl, pH 8.0 Thermo Fisher 15568-025
1M Tris-HCl, pH 7.5 Thermo Fisher 15567-027
1M MgSO4 KD Medical CAC-5310
5M NaCl KD Medical RGC-5280
NP-40 Sigma Aldrich 492018
20x TBST buffer Thermo Scientific 28358

Critical commercial assays

QIAquick® Gel Extraction Kit QIAGEN 28704

Deposited data

Phip-seq data from TIGER-LC consortium Thailand Harvard dataverse: https://doi.org/10.7910/DVN/NIT39Z

Oligonucleotides

PCR amplification primer 1 Fwd
5’ACACTCTTTCCCTACACGACTCCAG
TCAGGTGTGATGCTC3’
IDT DNA N/A
PCR amplification primer 1 Rv 5’GTGAC
TGGAGTTCAGACGTGTGCTCTTCCGA
TCCGAGCTTATCGTCGTCATCC3’
IDT DNA N/A
PCR amplification primer 2 Fwd 5’AATG
ATACGGCGACCACCGAGATCTACACT
CTTTCCCTACACGACTCCAGT3’
IDT DNA N/A
PCR amplification primer 2 Rv
5’CAAGCAGAAGACGGCATACGAGATtcgcaggGTG
ACTGGAGTTCAGACGTGT3’
IDT DNA N/A
Library sequencing primer 5’TGCTCGGGGATCC
AGGAATTCCGCTGCGT3’
IDT DNA N/A

Software and algorithms

R www.r-project.org Version 3.6.3
Bowtie http://bowtie-bio.sourceforge.net/index.shtml Version 2.3.5.1
GraphPad Prism 8 www.graphpad.com Version 8
XGBoost https://xgboost.readthedocs.io/en/latest/ Version 0.90.0.2

Other

96-deep-well plates BrandTech Cat# 701350
protein A Dynabeads Thermo Fisher Cat# 10008D
protein G Dynabeads Thermo Fisher Cat# 10009D
gel extraction kit Qiagen Cat# 28704
NextSeq 500 Sequencer Illumina N/A

Resource availability

Lead contact

  • Further information and requests should be directed to Xin Wei Wang, xw3u@nih.gov.

Materials availability

  • This study did not generate new unique reagents.

Data and code availability

  • The phage immunoprecipitation sequencing data reported in this paper have been deposited in the Harvard Dataverse and the accession number is listed in the key resources table and here (https://doi.org/10.7910/DVN/NIT39Z).

  • This paper does not report original code.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Experimental model and study participant details

Study population

The study population derives from the TIGER-LC consortium. Individuals living in Thailand were enrolled including patients with diagnosed with HCC, iCCA, high risk controls including CLD patients and individuals exposed to Opisthorchis viverrini [OV] and healthy controls [PC]. Recruitment for HCC, iCCA, CLD and OV took place at five hospitals including Maharaj Nakorn Chiang Mai Hospital, Roi Et Hospital, Chulabhorn Hospital- Bangkok, National Cancer Institute of Thailand, and Srinagarind Hospital. HCC were identified based on physician diagnosis of primary liver cancer within the last year based on AFP and ultrasound or pathological diagnosis. iCCA cases were identified based on physician diagnosis of primary liver cancer (iCCA) using ultrasound/imaging (at least 2) and pathological diagnosis of primary liver cancer. Cases were identified through several resources including medical records and pathology, with medical personnel identifying all cases diagnosed that day with liver cancer or presenting with risk factors associated with liver cancer including the list of weekly scheduled surgeries at the participating hospitals. CLD participants were identified as those having been diagnosed with HBV, HCV or alcoholic liver disease based on clinical report. OV were identified as hospital patients infected with liver fluke based on stool test for OV eggs. PC participants were identified among hospital visitors or individuals visiting the hospital for business, non-illness related reasons, such as for an annual checkup, or non-liver related illness in all hospital regions except in Khon Kaen University Hospital. In this area, PC participants were identified from routine screenings for population for OV in a province near Khon Kaen. This is based on a community outreach campaign tracking individual stool samples for OV. In all populations, exclusion criteria included individuals <20 years and >80 years, individuals diagnosed with HIV infection, individuals residing in an institution, and individuals who were severely ill. Informed consent was obtained from all participants. Institutional review board approvals were granted from all participating centers (Maharaj Nakorn Chiang Mai Hospital, Roi Et Hospital, Chulabhorn Hospital- Bangkok, National Cancer Institute of Thailand, and Srinagarind Hospital). All participants provided written informed consent.

The validation cohort derived from the UMD-NCI cohort. Participants were recruited from the greater Baltimore area starting in 2003. All participants consented to the collection of blood sample and participation in the survey. The study was approved under the NIH IRB and the UMD IRB. This cohort included HCC patients, at-risk patients, and PC. HCC cases were pathologically confirmed primary liver cancer. Exclusion criteria included non-English speakers, patients in critical care, individuals residing in an institution or individuals aged <18 or >90. At-risk patients were included if they had been diagnosed with HBV, HCV, non-alcoholic steatosis, or alcoholic liver disease with chronicity present for at least 6 months. Exclusion criteria included individuals previously exposed to HIV infection and individuals aged<18 or >90. VirScan was measured in all participants.

Method details

Lifestyle and biological variables

Several lifestyle and biological variables were obtained from patient questionnaires and clinical files. Participants were administered a socio-demographic and health questionnaire with urine and blood collected from all study subjects. The questionnaire was administered in-person by a trained interviewers obtaining information on (1) demographics, including age, gender and region of residence; (2) socioeconomic information, including occupation; and (3) lifestyle factors, including alcohol consumption and tobacco smoking. All factors listed in the questionnaire were self-reported. Region of residence was defined as south and central, north, and northeast Thailand. Occupation was self-reported as the longest job they had held during their lifetime. Study coordinators then categorized job category as (1) rice/crop/tree farmer, (2) construction/day worker and skilled laborer, (3) merchants and drivers, (4) government/office workers/teachers, and (5) all others including homemakers. Smoking status was defined as current, former or never. Alcohol intake was defined as never drinker, daily or a few times a week, weekly, or monthly/rarely. Clinical variables including AFP (ng/mL), CA19-9 (U/mL), ALT (U/L), total bilirubin (μmol/L), albumin (g/dL), platelet count (platelets/μL), weight (kg), height (m), and anti-HCV and HBsAg status were obtained from clinical files.

Phage library

The phage library seed was obtained from Stephen Elledge’s lab as described.13 Briefly, the T7 phage library was created using 56-amino acid (AA) peptides encompassing viral genomes of all known viruses, with 28 AA overlap. Methods to amplify the T7 phage library seed have been previously described.13 The seed T7 phage library was proliferated using BLT5403 E.coli. Briefly, to keep the diversity of phage-displayed epitopes during T7 phage proliferation, the agar plate method was used. The seed phage library was mixed with BLT5403 E.coli and spread on the LB agar plates and incubated in 37°C for 3 to 4 h. The plates were checked at every 20 min after 2.5 h until the plates were just cleared, so that the final phage titer was high and maintained the heterogeneity of the phage library. To elute the phage, each plate was covered with 10 mL of Phage Extraction Buffer and placed on a rocking platform at 4C overnight. The phage library was harvested by tipping the plate slightly. The extraction buffer from all the plates were combined and 0.5 mL chloroform was added to each 50 mL Falcon tube and gently mixed to lyse the E.coli. The tubes were centrifuged at 4500 rpm for 15 min to clarify the lysate and transfer the supernatant to a sterile bottle. DMSO was added to final volume of 10% to the supernatant. The expanded library was aliquoted and snapped freeze in liquid nitrogen. Then the expanded library was stored in 80°C freezer. The titer of the amplified library was determined by plaque assay.

Library quality control

The library was sampled and lysed at 95C for 10min. After two rounds of PCR were performed to amplify and index the lysed bacteriophage DNA product. After gel extraction, the size and quality of libraries were assessed on a Bioanalyzer instrument from Agilent. The DNA samples were sent for sequencing. Then, the sequencing was done at Sequencing Facility - Illumina (CCR) using 50bp single round sequenced the DNA base read cycle on Illumina NextSeq 4000 platform (1X50 bp) obtained 100 million to 200 million reads per lane (around 1,000,000 reads per sample). Total coverage rate is more than 99.99% of designed peptides displayed in T7 phage library.

PhIP-Seq

The PhIP-Seq procedure has been described in detail. Briefly, serum samples were mixed with the bacteriophage library and rotated overnight. The next day, magnetic beads were added to each well and allowed to immunoprecipitate. After 4 h, the beads were washed three times with wash buffer. The beads were then resuspended in water and lysed. The lysed phage material then went through two rounds of PCR amplification and was indexed and combined for sequencing using the Illumina NextSeq 500. All samples were run with technical duplicates. Additionally, each plate included three empty wells (negative controls) and three wells filled with a single healthy control sample from one individual which is utilized across plates (positive control).

Quantification and statistical analysis

Statistical analysis

Data normalization have been described in detail.49 Briefly, after read alignment using bowtie, the counts per peptide were transformed to a relative epitope binding signal (EBS). Peptides were binned in descending rank-order based on the negative control counts with a bin size of 300. After establishment of the order, the middle 90% of sample peptide counts data were used to serve as the null distribution and Z-scores were calculated in comparison to the mean and standard deviation. After calculating the Z score in identical plates, an antibody hit was based on a Z score >6.5 on both plates for an individual peptide. To account for peptide cross-reactivity, any hits with a shared subsequence of 7 amino acids in individual samples across viruses were considered cross-reactive and were eliminated. The EBS score was averaged, and log transformed between the replicates. Data were analyzed at the strain level with peptide data converted as the average EBS among enriched peptides per strain. A strain was considered seropositive for any number of peptides being enriched.

We examined whether any difference in our data could be explained by the presence of batch effects. To do so, we used several analyses. We tested for batch effects using principal component analysis of the normalized data. According to this analysis, we found minimal differences by plate (except in plate 39). We also examined the total number of enriched epitopes across plates using a linear model regressing the total number of enriched epitopes on plate. We found several plates were significantly different (including plates 35–39). However, we additionally examined whether this difference led to differential relationships between population groups across the plates (Figure S5). We tested for the interaction between plate and groups according to different groups. None of the tests found a significant interaction across the groups tested after multiple testing adjustment (FDR p < 0.05). Given the significant dispersion in plates 35–39, these plates were excluded from the analysis (Figure S6).

We conducted descriptive analyses using chi-square tests, Mann-Whitney U tests and analysis of variance tests. Differences in viral family abundance were tested using permutational multivariate analysis of variance using distance matrices using the vegan package in R.50 We examined two models comparing HCC and iCCA with PC using XGBoost. We selected this model as XGBoost is a highly utilized machine-learning method which is particularly useful for analyzing non-normal and sparse data types with high performance and computational speed. Thus, this model was used to build a prediction model discriminating populations with liver cancer from PC. To attenuate the overfitting of the XGBoost model on the training set, we used a dimension reduction strategy for cross-validation to identify the top features identified in >80% of models. We analyzed the full set of data (1280 strains) between HCC/iCCA and PC using 10x cross-validation 100 times and selected the features that were identified as important predictors in 80% of the models. Using only the features identified after dimension reduction, we divided the data randomly into a training (80%) and a test set (20%) and ran 10x cross-validation for model tuning on the training data and ultimately selected the final model distinguishing HCC/iCCA vs. PC. These models were tested in the test set and compared to the remaining populations by examining the area under the curve (AUC). As the high-risk control populations (CLD and OV) have unique liver cancer risk, they were compared individually to HCC and iCCA, respectively. We used logistic regression comparing important viral features from both models between HCC/iCCA and CLD/OV and PC. Models were adjusted for lifestyle and biological confounders included gender, age, job category, region, alcohol intake and smoking. Significance was defined as a false discovery rate (FDR) < 0.05.

Using the important features from the models, we created a viral score using the SHAP values.28 For the HCC and iCCA model, the SHAP values of the important features were pooled and concatenated to a 0–1 value (0 indicates more likely predicted PC, 1 indicates more likely predicted HCC or iCCA). The viral score was compared within the groups using Wilcoxon rank-sum test and using logistic regression. Models were adjusted for the same confounders as above with significance defined as p < 0.05. We examined whether the HCC or iCCA viral score associated with liver function markers including AFP, ALT, AST, ALBI, FIB-4 score, and CA19-9. We regressed log-normalized clinical markers (except ALBI which was not skewed) on viral score adjusted for HbsAg status, anti-HCV status, age (except FIB-4 score), smoking status, and alcohol intake. For ease of interpretation, figures and data were described using clinical values. We tested the viral score for mediation with AFP, AST, ALT, FIB-4 score and ALBI using the ‘mediation’ package in R using 100 simulations and using heteroskedasticity-consistent standard errors.51 Finally, we used cox proportional hazard analysis to examine the association between the HCC viral score and all-cause mortality in the CLD population adjusted for age, gender and smoking.

Acknowledgments

We thank the patients, families, and nurses for their contributions to this study. This work was supported in part by grants (Z01 BC 010877, Z01 BC 010876, Z01 BC 010313, and ZIA BC 011870) from the intramural research program of the Center for Cancer Research, National Cancer Institute of the US to X.W.W. W.L.D. was supported by the Intramural Continuing Umbrella of Research Experiences (iCURE) program at the NCI.

Author contributions

W.L.D. and X.W.W. developed the study concept; W.L.D. performed data analysis; L.W., M.F., J.L., S.R., B.P., Y.Z., and H.G. performed sample processing, data collection, and initial data quality control; L.W. and H.G. generated phip-seq data; V.B., C.P., W.S., A.P., V.L., N.L., A.C., C.U.A., T.U., T.S., S.S., K.P., A.B., C.C.H., C.M., M.R., and X.W.W. performed and managed subject recruitment on behalf of the TIGER-LC consortium; and W.L.D. and X.W.W. interpreted data and wrote the manuscript. All authors read, edited, and approved the manuscript.

Declaration of interests

The authors declare no competing interests.

Inclusions and diversity

We support inclusive, diverse, and equitable conduct of research.

Published: December 19, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xcrm.2023.101328.

Contributor Information

Mathuros Ruchirawat, Email: mathuros@cri.or.th.

Xin Wei Wang, Email: xw3u@nih.gov.

Supplemental information

Document S1. Figures S1–S6
mmc1.pdf (484.7KB, pdf)
Table S1. Descriptive statistics for TIGER-LC cohort, related to Figure 1

Means and standard error (SE) are provided for continuous data and counts and percentages provided for categorical data. Differences by group were tested using analysis of variance or chi-square test.

mmc2.xlsx (11.9KB, xlsx)
Table S2. Importance matrix output from HCC XGBoost model compared to PC, related to Figure 2
mmc3.xlsx (12.6KB, xlsx)
Table S3. Importance matrix output from iCCA XGBoost model compared to PC, related to Figure 3
mmc4.xlsx (12.8KB, xlsx)
Table S4. Logistic regression examining association of HCC important features among HCC vs. PC and HCC vs. CLD, related to Figure 4

Model report CLD as null and PC as reference. Model was adjusted for age, gender, alcohol intake, region, job category and smoking.

mmc5.xlsx (19.4KB, xlsx)
Table S5. Logistic regression examining association of iCCA important features among iCCA vs. PC and iCCA vs. OV, related to Figure 4

Model report OV as null and PC as null. Model was adjusted for age, gender, alcohol intake, region (PC only), job category and smoking.

mmc6.xlsx (64.6KB, xlsx)
Table S6. Linear regression model regressing HCC viral score on HCC status compared to PC and CLD, respectively, related to Figure 4

Models adjusted for age, gender, alcohol, smoking status, job category and region.

mmc7.xlsx (9.1KB, xlsx)
Table S7. Linear regression model regressing iCCA viral score on iCCA status compared to PC and OV, respectively, related to Figure 4

Models adjusted for age, gender, alcohol, smoking status, job category and region (PC model only).

mmc8.xlsx (9.1KB, xlsx)
Table S8. Association between liver function markers and HCC viral score in the CLD population, related to Figure 5

Linear regression models used regressing HCC viral score on log-normalized liver function markers adjusted for HbsAg status, anti-HCV status, age, smoking, and alcohol intake. Viral score is measured per 10 percentage point change.

mmc9.xlsx (9.2KB, xlsx)
Table S9. Association between liver function markers and iCCA viral score in the OV population, related to Figure 5

Linear regression models used regressing iCCA viral score on log-normalized liver function markers adjusted for HbsAg status, anti-HCV status, age, smoking and alcohol intake. AST and FIB-4 score were not measured in the population.

mmc10.xlsx (9.1KB, xlsx)
Document S2. Article plus supplemental information
mmc11.pdf (4.5MB, pdf)

References

  • 1.Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
  • 2.Islami F., Goding Sauer A., Miller K.D., Siegel R.L., Fedewa S.A., Jacobs E.J., McCullough M.L., Patel A.V., Ma J., Soerjomataram I., et al. Proportion and number of cancer cases and deaths attributable to potentially modifiable risk factors in the United States. CA. Cancer J. Clin. 2018;68:31–54. doi: 10.3322/caac.21440. [DOI] [PubMed] [Google Scholar]
  • 3.Yang J.D., Hainaut P., Gores G.J., Amadou A., Plymoth A., Roberts L.R. A global view of hepatocellular carcinoma: trends, risk, prevention and management. Nat. Rev. Gastroenterol. Hepatol. 2019;16:589–604. doi: 10.1038/s41575-019-0186-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kansagara D., Papak J., Pasha A.S., O’Neil M., Freeman M., Relevo R., Quiñones A., Motu’apuaka M., Jou J.H. Screening for Hepatocellular Carcinoma in Chronic Liver Disease. Ann. Intern. Med. 2014;161:261–269. doi: 10.7326/M14-0558. [DOI] [PubMed] [Google Scholar]
  • 5.Moon A.M., Weiss N.S., Beste L.A., Su F., Ho S.B., Jin G.Y., Lowy E., Berry K., Ioannou G.N. No Association Between Screening for Hepatocellular Carcinoma and Reduced Cancer-Related Mortality in Patients With Cirrhosis. Gastroenterology. 2018;155:1128–1139.e6. doi: 10.1053/j.gastro.2018.06.079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ji M., Liu Z., Chang E.T., Yu X., Wu B., Deng L., Feng Q., Wei K., Liang X., Lian S., et al. Mass screening for liver cancer: results from a demonstration screening project in Zhongshan City, China. Sci. Rep. 2018;8 doi: 10.1038/s41598-018-31119-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Singal A.G., Yopp A., S Skinner C., Packer M., Lee W.M., Tiro J.A. Utilization of hepatocellular carcinoma surveillance among American patients: a systematic review. J. Gen. Intern. Med. 2012;27:861–867. doi: 10.1007/s11606-011-1952-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Khan S.A., Tavolari S., Brandi G. Cholangiocarcinoma: Epidemiology and risk factors. Liver Int. 2019;39(Suppl 1):19–31. doi: 10.1111/liv.14095. [DOI] [PubMed] [Google Scholar]
  • 9.Cadwell K. The virome in host health and disease. Immunity. 2015;42:805–813. doi: 10.1016/j.immuni.2015.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Foxman E.F., Iwasaki A. Genome-virome interactions: examining the role of common viral infections in complex disease. Nat. Rev. Microbiol. 2011;9:254–264. doi: 10.1038/nrmicro2541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lopez Angel C.J., Pham E.A., Du H., Vallania F., Fram B.J., Perez K., Nguyen T., Rosenberg-Hasson Y., Ahmed A., Dekker C.L., et al. Signatures of immune dysfunction in HIV and HCV infection share features with chronic inflammation in aging and persist after viral reduction or elimination. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2022928118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Sparks R., Lau W.W., Liu C., Han K.L., Vrindten K.L., Sun G., Cox M., Andrews S.F., Bansal N., Failla L.E., et al. Influenza vaccination reveals sex dimorphic imprints of prior mild COVID-19. Nature. 2023;614:752–761. doi: 10.1038/s41586-022-05670-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Xu G.J., Kula T., Xu Q., Li M.Z., Vernon S.D., Ndung'u T., Ruxrungtham K., Sanchez J., Brander C., Chung R.T., et al. Viral immunology. Comprehensive serological profiling of human populations using a synthetic human virome. Science. 2015;348:aaa0698. doi: 10.1126/science.aaa0698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shrock E., Fujimura E., Kula T., Timms R.T., Lee I.-H., Leng Y., Robinson M.L., Sie B.M., Li M.Z., Chen Y., et al. Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity. Science. 2020;370 doi: 10.1126/science.abd4250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shrock E.L., Timms R.T., Kula T., Mena E.L., West A.P., Guo R., Lee I.-H., Cohen A.A., McKay L.G.A., Bi C., et al. Germline-encoded amino acid–binding motifs drive immunodominant public antibody responses. Science. 2023;380 doi: 10.1126/science.adc9498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bjornevik K., Cortese M., Healy B.C., Kuhle J., Mina M.J., Leng Y., Elledge S.J., Niebuhr D.W., Scher A.I., Munger K.L., Ascherio A. Longitudinal analysis reveals high prevalence of Epstein-Barr virus associated with multiple sclerosis. Science. 2022;375:296–301. doi: 10.1126/science.abj8222. [DOI] [PubMed] [Google Scholar]
  • 17.Venkataraman T., Valencia C., Mangino M., Morgenlander W., Clipman S.J., Liechti T., Valencia A., Christofidou P., Spector T., Roederer M., et al. Analysis of antibody binding specificities in twin and SNP-genotyped cohorts reveals that antiviral antibody epitope selection is a heritable trait. Immunity. 2022;55:174–184.e5. doi: 10.1016/j.immuni.2021.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Liu J., Tang W., Budhu A., Forgues M., Hernandez M.O., Candia J., Kim Y., Bowman E.D., Ambs S., Zhao Y., et al. A Viral Exposure Signature Defines Early Onset of Hepatocellular Carcinoma. Cell. 2020;182:317–328.e10. doi: 10.1016/j.cell.2020.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jespersen M.C., Peters B., Nielsen M., Marcatili P. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res. 2017;45 doi: 10.1093/nar/gkx346. W24-w29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Suntornlohanakul R., Wanlapakorn N., Vongpunsawad S., Thongmee T., Chansaenroj J., Poovorawan Y. Seroprevalence of Anti-EBV IgG among Various Age Groups from Khon Kaen Province, Thailand. Asian Pac. J. Cancer Prev. 2015;16:7583–7587. doi: 10.7314/apjcp.2015.16.17.7583. [DOI] [PubMed] [Google Scholar]
  • 21.Linsuwanon P., Puenpa J., Huang S.-W., Wang Y.-F., Mauleekoonphairoj J., Wang J.-R., Poovorawan Y. Epidemiology and seroepidemiology of human enterovirus 71 among Thai populations. J. Biomed. Sci. 2014;21:16. doi: 10.1186/1423-0127-21-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kositanont U., Wasi C., Ekpatcha N., Poomchart A., Likanonsakul S., Suphanip I., Balachandra K., Yamanishi K. Seroprevalence of human herpesvirus 6 and 7 infections in the Thai population. Asian Pac. J. Allergy Immunol. 1995;13:151–157. [PubMed] [Google Scholar]
  • 23.Mäkelä M.J., Puhakka T., Ruuskanen O., Leinonen M., Saikku P., Kimpimäki M., Blomqvist S., Hyypiä T., Arstila P. Viruses and Bacteria in the Etiology of the Common Cold. J. Clin. Microbiol. 1998;36:539–542. doi: 10.1128/jcm.36.2.539-542.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sun Y., Miao Z., Yan J., Gong L., Chen Y., Chen Y., Mao H., Zhang Y. Sero-molecular epidemiology of enterovirus-associated encephalitis in Zhejiang Province, China, from 2014 to 2017. Int. J. Infect. Dis. 2019;79:58–64. doi: 10.1016/j.ijid.2018.11.002. [DOI] [PubMed] [Google Scholar]
  • 25.Al-Sadeq D.W., Zedan H.T., Aldewik N., Elkhider A., Hicazi A., Younes N., Ayoub H.H., Raddad L.A., Yassine H.M., Nasrallah G.K. Human herpes simplex virus-6 (HHV-6) detection and seroprevalence among Qatari nationals and immigrants residing in Qatar. IJID Reg. 2022;2:90–95. doi: 10.1016/j.ijregi.2021.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Klöhn M., Schrader J.A., Brüggemann Y., Todt D., Steinmann E. Beyond the Usual Suspects: Hepatitis E Virus and Its Implications in Hepatocellular Carcinoma. Cancers. 2021;13 doi: 10.3390/cancers13225867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mason S.J., Graham N.E. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q. J. R. Meteorol. Soc. 2002;128:2145–2166. [Google Scholar]
  • 28.Lundberg S.M., Lee S.-I. Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc; 2017. A unified approach to interpreting model predictions. [Google Scholar]
  • 29.Johnson P.J., Berhane S., Kagebayashi C., Satomura S., Teng M., Reeves H.L., O'Beirne J., Fox R., Skowronska A., Palmer D., et al. Assessment of liver function in patients with hepatocellular carcinoma: a new evidence-based approach-the ALBI grade. J. Clin. Oncol. 2015;33:550–558. doi: 10.1200/JCO.2014.57.9151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Poh Z., Goh B.B.G., Chang P.E.J., Tan C.K. Rates of cirrhosis and hepatocellular carcinoma in chronic hepatitis B and the role of surveillance: a 10-year follow-up of 673 patients. Eur. J. Gastroenterol. Hepatol. 2015;27:638–643. doi: 10.1097/MEG.0000000000000341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sharma S.A., Kowgier M., Hansen B.E., Brouwer W.P., Maan R., Wong D., Shah H., Khalili K., Yim C., Heathcote E.J., et al. Toronto HCC risk index: A validated scoring system to predict 10-year risk of HCC in patients with cirrhosis. J. Hepatol. 2018;68:92–99. doi: 10.1016/j.jhep.2017.07.033. [DOI] [PubMed] [Google Scholar]
  • 32.West J., Card T.R., Aithal G.P., Fleming K.M. Risk of hepatocellular carcinoma among individuals with different aetiologies of cirrhosis: a population-based cohort study. Aliment. Pharmacol. Ther. 2017;45:983–990. doi: 10.1111/apt.13961. [DOI] [PubMed] [Google Scholar]
  • 33.Kanwal, F., Khaderi, S., Singal, A.G., Marrero, J.A., Loo, N., Asrani, S.K., Amos, C.I., Thrift, A.P., Gu, X., Luster, M., et al. Risk factors for HCC in contemporary cohorts of patients with cirrhosis. Hepatology. [DOI] [PMC free article] [PubMed]
  • 34.Liu L., Gong T., Tao W., Lin B., Li C., Zheng X., Zhu S., Jiang W., Zhou R. Commensal viruses maintain intestinal intraepithelial lymphocytes via noncanonical RIG-I signaling. Nat. Immunol. 2019;20:1681–1691. doi: 10.1038/s41590-019-0513-z. [DOI] [PubMed] [Google Scholar]
  • 35.Kernbauer E., Ding Y., Cadwell K. An enteric virus can replace the beneficial function of commensal bacteria. Nature. 2014;516:94–98. doi: 10.1038/nature13960. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pearson J.A., Tai N., Ekanayake-Alper D.K., Peng J., Hu Y., Hager K., Compton S., Wong F.S., Smith P.C., Wen L. Norovirus Changes Susceptibility to Type 1 Diabetes by Altering Intestinal Microbiota and Immune Cell Functions. Front. Immunol. 2019;10:2654. doi: 10.3389/fimmu.2019.02654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Plaza-Díaz J., Solís-Urra P., Rodríguez-Rodríguez F., Olivares-Arancibia J., Navarro-Oliveros M., Abadía-Molina F., Álvarez-Mercado A.I. The Gut Barrier, Intestinal Microbiota, and Liver Disease: Molecular Mechanisms and Strategies to Manage. Int. J. Mol. Sci. 2020;21 doi: 10.3390/ijms21218351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Moaven O., Mangieri C.W., Stauffer J.A., Anastasiadis P.Z., Borad M.J. Strategies to Develop Potent Oncolytic Viruses and Enhance Their Therapeutic Efficacy. JCO Precis. Oncol. 2021;5:733–743. doi: 10.1200/PO.21.00003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Parato K.A., Senger D., Forsyth P.A.J., Bell J.C. Recent progress in the battle between oncolytic viruses and tumours. Nat. Rev. Cancer. 2005;5:965–976. doi: 10.1038/nrc1750. [DOI] [PubMed] [Google Scholar]
  • 40.Monroy-Iglesias M.J., Crescioli S., Beckmann K., Le N., Karagiannis S.N., Van Hemelrijck M., Santaolalla A. Antibodies as biomarkers for cancer risk: a systematic review. Clin. Exp. Immunol. 2022;209:46–63. doi: 10.1093/cei/uxac030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ladányi A., Kiss J., Mohos A., Somlai B., Liszkay G., Gilde K., Fejös Z., Gaudi I., Dobos J., Tímár J. Prognostic impact of B-cell density in cutaneous melanoma. Cancer Immunol. Immunother. 2011;60:1729–1738. doi: 10.1007/s00262-011-1071-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Harris R.J., Cheung A., Ng J.C.F., Laddach R., Chenoweth A.M., Crescioli S., Fittall M., Dominguez-Rodriguez D., Roberts J., Levi D., et al. Tumor-Infiltrating B Lymphocyte Profiling Identifies IgG-Biased, Clonally Expanded Prognostic Phenotypes in Triple-Negative Breast Cancer. Cancer Res. 2021;81:4290–4304. doi: 10.1158/0008-5472.CAN-20-3773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Johnson P., Zhou Q., Dao D.Y., Lo Y.M.D. Circulating biomarkers in the diagnosis and management of hepatocellular carcinoma. Nat. Rev. Gastroenterol. Hepatol. 2022;19:670–681. doi: 10.1038/s41575-022-00620-y. [DOI] [PubMed] [Google Scholar]
  • 44.Davila J.A., Morgan R.O., Richardson P.A., Du X.L., McGlynn K.A., El-Serag H.B. Use of surveillance for hepatocellular carcinoma among patients with cirrhosis in the United States. Hepatology. 2010;52:132–141. doi: 10.1002/hep.23615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.de Martel C., Georges D., Bray F., Ferlay J., Clifford G.M. Global burden of cancer attributable to infections in 2018: a worldwide incidence analysis. Lancet. Glob. Health. 2020;8:e180–e190. doi: 10.1016/S2214-109X(19)30488-7. [DOI] [PubMed] [Google Scholar]
  • 46.Burrell C.J., Howard C.R., Murphy F.A. Epidemiology of Viral Infections. Fenner White's Med. Virol. 2017;185:185–203. [Google Scholar]
  • 47.Stebbins R.C., Noppert G.A., Aiello A.E., Cordoba E., Ward J.B., Feinstein L. Persistent socioeconomic and racial and ethnic disparities in pathogen burden in the United States, 1999–2014. Epidemiol. Infect. 2019;147 doi: 10.1017/S0950268819001894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Flores Y.N., Datta G.D., Yang L., Corona E., Devineni D., Glenn B.A., Bastani R., May F.P. Disparities in Hepatocellular Carcinoma Incidence, Stage, and Survival: A Large Population-Based Study. Cancer Epidemiol. Biomarkers Prev. 2021;30:1193–1199. doi: 10.1158/1055-9965.EPI-20-1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mina M.J., Kula T., Leng Y., Li M., de Vries R.D., Knip M., Siljander H., Rewers M., Choy D.F., Wilson M.S., et al. Measles virus infection diminishes preexisting antibodies that offer protection from other pathogens. Science. 2019;366:599–606. doi: 10.1126/science.aay6485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Oksanen J S.G., Blanchet F., Kindt R., Legendre P., Minchin P., O'Hara R., Solymos P., Stevens M., Szoecs E.,W.H., Barbour M., et al. 2022. _vegan: Community Ecology Package_. [Google Scholar]
  • 51.Yamamoto T. mediation: R package for Causal Mediation Analysis. J. Stat. Software. 2013;59:1–38. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S6
mmc1.pdf (484.7KB, pdf)
Table S1. Descriptive statistics for TIGER-LC cohort, related to Figure 1

Means and standard error (SE) are provided for continuous data and counts and percentages provided for categorical data. Differences by group were tested using analysis of variance or chi-square test.

mmc2.xlsx (11.9KB, xlsx)
Table S2. Importance matrix output from HCC XGBoost model compared to PC, related to Figure 2
mmc3.xlsx (12.6KB, xlsx)
Table S3. Importance matrix output from iCCA XGBoost model compared to PC, related to Figure 3
mmc4.xlsx (12.8KB, xlsx)
Table S4. Logistic regression examining association of HCC important features among HCC vs. PC and HCC vs. CLD, related to Figure 4

Model report CLD as null and PC as reference. Model was adjusted for age, gender, alcohol intake, region, job category and smoking.

mmc5.xlsx (19.4KB, xlsx)
Table S5. Logistic regression examining association of iCCA important features among iCCA vs. PC and iCCA vs. OV, related to Figure 4

Model report OV as null and PC as null. Model was adjusted for age, gender, alcohol intake, region (PC only), job category and smoking.

mmc6.xlsx (64.6KB, xlsx)
Table S6. Linear regression model regressing HCC viral score on HCC status compared to PC and CLD, respectively, related to Figure 4

Models adjusted for age, gender, alcohol, smoking status, job category and region.

mmc7.xlsx (9.1KB, xlsx)
Table S7. Linear regression model regressing iCCA viral score on iCCA status compared to PC and OV, respectively, related to Figure 4

Models adjusted for age, gender, alcohol, smoking status, job category and region (PC model only).

mmc8.xlsx (9.1KB, xlsx)
Table S8. Association between liver function markers and HCC viral score in the CLD population, related to Figure 5

Linear regression models used regressing HCC viral score on log-normalized liver function markers adjusted for HbsAg status, anti-HCV status, age, smoking, and alcohol intake. Viral score is measured per 10 percentage point change.

mmc9.xlsx (9.2KB, xlsx)
Table S9. Association between liver function markers and iCCA viral score in the OV population, related to Figure 5

Linear regression models used regressing iCCA viral score on log-normalized liver function markers adjusted for HbsAg status, anti-HCV status, age, smoking and alcohol intake. AST and FIB-4 score were not measured in the population.

mmc10.xlsx (9.1KB, xlsx)
Document S2. Article plus supplemental information
mmc11.pdf (4.5MB, pdf)

Data Availability Statement

  • The phage immunoprecipitation sequencing data reported in this paper have been deposited in the Harvard Dataverse and the accession number is listed in the key resources table and here (https://doi.org/10.7910/DVN/NIT39Z).

  • This paper does not report original code.

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.


Articles from Cell Reports Medicine are provided here courtesy of Elsevier

RESOURCES