Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 Sep 17;117(40):25036–25042. doi: 10.1073/pnas.2006212117

Development of a serum miRNA panel for detection of early stage non-small cell lung cancer

Lisha Ying a,b,1, Lingbin Du b,c,1, Ruiyang Zou d, Lei Shi b,e, Nan Zhang a,b, Jiaoyue Jin b,f, Chenyang Xu a,b, Fanrong Zhang b,g, Chen Zhu b,c, Junzhou Wu a,b, Kaiyan Chen b,h, Minran Huang b,f, Yingxue Wu a,b, Yimin Zhang b,i, Weihui Zheng b,j, Xiaodan Pan b,k, Baofu Chen l, Aifen Lin m, John Kit Chung Tam n, Rob Martinus van Dam o,p, David Tien Min Lai n, Kee Seng Chia p, Lihan Zhou d, Heng-Phon Too q, Herbert Yu r, Weimin Mao b,j, Dan Su b,f,2
PMCID: PMC7547174  PMID: 32943537

Significance

Lung cancer is the most prevalent and deadly cancer worldwide. Minimally invasive testing for early detection of lung cancer is the key for improving prognosis. In this study, we carried out a multicenter, multiethnic study to compare the serum miRNA profiles between 744 NSCLC and 944 healthy controls. We discovered 35 candidate miRNA biomarkers, verified 22 of them, and a panel of 5 miRNAs was optimized for detection of early stage NSCLC with AUC > 0.90 in three independent validation cohorts. The sensitivity and specificity of five-miR panel for detection of stage I NSCLC were 83.0% and 90.7%, respectively. Our study will help to accurately detect early stage NSCLC through minimally invasive testing in the clinic.

Keywords: blood biomarker, microRNA, early detection

Abstract

Minimally invasive testing for early detection of lung cancer to improve patient survival is a major unmet clinical need. This study aimed to develop and validate a serum multi-microRNA (multimiR) panel as a minimally invasive test for early detection of nonsmall cell lung cancer (NSCLC) regardless of smoking status, gender, and ethnicity. Our study included 744 NSCLC cases and 944 matched controls, including smokers and nonsmokers, male and female, with Asian and Caucasian subjects. Using RT-qPCR and a tightly controlled workflow, we quantified the absolute expression of 520 circulating microRNAs (miRNAs) in a Chinese cohort of 180 early stage NSCLC cases and 216 healthy controls (male smokers). Candidate biomarkers were verified in two case-control cohorts of 432 Chinese and 218 Caucasians, respectively (including females and nonsmokers). A multimiR panel for NSCLC detection was developed using a twofold cross-validation and validated in three additional Asian cohorts comprising 642 subjects. We discovered 35 candidate miRNA biomarkers, verified 22 of them, and developed a five-miR panel that detected NSCLC with area under curve (AUC) of 0.936–0.984 in the discovery and verification cohorts. The panel was validated in three independent cohorts with AUCs of 0.973, 0.916, and 0.917. The sensitivity of five-miR test was 81.3% for all stages, 82.9% for stages I and II, and 83.0% for stage I NSCLC, when the specificity is at 90.7%. We developed a minimally invasive five-miR serum test for detecting early stage NSCLC and validated its performance in multiple patient cohorts independent of smoking status, gender, and ethnicity.


Lung cancer is the most prevalent cancer and the leading cause of cancer death worldwide, with 610,000 and 130,000 lung cancer deaths annually in China and the United States, respectively (1, 2). This high mortality can be attributed to late detection and lack of effective treatment for advanced diseases (2, 3). Five-year survival rate for lung cancer patients diagnosed early (stages I–II) is 70%, whereas survival is 20% or lower for patients diagnosed in later stages (4, 5). Nonsmall cell lung cancer (NSCLC) accounts for over 80% of all lung cancer (3), and early detection of localized NSCLC for improved patient survival is a major unmet clinical need.

Low-dose spiral computed tomography (LDCT)-based screening has shown improved early detection of lung cancer and disease outcome (6). However, due to low specificity, LDCT leads to high rates of overdiagnosis (about 18%) and false-positive results (about 96%) (7, 8). LDCT screening also exposes subjects to ionizing radiation, which is known to increase cancer risk (9). Nevertheless, LDCT-based lung cancer screening has been recommended for heavy smokers in selected countries. With successful smoke cessation campaigns and growing lung cancer incidence in never-smokers, LDCT-based screening alone will have increasingly limited impact on overall lung cancer prognosis. Therefore, developing an alternative and complementary means of screening in the form of a blood-based, minimally invasive biomarker is highly desirable. Two large-scale, population-based studies were reported recently, showing promising results of blood-based protein and microRNA (miRNA) biomarkers in complementing LDCT-based lung cancer screening in high-risk Scottish and Italian populations, respectively (10, 11). Detection of circulating tumor DNA (ctDNA) through sequencing has also emerged as a promising approach for minimally invasive early detection of lung cancer (12).

miRNAs are small noncoding RNAs, typically between 19 and 24 nucleotides in length, that regulate gene expression posttranscriptionally (13). A growing body of evidence has established the association of both intracellular and cell-free circulating miRNAs with the tumor burden, diagnosis, and prognosis of lung cancer, especially that of NSCLC (14, 15). However, the majority of lung cancer circulating miRNA studies are focused on high-risk smokers and in populations with single ethnicity. It has not been established if circulating miRNA biomarkers are consistent between smokers and nonsmokers, as well as across gender and different ethnicities. Further, since miRNA measurement is prone to both preanalytical and analytical variability, another key challenge is integration of data from independent studies that employed nonstandardized preanalytical protocol and different technology platform with disparate performances.

We have shown that an analytically validated qPCR method could generate high-quality expression profiles of circulating miRNAs in a wide range of concentrations, and the concurrent implementation of exogenous and endogenous control measures could effectively minimize the effect of preanalytical and analytical variables, improving the signal-to-noise ratios and data accuracy in large-scale miRNA biomarker discovery efforts (1618). Using this method, we identified high-performance circulating miRNA biomarkers associated with heart failure and insulin resistance and demonstrated consistency of miRNA expression profiles across Chinese, Malay, and Indian ethnicities.

In this study, we applied the same methodology to evaluate the performance and consistency of circulating miRNA biomarkers in distinguishing early stage lung cancer patients from matched healthy controls in both smokers and nonsmokers of Asian and Caucasian populations. The aim of this study is to develop a circulating multimiRNA (multimiR) panel that could serve as minimally invasive biomarkers for early detection of NSCLC independent of smoking status, gender, and ethnicity.

Results

Identification of miRNA Biomarkers for Early Stage NSCLC.

Candidate miRNA biomarkers for early stage (stage I and II) NSCLC were identified through retrospective analysis of a well-defined Discovery Cohort of Chinese male smokers. Absolute expression levels of 520 miRNAs were profiled in 104 stage I and 76 stage II NSCLC patients and compared to those in 216 male Chinese control subjects (Table 1). Among the 520 miRNAs analyzed, 272 miRNAs were found to be consistently expressed at 500 or more copies per milliliter of serum in all cancer and control subjects, above the detection limit and within the dynamic range of standard curves in this workflow (SI Appendix, Fig. S1 A and B). Precision of measurement was shown with narrow variability in repeated measurements of reference serum samples (SI Appendix, Fig. S1C).Two miRNAs, miR-361-5p and miR-425-5p, were found to have stable expression across cancer and control subjects and could thus serve as endogenous references (SI Appendix, Fig. S2). Among the 272 miRNAs expressed, 35 miRNAs were found to be either significantly higher (25 miRNAs with false discovery rate [FDR] adjusted P < 0.01 and z-score > 1) or lower (10 miRNAs with FDR adjusted P < 0.01 and z-score < −1) in NSCLC cases than in the matched controls. Only one miRNA (miR-205-5p) was differentially expressed (with FDR adjusted P < 0.05) between lung squamous cell carcinoma and adenocarcinoma. Most of the NSCLC subjects were clustered together by the 35 miRNAs under unsupervised hierarchical clustering analysis (Fig. 1).

Table 1.

Patient cohorts used in each phase of the study

Phase 1 Phase 2 Phase 3
Discovery Verification 1 Verification 2 Validation 1 Validation 2 Validation 3
Controls Cases Controls Cases Controls Cases Controls Cases Controls Cases Controls Cases
Ethnicity Asian (Chinese) Asian (Chinese) Caucasian Asian (Chinese) Asian (Chinese) Asian (Chinese, Malay, Indian)
Source Zhejiang Cancer Hospital and Keyan City Zhejiang Cancer Hospital, Keyan City and Hangzhou City Asterand (Biobank) Zhejiang Cancer Hospital and Kecheng City Taizhou Hospital and Kecheng City National University of Singapore
No. of subjects 216 180 190 242 117 101 117 120 273 67 31 34
Age
 Median age 60.6 56.5 64.0 63.0 55.0 60.0 67.0 67.0 51.0 61.0 60.5 63.5
 Age range 49–70 41–65 44–77 44–77 45–64 45–75 46–77 46–77 40–68 35–76 40–71 46–85
Gender
 Male 216 180 150 200 88 80 117 120 121 56 21 23
 Female 0 0 40 42 29 21 0 0 152 11 10 11
Smoking history
 Smoker 216 180 150 153 14 152 117 120 273 61 No data
 Nonsmoker 0 0 40 89 109 17 0 0 0 19
Stage
 Stage I 104 138 71 65 19 17
 Stage II 76 104 30 55 23 9
 Stage III 0 0 0 0 11 4
 Stage IV 0 0 0 0 3 4
 Unknown 0 0 0 0 11 0
Subtype
 Adeno-carcinoma 48 134 39 97 33 24
 Squamous cell carcinoma 132 103 43 20 16 4
 Others 0 5 19 3 18 6

Fig. 1.

Fig. 1.

Candidate biomarkers segregate NSCLC cases from noncancer controls. Heatmap of normalized miRNA expression levels following unsupervised hierarchical clustering of all 424 subjects in the discovery cohort using 272 expressed miRNAs (Upper) and 35 candidate biomarkers (Lower).The horizontal axis represents the NSCLC status for the samples, with black color for NSCLC subjects and white color for non-NSCLC control subjects. The heatmap shows relative miRNA expression level with red indicating higher expression and green indicate lower expression.

Verification of miRNA Biomarkers.

We evaluated the performance and consistency of the 35 candidate miRNAs and the two reference miRNAs in two independent Verification cohorts of Chinese (n = 432) and Caucasians (n = 218). The two cohorts continued to focus on stage I and II NSCLC cases but included female and nonsmoker subjects (Table 1). Of 35 candidate biomarkers, 34 and 22 were verified to have consistent up- or down-regulation in Verification cohort 1 and 2, respectively, when using a less tringent z-score (FDR adjusted P < 0.01 and z-score > 0.4 or z-score < −0.4) to take into account of the inclusion of Caucasian, female, and nonsmoking subjects in these cohorts. Strong positive correlations in miRNA biomarker z-scores were observed between the Discovery cohort and the two Verification cohorts. Pearson’s correlation coefficient, r was 0.97 (P < 0.001) for z-scores in the Discovery cohort versus Verification cohort 1, both from the same source. Pearson’s r was 0.62 (P < 0.001) for z-scores in the Discovery cohort (Asian) versus Verification cohort 2 (Caucasian) (SI Appendix, Fig. S3).

To investigate if the identified biomarkers were consistently expressed across gender and smoking history, we performed correlation analysis of z-scores between males and females as well as between smokers and nonsmokers in each of the Verification cohorts (Fig. 2A). The analysis showed consistent associations between miRNAs and NSCLC by gender and smoking status as the z-scores were well correlated between males and females or between smokers and nonsmokers, suggesting that these biomarkers are different between early stage NSCLC and healthy controls regardless of gender, smoking status, and ethnicity. Unsupervised hierarchical clustering of 523 NSCLC and 523 matched control subjects across Discovery and two Verification cohorts showed that most of the NSCLC patients were clustered together by the 22 verified miRNA biomarkers (FDR-adjusted P < 0.01 and z-score > 0.4 or < −0.4) under (Fig. 2B).

Fig. 2.

Fig. 2.

Biomarkers verified in cohorts including subjects with different ethnicities, gender, and smoking status. (A) Comparison of biomarker fold changes between female and male subjects and between smokers and nonsmokers in Verification cohorts 1 (Asian) and 2 (Caucasian). (B) Heatmap of normalized miRNA expression levels following unsupervised hierarchical clustering of all 1,070 subjects in the combined cohort (Discovery, Validation 1 and 2) using 22 validated biomarkers.

Biomarker Panel Building and Optimization.

Next, we selected and cross-validated multimiR panels based on their AUC in distinguishing NSCLC. We combined all 1,046 samples (523 NSCLC and 523 controls) in Discovery and Verification cohorts and partitioned the samples into equally sized training and test sets with matched cancer stage, age, gender, ethnicity, and smoking status. We derived multimiR panels in training sets using Sequential Forward Floating Serach (SFFS) and Support Vector Machine (SVM), built the algorithm through logistic regression, and evaluated multimiR panel performance in the test set. An increasing AUC in the test set was observed when the number of miRNAs in the panel was increased, but this improvement plateaued at five miRNAs (Fig. 3A). The median AUC for a five-miR panel from 200 iterations of cross-validation was ∼0.96 in the test set, with a range of <0.03 between the 25th and 75th percentiles. Incorporating more miRNAs into the panel did not significantly improve AUC. The AUC of the final five-miR panel were 0.986 (95% CI, 0.975–0.992), 0.936 (95% CI, 0.910–0.956), and 0.971 (95% CI, 0.942–0.986) in the Discovery cohort, Verification cohort 1, and Verification cohort 2 (Fig. 3B), respectively. The final five-miR panel included two miRNAs that were down-regulated in NSCLC and three miRNAs that were up-regulated in NSCLC according to our analysis (Fig. 3C).

Fig. 3.

Fig. 3.

Development of five-miR biomarker panel for NSCLC detection. (A) Boxplots of AUC values calculated from 200 rounds of the twofold cross-validation procedure for biomarker panels comprising two to eight miRNAs. Box represents the 25th, 50th, and 75th percentiles of AUC values calculated for the detection of NSCLC by the biomarker panels containing miRNAs from 2 to 8. (B) ROC curves of NSCLC detection sensitivity and specificity for the five-miR biomarker panel in Discovery and Verification Cohorts 1–2. (C) Table of miRNAs included in the final five-miR panel showing their expression in NSCLC and coefficients used for constructing the panel.

The selected five-miR panel, with a prespecified prediction algorithm, was validated in three additional cohorts of Chinese and Singaporean patients and controls (Table 1). The AUC of the five-miR panel were 0.973 (95% CI, 0.947–0.987), 0.916 (95% CI, 0.849–0.951), and 0.917 (95% CI, 0.826–0.964) in Validation Cohorts 1, 2, and 3, respectively (Fig. 4A). Testing other multimiR panels in the Validation cohorts showed that the five-miR panel gave the optimal AUC across the three cohorts (SI Appendix, Fig. S4). The five-miR biomarker panel scores for each sample, calculated from a logistic regression model, were able to differentiate NSCLC cases of all stages from noncancer controls in all six study cohorts (Fig. 4B).There were no significant differences in five-miR panel scores between lung adenocarcinoma and squamous cell carcinoma in five of the six study cohorts (SI Appendix, Fig. S5).

Fig. 4.

Fig. 4.

Validation of five-miR biomarker panel for detection of all stages of NSCLC. (A) ROC curves of NSCLC detection sensitivity and specificity for the five-miR panel in Validation Cohorts 3–5. (B) Sample scores calculated from the five-miR panel prediction model for every subject in each case-control cohort, classified by cancer stage where available. (C) AUC for the five-miR biomarker panel in all of the cohorts of this study. The AUCs were calculated for all stage cancers, early stage cancers (stages I and II), and stage I cancers.

Since Validation cohorts 2 and 3 contained stage III or IV NSCLC cases, we further tested the performance of the five-miR panel in stage I and II patients in Validation cohorts 2 (n = 315) and 3 (n = 57).The AUCs for stage I–II NSCLC were 0.935 (95% CI, 0.850–0.968) and 0.900 (95% CI, 0.791–0.958), respectively (Fig. 4C). The AUCs for stage I NSCLC were 0.960 (95% CI, 0.894–0.987) and 0.886 (95% CI, 0.734–0.962), respectively (Fig. 4C).

We estimated that the sensitivity was 81.3% (95% CI, 78.2–84.1%) for all cancer stages, 82.9% (95% CI, 79.8–85.7%) for stages I and II, and 83.0% (95% CI, 79.6–85.9%) for stage I NSCLC when the specificity was at 90.7% (95% CI, 88.3–92.8%) in the three validation cohorts, demonstrating that the panel has a promising capacity in differentiating stages I and II NSCLC patients from matched controls regardless of gender, ethnicity, and smoking status.

Discussion

Lung cancer is clinically detected through chest X-ray, LDCT, and other imaging methods. Minimally invasive biomarkers, on their own or in conjunction with imaging, can potentially improve lung cancer diagnosis by enhancing sensitivity and specificity and by increasing screening compliance. Potential protein biomarkers, such as carcinoembryonic antigen (CEA), cytokeratin-19 fragment (CYFRA21-1), neuron-specific enolase (NSE), cancer-associated antigens CA125 and CA19-9, and chromogranin A, as well as molecular markers, such as mutations in cancer-associated genes KRAS and TP53, have been identified. However, these biomarkers are limited in sensitivity, specificity, and reproducibility in detecting lung cancer (19).

Tumor-associated DNA or RNA biomarkers, including miRNAs, can help more accurately diagnose and monitor cancer (20). In this study, we showed that a five-miR panel was optimal in detecting NSCLC of all stages, with a validated AUC between 0.91 and 0.97, regardless of gender, ethnicity, and smoking status. We further showed that the panel could detect early stage (stages I and II) NSCLC with an AUC of 0.93 and 0.90 in two independent cohorts. The accuracy of the five-miR panel in detecting NSCLC was significantly higher than that of CEA (SI Appendix, Fig. S6), CYFRA21-1, or NSE (21). In comparison, ctDNA-based liquid biopsy approaches have shown an AUC of about 0.8, with 70% sensitivity and 80% specificity in small-scale validation studies (12).

Multiple circulating miRNA biomarker panels have been proposed for NSCLC detection from studies in different countries (14, 2231) (SI Appendix, Table S1). Some proposed miRNA panels reported high sensitivity and specificity for NSCLC detection, but none were developed systematically with a robust method and validated by a large multicenter and multiethnic study. Furthermore, few panels had miRNA biomarkers in common (19, 32). This discrepancy is likely due to the heterogeneity of patients in demographics and clinicopathological characteristics. Additionally, preanalytical and analytical variables may also contribute to the difference (19).To overcome these challenges, we developed a miRNA RT-qPCR assay platform, which was shown to have greater sensitivity and reproducibility in detecting circulating miRNAs across ethnicities (16, 18). We observed similar results in unsupervised clustering analysis of the samples from different cohorts in our study. Our five-miR panel was validated in large and independent patient cohorts that comprised diverse ethnicities from different sources. The ability of the serum five-miR panel to detect stage I NSCLC with 83% sensitivity and 91% specificity means that the panel may be used as a screening tool prior to confirmatory diagnosis using LDCT or other imaging methods with tissue biopsy. A minimally invasive blood test may improve screening compliance compared to LDCT screening. There is also potential for the five-miR panel to be implemented together with LDCT in current lung cancer screening programs as a blood biomarker to reduce false positive results and unnecessary biopsies by improving the specificity of LDCT screening, which has been reported to be 73.4% (33). The clinical utility of the five-miR panel in NSCLC diagnosis and screening will be further investigated and validated in large-scale prospective studies.

The five miRNAs included in our panel have been associated with lung cancer. Two (let-7a-5p and miR-375) had lower expression and three (miR-1-3p, miR-1291, and miR-214-3p) had higher expression in NSCLC compared to healthy individuals. miR-375 has been reported before as a prognostic biomarker for NSCLC (23, 31). In addition, miR-375, let-7a-5p, and miR-1291 were also prognostic biomarkers for lung cancer (3436). Circulating let-7a-5p was shown to suppress lung cancer (37), and miR-375 was found to have tumor suppressive activities (38), consistent with our study where the expressions of these miRNAs were low in NSCLC. MiR-1-3p was shown to have oncogenic effects in NSCLC (39), which was also consistent with our finding of high expression in NSCLC. Circulating miR-214 was shown to affect NSCLC with regard to drug resistance (40), and miR-214-3p may be oncogenic in NSCLC (41).

Limitations of our study include testing in a research and not clinical setting, lack of patients with ethnicities beyond Asian and Caucasian (e.g., African), and the use of retrospective case-control cohorts, some of which were not completely matched for demographics and clinicopathological characteristics. Since the study focused on biomarkers for early stage NSCLC, we had fewer patients with stage III (n = 14) or stage IV (n = 8) cancers in the study. Implementing the five-miR panel in the clinic will require the development of a prediction model to generate as core for risk stratification. Future work will focus on validating the five-miR biomarker panel in prospective cohorts, which are designed to screen for individuals with early stage NSCLC in a clinical setting and on extending the study beyond Asian and Caucasian patients.

Methods

Study Design and Population.

Six case-control cohorts, comprising a total of 744 NSCLC cases and 944 control subjects aged between 40 and 85 y, were recruited from seven independent sources (Table 1). NSCLC patients were recruited by convenience sampling between 2004 and 2018 at Zhejiang Cancer Hospital and Taizhou Hospital in Zhejiang, China; National University Hospital in Singapore; and from various hospitals in Europe (Asterand biobank). Blood was collected before cancer surgery and treatment. Matched healthy controls were recruited by convenience sampling between 2011 and 2017 from lung cancer screening programs in Keyan city, Hangzhou city, and Kecheng city in Zhejiang, China, as well as from healthy subjects at the National University Hospital in Singapore, and various hospitals in Europe (Asterand biobank). All of the samples and patient data were deidentified prior to use. Patients recruited in China were all of Chinese ethnicity while those from the Asterand biobank were all Caucasian. The Singapore cohort comprised patients of Chinese, Malay, and Indian ethnicity. Smokers were defined as subjects who smoked more than 10 cigarettes per day for a period of 10 y or longer. All control subjects were confirmed by CT/LDCT to be nodule-free at the time of blood collection. All lung cancer cases were confirmed by histopathological examination of biopsy tissues. All studies were approved by the Institutional Review Board of the respective site with written informed consent from study participants.

Blood Collection and Serum Processing.

Fasting blood samples (20 mL) were collected using venipuncture in plain serum tubes (BD vacutainer plus plastic serum tube). Blood samples were allowed to clot for 30–60 min at room temperature and centrifuged at 3,000 rpm for 10 min at 4 °C. After centrifugation, sera were transferred using syringes and aliquoted into cryotubes for immediate storage at −80 °C.

RNA Isolation.

We extracted total RNA from 200 µL of each serum sample using the miRNeasy Serum/Plasma Kit (Qiagen). This was done according to the manufacturer’s recommendations, except for the following modifications. (i) We added a set of three proprietary spike-in controls (MiRXES), representing high, medium, and low levels of RNA, into the sample lysis buffer (QIAzol Lysis Reagent, Qiagen) prior to sample RNA isolation. The spike-in controls are 20-nucleotide RNAs with unique sequences (distinct from any of the 2,588 annotated mature human miRNAs in miRBase version 21). These control RNAs are used to monitor RNA isolation efficiency and to normalize for technical variations during RNA isolation. (ii) We added bacteriophage MS2 RNA into sample lysis buffer (1 µg/mL QiaZol) to improve RNA isolation yield. (iii) We centrifuged the samples at 18,000 × g for 15 min at room temperature after mixing with chloroform. (iv) We eluted the RNA in 25 µL of RNase-free water.

RT-qPCR Detection of miRNA Expression.

We used a tightly controlled RT-qPCR workflow to quantify the expression of miRNAs in each blood sample. We reverse-transcribed serum RNA using miRNA-specific reverse transcription (RT) primers according to the manufacturer’s instructions (MiRXES) on a Veriti Thermal Cycler (Applied Biosystems). Multiplexed RT reactions were performed using RT primers specific for each miRNA. For discovery, 520 RT primers were divided into 10 multiplex primer pools (50- to 60-plex per pool) to minimize nonspecific cross-overs and primer–primer interactions. For each RNA sample, we performed 10 multiplex RT reactions, each with 2 µL of isolated RNA. Synthetic templates for standard curves of each miRNA (6-log serial dilution of 107 to 102 copies) and a nontemplate control (nuclease-free water spiked with MS2) were reverse-transcribed concurrently with the serum RNA samples. We preamplified all cDNAs, including those from synthetic miRNA standards, using a 14-cycle PCR with Augmentation Primer Pools (MiRXES) on the Veriti Thermal Cycler. We then performed single qPCR on the amplified cDNA samples using a miRNA-specific qPCR assay and ID3EAL miRNA qPCR Master Mix according to the manufacturer’s instructions (MiRXES). We carried out the qPCRs with technical duplicates on the ViiA qPCR system (384-well configuration, Applied Biosystems). We calculated raw threshold cycle (Ct) values using the ViiA 7 RUO software with automatic baseline setting and a threshold of 0.5. We assessed RT-qPCR efficiency and potential cDNA amplification bias by analyzing the Ct values of the synthetic miRNA standards. We calculated the absolute expression of each miRNA (number of copies present) in the serum sample by interpolation of sample Ct values with synthetic miRNA standard curves after correcting for variations in RT-qPCR efficiency.

CEA Protein Quantification.

CEA protein was quantified using a CEA ELISA Kit (Elabscience Biotechnology Inc.) according to the manufacturer’s protocol. NSCLC cases (n = 233) from the Zhejiang Cancer Hospital and healthy controls (n = 230) from the Keyan city cohort were assayed for CEA concentration.

Biomarker Discovery.

We used geNorm (42) and NormFinder (43) software to identify endogenous reference miRNAs that had stable expression across all samples and could be used to normalize for varying sample RNA inputs for RT-qPCR. We used the normalized miRNA expression values to compare the expression levels of individual miRNAs between NSCLC cases and healthy controls. Unsupervised hierarchical clustering was carried out based on Euclidean distance.

Biomarker Panel Building and Optimization.

We used a twofold cross-validation procedure, incorporating a feature selection algorithm and a logistic regression predictive model, to build and optimize miRNA biomarker panels. Prediction model performance was evaluated using the area under the curve (AUC) based on the receiver operating characteristics (ROC) curves. We carried out 200 rounds of the twofold cross-validation procedure for each biomarker panel comprising two to eight miRNAs. We used the sequential forward floating search (SFFS) algorithm (44) to select miRNA biomarkers for inclusion in each biomarker panel. A logistic regression model was used to train predictive models for calculating the probability of a patient to have NSCLC given the expression levels of miRNAs included in the biomarker panel (45).

Statistical Analysis.

Fold change in absolute miRNA expression (copy number) was standardized using a z-score (standard score), which was calculated using the formula: z-score = log2(FC/SD), where FC is the fold change of miRNA expression between NSCLC and controls, and SD is SD of expression levels for each miRNA. We determined if changes in miRNA expression were statistically significant using the Student’s t test. All P values were two-sided and corrected for multiple hypothesis testing using the FDR adjustment (46, 47). Correlation in miRNA biomarker z-scores between different cohorts were calculated using Pearson’s correlation coefficient, r.

Supplementary Material

Supplementary File

Acknowledgments

This work was supported by Major Science and Technology Project of Zhejiang Province of China Grants 2014C03029 and 2020C03023, Public Welfare Technology Foundation of Zhejiang Province of China Grant 2017C34001, National Natural Science Foundation of China Grant 81972917, Major Science and Technology Project of Medical and Health of Zhejiang Province of China Grant WKJ-ZJ-1902, the Zhejiang high-level innovative talent program, and the 1022 program of Zhejiang Cancer Hospital.

Footnotes

Competing interest statement: R.Z. is the chairman and chief executive officer (CEO) of MIRXES (Hangzhou) Biotechnology Co., Ltd and has ownership interest (including patents) in the same. L.Z. is co-CEO of MIRXES (Hangzhou) Biotechnology Co., Ltd and has ownership interest (including patents) in the same.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2006212117/-/DCSupplemental.

Data Availability.

All study data are included in the article and supporting information.

References

  • 1.Islami F. et al., Proportion and number of cancer cases and deaths attributable to potentially modifiable risk factors in the United States. CA Cancer J. Clin. 68, 31–54 (2018). [DOI] [PubMed] [Google Scholar]
  • 2.Chen W. et al., Cancer statistics in China, 2015. CA Cancer J. Clin. 66, 115–132 (2016). [DOI] [PubMed] [Google Scholar]
  • 3.Ganti A. K., Mulshine J. L., Lung cancer screening. Oncologist 11, 481–487 (2006). [DOI] [PubMed] [Google Scholar]
  • 4.Goodgame B. et al., A clinical model to estimate recurrence risk in resected stage I non-small cell lung cancer. Am. J. Clin. Oncol. 31, 22–28 (2008). [DOI] [PubMed] [Google Scholar]
  • 5.Goldstraw P. et al.; International Association for the Study of Lung Cancer Staging and Prognostic Factors Committee, Advisory Boards, and Participating Institutions; International Association for the Study of Lung Cancer Staging and Prognostic Factors Committee Advisory Boards and Participating Institutions , The IASLC lung cancer staging project: Proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung cancer. J. Thorac. Oncol. 11, 39–51 (2016). [DOI] [PubMed] [Google Scholar]
  • 6.Aberle D. R. et al.; National Lung Screening Trial Research Team , Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 365, 395–409 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shlomi D., Ben-Avi R., Balmor G. R., Onn A., Peled N., Screening for lung cancer: Time for large-scale screening by chest computed tomography. Eur. Respir. J. 44, 217–238 (2014). [DOI] [PubMed] [Google Scholar]
  • 8.Bach P. B. et al., Benefits and harms of CT screening for lung cancer: A systematic review. J. Am. Med. Assoc. 307, 2418–2429 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rampinelli C. et al., Exposure to low dose computed tomography for lung cancer screening and risk of cancer: Secondary analysis of trial data and risk-benefit analysis. BMJ 356, j347 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pastorino U. et al., PL02.04 blood microRNA and LDCT reduce unnecessary LDCT repeats in lung cancer screening: Results of prospective BioMILD trial. J. Thorac. Oncol. 14, S5–S6 (2019). [Google Scholar]
  • 11.Sullivan F., Schembri S., PL02.03 early detection of cancer of the lung scotland (ECLS): Trial results. J. Thorac. Oncol. 14, S5 (2019). [Google Scholar]
  • 12.Chabon J. J. et al., Integrating genomic features for non-invasive early lung cancer detection. Nature 580, 245–251 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Stefani G., Slack F. J., Small non-coding RNAs in animal development. Nat. Rev. Mol. Cell Biol. 9, 219–230 (2008). [DOI] [PubMed] [Google Scholar]
  • 14.Fehlmann T. et al., Evaluating the use of circulating microRNA profiles for lung cancer detection in symptomatic patients. JAMA Oncol. 6, 1–10 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhou Q. et al., MicroRNAs: A novel potential biomarker for diagnosis and therapy in patients with non-small cell lung cancer. Cell Prolif. 50, e12394 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Saw W. Y. et al., Establishing multiple omics baselines for three Southeast Asian populations in the Singapore integrative omics study. Nat. Commun. 8, 653 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Choi H. et al., Plasma protein and microRNA biomarkers of insulin resistance: A network-based integrative -omics analysis. Front. Physiol. 10, 379 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wong L. L. et al., Combining circulating microRNA and NT-proBNP to detect and categorize heart failure subtypes. J. Am. Coll. Cardiol. 73, 1300–1313 (2019). [DOI] [PubMed] [Google Scholar]
  • 19.Bottani M., Banfi G., Lombardi G., Circulating miRNAs as diagnostic and prognostic biomarkers in common solid tumors: Focus on lung, breast, prostate cancers, and osteosarcoma. J. Clin. Med. 8, 1661 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kaiser J., “Liquid biopsy” for cancer promises early detection. Science 359, 259 (2018). [DOI] [PubMed] [Google Scholar]
  • 21.Indovina P., Marcelli E., Maranta P., Tarro G., Lung cancer proteomics: Recent advances in biomarker discovery. Int. J. Proteomics 2011, 726869 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Shen J. et al., Plasma microRNAs as potential biomarkers for non-small-cell lung cancer. Lab. Invest. 91, 579–587 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Arab A. et al., Potential circulating miRNA signature for early detection of NSCLC. Cancer Genet. 216–217, 150–158 (2017). [DOI] [PubMed] [Google Scholar]
  • 24.Pan J. et al., A two-miRNA signature (miR-33a-5p and miR-128-3p) in whole blood as potential biomarker for early diagnosis of lung cancer. Sci. Rep. 8, 16699 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bagheri A. et al., A panel of noncoding RNAs in non-small-cell lung cancer. J. Cell. Biochem., 10.1002/jcb.28111 (2018). [DOI] [PubMed] [Google Scholar]
  • 26.Geng Q. et al., Five microRNAs in plasma as novel biomarkers for screening of early-stage non-small cell lung cancer. Respir. Res. 15, 149 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Peng H. et al., A circulating non-coding RNA panel as an early detection predictor of non-small cell lung cancer. Life Sci. 151, 235–242 (2016). [DOI] [PubMed] [Google Scholar]
  • 28.Chen X. et al., Identification of ten serum microRNAs from a genome-wide serum microRNA expression profile as novel noninvasive biomarkers for nonsmall cell lung cancer diagnosis. Int. J. Cancer 130, 1620–1628 (2012). [DOI] [PubMed] [Google Scholar]
  • 29.Yang X. et al., Serum microRNA signature is capable of early diagnosis for non-small cell lung cancer. Int. J. Biol. Sci. 15, 1712–1722 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Abdollahi A. et al., A combined panel of circulating microRNA as a diagnostic tool for detection of the non-small cell lung cancer. QJM 112, 779–785 (2019). [DOI] [PubMed] [Google Scholar]
  • 31.Lu S. et al., Two plasma microRNA panels for diagnosis and subtype discrimination of lung cancer. Lung Canc. 123, 44–51 (2018). [DOI] [PubMed] [Google Scholar]
  • 32.Moretti F. et al., Systematic review and critique of circulating miRNAs as biomarkers of stage I-II non-small cell lung cancer. Oncotarget 8, 94980–94996 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Church T. R. et al.; National Lung Screening Trial Research Team , Results of initial low-dose computed tomographic screening for lung cancer. N. Engl. J. Med. 368, 1980–1991 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Yerukala Sathipati S., Ho S. Y., Identifying the miRNA signature associated with survival time in patients with lung adenocarcinoma using miRNA expression profiles. Sci. Rep. 7, 7507 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wang N. et al., Establishment and validation of a 7-microRNA prognostic signature for non-small cell lung cancer. Cancer Manag. Res. 10, 3463–3471 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Zhang L. et al., Downregulation of exosomal let-7a-5p in dust exposed- workers contributes to lung cancer development. Respir. Res. 19, 235 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Duan S., Yu S., Yuan T., Yao S., Zhang L., Exogenous let-7a-5p induces A549 lung cancer cell death through BCL2L1-mediated PI3Kγ signaling pathway. Front. Oncol. 9, 808 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yan J. W., Lin J. S., He X. X., The emerging role of miR-375 in cancer. Int. J. Cancer 135, 1011–1018 (2014). [DOI] [PubMed] [Google Scholar]
  • 39.Wang Y., Luo X., Liu Y., Han G., Sun D., Long noncoding RNA RMRP promotes proliferation and invasion via targeting miR-1-3p in non-small-cell lung cancer. J. Cell. Biochem. 120, 15170–15181 (2019). [DOI] [PubMed] [Google Scholar]
  • 40.Zhang Y., Li M., Hu C., Exosomal transfer of miR-214 mediates gefitinib resistance in non-small cell lung cancer. Biochem. Biophys. Res. Commun. 507, 457–464 (2018). [DOI] [PubMed] [Google Scholar]
  • 41.Zhang K. et al., Down-regulation of miR-214 inhibits proliferation and glycolysis in non-small-cell lung cancer cells via down-regulating the expression of hexokinase 2 and pyruvate kinase isozyme M2. Biomed. Pharmacother. 105, 545–552 (2018). [DOI] [PubMed] [Google Scholar]
  • 42.Vandesompele J. et al., Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, research0034.1-research0034.11 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Andersen C. L., Jensen J. L., Ørntoft T. F., Normalization of real-time quantitative reverse transcription-PCR data: A model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 64, 5245–5250 (2004). [DOI] [PubMed] [Google Scholar]
  • 44.Xiong M., Fang X., Zhao J., Biomarker identification by feature wrappers. Genome Res. 11, 1878–1887 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bewick V., Cheek L., Ball J., Statistics review 14: Logistic regression. Crit. Care 9, 112–118 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Benjamini Y., Hochberg Y., Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995). [Google Scholar]
  • 47.Benjamini Y., Yekutieli D., The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Data Availability Statement

All study data are included in the article and supporting information.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES