Skip to main content
Carcinogenesis logoLink to Carcinogenesis
. 2016 May 27;37(8):751–758. doi: 10.1093/carcin/bgw064

Association of a let-7 miRNA binding region of TGFBR1 with hereditary mismatch repair proficient colorectal cancer (MSS HNPCC)

Rosa M Xicola 1, Sneha Bontu 1, Brian J Doyle 2, Jamie Rawson 2, Pilar Garre 3, Esther Lee 2, Miguel de la Hoya 3, Xavier Bessa 4, Joan Clofent 5, Luis Bujanda 6, Francesc Balaguer 7, Sergi Castellví-Bel 7, Cristina Alenda 8,9, Rodrigo Jover 8,9, Clara Ruiz-Ponte 10, Sapna Syngal 11,12, Montserrat Andreu 4, Angel Carracedo 10, Antoni Castells 7, Polly A Newcomb 13, Noralane Lindor 14, John D Potter 15,16,17, John A Baron 18, Nathan A Ellis 19, Trinidad Caldes 3,, Xavier LLor 1,*
PMCID: PMC4967215  PMID: 27234654

Summary

We describe an association between TGFBR1 polymorphism rs868 and mismatch repair proficient hereditary colorectal cancers. This polymorphism results in more binding of TGFBR1 to let-7b-5p miRNA, which would result in lower expression of TGFBR1 and thus higher colorectal cancer risk

Abstract

The purpose of this study was to identify novel colorectal cancer (CRC)-causing alleles in unexplained familial CRC cases. In order to do so, coding regions in five candidate genes (MGMT, AXIN2, CTNNB1, TGFBR1 and TGFBR2) were sequenced in 11 unrelated microsatellite-stable hereditary non-polyposis CRC (MSS HNPCC) cases. Selected genetic variants were genotyped in a discovery set of 27 MSS HNPCC cases and 85 controls. One genetic variant, rs67687202, in TGFBR1 emerged as significant (P = 0.002), and it was genotyped in a replication set of 87 additional MSS HNPCC-like cases and 338 controls where it was also significantly associated with MSS HNPCC cases (P = 0.041). In the combined genotype data, rs67687202 was associated with a moderate increase in CRC risk (OR = 1.68; 95% CI = 1.13–2.50; P = 0.010). We tested a highly correlated SNP rs868 in 723 non-familial CRC cases compared with 629 controls, and it was not significantly associated with CRC risk (P = 0.370). rs868 is contained in a let-7 miRNA binding site in the 3′UTR of TGFBR1, which might provide a functional basis for the association in MSS HNPCC. In luciferase assays, the risk-associated allele for rs868 was associated with half the luciferase expression in the presence of miRNA let-7b-5p compared with protective allele, suggesting more binding of let-7b-5p and less TGFBR1 expression. Thus, rs868 potentially is a CRC risk-causing allele. Our results support the concept that rs868 is associated with lower TGFBR1 expression thereby increasing CRC risk.

Introduction

Colorectal cancer (CRC) is among all common malignancies one with the highest percentage of familial clustering. Thus 15–20% of cases develop in families with at least another affected member (1). Individuals in these families have 2- to 3-fold higher risk of developing CRC (2) and in most cases no associated genetic defect has been identified.

Several gene mutations have been identified that cause well-defined, high-penetrance CRC syndromes, including familial adenomatous polyposis, MUTYH-associated polyposis, and Lynch syndrome. Lynch syndrome (LS) is an autosomal dominant disorder that accounts for 2–3% of all CRCs. LS is caused by mutations in DNA mismatch repair (MMR) genes resulting in tumors characterized by having microsatellite instability (3).

One way to enrich for cases that contain genetic risk factors is to use family history criteria. Historically, the criteria used in the identification of LS were referred to as the Amsterdam criteria (4). The Amsterdam criteria are satisfied for a family when there are three or more first-degree relatives with CRC, or other LS-related cancers, in two consecutive generations with at least one case of cancer diagnosed before age 50. Approximately half of the families who fulfill Amsterdam criteria have LS (5). Most of the LS-mutation-negative Amsterdam families with MMR-proficient tumors lack a genetic explanation. These families are referred to as familial colorectal cancer “type X” (6) or MSS HNPCC (microsatellite-stable hereditary non-polyposis colorectal cancer) (5).

Unlike LS families, MSS HNPCC families develop mainly left-sided CRCs; they are older at diagnosis; fewer family members are affected; and they generally do not have a higher risk of extra-colonic malignancies (5–7). This phenotype represented about 1.1% of all CRCs in a multi-institutional cohort of consecutive cases throughout Spain referred to as EPICOLON (5). Tumors from these patients showed a higher prevalence of codon 12 KRAS mutations than is seen in non-familial MMR-proficient cases (8,9), and they exhibit a lower level of global methylation than in any other CRC subtype (9).

More than 30 low-risk genetic polymorphisms that contribute to CRC susceptibility have been identified through genome-wide association studies (GWAS) (10). Most of these variants are markers for genetic risk and are not strong candidates for causality by themselves, but recent efforts have successfully refined the association signals, thus identifying smaller regions of interest that are more likely to contain the risk-causing variants (11). One such variant, rs6983267 in 8q24.21, is contained in a novel long noncoding RNA transcript (lncRNA) named CCAT2. The CCAT2 transcript is highly over-expressed in microsatellite-stable colorectal cancer and promotes tumor growth, metastasis, and chromosomal instability. The single-nucleotide polymorphism (SNP) status seems to affect the levels of CCAT2 expression and the risk-associated guanine base correlates with higher levels of CCAT2 transcript, providing an explanation for the SNP association (12). Intriguingly, rs6983267 is the only one of the ~30 GWAS variants that is more common in patients with a family history of CRC (13).

Recent studies of CRC families have identified some genetic risk factors, though most still remain unknown. For instance, we identified two rare variants and one common polymorphism in a base excision repair (BER) gene that is associated with CRC risk in a small subset of these families (14). Palles et al. (15) recently identified germline mutations in DNA polymerases POLE and POLD1 in a small group of families with an attenuated polyposis phenotype. Several linkage analysis studies have identified two regions (9q22 and 3q21-24) that co-segregate with hereditary MSS CRC. Sequencing of candidate genes localized to those regions has produced null results (16–21) except for the identification of two deleterious variants in GALNT12 on the 9q22.33 region identified in two unrelated families (22). We note that the TGFBR1 gene is localized in the same region. TGFBR1 has been associated with CRC in some candidate gene studies (23,24).

The main goal of the present study was to identify potential germline DNA alterations to identify novel CRC genetic susceptibility factors through the analysis of individuals with a MSS HNPCC phenotype and to extend the study of these alterations to a broader group of familial and non-familial CRC patients.

Materials and methods

Study sample

This study included subjects of European ancestry recruited through EPICOLON I/II projects, the Colon Cancer Family Registry (CCFR), and the high-risk clinics of Hospital Clinico (Madrid, Spain) and Dana–Farber Cancer Institute (DFCI, Boston, MA). To avoid potential population stratification, individuals with Jewish and Basque ancestry were excluded from the study.

The EPICOLON I/II projects enrolled prospective, multi-center, nation-wide cohorts of consecutive patients with newly diagnosed CRC throughout Spain (25). From EPICOLON, we included 7 MSS HNPCC patients (individuals fulfilling Amsterdam criteria of HNPCC with MMR-proficient tumors), 308 patients with non-familial MSS CRCs (no relatives diagnosed with CRC or any other cancers related to LS) and 423 healthy controls (cancer-free individuals with no family history of CRC) matched by age and sex with the non-familial cases. Mean age at diagnosis of the cases was 71 years. This latter group represents the control group for the genetic analysis described below.

The CCFR was established by the National Cancer Institute (26). This registry identified CRCs using population-based and clinic-based strategies. In the present study, we included a total of 87 familial cases with MMR-proficient tumors: 74 were MSS HNPCC and 13 had either a child or a parent diagnosed with CRC, with one of the two diagnosed at the age of 50 or younger. If more than one affected family member was available per family, we only included the youngest to reduce the potential inclusion of sporadic phenocopies.

Additionally, we included 17 MSS HNPCC patients and 11 non-affected family members from Hospital Clinico and 3 MSS HNPCC patients from DFCI.

The Institutional Ethics Committee of each participating institution approved the study, and written informed consent was obtained from all participants.

Candidate gene analysis

We divided familial cases into two sets: (i) The discovery set included 27 unrelated MSS HNPCC patients, of which 24 were from Spain and 3 were from DFCI (mean age at diagnosis 53 years; 44% men and (ii) the replication set included 87 familial cases from the CCFR (mean age at diagnosis 48.5 years; 46% men).

We selected five candidate genes that have been associated with CRC in at least one study (27), including MGMT, AXIN2, CTNNB1, TGFBR1 and TGFBR2. To identify genetic variation potentially associated with MSS HNPCC, we sequenced all coding regions, including exon–intron boundaries and parts of the 5′ and 3′ untranslated regions (UTRs), of each gene in germline DNA from 11 MSS HNPCC patients from the discovery set. All genetic variants identified in this manner were genotyped in the remainder of the MSS HNPCC cases included in the discovery set.

Variants were classified as novel if they had not been identified in the Ensembl.org database. We compared genotype frequencies of SNPs with minor allele frequencies (MAF) > 0.05 between MSS HNPCC cases and the CEU (Utah Residents with Northern and Western European ancestry) population. For variants with a nominal P-value ≤0.05, we expanded genotyping to a group of 85 controls and re-assessed genotype frequency differences. Variants without public genotype data in Ensembl.org were genotyped in the 85 controls. If any of the SNPs identified were in strong linkage disequilibrium (LD) with others, we selected one of them as a marker for the rest. Bonferroni correction for multiple comparisons was applied. Based on this threshold of significance, only statistically significant candidate SNPs from the discovery set were genotyped in the replication set of 87 CCFR familial cases and 338 controls.

Germline DNA was extracted from blood and quantified using standard techniques. Whole genome amplification by isothermal strand displacement was performed with blood DNA samples using the GenomiPhi V2 DNA Amplification Kit (GE Healthcare UK Limited, Little Chalfont Buckinghamshire, UK). For sequencing, we designed primers using Primer 3 software (http://bioinfo.ut.ee/primer3/). Polymerase chain reaction (PCR) fragments went through a PCR sequencing amplification in an ABI Big Dye Terminator Sequencing Kit, v3.1 (Life Technologies, Grand Island, NY). The resulting PCR product was purified with the ABI BigDye Xterminator Purification kit (Life Technologies) and loaded into the 3730 ABI DNA Analyzer (Life Technologies). Sequence histograms were analyzed searching for heterozygous substitutions and then these sequences were aligned with the wild-type sequences using CLC workbench software, version 4.6 (CLC Bio A/S, Aarhus, Denmark) to identify homozygous genotypes. Genotyping of candidate SNPs was also performed through Sanger sequencing.

The EPICOLON project GWAS data (28) was accessed in order to investigate the association of a candidate SNP with CRC in non-familial cases.

Expression levels of hsa-let-7 family of microRNAs in colon adenocarcinoma and non-involved tissue

We accessed the next-generation microRNA (miRNA) sequencing data from The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) for colon adenocarcinoma and non-involved tissues. Microsatellite instability tumors were excluded. Only tumor tissue data from patients with White European ancestry were included. The analysis was performed using R package “DESeq2”, following its analytical pipeline including normalization and exclusion of outliers (29). Normalized counts were used to visualize expression levels of the different members of the let-7 family of miRNAs in 159 tumors and 8 non-involved tissues.

Expression levels of TGFBR1 gene in colon adenocarcinoma and healthy tissues

We accessed the next-generation RNA-sequencing data from the TCGA for colon adenocarcinoma. Microsatellite instability tumors were excluded. Only tumor tissue data from patients with White European ancestry were included. We also accessed the data from Genotype-Tissue Expression (GTEx) project (http://www.gtexportal.org), which aims to provide to the scientific community a resource to study human gene expression in multiple tissues and to study its relationship to genetic variation. Processed genotyped data and normalized expression matrices from expression arrays were available for a set of 338 healthy colon tissue samples. As described in the previous section, we used “DESeq2” to analyze differential expression of TGFBR1 stratified by SNP genotype.

Effect of rs868 genetic variant in let-7b binding site on TGFBR1 3′UTR

For the luciferase assays, customized vectors were ordered from Active Motif (Active Motif, Carlsbad, CA). A 145 nucleotide sequence containing the predicted binding site for let-7 from the TGFBR1 3′UTR was cloned downstream of the luciferase reporter gene with flanking DNA. One vector included an adenine base at rs868 in the 145 nucleotide sequence and another vector contained a guanine, corresponding to the two alleles of rs868. Three control vectors cloned downstream of the luciferase reporter were used for normalization: (i) a vector with a non-genic sequence, (ii) a vector that contained the 3′UTR from beta-actin (ACTB) and (iii) an empty vector with no additional sequence.

To measure the effects of the polymorphism rs868 on gene expression through let-7b regulation, HEK 293 cells were co-transfected with 100ng of luciferase vector and 50nM hsa-let-7b-5p miRNA or non-targeting control miRNA (scrambled miRNA) (Active Motif) using 300ng of DharmaFECT DUO transfection reagent (GELifescience, Pittsburgh, PA) per well. Twenty-four hours after transfection, cells were lysed and 1 μl of luciferase substrate was added per well according to the manufacturer’s instructions (Active Motif). Luciferase activity was measured with the Synergy 2 Multi-Mode reader (BioTek, Winooski, VT) in relative luminescence units (RLU). Each experimental condition was tested in triplicate and we ran three independent experiments. RLU values for the two experimental conditions under study (A-rs868 and G-rs868 3′UTR vectors) were normalized using the average RLU values of the three control vectors. The average RLU value of all three controls was calculated separately for each treatment (let-7b-5p and the scrambled miRNA) and then used to normalize each experimental condition. Normalized values were used to test for differences between conditions and treatments.

Algorithm-based prediction tools

We used SIFT (30) and PolyPhen (31) to predict whether an amino acid substitution was likely to affect protein function. Human Splicing Finder (32) was used to calculate the effect of intronic or exonic genetic variants in splicing sites. miRBase (33), miRanda (34), TarBase (35) and RNAHybrid (36) were used to predict miRNA binding sites and mRNA targets for miRNAs.

Statistical analysis

Fisher exact test was used to determine the significance of differences in genotype frequencies between cases and controls in the discovery and replication sets. We applied a Bonferroni correction for multiple comparisons based on the number of SNPs identified in cases from the discovery set. Departure from Hardy–Weinberg equilibrium was assessed using a χ2 test in controls and all SNPs genotyped in the discovery and replication sets were in Hardy–Weinberg equilibrium (P-value > 0.05). Logistic regression assuming a log-additive genetic model was used to estimate odds ratio (OR), 95% confidence intervals (CI), and P-values, including sex and age as covariates. To compare familial cases and controls, age was not included as a covariate because controls were purposefully chosen to be substantially older than hereditary CRC cases to reduce the likelihood that disease-free controls were actually carriers of variants and at risk of CRC. R software (37) and PLINK (38) were used to perform statistical analysis and Haploview 4.2 (39) was used to visualize haplotypes. All reported P-values are two sided. A non-parametric test was used to identify significant differences in normalized relative luciferase values.

Results

Candidate gene analysis

To identify potentially disease-causing genetic variations, we sequenced the coding sequences and their proximal 5′ and 3′ UTRs in MGMT, AXIN2, CTNNB1, TGFBR1 and TGFBR2 in 11 MSS HNPCC cases. We thereby identified 24 variants, of which 5 were previously unrecognized and 19 were reported in public databases (Supplementary Table 1, available at Carcinogenesis Online). Of the 24 variants, 17 were common SNPs (MAF > 5%), 2 were low-frequency SNPs (5% > MAF > 1%) and 5 were rare variants (MAF ≤ 1%). Four of the 24 variants were synonymous, 4 were missense, 10 were localized in introns and 4 were localized in an UTR region. Additionally, we identified two insertion/deletion variants, one exonic and one intronic. The low-frequency intronic SNPs, rs117258157 and rs116931989, in AXIN2 were present in the same individual and were absent from the other affected family member available for study. Among the five rare variants, two were intronic (AXIN2 c.1060-41T>C and MGMT c.414+15C>T), two were 3′UTR variants (AXIN2 c.*309C>A and c.*489C>T) and one was a synonymous variant (CTNNB1 L781L). Variants in CTNNB1 and MGMT were present in only one of the two affected cases available to study in the family, suggesting that the variants do not co-segregate with disease. Based on the prediction tools, none of the five variants resulted in altered protein function, alternative splicing, or change in miRNA binding sites. Consequently, we did not analyze these variants further.

In the original 11 sequenced MSS HNPCC cases, we identified several SNPs that were in strong LD, namely, rs67687202–rs868–rs334354 in TGFBR1, and rs12917–rs1803965, rs2308327–rs2308321 in MGMT. These LD relationships were also observed in data available from HapMap. Consequently, we selected one SNP from each set as a marker for the others. Bonferroni correction for multiple comparisons was applied based on the 19 SNPs reported in public databases and we set the significance P-value threshold at 0.003 (0.05/19). After genotyping the 27 MSS HNPCC cases and 85 healthy controls in the discovery phase, we identified only one SNP rs67687202 in TGFBR1 that passed the above threshold for significance (Fisher exact P = 0.002). Thus rs67687202 was the only variant tested in the final replication phase (Table 1). We noted that it is the major allele (ancestral) of rs67687202 that is over-represented in MSS HNPCC cases compared with controls.

Table 1.

Genotype distribution for rs67687202 in the hereditary and non-familial CRC cases

Discovery TCTTT/TCTTT TCTTT/− −/− P
Controls 54.1% (46) 43.5% (37) 2.4% (2) 0.002
Hereditary 88.8% (24) 11.2% (3) 0%
Replication TCTTT/TCTTT TCTTT/− −/− P
Controls 56.8% (192) 37.6% (127) 5.6% (19) 0.041
Hereditary (CCFR series) 63.2% (55) 36.8% (32) 0%
Discovery TCTTT/TCTTT TCTTT/− −/− P
Controls 56.3% (238) 38.8% (164) 4.9% (21) 0.370
Non-familial CRC 61.4% (189) 33.8% (104) 4.8% (15)

rs67687202, rs868 and rs334354 in TGFBR1 are in LD

In the original samples sequenced, rs67687202 was found to be in strong LD with rs334354 in the 7–8 intron and with rs868 in the 3′UTR of TGFBR1. We genotyped rs334354 and rs868 in all MSS HNPCC cases and in a set of 154 controls and 95 non-familial colorectal cancer cases. Using Haploview 4.2 (39), we calculated LD values and estimated haplotype frequencies in the three groups. As we initially found, rs334354 and rs868 are in strong LD in the analyzed controls and non-familial CRC cases (Supplementary Figure 1, available at Carcinogenesis Online). Data from 1000Genomes show a MAF of 0.20 for rs868 and 0.21 for rs334354 in the European population. Moreover, based on analysis with Tagger in Haploview (39), genotyping rs334354 and rs868 captures the variation of 17 other SNPs in TGFBR1 with a r2 >0.8. Therefore, rs67687202, rs868 and rs334354 are linked to others in the region (Supplementary Figure 2, available at Carcinogenesis Online) forming a much bigger LD block.

Association of TGFBR1 variants with MMR-proficient hereditary colorectal cancers

To determine whether the rs67687202 association is valid, we genotyped rs67687202 in the replication set of familial cases from the CCFR series. We found that rs67687202 is significantly associated with MSS HNPCC (Fisher exact P = 0.041) (Table 1). Combining the genotype data from the discovery and replication sets, including all 114 MMR-proficient familial cases and all controls, we found that cases who carried the major (ancestral) allele showed a statistically significant increased risk of CRC (OR = 1.68; 95% CI = 1.13–2.50; P = 0.010). Thus, the MMR proficient hereditary cases showed a significantly higher frequency of an ancestral haplotype than controls.

To determine whether rs67687202 is associated with non-familial CRC, we genotyped 308 rs67687202 in non-familial CRC cases and we compared the SNP frequency with 338 controls, both groups from a subset of individuals from the EPICOLON cohort. We did not identify an association of rs67687202 with non-familial CRC cases: OR = 1.16; CI 95% = 0.9–1.49; P = 0.250 (Table 1). In order to confirm these findings, we accessed GWAS information from the entire EPICOLON cohort (28) and analyzed genotype data available for rs868 in 723 non-familial CRC cases and 629 controls. Again, we did not identify a significant association (OR = 1.04; 95% CI = 0.87–1.26; P = 0.65). As previous work had found that rs11466445 (TGFBR1*6A) in exon 1 of TGFBR1 is associated with CRC, we evaluated whether this variant was linked to rs67687202. None of the patients in the discovery set harboring the alternate allele for rs67687202 shared the 6A polymorphism for rs11466445. Moreover, the frequency of the TGFBR1*6A allele in the discovery set of patients did not differ significantly from the frequency from controls (MAF = 0.09 versus MAF = 0.109, respectively) (40). These data suggested that there is no relationship between the rs11466445 and the rs67687202 haplotypes in MSS HNPCC patients.

Interaction of miRNA let-7 and TGFBR1 in colorectal cancer

miRNAs promote translational repression or mRNA degradation through binding to complementary sequences on their target mRNA transcript. Based on prediction algorithms, rs868 in the 3′UTR of TGFBR1 is contained in a binding site for the let-7 family of miRNAs (Figure 1) (34). Based on the sequence analysis of the let-7 binding site in the TGFBR1 3′ UTR, the presence of an adenine base at rs868 would result in better complementarity of TGFBR1 to all let-7 family members of miRNAs compared with the alternative guanine base. We considered that increased complementarity would translate into better let-7 binding and decreased expression of TGFBR1. Thus, we quantified the expression of let-7 family members in 159 sporadic CRC cases and 8 non-involved colon tissues from CRC patients using the miRNAseq data from the TCGA. This analysis showed that, combining data from all let-7 miRNAs, the let-7 family exhibited a higher expression in tumors than in non-involved tissues, although the differences did not reach significance, possibly due to the small number of available samples of non-involved tissues (Figure 2A). let-7a, 7b and 7f were the members with highest expression levels in colon cancers (Figure 2B).

Figure 1.

Figure 1.

Complementary binding between hsa-let-7 and TGFBR1 with rs868. Graphical representation of the complementary binding nucleotides between TGFBR1 3′UTR sequence and let-7 with detailed position of rs868 in TGFBR1 (www.microRNA.org).

Figure 2.

Figure 2.

let-7 miRNA family members expression levels from colon adenocarcinoma and non-involved colon tissues. (A) Bars represent the mean overall expression and standard error of the aggregated let-7 family members from miRNAseq data in non-involved and tumor colon tissues. (B) Bars represent the mean normalized counts and standard error from miRNAseq data for the different members of let-7 family in colon adenocarcinoma tissues.

In order to determine the potential difference in let-7 binding according to the presence of either adenine or guanine in the let-7 binding site of rs868, we performed cell-based functional assays. We compared the difference in luciferase expression after transfection of let-7b into HEK 293 cells in which we had co-transfected luciferase constructs containing one of the two rs868 polymorphisms. After normalization, in the presence of let-7b, we found that luciferase expression from the rs868 adenine-containing construct was approximately half the level of luciferase expression compared with the rs868 guanine-containing construct (P = 0.008), whereas in the presence of scrambled control miRNA luciferase expression was not significantly different (Figure 3). Thus, the data suggest that in the presence of the guanine nucleotide of rs868, there would be more allelic expression of TGFBR1 than in the presence of the adenine nucleotide.

Figure 3.

Figure 3.

Luciferase luminescence according to rs868 genotype in let-7b-5p binding site for TGFBR1 3′UTR. Bars represent the mean normalized relative luminescence units and standard error from luciferase-transporter assays. HEK 293 cells were transfected with a plasmid containing the sequence for A-rs868 or G-rs868 and treated with let-7b-5p or the non-targeting control miRNA (scrambled). Significant statistical difference between conditions and treatments are shown (P-value).

We then proceeded to analyze the expression of TGFBR1 based on rs868 genotype in human tumor and healthy tissues. To study tumor tissue, we used the data available from the TCGA. CRCs showed the expected lower expression of TGFBR1 for the major ancestral genotype (AA), which is the genotype associated with increased risk of cancer (Figure 4). The AA genotype showed two-thirds the expression of the GG genotype. However, likely due to the small number of individuals with the GG genotype, the difference did not reach statistical significance. We also used the GTEx project to study healthy colon tissues. We identified data for the SNP rs10283455 that is 100% correlated with rs868 and analyzed TGFBR1 expression. In this case, the major ancestral genotype for rs10283455 was not associated with lower expression levels of TGFBR1 (Figure 4).

Figure 4.

Figure 4.

TGFBR1 expression levels in colon adenocarcinoma and healthy colon tissues stratified by SNP genotype. (A) Box plot represents normalized counts and TGFBR1 mean expression level for each rs868 genotype (GG = 3; AA = 88 samples) including tumor tissues from the TCGA. (B) Box plot represents normalized counts and TGFBR1 mean expression level for each rs10283455 genotype (GG = 16; AA = 182 samples) including healthy tissues from GTEx.

Discussion

Since the discovery of the MMR genes and the base excision repair gene MUTYH as responsible for autosomal dominant (MMR) and autosomal recessive (MUTYH) CRC syndromes, respectively, the finding of new genetically determined CRC syndromes has been limited to a few scattered families. Recent GWAS have identified more than 30 low-risk alleles associated with CRC; however, these variants explain only a small portion of the remaining genetic variance in CRC and only rs6983267 has been found in excess in MSS HNPCC cases compared with controls.

Here we found an association of a TGFBR1 haplotype (rs334354, rs67687202 and rs868) with MSS HNPCC cases. The association was statistically significant in both the discovery and replication sets. We did not identify an association with non-familial CRC cases. Försti et al. (41) studied a similar number of non-familial cases from two prospective population-based cohorts in Sweden, showing a statistically significant association of rs334354 in TGFBR1 in CRC cases with an increased OR for the ancestral allele. On the other hand, SNPs in TGFBR1 were not identified as risk factors for non-familial CRC in published GWAS. A prior study did not identify a significant association of rs334354 with familial CRC cases; however, only 17 of the 262 patients studied met the Amsterdam criteria for MSS HNPCC in that study (42). These data suggest that the rs334354-rs868 haplotype is associated with higher CRC in the hereditary setting but not in the sporadic setting, that is, the haplotype acts as a modifier of risk. Further genotype analysis in both hereditary and sporadic CRC case–control series should be carried out to better quantify the effect of the TGFBR1 haplotype in the development of CRC.

We questioned whether the haplotype we described is in LD with rs11466445, a polymorphism referred to as TGFBR1*6A, that has been associated with different types of cancers (23,43). rs11466445 was reported to be more frequent in MSS HNPCC individuals than in Lynch syndrome patients (44). rs11466445 was not in LD with the risk-associated TGFBR1 haplotype nor did we identify an association of rs11466445 with the MSS HNPCC cases in the present study, similarly to what others reported studying a group of CRCs enriched with hereditary cases (42,45). As we report here, Försti et al. did not find an association of rs11466445 with non-familial CRC cases (41).

We analyzed the region containing the risk haplotype in order to find a potential biological explanation. Two mechanisms seemed plausible. First, a founder mutation present in MMR-proficient hereditary CRC cases could be present on the risk haplotype and this mutation would not be present in non-familial cases. Three studies (17,19,21) have identified the region of 9q22 as co-segregating with hereditary CRC in linkage studies. 9q22 is approximately a 25Mb chromosomal band and TGFBR1 is localized to the distal end of 9q22. Two of those linkage studies specially identified the strongest linkage signal in families with affected individuals at young ages (17,21). The third publication found the association in individuals with MMR-proficient CRC tumors in a large family with carriers of a MSH2 gene mutation (19). Interestingly, the narrowing of a previously described linkage signal on chromosome 9q (17) did not include TGFBR1 (46); however, accurate localization of linkage signals is challenging for moderate penetrance risk factors. We note that the GALNT12 gene, in which putatively disease-causing mutations have been identified in several families, is localized 2.5Mb proximal to TGFBR1. The impact of the association we have identified here on the linkage analysis signals is yet to be determined.

The second mechanism, more plausible in light of the in vivo and in vitro data shown here, could involve gene regulation by miRNAs. Indeed, we have demonstrated that the presence of an adenine base at rs868 of TGFBR1 is associated with lower luciferase gene expression, compared with the presence of a guanine base, after transfection of let-7b. This is compatible with tighter binding of let-7 with TGFBR1 when rs868 contains adenine resulting in lower TGFBR1 expression, as we saw in CRCs. Our interpretation of these data is that the guanine base disrupts the binding of one of the most abundant let-7 miRNA members in colon tumors. The consequence of lower binding should be higher levels of TGFBR1 and protection from colorectal carcinogenesis. In fact, TGFBR1 has also been identified as a target gene for let-7 in liver as over-expression of let-7c inhibits both the expression of TGFBR1 mRNA and protein, as well as the expression of a luciferase reporter gene fused to the 3′UTR of TGFBR1 (47). From this point of view, the adenine allele of rs868 protects against CRC.

The regulation of TGFBR1 by let-7 miRNAs (34,48) could be compatible, at least in part, with what was previously reported as allele-specific expression (ASE) of TGFBR1 (24). ASE results in reduced expression of the gene, it is dominantly inherited, it segregates in families and it also occurs in non-familial CRC cases. Two major TGFBR1 haplotypes are predominant among ASE cases (which suggests ancestral mutations) but causative germline changes have not been identified. Other investigators, however, have not been able to replicate these findings (45). ASE was associated with different haplotype frequency distributions in a genomic region covering the area between the 3′ end of intron 3 to ~5kb downstream of the 3′ end of the UTR and the highest association was observed in the region where rs868 is located (24). The effect of rs868 on miRNA binding could thus be critical to pathogenesis. Unfortunately, we were not able to assess the presence of ASE in our series.

A similar mechanism has been proposed for the LCS6 variant in a KRAS miRNA complementary site, which is significantly associated with an increased risk for non-small cell lung cancer among smokers (49).

Finally, the rs334348 in the 3′UTR of TGFBR1 has been found to correlate with differences in protein modulation by miR-628-5p in the presence of the non-reference allele. This SNP is located 815bp from rs868 in a high LD region and was found to be associated with familial breast cancer (50).

In summary, our study strengthens the likelihood that TGFBR1 in 9q22.33 is a modifying factor for CRC development, particularly in the group of patients with MSS HNPCC. We propose a potential biological mechanism by which differential miRNA binding influences TGFBR1 expression and results in cancer-promoting consequences.

Supplementary material

Supplementary Table 1 and Figures 1 and 2 can be found at http://carcin.oxfordjournals.org/

Funding

This work was supported in part by Sirazi Foundation; Area of Excellence Award from the University of Illinois at Chicago, Office of the Vice Chancellor for Research; the Department of Medicine and Cancer Center of the University of Illinois (X.L.). T.C., P.G. and M.d.H. were supported by PI13/02588 and RD12/0036/006 from ISCIII (Spain) and the European Regional Development FEDER funds. The Colon Cancer Family Registry (CCFR) was supported by grant UM1 CA167551 from the National Cancer Institute. The following CCFR centers provided samples and data for this study: Australasian Colorectal Cancer Family Registry (U01/U24 CA097735), USC Consortium Colorectal Cancer Family Registry (U01/U24 CA074799), Mayo Clinic Cooperative Family Registry for Colon Cancer Studies (U01/U24 CA074800), Ontario Registry for Studies of Familial Colorectal Cancer (U01/U24 CA074783), Seattle Colorectal Cancer Family Registry (U01/U24 CA074794) and University of Hawaii Colorectal Cancer Family Registry (U01/U24 CA074806).

The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the CCFR, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government or the CCFR. The study sponsors had no role in the design of the study; no role in the collection, analysis or interpretation of the data; no role in the writing of the manuscript and no role in the decision to submit the manuscript for publication.

Supplementary Material

Supplementary Data

Acknowledgements

R.M.X. and X.L. designed and supervised the overall project, analyzed and interpreted the data, and drafted the manuscript. They take full responsibility for the integrity of the data and the accuracy of the data analysis. T.C. and N.A.E. provided critical analysis of the results and helped revise the manuscript for important intellectual content. R.M.X., S.B., B.J.D., J.R., E.L., M.d.H. and P.G. processed, prepared and analyzed the samples. X.B., J.C., L.B., F.B., S.C.-B., R.J., C.R.-P., S.S., M.A., A. Carracedo and A. Castells supervised different aspects of the recruitment and data acquisition and revised the manuscript. C.A. revised and selected all pathology specimens. R.M.X. performed the statistical analysis.

Conflict of Interest Statement: None declared.

Glossary

Abbreviations

ASE

allele-specific amplification

BER

base excision repair

CEU

Utah Residents with Northern and Western European ancestry

CRC

colorectal cancer

HNPCC

hereditary non-polyposis colorectal cancer

LD

linkage disequilibrium

LS

Lynch syndrome

MAF

minor allele frequency

MMR

mismatch repair system

MSS HNPCC

microsatellite-stable hereditary non-polyposis colorectal cancer

SNP

single-nucleotide polymorphism

References

  • 1. Pinol V., et al. (2004) Frequency of hereditary non-polyposis colorectal cancer and other colorectal cancer familial forms in Spain: a multicentre, prospective, nationwide study. Eur. J. Gastroenterol. Hepatol., 16, 39–45. [DOI] [PubMed] [Google Scholar]
  • 2. Johns L.E., et al. (2001) A systematic review and meta-analysis of familial colorectal cancer risk. Am. J. Gastroenterol., 96, 2992–3003. [DOI] [PubMed] [Google Scholar]
  • 3. Xicola R.M., et al. (2009) Hereditary colorectal cancer. In Kim K. (ed) Colorectal Cancer: Early Detection and Prevention. Slack, Thorofare, New Jersey, pp. 149–166. [Google Scholar]
  • 4. Vasen H.F., et al. (1999) New clinical criteria for hereditary nonpolyposis colorectal cancer (HNPCC, Lynch syndrome) proposed by the International Collaborative group on HNPCC. Gastroenterology, 116, 1453–1456. [DOI] [PubMed] [Google Scholar]
  • 5. Llor X., et al. (2005) Differential features of colorectal cancers fulfilling Amsterdam criteria without involvement of the mutator pathway. Clin. Cancer Res., 11, 7304–7310. [DOI] [PubMed] [Google Scholar]
  • 6. Lindor N.M., et al. (2005) Lower cancer incidence in Amsterdam-I criteria families without mismatch repair deficiency: familial colorectal cancer type X. JAMA, 293, 1979–1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Koh P.K., et al. (2011) Familial colorectal cancer type X: polyp burden and cancer risk stratification via a family history score. ANZ J. Surg., 81, 537–542. [DOI] [PubMed] [Google Scholar]
  • 8. Sanchez-de-Abajo A., et al. (2007) Molecular analysis of colorectal cancer tumors from patients with mismatch repair proficient hereditary nonpolyposis colorectal cancer suggests novel carcinogenic pathways. Clin. Cancer Res., 13, 5729–5735. [DOI] [PubMed] [Google Scholar]
  • 9. Goel A., et al. (2010) Aberrant DNA methylation in hereditary nonpolyposis colorectal cancer without mismatch repair deficiency. Gastroenterology, 138, 1854–1862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Esteban-Jurado C., et al. (2014) New genes emerging for colorectal cancer predisposition. World J. Gastroenterol., 20, 1961–1971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Whiffin N., et al. (2013) Deciphering the genetic architecture of low-penetrance susceptibility to colorectal cancer. Hum. Mol. Genet., 22, 5075–5082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Ling H., et al. (2013) CCAT2, a novel noncoding RNA mapping to 8q24, underlies metastatic progression and chromosomal instability in colon cancer. Genome Res., 23, 1446–1461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Abuli A., et al. (2010) Susceptibility genetic variants associated with colorectal cancer risk correlate with cancer phenotype. Gastroenterology, 139, 788–796. [DOI] [PubMed] [Google Scholar]
  • 14. Garre P., et al. (2011) Analysis of the oxidative damage repair genes NUDT1, OGG1, and MUTYH in patients from mismatch repair proficient HNPCC families (MSS-HNPCC). Clin. Cancer Res., 17, 1701–1712. [DOI] [PubMed] [Google Scholar]
  • 15. Palles C., et al. (2013) Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet., 45, 136–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Papaemmanuil E., et al. (2008) Deciphering the genetics of hereditary non-syndromic colorectal cancer. Eur. J. Hum. Genet., 16, 1477–1486. [DOI] [PubMed] [Google Scholar]
  • 17. Wiesner G.L., et al. (2003) A subset of familial colorectal neoplasia kindreds linked to chromosome 9q22.2–31.2. Proc Natl Acad Sci USA, 100, 12961–12965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Picelli S., et al. (2008) Genome-wide linkage scan for colorectal cancer susceptibility genes supports linkage to chromosome 3q. BMC Cancer, 8, 87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Skoglund J., et al. (2006) Linkage analysis in a large Swedish family supports the presence of a susceptibility locus for adenoma and colorectal cancer on chromosome 9q22.32-31.1. J. Med. Genet., 43, e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kemp Z., et al. (2006) Evidence for a colorectal cancer susceptibility locus on chromosome 3q21-q24 from a high-density SNP genome-wide linkage scan. Hum. Mol. Genet., 15, 2903–2910. [DOI] [PubMed] [Google Scholar]
  • 21. Kemp Z.E., et al. (2006) Evidence of linkage to chromosome 9q22.33 in colorectal cancer kindreds from the United Kingdom. Cancer Res., 66, 5003–5006. [DOI] [PubMed] [Google Scholar]
  • 22. Clarke E., et al. (2012) Inherited deleterious variants in GALNT12 are associated with CRC susceptibility. Hum. Mutat., 33, 1056–1058. [DOI] [PubMed] [Google Scholar]
  • 23. Pasche B., et al. (1998) Type I transforming growth factor beta receptor maps to 9q22 and exhibits a polymorphism and a rare variant within a polyalanine tract. Cancer Res., 58, 2727–2732. [PubMed] [Google Scholar]
  • 24. Valle L., et al. (2008) Germline allele-specific expression of TGFBR1 confers an increased risk of colorectal cancer. Science, 321, 1361–1365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Pérez-Carbonell L., et al. (2012) Comparison between universal molecular screening for Lynch syndrome and revised Bethesda guidelines in a large population-based cohort of patients with colorectal cancer. Gut, 61, 865–872. [DOI] [PubMed] [Google Scholar]
  • 26. Newcomb P.A., et al. (2007) Colon Cancer Family Registry: an international resource for studies of the genetic epidemiology of colon cancer. Cancer Epidemiol. Biomarkers Prev., 16, 2331–2343. [DOI] [PubMed] [Google Scholar]
  • 27. Network T.C.G.A. (2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487, 330–337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Fernandez-Rozadilla C., et al. (2014) A genome-wide association study on copy-number variation identifies a 11q11 loss as a candidate susceptibility variant for colorectal cancer. Hum. Genet., 133, 525–534. [DOI] [PubMed] [Google Scholar]
  • 29. Love M.I., et al. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15, 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Ng P.C., et al. (2001) Predicting deleterious amino acid substitutions. Genome Res., 11, 863–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Adzhubei I.A., et al. (2010) A method and server for predicting damaging missense mutations. Nat. Methods, 7, 248–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Desmet F.O., et al. (2009) Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res., 37, e67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Kozomara A., et al. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res., 39, D152–D157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. John B., et al. (2004) Human MicroRNA targets. PLoS Biol., 2, e363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Vergoulis T., et al. TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucleic Acids Res., 40, D222–D229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Kruger J., et al. (2006) RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res., 34, W451–W454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. R Core Team. (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  • 38. Purcell S., et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81, 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Barrett J.C., et al. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263–265. [DOI] [PubMed] [Google Scholar]
  • 40. Abuli A., et al. A two-phase case-control study for colorectal cancer genetic susceptibility: candidate genes from chromosomal regions 9q22 and 3q22. Br. J. Cancer, 105, 870–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Försti A., et al. (2010) Polymorphisms in the transforming growth factor beta 1 pathway in relation to colorectal cancer progression. Genes. Chromosomes Cancer, 49, 270–281. [DOI] [PubMed] [Google Scholar]
  • 42. Skoglund Lundin J., et al. (2009) TGFBR1 variants TGFBR1(*)6A and Int7G24A are not associated with an increased familial colorectal cancer risk. Br. J. Cancer, 100, 1674–1679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Valle L. (2012) Debate about TGFBR1 and the susceptibility to colorectal cancer. World J. Gastrointest. Oncol., 4, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Bian Y., et al. (2005) TGFBR1*6A may contribute to hereditary colorectal cancer. J. Clin. Oncol., 23, 3074–3078. [DOI] [PubMed] [Google Scholar]
  • 45. Carvajal-Carmona L.G., et al. (2010) Comprehensive assessment of variation at the transforming growth factor beta type 1 receptor locus and colorectal cancer predisposition. Proc. Natl Acad. Sci. USA, 107, 7858–7862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Gray-McGuire C., et al. (2010) Confirmation of linkage to and localization of familial colon cancer risk haplotype on chromosome 9q22. Cancer Res., 70, 5409–5418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Tzur G., et al. (2009) Comprehensive gene and microRNA expression profiling reveals a role for microRNAs in human liver development. PLoS One, 4, e7511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Issabekova A., et al. (2012) Interactions of intergenic microRNAs with mRNAs of genes involved in carcinogenesis. Bioinformation, 8, 513–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Chin L.J., et al. (2008) A SNP in a let-7 microRNA complementary site in the KRAS 3’ untranslated region increases non-small cell lung cancer risk. Cancer Res., 68, 8535–8540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Nicoloso M.S., et al. (2010) Single-nucleotide polymorphisms inside microRNA target sites influence tumor susceptibility. Cancer Res., 70, 2789–2798. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
supp_bgw064_Supp1.tif (1.5MB, tif)
supp_bgw064_Supp2.tif (708.6KB, tif)

Articles from Carcinogenesis are provided here courtesy of Oxford University Press

RESOURCES