Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2003 Apr 1;31(7):e33. doi: 10.1093/nar/gng033

Amplification of repeat-containing transcribed sequences (ARTS): a transcriptome fingerprinting strategy to detect functionally relevant microsatellite mutations in cancer

Martina Olivero, Tina Ruggiero, Nadia Coltella, Antonella Maffe’, Raffaele Calogero 2, Enzo Medico 1, Maria Flavia Di Renzo a
PMCID: PMC152818  PMID: 12655021

Abstract

Cancer is a genetic disease caused by mutations in somatic cells. Those that carry advantageous mutations are favoured by natural selection. In most cancers, genetic instability increases mutation rate and facilitates cancer cell evolution. Microsatellite instability (MSI), due to defects of the DNA mismatch repair system, affects in particular repeat sequences (microsatellites) scattered throughout the genome. As mutations in expressed genes are more likely to be functional, we developed a procedure for the systematic identification of mutant repeat-containing expressed sequences (amplification of repeat-containing transcribed sequences, ARTS). The entire cell mRNA was converted into short double-stranded cDNA fragments linked to an adapter at both ends. Repeat-containing cDNA fragments were PCR amplified using the adapter-specific primer in combination with different arbitrary primers including the repeat. ARTS yielded discrete PCR products with lengths that were directly correlated to the lengths of the endogenous repeats. Comparison between ARTS products obtained from control cells and cancer cells with microsatellite instability (MSI+) revealed mRNAs carrying insertions or deletions at repeats. The subsequent sequencing allowed the identification of a series of frameshift-mutated mRNAs in MSI+ cancer cells, including the already described mutant BAX transcript. These data show that ARTS provides an unbiased genome-wide approach to the discovery of functionally relevant genes that could be affected by MSI in cancer.

INTRODUCTION

Cancer initiates as a result of mutation in a single gene in a progenitor cell. Subsequent mutations in the offspring of this cell and waves of clonal expansion give rise to daughter cells showing a growth advantage (reviewed in 1). The initiators and the targets of this multi-step progression are activated oncogenes and inactivated tumour suppressor genes, of which altered caretaker genes increase variants for the forces of selection to act upon. In tumours an abnormally high level of genetic instability might cause alterations to accumulate throughout the many tumour cell divisions. There are several completely distinct forms of genetic instability (reviewed in 2). One of the best understood is that arising from inactivation of DNA mismatch repair genes that gives rise to instability at the nucleotide level, because naturally occurring replication errors cannot be repaired effectively (reviewed in 3). This instability is easier to observe at short DNA sequences scattered throughout the genome called microsatellites and is thus known as microsatellite instability (MSI).

MSI is the hallmark of tumours arising in hereditary non-polyposis colorectal cancer syndrome (4), one of the commonest hereditary cancer syndromes, and it is also observed in 10–15% of sporadic colorectal (5), gastric (6) and endometrial (7) cancers. The base level of genetic instability characterising these tumors makes it difficult to establish which genetic alteration plays a role in tumourigenesis. The definition of ‘real’ target genes for MSI involves studying their mutation frequency in tumours and the effect of mutation on the expression and activity of the encoded proteins (reviewed in 8,9). Mutations due to MSI were originally described mainly in non-coding sequences (10). However, using the candidate gene approach, i.e. studying sequences relevant to critical cellular pathways, approximately 30 genes were found mutated within their coding sequences (CDS) due to mismatch repair defects (reviewed in 9,11). More recently, the huge amount of sequence information contained in public databases enabled genome scanning for new candidate target genes. In MSI+ tumours, the slippage mistakes made by DNA polymerase at repeat sequences are not effectively repaired due to defects in DNA mismatch repair. Tracts of repeat sequences are therefore particularly vulnerable to mutations (reviewed in 12). Thus, the reported genome-wide scans were focused on coding DNA microsatellites, and mutational screenings were performed of the identified loci in MSI+ cancer (1317). Several mutations have been identified, but it is again difficult to distinguish the few cancer-driving mutations from a possible vast majority of so-called bystander or passenger alterations. Most of the mutational screenings of MSI+ cancers are focused on tracts of mononucleotide repeats, where base insertion and deletion are especially harmful as the resulting frameshifts are predicted to generate aberrant transcripts. Aberrant transcripts are either translated into non-functioning proteins or degraded by the mRNA surveillance system. Therefore, in all cases, functional studies should first determine if a mutated gene is expressed in normal tissue and if and how its mutant version is expressed in corresponding tumours. We were interested to know if the already identified putative MSI target genes are expressed in the relevant tissues, for example in colorectal tissues, and carried out their expression analysis in these using gene chips. We report here that most of the putative MSI targets identified are barely expressed in normal and cancerous human colorectal tissues. Then, we conceived a procedure for the systematic identification of mutant repeat-containing expressed sequences. Here we show the set-up and application of a novel transcriptome-wide strategy that allows identification of mutations in expressed genes in cancer.

MATERIALS AND METHODS

Cell lines

The following colorectal carcinoma-derived cell lines were used: the non-MSI SW480 and the MSI+ DLD1/HCT15 and HCT116, purchased from the American Type Culture Collection (ATCC, Manassas, VA).

mRNA isolation

Cytoplasmic RNA was isolated from cultured cells using the Concert™ Cytoplasmic RNA Reagent (Invitrogen, Carlsbad, CA); mRNA was isolated from cytoplasmic RNA using a mRNA Isolation Kit (Roche, Mannheim, Germany). Total and poly(A)+ RNA integrity and quality were critical and were thoroughly checked by gel electrophoresis, northern blot and quantitative RT–PCR with the TaqMan assay using housekeeping gene probes.

cDNA synthesis

First-strand cDNA was synthesised from 5 µg of mRNA in a reaction buffer containing 1× M-MLVRT reaction buffer (Promega, Madison, WI), 250 ng random hexanucleotides (Invitrogen), 5 mM dithiothreitol (Invitrogen), 0.25 mM dNTPs (Invitrogen), 5 µl of M-MLVRT(H-) enzyme (Promega) in DEPC-treated H2O; the reaction was incubated for 1 h at 37°C. Second-strand synthesis was performed on the first-strand reaction product in a buffer containing 1× Second Strand Buffer (Invitrogen), 0.2 mM dNTPs, 5 U Escherichia coli T4 DNA ligase (MBI Fermentas, Vilnius, Lithuania), 5 U Klenow fragment (MBI Fermentas), 2 U E.coli RNase H (Invitrogen) in DEPC-treated H2O. The reaction was incubated for 2 h at 16°C, then 10 U of T4 DNA polymerase was added and incubated at 16°C for 5 min. The reaction was stopped with 33 mM EDTA and heated to 70°C for 10 min. Double-stranded (ds)-cDNA was then extracted with phenol: chloroform (1:1) pH 8 and precipitated with ammonium acetate and ethanol.

Digestion and ligation

The ds-cDNA was digested with NlaIII (New England Biolabs, Beverly, MA) at 37°C for 1 h. As digestion is a critical step, it was thoroughly checked by means of control amplifications. Each fragment was ligated to an adapter using T4 DNA ligase (Invitrogen) at 16°C for 1 h. The adapter sequence was designed according to one sequence used as a linker in SAGE protocols (18), and is as follows: 5′-GGATTTGCTGGTGCAGTACACATG-3′ (Ad primer). The adapter was phosphorylated and annealed to the complementary primer: 5′-TGTACTGCACCAGCAAATCC-3′. We verified that these primers do not amplify any sequence from human cDNA.

Extension of repeat-containing sequences

cDNA fragments ligated to adapters were first linearly amplified using 0.7 µM specific repeat-anchoring primer (R-An primer, Fig. 1), made of the following modules: the universal sequence (US) (5′-CTAATACGACTCACTATAGGGCGC-3′) followed by the arbitrary repeat-anchoring trinucleotide (RAT) and the 6 nt repeat-specific sequence (RSS). The reaction was carried out for 3 min at 95°C, for five cycles of 30 s at 94°C and for 2 min at either 42°C in the case of (A)6-R-An primers or 47°C in the case of (G)6-R-An primers, in a reaction buffer containing 1× PCR buffer minus Mg (Invitrogen), 1 mM MgCl2, 0.2 mM dNTPs, 0.1 µCi/µl [33P]dCTP (Amersham-Biosciences, Freiburg, Germany) and 0.05 U/µl Platinum Taq DNA polymerase (Invitrogen).

Figure 1.

Figure 1

Description of the ARTS procedure and expected results. (A) Schematic flowchart representation of the procedure for obtaining ARTS products from a total transcriptome. As a first step, mRNAs are purified, reverse transcribed and converted into ds-cDNAs. The ds-cDNA population is then digested with a restriction endonuclease (NlaIII) that is expected to cleave most reverse transcripts at least once, generating 200–500 bp long ds-cDNA fragments. The fragments are subsequently ligated to an adapter (Ad), here indicated as ARTS-Ad. To specifically enrich for repeat-containing sequences, linear amplification is performed using a repeat-anchoring primer (R-An primer), designed as described in (B). Five linear amplification cycles are performed so that the anchoring (RAT) and repeat-specific (RSS) parts of the R-An primer are both required to achieve specific annealing. After the enrichment step, repeat-containing sequences are PCR amplified at high stringency using the same R-An primer and the ARTS-Ad primer, the latter specifically designed to avoid adapter–adapter amplification. (B) Design of the repeat-anchoring (R-An) primer. The R-An primer is composed of three modules, in 5′→3′ order: (i) a universal sequence (US), common to all R-An primers, which increases the efficiency of the final PCR amplification step; (ii) an arbitrary repeat-anchoring trinucleotide (RAT), which not only prevents slippage, but also defines the three bases immediately 5′ to the repeat being located at the 5′ end; (iii) a repeat-specific sequence (RSS), which contains the repeat to be explored. (C) Graphical representation of the expected ARTS products and their migration in PAGE, highlighting the differences between a PCR product containing a wild-type repeat or its longer and shorter versions, deriving from insertions and deletions, respectively. Due to the fact that the R-An primer is anchored 5′ of the repeat, the size of the ARTS product will be directly dependent on the repeat length.

PCR step

The linearly amplified cDNA products were exponentially amplified with 0.7 µM specific R-An primer and 0.7 µM Ad primer in the same reaction buffer used for the linear extension, containing [33P]dCTP (see above). The amplification was carried out as follows: 3 min at 95°C; five cycles of 30 s at 94°C, 1 min at either 42 or 47°C [in the case of either (A)6-R-An primers or (G)6-R-An primers, respectively] and 1 min at 72°C; 30 cycles of 30 s at 94°C, 30 s at either 50 or 55°C [in the case of either (A)6-R-An primers or (G)6-R-An primers, respectively] and 1 min at 72°C. The final elongation was carried out for 30 min at 72°C.

PCR product separation and sequencing

ARTS products were separated on 6% denaturing polyacrylamide gel, dried and exposed. Selected bands were cut from the gel and DNA was extracted by boiling gel pieces for 15 min. The extracted products were re-amplified with the specific R-An primer and the Ad primer under the same conditions used for exponential amplification. The PCR products were purified by a JETquick Gel Extraction Spin Kit (GENOMED) and sequenced using a Thermo Sequenase Radiolabeled Terminator Cycle Sequencing Kit (UBS Corp., Cleveland, OH).

Microarray screening

Total RNA was extracted from four samples of normal colorectal mucosa obtained from informed patients operated on for either diverticulitis or rectum prolapse. RNA was isolated from freshly frozen tissues with guanidiniun thiocyanate. Its integrity and quality were critical and were thoroughly checked by gel electrophoresis, northern blot and quantitative RT–PCR with the TaqMan assay using housekeeping gene probes. Generation of the hybridisation mix for Affymetrix GeneChip analysis was carried out according to standard Affymetrix protocols. Briefly, ds-cDNA was synthesised from 20 µg total RNA using the Superscript Choice System (Gibco BRL) with an HPLC-purified oligo(dT) primer containing a T7 RNA polymerase promoter (Sigma Genosys) following the manufacturer’s instructions. In vitro transcription was carried out with 1 µg cDNA using an Enzo BioArray RNA transcript labeling kit (Enzo Diagnostics, Farmingdale, NY). Aliquots of 20 µg of biotinylated cRNA were fragmented and hybridised to Human Genome HG-U133Av2 GeneChips, which were subsequently washed, stained and scanned according to the Affymetrix protocols at the BioGeM Affymetrix Facility (Naples, Italy). Affymetrix MAS5.0 software was used to normalise and scale results. For each sample, out of 22 091 probe sets explored, 10 200 ± 420 were called expressed (present or marginal expression call). Considering the whole experiment, 9619 probe sets (43%) were called expressed in at least three of four samples, of which 7795 (35%) were called expressed in all four samples. This indicates good quality of the starting material and of the microarray analysis procedure. By studying the distribution of the expression level with respect to the expression call, we subdivided the probe sets into three subgroups, expressed, borderline and not expressed (data not shown), respectively.

Sequence database scanning for repeats

A systematic screening of human CDSs and 3′-untranslated regions (3′-UTRs) containing stretches of homopolymeric repeats, as well as di- and trinucleotide repeats, was done by means of Perl scripts. In the screening, a minimum of seven mononucleotides, four dinucleotides and three trinucleotides was required as the lower limit of the repeats. The number of sequences containing at least one of the above-mentioned repeats was searched in 15 181 sequences present in release 14 of the 3′-UTR database (19) and in 45 425 CDSs present in the human reference coding sequences database (ftp.ncbi.nih.gov/refseq) downloaded in April 2002.

Expression database analysis

SAGE tags databases of normal and tumour colorectal tissues were retrieved from http://www.ncbi.nlm.nih.gov/geo (NC1, GSM728; NC2, GSM729; TU98, GSM756; TU102, GSM755) and tags related to 98 putative MSI target genes were extracted using a Perl script. A list of the 98 genes studied is available from the authors. The medium level of expression was considered to be below 60 tags per million, which incorporates 90% of the transcripts detected by SAGE. Expression information for the 41 of 98 putative MSI target genes represented in the microarray dataset of normal and colorectal cancer tissues obtained by Nottermann and co-workers was extracted from http://microarray.princeton.edu/oncology/Data/CarcinomaNormalDatasetCancerResearch.txt using Perl scripts. These data were obtained using the MAS4 Affymetrix software and centred to an average signal of 50 intensity units. We therefore divided the data set into three groups: unexpressed genes, showing an average expression level <0 intensity units (21.7% of genes); genes expressed at medium to low level, showing an average level of expression between 0 and 400 intensity units (75.5% of genes); highly expressed genes, showing an average expression level >400 intensity units (2.56% of genes).

Accession numbers

BMP7 (bone morphogenetic protein 7), GenBank accession no. NM_001719; Porin 31HM cDNA, LOC158553, GenBank accession no. XM_018142; PP610, GenBank accession no. AF177331; IMPD2 transcript, GenBank accession no. BC015567; BAX β gene CDS, GenBank accession no. L22474; PIG-T mRNA, GenBank accession no. AB057724; PACE mRNA (paired basic amino acid cleaving enzyme), GenBank accession no. NM_002569; WNT16 mRNA, GenBank accession no. NM_016087; DSTN, GenBank accession no. NM_006870; NDUFC2, GenBank accession no. NM_004549; ZMPSTE24, GenBank accession no. NM_005857; KIAA0905, GenBank accession no. AB020712; P4HB, GenBank accession no. NM_000918; MBD4, GenBank accession no. NM_003925.

RESULTS

Expression of known putative target genes in human colorectal mucosa

Several studies have recently reported a number of candidate target genes of MSI in colorectal cancers. We studied expression of 98 genes found mutated at coding microsatellites in 5–58% of MSI+ colorectal cancers (9,1317). We first examined expression of their CDS in human normal colorectal mucosa by high density oligonucleotide (HDO; Affymetrix) microarrays. Within this chip, 77 of 98 of the above-mentioned putative MSI target genes were represented by at least one probe set. Among these 77, only 18 genes were found to be consistently expressed. Only the five (6%) genes DSTN, NDUFC2, ZMPSTE24, KIAA0905 and P4HB were found expressed at high level, i.e. not only ‘called as present’ but also showing a high signal.

To confirm our experimental data, we also checked the expression levels of these putative target genes reported in databases available online. In serial analysis of gene expression databases of normal colorectal mucosa and colorectal cancers, only two (MBD4 and P4HB) out of the 98 putative MSI target genes were found expressed over a medium level. Expression of the same 98 genes was also investigated in a microarray dataset obtained by Notterman and co-workers from normal and neoplastic colorectal tissues (20). Forty-one out of 98 genes were represented in Notterman’s data set. Ten of these 41 genes were not expressed, while the remaining ones were expressed at a medium to low level. There were no highly expressed genes.

In conclusion, most of the genes identified as putative MSI targets are barely expressed in normal and cancerous human colorectal tissues, with a distribution overlapping that of the whole set of explored genes. This indicates that genes found with DNA-scanning methods are probably not any more functionally relevant in colon tissues than any other gene, and that their mutant versions are unlikely to have been selected during tumour progression.

Identification of messenger RNAs containing repeats in the human transcriptome

To identify mutations in expressed genes, we conceived an RT–PCR-based procedure. Again we focused analysis on repeats, these being particularly susceptible to MSI, as mentioned before. To predict the number of repeats to be analysed in a transcriptome-wide approach, we examined those extracted from the CDS human reference sequences and 3′-UTR databases for mononucleotide, dinucleotide and trinucleotide repeats. Neither database analysed contains redundant sequences. We found that essentially all the extracted sequences contain at least one repeat, the short mononucleotide repeats being more frequent than di- and trinucleotide repeats, and A/T repeats being more frequent than G/C repeats. As an example, out of 45 425 CDSs listed in the database, we found that 2628 (5.7%) CDSs contain (A)7, 2754 (6%) contain (T)7, 112 (0.25%) contain (G)7 and 224 (0.4%) contain (C)7. Out of 15 181 3′-UTRs, 1883 (12.4%) contain (A)7, 2223 (14.6%) contain (T)7, 43 (0.28%) contain (G)7 and 58 (0.38%) contain (C)7. As expected, we found that short repeats are more frequent than the longer ones in CDSs and that a threshold length of six bases allows detection of repeats in almost all CDSs.

Validation of ARTS for the detection of transcripts containing mononucleotide repeats

To identify transcripts mutated because of either insertion or deletion of bases in MSI+ cancer cells, we developed an RT–PCR-based approach aimed at amplifying repeats and the flanking sequences (ARTS) (Fig. 1). All the cell mRNAs were converted into ds-cDNA and subsequently digested to yield ds-cDNA fragments of 200–500 bp. Each fragment was ligated to an adapter at both ends. PCR products were obtained by amplifying sequences encompassing the adapter and any given repeat recognised by a repeat-anchoring primer (R-An primer, Fig. 1B). As MSI frequently causes insertion and deletion of bases at repeat sequences (frameshift mutations) that are particularly deleterious, we focused on mutations detectable as length changes of PCR products containing repeats (Fig. 1C). The direct comparison of the length of each PCR product obtained from MSI+ tumour cells and from cells without MSI allows the identification of mutated transcripts.

We used either MSI+ or non-MSI colorectal cancer cell lines to set up ARTS. Using an R-An primer (Fig. 1) made of three arbitrary bases and either (A)6 or (G)6 sequences, we obtained ARTS PCR products that can be separated as discrete bands in sequencing gels (Figs 13). After purification from a gel and sequencing, these bands showed, as expected, the same or a higher number of mononucleotides corresponding or complementary to those included in the R-An primer (Figs 2 and 3). Thanks to the use of ds-cDNAs linked to the adapter at both ends, each (A)6-R-An primer (listed in Fig. 1) amplified either (A)n > 5 or (T)n > 5 containing cDNA fragments (Fig. 2) and each (G)6-R-An primer amplified either (G)n > 5 or (C)n > 5 containing cDNAs (Fig. 3). It is worth noting that the poly(A)+ tails of mRNAs are not amplified, as this (A)n is not flanked by any bases at their 3′ end. In each ARTS product, the flanking sequence contained between one and three bases identical to those of the RAT of the specific R-An primer (Figs 2 and 3). Thanks to the use of randomly primed reverse transcription, ARTS evenly detected short and long transcripts and ARTS PCR products equally represented sequences near the 3′ and the 5′ mRNA ends. In most cases the co-migrating bands amplified from different colorectal cancer cell lines had the same sequence (data not shown), although sometimes expressed at different levels. In many cases, the amplified bands correspond to transcripts of genes expressed at very low level in SAGE and microarray experiments (data not shown). In conclusion, ARTS succeeded in generating a microsatellite fingerprint specifically restricted to mRNAs of expressed genes. On the basis of our results, 24 R-An primers are sufficient to analyse all the cell transcripts containing any given mononucleotide repeat of six bases or more (Fig. 1). Each of these primers is expected to generate ∼100–500 PCR products (of 50–500 bp), which can be efficiently separated on conventional sequencing gels and compared between MSI+ and non-MSI cells or tissue samples. The same procedure can be used to identify transcripts containing di- or trinucleotide repeats, given that PCR conditions are changed.

Figure 3.

Figure 3

ARTS identifies mutated (G/C)n repeats in transcripts from colorectal cancer cell lines.33P-labelled ARTS products were obtained with R-An primer 21 [shown at the bottom of (A)] and separated on a polyacrylamide gel. Arbitrary bases in the R-An primer are in bold. Messenger RNAs from the following colorectal cancer cell lines were analysed: SW480 (non-MSI; lane 1), HCT15/DLD1 (MSI+; lane 2) and HCT116 (MSI+; lane 3). (A) Complete gel comparing all the obtained ARTS products. Bands showing different relative intensities are indicated by dotted lines; in some cases the bands were barely detectable in one of the cell lines. As in the case of Figure 2, sequencing of ARTS products from the non-MSI SW480 cells demonstrated the absence of artifactual insertions or deletions at repeats introduced by ARTS. Boxes indicate gel segments that are enlarged in (B)–(E). (B) ARTS identified the (G)8 repeat in the BAX β gene CDS carrying a 1 bp deletion in both MSI+ cell lines; bases that flank the repeat and are recognised by the arbitrary R-An primer bases are in bold. (C) ARTS identified the (C)7 repeat in the PIG-T mRNA carrying a 2 bp insertion in HCT116 cells and which is expressed but not mutated in HCT15 cells. (D) ARTS identified the (C)7 repeat in the PACE mRNA (paired basic amino acid cleaving enzyme) carrying a 1 bp insertion in HCT116 cells and which is expressed but not mutated in HCT15 cells. (E) ARTS identified the (G)7 repeat in the WNT16 mRNA carrying a 1 bp deletion in HCT116 cells and which is expressed but not mutated in HCT15 cells.

Figure 2.

Figure 2

ARTS identifies mutated (A/T)n repeats in transcripts from colorectal cancer cell lines. 33P-labelled ARTS products were obtained with R-An primer 7 [shown at the bottom of (A)] and separated on a polyacrylamide gel. Arbitrary bases in the R-An primer are in bold. Messenger RNAs from the following colorectal cancer cell lines were analysed: SW480 (non-MSI; lane 1), HCT15/DLD1 (MSI+; lane 2) and HCT116 (MSI+; lane 3). (A) Complete gel comparing all the obtained ARTS products. Bands showing different relative intensities are indicated by dotted lines; in some cases the bands were barely detectable in one of the cell lines. As examples, arrows point to ARTS products corresponding to differentially expressed transcripts of BMP7 (bone morphogenetic protein 7, top arrow) and of cDNA similar to Porin 31HM cDNA (bottom arrow). These transcripts are either not or barely expressed, respectively, in the tumor cell lines. Several repeat-containing ARTS products from the non-MSI SW480 cells were re-amplified and sequenced. Alignment of these sequences with GenBank-deposited sequences demonstrated the absence of artifactual insertions or deletions at repeats introduced by ARTS. Boxes indicate gel segments that are enlarged in (B) and (C). (B) ARTS identified a mutation in the (T)6 repeat of the 3′-UTR of an unknown mRNA named clone PP610 that carries a 1 bp insertion in HCT116 cells and was not detected in HCT15 cells; bases flanking the repeat that are recognised by the arbitrary ones of the R-An primer are in bold. (C) ARTS identified the (T)9 repeat in the 3′-UTR of the IMPD2 transcript that carries a 1 bp deletion in HCT15 cells and a 2 bp deletion in HCT116 cells; in the former cells the wild-type allele and in the latter cells the 1 bp deleted allele is also expressed.

Detection by ARTS of transcripts mutated at mononucleotide repeats

As shown in Figure 2, using one of the (A)6-R-An primers we identified ARTS products showing one or two base different migration in cell lines. The bands were cut and the incorporated DNA was re-amplified using the same R-An primer. Sequencing of the resulting PCR products showed that ARTS effectively identified insertions and deletions at A repeats, primarily located in the mRNA 3′-UTRs. This was expected, as the (A)n > 5 repeats are slightly more frequent in the 3′-UTRs than in CDSs and repeat mutations in UTRs are less likely to be counter-selected. All the mutations found were confirmed by analysing the same cell genomic DNA (data not shown).

To our surprise, the experiment shown in Figure 2 did not identify mutated transcripts of the TGFβRII, caspase 5 and MBD4 genes, whose mutations at (A/T)n repeats were found in the DNA of the cell lines examined (reviewed in 9). It is known that the consequence of TGFβRII gene mutation in MSI+ colorectal cancer samples and cell lines is transcript instability (21). By using specific RT–PCR we found barely detectable amounts of TGFβRII mRNA in MSI+ cell lines examined. Mutant TGFβRII mRNA was detectable with the ARTS approach only if an (A)6 primer with TGFβRII-specific bases was used at more stringent amplification conditions (data not shown). Conversely, caspase 5-specific RT–PCR showed that this gene is not expressed in the cell lines analysed (data not shown). For MBD4, the corresponding cDNA was not amplified in the experiment shown in Figure 2, as none of the arbitrary bases of the used R-An primer flank the (A)10 repeat present in the MBD4 transcript.

Using one of the (G)6-R-An primers (Fig. 3), frame shift mutations were identified in translated sequences of transcripts more frequently than in their 3′-UTR, as expected; in fact, (G)n > 5 and (C)n > 5 homopolymeric repeats were more frequent in the translated mRNA sequence. All the mutations found were confirmed by analysing the same cell genomic DNA (data not shown). Among the mRNAs identified, we found the BAX gene transcript, showing the G deletion in the ATG(G8)AG stretch in exon 3 already described in MSI+ cell lines (22). Using the same primer, we also found mutations in the WNT16, PACE and PIG-T gene transcripts (Fig. 3).

DISCUSSION

We have described here a novel procedure, that we named ARTS, and have shown that it is a genome-wide and unbiased method for recognising insertions and deletions at repeat sequences of expressed genes in MSI+ cancers. The putative MSI target genes identified by ARTS will need further validation (8). However, other than its value as a mutational scanning method, ARTS provides functional clues to the identification of mutated genes in MSI+ tumours, as: (i) it studies only genes that are expressed; (ii) it amplifies short mononucleotide repeats that are far more frequent in CDSs, but in general less frequently mutated than the longer ones in MSI+ cancers; (iii) by introducing a PCR-based amplification step, it allows the exploration of genes expressed at low levels and consequently it permits detection of genes other than housekeeping or structural protein coding genes; and (iv) it detects CDS frameshift mutations that result in truncated proteins, which are likely to turn into inactive proteins or, less frequently but importantly, into proteins with transdominant or novel functions (23).

ARTS allows an unbiased search of MSI target genes, as it is based on the random fragmentation of mRNAs and amplification of unknown sequences separating any repeat and an exogenous adapter. Unlike candidate gene studies, the ARTS approach to gene mutation discovery offers the advantage of removing preconceived biases regarding specific genes in cancer. Thus ARTS has the potential to identify novel cancer-related genes and new pathways of growth control. Although it has been demonstrated that a finite number of critical pathways controlling cell birth and death are likely to be affected in tumour progression (reviewed in 1), it is expected that different members of the same or related pathways are involved in different cancers. A good example involves the members of the RAS–RAF–ERK pathway, which mediates cell responses to growth signals and is implicated in the onset and progression of several human malignancies. In >50% of all colorectal cancers RAS is mutated to an oncogenic form. The MSI+ subset of sporadic colorectal cancers have a lower incidence of K-RAS mutation but a higher incidence of B-RAF mutation than tumours showing no defect in the mismatch repair system (24). These data support the idea that each tumour type progresses through mutations in the same cellular pathways, but that the spectrum of genes and mutations depends on the nature of the underlying causes and mechanisms, for example on different kinds of genetic instability.

To attain a genome-wide screening we have exploited the properties of primers created for arbitrarily primed PCR (25) and inter-simple sequence repeat (inter-SSR) PCR (26), which are global strategies originally designed to quantitate genetic instability. Arbitrarily primed PCR demonstrated that a single arbitrary primer allows the amplification of several related sequences, when used in PCR including a few cycles at low stringency. In inter-SSR PCR, short repeat-containing primers, anchored at the 3′ end, amplify products representative of the entire genomic DNA; the obtained products purposely do not contain the repeat sequences, but sequences between repeats in those cases where repeats are within a few kilobases of each other and in inverted orientation. In procedures that apply inter-SSR PCR to sequence identification (27,28), PCR is followed by sub-cloning of the amplified products, as the inter-SSR PCR products do not contain any known sequence useful for direct sequencing. In ARTS the goals were different. We amplified repeats together with the unknown flanking sequences by anchoring the repeat-containing primers at the 5′ end. In addition, by linking cDNA fragments to adapters we were able to directly re-amplify PCR products and to identify the intervening sequences.

ARTS enables frameshift mutation identification not only in long, but also in short repeats, which are more frequent in coding sequences but less frequently mutated in general. By in silico selection of potential targets, based on the search for coding microsatellites, we found that target genes may be all the expressed genes, as >90% of the coding sequences deposited in GenBank contain repeats: a threshold length of six bases allows detection of repeats in almost all CDSs. It is well known from studies of both mammalian and lower eukaryotic cells that the frequency of mutation in microsatellites is dependent on the number of the repeats in the tract, and that A or G tracts of 9 bp or fewer are mutated in less than 5% of cancers with MSI. Therefore, when ARTS detects frameshift mutations in short expressed microsatellites, it is more likely that the identified mutations have been selected during tumour progression.

ARTS allows direct mutational screening of expressed genes and the straight comparison of normal and tumour transcripts. We considered that, although in MSI cancers the high background of instability generates thousands of genomic alterations, only expressed genes whose mutation leads to functional consequences are selected during tumour progression. We verified that several putative MSI target genes, found mutated in colorectal cancers, are barely or not expressed either in the normal or in colorectal tumour tissues or in both. Therefore we developed ARTS, which decreases the chance of coming across ‘innocent’ bystander mutated sequences and non-pathway mutations.

We cannot rule out that ARTS could underestimate the number of frameshift mutations due to failure to detect very low abundant transcripts or transcripts showing instability. In fact it has already been demonstrated that aberrant transcripts can be degraded by a specific RNA surveillance complex (reviewed in 29). This is the case of mutant TGFβRII (21), which we did not detect with ARTS. We did not highlight this feature of the ARTS results, i.e. different intensities or absence of some PCR products in the direct comparison of parental and neoplastic cells, as ARTS cannot be regarded as a quantitative method. However, a striking increase or decrease in a certain ARTS product amount could also give information on expression variation of the corresponding transcript.

Using ARTS we have identified a number of frameshift mutated transcripts in MSI+ colorectal cancer cell lines. These transcripts can be important. In fact, although it has been shown that, at the genomic level, mutational events are significantly higher in MSI+ cell lines than in primary tumours (15), the same genes have been found mutated in cell lines and tumours in most cases. The identification by ARTS of a mutated BAX transcript, already described by many authors in MSI+ cancer cells (22,30,31), gives proof-of-concept evidence of the protocol suitability.

The mutant transcripts we identified and others that ARTS can detect should be validated as functionally relevant, according to the criteria proposed at the Bethesda consensus meeting to define target genes for instability in human cancers (8), bearing in mind more recent observations (9). ARTS, by detecting frameshift mutations in genes expressed in normal and tumour tissues gives preliminary clues to their functional relevance. Frameshift mutations can result in truncated proteins that are likely to be inactive. Although rare, the translation of a frameshift peptide that could acquire transdominant or novel function might be particularly important. In addition, ARTS can amplify in each cell line or tumour both the mutant and the wild-type alleles. Therefore, ARTS profiles in which the wild-type counterpart of a mutant allele is not amplified are likely to be highly relevant. In fact, this kind of ARTS profile can indicate biallelic inactivation due to mutation of the same repeat tract in both alleles or to either loss or epigenetic silencing of the wild-type allele.

In conclusion, in the post-genomic era we have developed a tool useful for the identification of mutated expressed genes in cancer, with the potential of contributing to the identification of new cancer genes and pathways. By increasing the probability of identifying functional genetic alterations, ARTS-driven transcriptome-wide mutational screening of expressed sequences can help in focusing more complex functional assays onto a restricted number of genes.

The value of this approach to cancer gene identification goes beyond the MSI model. In fact, genetic and epigenetic processes may lead to gene activation or inactivation through different mechanisms, but the recurring theme is the same: mutations in crucial genes controlling cell proliferation and homeostasis confer a selective advantage to cancer cells. Therefore, expressed genes identified by ARTS as mutated at repeats in MSI+ cancers might indeed be of general importance if altered through other mechanisms in non-MSI cancers.

Acknowledgments

ACKNOWLEDGEMENTS

We thank Mr Enzo De Sio and Mrs Raffaella Albano for technical help and Mrs Elaine Wright for reading the English. This work was supported by the Italian Ministry of Research and Education (MIUR) Cofin and FIRB project funding to M.F.D., E.M. and R.C. and the Italian Association for Cancer Research (AIRC) funding to M.F.D. N.C. is a FIRC fellowship recipient.

REFERENCES

  • 1.Cahill D.P., Kinzler,K.W., Vogelstein,B. and Lengauer,C. (1999) Genetic instability and darwinian selection in tumours. Trends Cell Biol., 9, M57–M60. [PubMed] [Google Scholar]
  • 2.Lengauer C., Kinzler,K.W. and Vogelstein,B. (1997) Genetic instability in colorectal cancers. Nature, 386, 623–627. [DOI] [PubMed] [Google Scholar]
  • 3.Jiricny J. and Nystrom-Lahti,M. (2000) Mismatch repair defects in cancer. Curr. Opin. Genet. Dev., 10, 157–161. [DOI] [PubMed] [Google Scholar]
  • 4.Aaltonen L.A., Peltomaki,P., Leach,F.S., Sistonen,P., Pylkkanen,L., Mecklin,J.P., Jarvinen,H., Powell,S.M., Jen,J., Hamilton,S.R. et al. (1993) Clues to the pathogenesis of familial colorectal cancer. Science, 260, 812–816. [DOI] [PubMed] [Google Scholar]
  • 5.Peltomaki P. (2001) Deficient DNA mismatch repair: a common etiologic factor for colon cancer. Hum. Mol. Genet., 10, 735–740. [DOI] [PubMed] [Google Scholar]
  • 6.Chung Y.J., Park,S.W., Song,J.M., Lee,K.Y., Seo,E.J., Choi,S.W. and Rhyu,M.G. (1997) Evidence of genetic progression in human gastric carcinomas with microsatellite instability. Oncogene, 15, 1719–1726. [DOI] [PubMed] [Google Scholar]
  • 7.Risinger J.I., Berchuck,A., Kohler,M.F., Watson,P., Lynch,H.T. and Boyd,J. (1993) Genetic instability of microsatellites in endometrial carcinoma. Cancer Res., 53, 5100–5103. [PubMed] [Google Scholar]
  • 8.Boland C.R., Thibodeau,S.N., Hamilton,S.R., Sidransky,D., Eshleman,J.R., Burt,R.W., Meltzer,S.J., Rodriguez-Bigas,M.A., Fodde,R., Ranzani,G.N. and Srivastava,S. (1998) A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res., 58, 5248–5257. [PubMed] [Google Scholar]
  • 9.Duval A. and Hamelin,R. (2002) Mutations at coding repeat sequences in mismatch repair-deficient human cancers: toward a new concept of target genes for instability. Cancer Res., 62, 2447–2454. [PubMed] [Google Scholar]
  • 10.Ionov Y., Peinado,M.A., Malkhosyan,S., Shibata,D. and Perucho,M. (1993) Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis. Nature, 363, 558–561. [DOI] [PubMed] [Google Scholar]
  • 11.Vilkki S., Launonen,V., Karhu,A., Sistonen,P., Vastrik,I. and Aaltonen,L.A. (2002) Screening for microsatellite instability target genes in colorectal cancers. J. Med. Genet., 39, 785–789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jiricny J. (1998) Replication errors: cha(lle)nging the genome. EMBO J., 17, 6427–6436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mori Y., Yin,J., Rashid,A., Leggett,B.A., Young,J., Simms,L., Kuehl,P.M., Langenberg,P., Meltzer,S.J. and Stine,O.C. (2001) Instabilotyping: comprehensive identification of frameshift mutations caused by coding region microsatellite instability. Cancer Res., 61, 6046–6049. [PubMed] [Google Scholar]
  • 14.Woerner S.M., Gebert,J., Yuan,Y.P., Sutter,C., Ridder,R., Bork,P. and von Knebel Doeberitz,M. (2001) Systematic identification of genes with coding microsatellites mutated in DNA mismatch repair-deficient cancer cells. Int. J. Cancer, 93, 12–19. [DOI] [PubMed] [Google Scholar]
  • 15.Duval A., Rolland,S., Compoint,A., Tubacher,E., Iacopetta,B., Thomas,G. and Hamelin,R. (2001) Evolution of instability at coding and non-coding repeat sequences in human MSI-H colorectal cancers. Hum. Mol. Genet., 10, 513–518. [DOI] [PubMed] [Google Scholar]
  • 16.Kim N.G., Rhee,H., Li,L.S., Kim,H., Lee,J.S., Kim,J.H., Kim,N.K. and Kim,H. (2002) Identification of MARCKS, FLJ11383 and TAF1B as putative novel target genes in colorectal carcinomas with microsatellite instability. Oncogene, 21, 5081–5087. [DOI] [PubMed] [Google Scholar]
  • 17.Park J., Betel,D., Gryfe,R., Michalickova,K., Di Nicola,N., Gallinger,S., Hogue,C.W. and Redston,M. (2002) Mutation profiling of mismatch repair-deficient colorectal cancers using an in silico genome scan to identify coding microsatellites. Cancer Res., 62, 1284–1288. [PubMed] [Google Scholar]
  • 18.Velculescu V.E., Zhang,L., Vogelstein,B. and Kinzler,K.W. (1995) Serial analysis of gene expression. Science, 270, 484–487. [DOI] [PubMed] [Google Scholar]
  • 19.Pesole G., Liuni,S., Grillo,G., Licciulli,F., Mignone,F., Gissi,C. and Saccone,C. (2002) UTRdb and UTRsite: specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Res., 30, 335–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Notterman D.A., Alon,U., Sierk,A.J. and Levine,A.J. (2001) Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma and normal tissue examined by oligonucleotide arrays. Cancer Res., 61, 3124–3130. [PubMed] [Google Scholar]
  • 21.Markowitz S., Wang,J., Myeroff,L., Parsons,R., Sun,L., Lutterbaugh,J., Fan,R.S., Zborowska,E., Kinzler,K.W., Vogelstein,B. et al. (1995) Inactivation of the type II TGF-beta receptor in colon cancer cells with microsatellite instability. Science, 268, 1336–1338. [DOI] [PubMed] [Google Scholar]
  • 22.Rampino N., Yamamoto,H., Ionov,Y., Li,Y., Sawai,H., Reed,J.C. and Perucho,M. (1997) Somatic frameshift mutations in the BAX gene in colon cancers of the microsatellite mutator phenotype. Science, 275, 967–969. [DOI] [PubMed] [Google Scholar]
  • 23.Linnebacher M., Gebert,J., Rudy,W., Woerner,S., Yuan,Y.P., Bork,P. and von Knebel Doeberitz,M. (2001) Frameshift peptide-derived T-cell epitopes: a source of novel tumor-specific antigens. Int. J. Cancer, 93, 6–11. [DOI] [PubMed] [Google Scholar]
  • 24.Rajagopalan H., Bardelli,A., Lengauer,C., Kinzler,K.W., Vogelstein,B. and Velculescu,V.E. (2002) Tumorigenesis: RAF/RAS oncogenes and mismatch-repair status. Nature, 418, 934. [DOI] [PubMed] [Google Scholar]
  • 25.Welsh J. and McClelland,M. (1990) Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res., 18, 7213–7218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Basik M., Stoler,D.L., Kontzoglou,K.C., Rodriguez-Bigas,M.A., Petrelli,N.J. and Anderson,G.R. (1997) Genomic instability in sporadic colorectal cancer quantitated by inter-simple sequence repeat PCR analysis. Genes Chromosomes Cancer, 18, 19–29. [PubMed] [Google Scholar]
  • 27.Anderson G.R., Brenner,B.M., Swede,H., Chen,N., Henry,W.M., Conroy,J.M., Karpenko,M.J., Issa,J.P., Bartos,J.D., Brunelle,J.K., Jahreis,G.P., Kahlenberg,M.S., Basik,M., Sait,S., Rodriguez-Bigas,M.A., Nowak,N.J., Petrelli,N.J., Shows,T.B. and Stoler,D.L. (2001) Intrachromosomal genomic instability in human sporadic colorectal cancer measured by genome-wide allelotyping and inter-(simple sequence repeat) PCR. Cancer Res., 61, 8274–8283. [PubMed] [Google Scholar]
  • 28.Tang J.C., Lam,K.Y., Law,S., Wong,J. and Srivastava,G. (2001) Detection of genetic alterations in esophageal squamous cell carcinomas and adjacent normal epithelia by comparative DNA fingerprinting using inter-simple sequence repeat PCR. Clin. Cancer Res., 7, 1539–1545. [PubMed] [Google Scholar]
  • 29.Hilleren P. and Parker,R. (1999) Mechanisms of mRNA surveillance in eukaryotes. Annu. Rev. Genet., 33, 229–260. [DOI] [PubMed] [Google Scholar]
  • 30.Abdel-Rahman W.M., Georgiades,I.B., Curtis,L.J., Arends,M.J. and Wyllie,A.H. (1999) Role of BAX mutations in mismatch repair-deficient colorectal carcinogenesis. Oncogene, 18, 2139–2142. [DOI] [PubMed] [Google Scholar]
  • 31.Ionov Y., Yamamoto,H., Krajewski,S., Reed,J.C. and Perucho,M. (2000) Mutational inactivation of the proapoptotic gene BAX confers selective advantage during tumor clonal evolution. Proc. Natl Acad. Sci. USA, 97, 10872–10877. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES