Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jun 1.
Published in final edited form as: Parasitol Res. 2015 Mar 19;114(6):2263–2272. doi: 10.1007/s00436-015-4419-x

Genome-wide characterization of microsatelittes and marker development in the carcinogenic liver fluke Clonorchis sinensis

Thao TB Nguyen a,b, Yuji Arimatsu a, Sung-Jong Hong c, Paul J Brindley d, David Blair e, Thewarach Laha f, Banchob Sripa a,*
PMCID: PMC4454773  NIHMSID: NIHMS673347  PMID: 25782682

Abstract

Clonorchis sinensis is an important carcinogenic human liver fluke endemic in East and Southeast Asia. There are several conventional molecular markers have been used for identification and genetic diversity, however, no information about microsatellites of this liver fluke published so far. We here report microsatellite characterization and marker development for genetic diversity study in C. sinensis using genome-wide bioinformatics approach. Based on our search criteria, a total of 256,990 microsatellites (≥ 12 base pairs) were identified from genome database of C. sinensis with hexa-nucleotide motif being the most abundant (51%) followed by penta-nucleotide (18.3%) and tri-nucleotide (12.7%). The tetra-nucleotide, di-nucleotide and mononucleotide motifs accounted for 9.75 %, 7.63% and 0.14%, respectively. The total length of all microsatellites accounts for 0. 72 % of 547 Mb of the whole genome size and the frequency of microsatellites were found to be one microsatellite in every 2.13 kb of DNA. For the di-, tri, and tetra-nucleotide, the repeat numbers redundant are six (28%), four (45%) and three (76%), respectively. The ATC repeat is the most abundant microsatellites followed by AT, AAT and AC, respectively. Within 40 microsatellite loci developed, 24 microsatellite markers showed potential to differentiate between C. sinensis and O. viverrini. Seven out of 24 loci showed heterozygous with observed heterozygosity ranged from 0.467 to 1. Four-primer sets could amplify both C. sinensis and O. viverrini DNA with different sizes. This study provides basic information of C. sinensis microsatellites and the genome-wide markers developed may be a useful tool for genetic study of C. sinensis.

Keywords: Clonorchis sinensis, microsatellite, characterization, SciRoko, genetic diversity

1. Introduction

Clonochis sinensis is a major neglected tropical diseases endemic in China, Republic of Korea, Taiwan, East Russia and northern parts of Vietnam with an estimate of 15 million people are infected globally (Dorny et al. 2009; Hong and Fang 2012; Keiser and Utzinger 2009). Infection with this liver fluke can lead to several hepatobiliary diseases including cholangiocarcinoma as it is classified as a Group I carcinogen by the International Agency for Research on Cancer, WHO (Bouvard et al. 2009; Hong and Fang 2012). Since the parasite is distributed over a half of Asia, genetic diversity in the endemic regions has been reported using conventional molecular markers such as mtDNA and rRNA (Lee and Huh 2004; Liu et al. 2012; Park 2007; Tatonova et al. 2012; Tatonova et al. 2013; Xiao et al. 2013). These genetic variations of C. sinensis may influence disease presentation, host specificity and other biological functions (Choi 1984; Saijuntha et al. 2007; Hong and Fang 2012; Sun et al. 2013). Nowadays microsatellite is increasing popular in application for genetic diversity study. However, to date, information of C. sinensis microsatellites have not been reported.

Microsatellites or simple sequence repeat (SSRs) are stretches of DNA consisting of tandemly repeated short motif of 1–6 base pairs in length and are the powerful marker for population genetic studies of different species (Gower et al. 2011; Tang et al. 2005; Toth et al. 2000). Compared with other molecular markers, microsatellites are more advantageous because of high abundance in the genome, co-dominant inheritance, and easy to detect by PCR amplification (Kim et al. 2012; Toth et al. 2000). Several studies used microsatellite to examine population genetics of parasitic trematodes (Hurtrez-Boussès et al. 2004; Laoprom et al. 2010; Shrivastava et al. 2003). However, these studies developed microsatellite markers by conventional method using library screening for repeated sequences which is laborious, costly and often inefficient. In the post-genomic era, identification of microsatellites from species which genome sequences are available become faster and more cost effective by using bioinformatics such as Schistosoma japonicum (Xiao et al. 2011). Fortunately, a draft genome of C. sinensis has been recently published (Wang et al. 2011). Hence, the aim of this study was to identify and characterize microsatellites in C. sinensis genome using bioinformatic approach, and to develop potential microsatellite markers for studying genetic diversity.

2. Materials and methods

2.1 Parasite and DNA extraction

C. sinensis metacercariae were obtained from naturally infected Pungtungia herzi fish from Jinju, Gyeongsangnam-do, Republic of Korea. The metacercariae were used to infect male Syrian golden hamsters at animal husbandry facilities of the Faculty of Medicine, Khon Kaen University. All animals were kept under a conventional condition and fed with stock diet and water ad libitum. The hamsters were sacrificed at 2 months post-infection, adult C. sinensis flukes were recovered from the bile ducts and gallbladder. Protocols for vertebrate animal studies were complied to the Ethics Guideline of the Animal Experimentation of the National Research Council of Thailand. Adult worms of C. sinensis were kindly provided by Dr. Do Trung Dung, National Institute of Malariology, Parasitology and Entomology, Hanoi, Vietnam. C. sinensis adult worms from China were also obtained from Professor Xing-Guan Zhu in Guangzhou, China.

Genomic DNA of C. sinensis adult worms was extracted using Gentra Puregene Tissue Kit (QIAGEN, Germany) according to the manufacturer’s instructions. The DNA was dissolved with 30–50 μl sterile water and kept −20 °C until using. Control DNA from O. viverrini, was extracted by the same method as described for C. sinensis adults.

2.2 Identification of microsatellites

Draft genome of C. sinensis has been published using de novo sequencing (Wang et al. 2011). The genome database was downloaded from http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=BADR02. For identification of C. sinensis microsatellites, we used bioinformatics tool – SciRoKo version 3.3 (Kofler et al. 2007). Identification criteria were perfect mono- to hexa-nucleotide repeats with a minimum length of 12 bases, and minimum repeat of 2 repeat units. This means that there should be at least twelve occurrences of mono-nucleotide, six occurrences of a di-, four occurrences of a tri-, three occurrences of a tetra-, 2.5 occurrences of a penta- and two occurrences of hexa-nucleotide repeats (Toth et al. 2000). The microsatellite numbers, motifs, repeat numbers, length of the repeat, repeat type, start and end position of the repeat and microsatellite sequences were analyzed.

2.3 Localization of microsatellites

Genomics scaffolds for C. sinensis were downloaded from http://fluke.sysu.edu.cn/ and builded an index of the genome using the GMAP program (Wu and Nacu 2010). All the microsatellites and their flanking sequences were then mapped to the index using GMAP and the coordinates compared to the coordinates of annotated genes which contained in a gff file download from http://fluke.sysu.edu.cn/ by using the ‘intersect’ tool from bedTools v2.22 (Quinlan and Hall 2010). Remaining sequences were then compared to annotated C. sinensis coding sequences which are downloaded from http://fluke.sysu.edu.cn/ using blast. Sequences with high scoring matches to known coding sequences were discarded. Finally, remaining sequences were compared to the genome using blast and those without high scoring matches were discarded.

2.4 Mining of microsatellites

From all microsatellites which were detected by SciRoKo, we selected only di-, tri- and tetra-nucleotide repeats, which were > 10 and < 25 repeat numbers and that had far apart with other microsatellites to avoid linkage disequilibrium and were not closed to both ends of contigs. BatchPrimer3 program (You et al. 2008) was used to design primers flanking the putative microsatellites. The input parameters for the program were: primer length 20 bp, optimal primer GC content 50% (range 40–60%), optimal primer TM 60°C (range 59.9 – 60.1°C) and an estimated product size of 100–600 bp. We chosed the microsatellites which had a single hit to the C. sinensis genome from different contigs and generated a range of products sizes suitable for multiplex PCR in the future.

2.5 PCR and gel electrophoresis

Genomic DNA of adult C. sinensis and O. viverrini were used as PCR templates. The microsatellite markers of C. sinensis were amplified using microsatellite specific primers as described above. PCR amplification was performed on a thermocycler (Applied Biosystem). All PCR assays were carried out in a volume of 25 μl containing 12.5 μl of 2X Gotaq Colorless Master Mix (Promega Corporation, WI, USA), 0.2 μM of each primer, 50 ng of DNA template. The PCR reactions were performed in a thermocycler including denaturation 95°C for 5 minutes followed by 30 cycles of 95°C for 30 seconds, annealing (59°C, 60°C, 61°C) for 30 seconds, 72°C for 30 seconds, and a final extension at 72°C for 5 minutes. Amplified products were electrophoresed on 1.5 % agarose gel with 1X TAE buffer, stained with ethidium bromide (EtBr) and photographed under UV illumination.

2.6 Polymorphism analysis

The number of alleles, and the observed and expected heterozygosity at each locus was estimated by calculation following this basic formula:

ObvervedheterozygosityHo=-ExpectedheterozygosityHe=1-

Where pi = frequency of the marker allele.

The polymorphism information content (PIC) was calculated using this formula:

PIC=1--2=1--+

where pi = frequency of the marker allele, and n= number of different alleles (Hildebrand CE et al. 1992).

2.7 Genetic diversity analysis

Amplicons were cloned into plasmid T-vector for propagation using a cloning kit pJET (Themo Scientific, USA). Escherichia coli (strain JM109) cells were transformed with cloned products and plasmid DNA was isolated from the bacteria using the GeneJET Plasmid Miniprep Kit (Thermo Scientific, USA). DNA sequencing of the recombinant plasmid was analyzed using an automated fluorescent DNA sequencing using the Big dye terminator method (1st Base, Malaysia). Sequences were edited, contig assembled, and aligned using clustalW alignment in BioEdit software package (Hall 1999).

3. Results

3.1 Abundance and microsatellite characteristics

One to six nucleotides repeat motif with perfect microsatellites without mismatches and comprising a minimum of 12 bases were searched. A total of 256,990 perfect microsatellites were identified from 6,190 contigs of C. sinensis genome of using SciRoKo. The microsatellite density was 467.7 loci per mega base (Mb). Microsatellites were not present in every contigs. Among the 1–6 repeat motifs, the most abundant repeat motif was the hexa-nucleotides, which accounted for 51.5% of the total, followed by penta-nucleotides (18.3%) and tri-nucleotides (12.7%). The tetra-nucleotides, di-nucleotides and mono-nucleotide motifs accounted for only 9.75 %, 7.63% and 0.14%, respectively. Microsatellite distribution of C. sinensis genome is shown in Table 1.

Table 1.

Motif statistic of C. sinensis microsatellites.

Motif Total counts Distribution (%) Average_Length Counts/Mbp
Mononucleotide 363 0.14 12.41 0.66
Dinucleotide 19,611 7.63 16.83 35.85
Trinucleotide 32,639 12.7 23.76 59.66
Tetranucleotide 25,063 9.75 21.33 45.81
Pentanucleotide 47,037 18.3 13.46 85.97
Hexanucleotide 132,277 51.5 12.58 241.78

The most abundant repeat motif is hexa-nucleotides and 99 % were 2 repeats. The very rare repeat number ranged from 5 to 20 and no locus has more than 100 repeats. Similarly, about 90% of penta-nucleotide were 2 repeats, the longest length of C. sinensis microsatellite belong to this group was (GTTCA)650. Microsatellite abundancy decreased significantly as the motif repeat number increased. For the di-, tri, and tetra-nucleotides, the repeat numbers redundant were 6 repeats (28%), 4 repeats (45%) and 3 repeats (76%), respectively (Figure 1 and Table 2). In decreasing order, the top 20 most frequently occurring microsatellites were ATC, AT, AAT, AC, AAAAAT, AAAAAC, AG, AAAAC, AAAC, AAAT, AATC, AAAAT, AAAATG, ATAG, AGAGGC, AAC, AACTG, AAATGC, AAATG and AAAAAG (Figure 2).

Figure 1.

Figure 1

Distribution with respect to the motif repeat number of the individual mono- to hexanucleotide repeat microsatellites in the whole genome sequences of C. sinensis. The vertical axis shows the abundances of microsatellites that have different motif repeat number (from 2 to >20), which are discriminated by legends of different colours.

Table 2.

Abundance and distribution of microsatellite motif number repeats.

Repeats Motif
Mono- Di- Tri- Tetra- Pen- Hexa-

Counts Dist(%) Counts Dist(%) Counts Dist(%) Counts Dist(%) Counts Dist(%) Counts Dist(%)
2 - - - - 41,906 89.1 131,191 99.2
3 - - - 19,156 76.4 3,998 8.5 913 0.7
4 - - 14,728 45.1 2,380 9.5 606 1.3 111 0.1
5 - - 6,882 21.1 919 3.7 151 0.3 18 0.0
6 - 5,443 27.8 3,466 10.6 389 1.6 50 0.1 15 0.0
7 - 4,525 23.1 1,718 5.3 267 1.1 41 0.1 4 0.0
8 - 3,639 18.6 1,034 3.2 146 0.6 24 0.1 3 0.0
9 - 2,422 12.4 667 2.0 105 0.4 28 0.1 3 0.0
10 - 1,310 6.7 455 1.4 168 0.7 19 0.0 1 0.0
11 - 729 3.7 372 1.1 54 0.2 13 0.0 0 0.0
12 284 78 439 2.2 284 0.9 61 0.2 13 0.0 6 0.0
13 41 11 283 1.4 274 0.8 64 0.3 18 0.0 0 0.0
14 22 6 189 1.0 162 0.5 57 0.2 16 0.0 2 0.0
15 10 3 118 0.6 146 0.4 43 0.2 8 0.0 0 0.0
16 4 1 82 0.4 138 0.4 39 0.2 22 0.0 1 0.0
17 1 0 73 0.4 122 0.4 56 0.2 13 0.0 1 0.0
18 0 0 76 0.4 96 0.3 45 0.2 15 0.0 0 0.0
19 0 0 76 0.4 86 0.3 46 0.2 14 0.0 1 0.0
20 0 0 63 0.3 90 0.3 46 0.2 14 0.0 0 0.0
>20 1 0 143 0.7 1,954 6.0 1,122 4.5 67 0.1 9 0.0

Total 363 19,611 32,639 25,063 47,037 132,277

Figure 2.

Figure 2

The 20 most frequently occurring microsatellites in C. sinensis.

For di-nucleotide repeats, the AT repeats were the most abundant (56%) followed by AC (29.3%), AG (14.3%) and CG repeat (0.4%). Of the tri-nucleotide repeats, the most abundant type was ATC (58.5%), followed by AAT (21.8%) and CCG (0.2%). The AAAC (10.6%) repeat was the most abundant among tetra-nucleotide repeats followed by AAAT (10.5%), AATC (8.7%). AAAAC (5.9%) and AAAAAT (2.4%). The longest microsatellites of mono-nucleotide, di-, tri-, tetra-, penta-, and hexa-nucleotide were (A)24, (GT)265, (TCA)209, (CTAT)682, (GTTCA)650, (CTAACC)61, respectively (Table 3).

Table 3.

Most common and the longest microsatellites of the motifs.

Repeat Mono- Di- Tri- Tetra- Penta- Hexa-

motif (%) motif (%) Motif (%) motif (%) motif (%) motif (%)
Common C/G 69.40 AC 55.94 ATC 58.48 AAAC 10.62 AAAAC 5.90 AAAAAT 2.42
A/T 30.58 AC 29.33 AAT 21.80 AAAT 10.51 AAAAT 4.63 AAAAAC 2.27
AG 14.27 AAC 5.64 AATC 8.74 AACTG 3.80 AAAATG 1.46
CG 0.44 ACC 3.89 ATAG 7.56 AAATG 3.31 AGAGGC 1.42
AGC 3.30 AATG 6.11 AAAAG 2.68 AAATGC 1.22
other 6.90 other 56.46 other 79.68 other 91.21

Longest (A)24 (GT)265 (TCA)209 (CTAT)682 (GTTCA)650 (CTAACC)61

3.2 Localization of microsatellites

Among the total 256,990 microsatellites, there were 148,450 microsatellites and their flanking regions which are not belong any known genes in the C. sinensis genome. 103,018 microsatellites were localized intra-genomic or coding DNA regions. 5522 sequences could not be aligned with any genes in the parasite genome.

3.3 Microsatellite PCR

We selected 1,677 microsatellites to design primers by using BatchPrimer3 as the criteria described in the Materials and Methods and 541 primer pairs were satisfactorily generated. Among these, 40 out of 72 microsatellite loci were randomly chosen based on the criteria described above. Of the 40 microsatellite PCR, 7 did not amplify C. sinensis DNA and 4 primer sets successfully amplified both C. sinensis and O. viverrini (Figure 3). Twenty-nine (72.5%) were successfully amplified with correct PCR products (Figure 4). However, 5 out of 29 primer sets showed weak bands. Therefore, we selected 24 primer sets for further study and all details are shown in Table 4.

Figure 3.

Figure 3

Agarose gel electrophoresis of specific amplification of C. sinensis (Cs) and O. viverrini (Ov). Lane M represents a DNA size marker.

Figure 4.

Figure 4

Agarose gel electrophoresis of specific amplification of the microsatellites of C. sinensis using C. sinensis – specific primers. Lane 1–40 represent PCR products amplified from genomic DNA extracted from C. sinensis using CsMs1 to CsMs40. Lane M represents a DNA size marker.

Table 4.

Characteristics of 24 microsatellites and primer sets.

Name Primer sequences (5′-3′) Repeat motif Product size (bp) Contig Position
CsMs1 F: AACGACCACATGGTCACTCA
R: AACAGTGCCCAAGTCCAAAC
(TA)11 212 174 52353 – 52374
CsMs6 F: ATTCGCCACCACACCATAAT
R: GTCACAGCAGGAACTGCAAA
(TAA)12 282 1769 18689 – 18725
CsMs7 F: GCATCAGCAGTCATCCTTGA
R: GAGATCACGGTCACCTGGTT
(AT)14 198 1796 165825 – 165853
CsMs10 F: ATCCGTGTATCCCCATTTCA
R: CGTGAATGTACGACCACCTG
(CA)10 239 4119 6420 – 6439
CsMs11 F: ATTGAAGGAGCGGAACCTTT
R: CCAACGATGTGTGACTGTCC
(ATC)11 144 737 119036 – 119069
CsMs13 F: TCAAGGGGTTCAGTGAAAGG
R: TGCCCAAAGATTTTCCAAAG
(AT)16 190 6190 253296 – 253327
CsMs14 F: TGGTTGGTTCATTGCTTCAA
R: ACACAACCCCCAACCTATCA
(AT)10 199 377 197583 – 197602
CsMs17 F: CTCCATTTGGGCAGACAGAT
R: GACCACCTACGCTGTTTGGT
(ATC)16 259 552 200071– 200119
CsMs18 F: ACTGCGTTCTCCTGTTCGTT
R: TCGGAATCCTCAGAATGGAC
(TAGA)15 272 734 49232 – 49294
CsMs20 F: ATGTCCACCCTTCTGTCTGG
R: AGCATGCTAAAACCGAATGG
(TA)13 295 3114 231204 – 231229
CsMs21 F: GTGCTTCAGGTTTGGTGGTT
R: GCCTAATGAGCCCAACGATA
(TA)11 303 118 184400 – 184421
CsMs22 F: GAAAGTAGCCATTCGGACCA
R: TAAGGCCAAATCAGCCATTC
(TGA)17 326 3452 61392 – 61442
CsMs23 F: TTCTATCGCTCGTGTGTTGC
R: TACTCGAATGGATGGGAAGC
(GT)10 358 2191 7596 – 7616
CsMs24 F: CCTTGATCAGGTCAGGGAAA
R: ATGAGGGCCATTGCAGTTAC
(AT)11 368 667 25071 – 25092
CsMs25 F: TATGGCCACTGAAAACCACA
R: GACCACCACCTCAGAGGAAA
(AT)15 377 334 379501 – 379531
CsMs26 F: GTCTTCACCCGACGTTTGTT
R: CCCTGTTTGGAAGCACAAAT
(TTTG)12 411 830 125495 – 125545
CsMs28 F: TTGTTGGTCAGTACGCTTGC
R: GCTCCGTCACTGTGAGAACA
(TC)12 430 619 97134 – 97158
CsMs29 F: CACCGGTGTCTTCCTTTGTT
R: AGGAGATGAGAAACGCGAGA
(TA)11 441 1630 147989 – 148011
CsMs31 F: GCTTCACAGACAGTGCGGTA
R: AAAACAGGGCATTCGACATC
(AT)11 479 578 14935 – 14956
CsMs33 F: ATCAAAGGCCACAGTGAACC
R: TATGCGAGCGAAAATGTCAG
(TC)14 496 1853 82134 – 82162
CsMs35 F: GTTTGCGCTTAGTCACCACA
R: GTCGGTCAACAATCGGAACT
(AT)12 514 602 58509 – 58533
CsMs37 F: TTCTCATCAACCCGGTAAGC
R: GGTTGTTTCGGACAGGAAAA
(TG)18 536 1879 45493 – 45529
CsMs39 F: TTTGGGAAGTTGGAAACAGG
R: CGTTAGCAAACGTGCGATTA
(TG)10 561 3391 62759 – 62779
CsMs40 F: ATTTTGGGTGGGATGAAACA
R: TGTGGATCACCGAACACAGT
(ATC)23 579 2082 20496 – 20564

3.4 Microsatellite polymorphism

Of the 24 primer pairs, 7 pairs amplified distinct bands among individual worms from different locations. The seven polymorphic markers are shown in Table 5. Two typical microsatellite profiles obtained using primer pairs CsMs11 and CsMs17 are shown in Figure 5. The number of alleles varies from 3 to 8 with an average of 5 alleles per locus. The observed and expected heterozygosity ranged from 0.467 to 1 and from 0.591 to 0.781, respectively. The polymorphism informative content ranged from 0.504 to 0.733 with an average of 0.622, all 7 microsatellite markers were highly informative (PIC>5).

Table 5.

Microsatellite markers and their polymorphism characteristics.

Name Primer sequences (5′-3′) Repeat motif Size No. of alleles Ho He Fs
CsMs1 F: AACGACCACATGGTCACTCA
R: AACAGTGCCCAAGTCCAAAC
(TA)11 212–283 3 0.467 0.629 0.257552
CsMs10 F: ATCCGTGTATCCCCATTTCA
R: CGTGAATGTACGACCACCTG
(CA)10 240–590 5 0.5 0.679 0.263623
CsMs11 F: ATTGAAGGAGCGGAACCTTT
R: CCAACGATGTGTGACTGTCC
(ATC)11 144–285 6 0.762 0.709 −0.07475
CsMs17 F: CTCCATTTGGGCAGACAGAT
R: GACCACCTACGCTGTTTGGT
(ATC)16 238–692 8 0.619 0.761 0.186597
CsMs18 F: ACTGCGTTCTCCTGTTCGTT
R: TCGGAATCCTCAGAATGGAC
(TAGA)15 302–495 7 1 0.781 −0.28041
CsMs21 F: GTGCTTCAGGTTTGGTGGTT
R: GCCTAATGAGCCCAACGATA
(TA)11 300–447 3 0.474 0.591 0.19797
CsMs25 F: TATGGCCACTGAAAACCACA
R: GACCACCACCTCAGAGGAAA
(AT)15 390–410 3 0.765 0.651 −0.17512

Figure 5.

Figure 5

Agarose gel electrophoresis of microsatellites CsMs11 and CsMs17 of 19 samples. Samples from Vietnam (V), from Korea (K) and from China (C). Lane M represents a DNA size marker.

3.5 Genetic diversityof C. sinensis by sequencing

Three microsatellites (CsMs26, CsMs28, CsMs33) were used in genotyping of 15 individual C. sinensis worms from China, Korea and Vietnam. We interested only in the number of repeats of microsatellites. The sequencing results showed no difference in repeat number of the microsatellites CsMs28 between Chinese and Vietnamese samples. However, C. sinensis samples from Korea presented two different repeat numbers compared to those from China and Vietnam. The repeat numbers of all samples were less than the reference sequence database. For the marker CsMs26, only sample V5 from Vietnam contained 8 repeat numbers, while the rest showed no variation among isolates from Vietnam, China and Korea. For the marker CsMs33, there were 11 alleles among individual samples from three locations ranging from 9 to 23 repeats. Overall, the allele numbers of the microsatellites CsMs26, CsMs28 and CsMs33 were 2, 3 and 11, respectively. The locus CsMs28 was shown to be monomorphic in the liver fluke samples from China and Vietnam. The locus CsMs26 was monomorphic in the samples from China and Korea. Microsatellites CsMs33 showed the highest polymorphism among the three microsatellites examined.

4. Discussion

Microsatellites have been documented as a high potential molecular tool for genetic diversity study in various organisms including trematodes (Gower et al. 2011; Hurtrez-Boussès et al. 2004; Laoprom et al. 2010; Shrivastava et al. 2003; Xiao et al. 2011). Our study here is the first report to our knowledge of genome-wide identification and characterization of microsatellites of the liver fluke, C. sinensis. We have identified C. sinensis microsatellites using bioinformatic tools and certain microsatellite markers were developed successfully. The levels of polymorphism were different among the microsatellite markers as determined from C. sinensis isolates from 3 different locations. Our study at least provides basic information of C. sinensis microsatellites.

Microsatellite distribution and density in animal genomes varies upon species. The SciRoKo program has been successfully used for mining microsatellites in several species including trematode parasites (Xiao et al., 2011) and plant (Joshi et al. 2010). In our investigation, 256,990 perfect microsatellites were found in the C. sinensis genome by using the SciRoKo and were accounted for 0.72% of total genome sequences. The microsatellite density is estimated at one microsatellite for every 2.13 kb of DNA and an ATC repeat occur every 29.4 kb of DNA. Compared to human genome, one microsatellite is found in every 6 kb of the genome (Beckmann and Weber 1992). For trematode microsatellites identified by SciRoKo, S. japonicum genome contains more microsatellites than C. sinensis (316,225 microsatellites), even the genome size of S. japonicum (402.7 Mb) was smaller (Nguyen B.T., unpublished). In protozoa, genome coverage by microsatellites in Plasmodium falciparum was 23% (Sharma et al. 2007) and SSR frequency was 9 times greater than those of our C. sinensis.

The number and sequence of repeats of microsatellites also vary according to species. Our study showed a high abundance of hexa-nucleotides (51%) repeats over other microsatellites. This may be a characteristic of C. sinensis. In general, the frequency of mono-nucleotide repeats is higher than other microsatellites repeat motif in eukaryotic genomes (Sharma et al. 2007; Toth et al. 2000). In addition, the major motifs (ATC, AT, AAT, AAAAAT, AAAT, AAAAAT) were all A/T rich while the scarce motifs were mostly C/G rich. However, in our C. sinensis results, the ATC repeat is the most abundant microsatellites. In other species, for example, di-nucleotide repeats AC are the most abundant in Takifugu rubripes and CAG/CAA is the most abundant type in Ceratocystis fimbriata genome (Cui et al. 2005; Simpson et al. 2013). In our study, poly (C/G) is more abundant than poly (A/T) sequences which is in contrast to that of other eukaryotic (Toth et al. 2000). However, the most common di-nucleotides microsatellite of C. sinensis (TA and AC/GT) were similar to many eukaryotic genome including O. viverrini and some parasites (Laoprom et al. 2010; Toth et al. 2000). However, the number and sequences of microsatellite may vary depending on the software program used, the database, criteria for selection and relevant parameters for mining the microsatellites.

For genetic diversity of the liver fluke C. sinensis among three different regions (Korea, China and Vietnam), we found both heterozygous and homologous microsatellites. High levels of polymorphism were found in 7 heterozygous with observed heterozygosity from 0.474 to 1. We also observed that number of alleles per locus is positively correlated with the length of repeat region because long loci causes a higher mutation rates than the short ones (Ellegren 2000; Schlotterer 2000). For our three mincrosatellite sequencing studied, we observed that Chinese and Vietnamese C. sinensis isolates were similar in the number of repeats for the microsatellite CsM28. The CsM33 marker is the most polymorphic (11 alleles), while CsM26 had only 2 alleles. Variation of microsatellites in our study mainly differ in the repeat numbers while other genetic markers, such as rDNA or mtDNA, the difference is in its sequence (Lee and Huh 2004; Liu et al. 2012; Xiao et al. 2013).

In conclusion, our study first reported the genome-wide microsatellites of C. sinensis. We identified 256,990 microsatellites in whole genome of C. sinensis by using SciRoKo program. The twenty-four microsatellite markers were developed and could amplify C. sinensis DNA with promising application in the study of genetic diversity. However,, further research should be done in more microsatellite markers with large number of samples from more geographic locations.

Acknowledgments

This work was supported by the Higher Education Research Promotion and National Research University Project of Thailand, Office of the Higher Education Commission, through the Health Cluster (SHeP-GMS), Khon Kaen University, Thailand; the Thailand Research Fund (TRF); the National Institute of Allergy and Infectious Diseases (NIAID), award number P50AI098639 and the United States Anny Medical Research and Materiel Command (USAMRMC), contract number W81XWH-12-C-0267. NBT is a supported by Faculty of Medicine, Khon Kaen University scholarship. BS is a Senior TRF Scholar.

Footnotes

The content is solely the responsibility of the authors and does not necessarily represent the official views of the USAMRMC, NIAID or the NIH or the funders.

References

  1. Beckmann JS, Weber JL. Survey of human and rat microsatellites. Genomics. 1992;12:627–631. doi: 10.1016/0888-7543(92)90285-z. http://dx.doi.org/10.1016/0888-7543(92)90285-Z. [DOI] [PubMed] [Google Scholar]
  2. Bouvard V, et al. A review of human carcinogens--Part B: biological agents. Lancet Oncol. 2009;10:321–322. doi: 10.1016/s1470-2045(09)70096-8. [DOI] [PubMed] [Google Scholar]
  3. Choi DW. Clonorchis sinensis: life cycle, intermediate hosts, transmission to man and geographical distribution in Korea. Arzneimittelforschung. 1984;34:1145–1151. [PubMed] [Google Scholar]
  4. Cui J-Z, Shen X-Y, Yang G-P, Gong Q-L, Gu Q-Q. Characterization of microsatellite DNAs in Takifugu rubripes genome and their utilization in the genetic diversity analysis of T. rubripes and T. pseudommus. Aquaculture. 2005;250:129–137. http://dx.doi.org/10.1016/j.aquaculture.2005.04.041. [Google Scholar]
  5. Dorny P, Praet N, Deckers N, Gabriel S. Emerging food-borne parasites. Vet Parasitol. 2009;163:196–206. doi: 10.1016/j.vetpar.2009.05.026. [DOI] [PubMed] [Google Scholar]
  6. Ellegren H. Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet. 2000;16:551–558. doi: 10.1016/s0168-9525(00)02139-9. S0168-9525(00)02139-9 [pii] [DOI] [PubMed] [Google Scholar]
  7. Gower CM, et al. Population genetics of Schistosoma haematobium: development of novel microsatellite markers and their application to schistosomiasis control in Mali. Parasitology. 2011;138:978–994. doi: 10.1017/S0031182011000722. S0031182011000722 [pii] [DOI] [PubMed] [Google Scholar]
  8. Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acid Symposium Series. 1999;4:95–98. [Google Scholar]
  9. Hildebrand CE, Torney DC, Wagner RP. Informativeness of polymorphic DNA markers. Los Alamos Sci. 1992;20:100–102. [Google Scholar]
  10. Hong S-T, Fang Y. Clonorchis sinensis and clonorchiasis, an update. Parasitol Int. 2012;61:17–24. doi: 10.1016/j.parint.2011.06.007. [DOI] [PubMed] [Google Scholar]
  11. Hurtrez-Boussès S, et al. Isolation and characterization of microsatellite markers in the liver fluke (Fasciola hepatica) Molecular Ecology Notes. 2004;4:689–690. doi: 10.1111/j.1471-8286.2004.00786.x. [DOI] [Google Scholar]
  12. Joshi RK, Kuanar A, Mohanty S, Subudhi E, Nayak S. Mining and characterization of EST derived microsatellites in Curcuma longa L. Bioinformation. 2010;5:128–131. doi: 10.6026/97320630005128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Keiser J, Utzinger J. Food-borne trematodiases. Clin Microbiol Rev. 2009;22:466–483. doi: 10.1128/CMR.00012-09. 22/3/466 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kim J, Choi J-P, Ahmad R, Oh S-K, Kwon S-Y, Hur C-G. RISA: a new web-tool for Rapid Identification of SSRs and Analysis of primers. Genes & Genomics. 2012;34:583–590. doi: 10.1007/s13258-012-0032-x. [DOI] [Google Scholar]
  15. Kofler R, Schlotterer C, Lelley T. SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007;23:1683–1685. doi: 10.1093/bioinformatics/btm157. btm157 [pii] [DOI] [PubMed] [Google Scholar]
  16. Laoprom N, et al. Microsatellite loci in the carcinogenic liver fluke, Opisthorchis viverrini and their application as population genetic markers. Infect Genet Evol. 2010;10:146–153. doi: 10.1016/j.meegid.2009.11.005. S1567-1348(09)00232-9 [pii] [DOI] [PubMed] [Google Scholar]
  17. Lee SU, Huh S. Variation of nuclear and mitochondrial DNAs in Korean and Chinese isolates of Clonorchis sinensis. Korean J Parasitol. 2004;42:145–148. doi: 10.3347/kjp.2004.42.3.145. D - NLM: PMC2717366 EDAT- 2004/09/24 05:00 MHDA- 2004/11/09 09:00 CRDT- 2004/09/24 05:00 AID - 200409145 [pii] PST - ppublish. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Liu GH, et al. Genetic variation among Clonorchis sinensis isolates from different geographic regions in China revealed by sequence analyses of four mitochondrial genes. J Helminthol. 2012;86:479–484. doi: 10.1017/S0022149X11000757. [DOI] [PubMed] [Google Scholar]
  19. Park GM. Genetic comparison of liver flukes, Clonorchis sinensis and Opisthorchis viverrini, based on rDNA and mtDNA gene sequences. Parasitol Res. 2007;100:351–357. doi: 10.1007/s00436-006-0269-x. [DOI] [PubMed] [Google Scholar]
  20. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. D - NLM: PMC2832824 EDAT- 2010/01/30 06:00 MHDA- 2010/06/22 06:00 CRDT- 2010/01/30 06:00 PHST- 2010/01/28 [aheadofprint] PHST- 2010/02/03 [aheadofprint] AID - btq033 [pii] AID. PST - ppublish. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Saijuntha W, et al. Evidence of a species complex within the food-borne trematode Opisthorchis viverrini and possible co-evolution with their first intermediate hosts. Int J Parasitol. 2007;37:695–703. doi: 10.1016/j.ijpara.2006.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Schlotterer C. Evolutionary dynamics of microsatellite. DNA Chromosoma. 2000;109:365–371. doi: 10.1007/s004120000089. [DOI] [PubMed] [Google Scholar]
  23. Sharma PC, Grover A, Kahl G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol. 2007;25:490–498. doi: 10.1016/j.tibtech.2007.07.013. S0167-7799(07)00239-9 [pii] [DOI] [PubMed] [Google Scholar]
  24. Shrivastava J, Barker GC, Johansen MV, Xiaonong Z, Aligui GD, McGarvey ST, Webster JP. Isolation and characterization of polymorphic DNA microsatellite markers from Schistosoma japonicum. Molecular Ecology Notes. 2003;3:406–408. doi: 10.1046/j.1471-8286.2003.00466.x. [DOI] [Google Scholar]
  25. Simpson MC, Wilken PM, Coetzee MP, Wingfield MJ, Wingfield BD. Analysis of microsatellite markers in the genome of the plant pathogen Ceratocystis fimbriata. Fungal Biol. 2013;117:545–555. doi: 10.1016/j.funbio.2013.06.004. S1878-6146(13)00090-1 [pii] [DOI] [PubMed] [Google Scholar]
  26. Sun J, et al. Low Divergence of in China Based on Multilocus Analysis. PLoS One. 2013;18:e67006. doi: 10.1371/journal.pone.0067006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Tang S, Popongviwat A, Klinbunga S, Tassanakajon A, Jarayabhand P, Menasveta P. Genetic heterogeneity of the tropical abalone (Haliotis asinina) revealed by RAPD and microsatellite analyses. J Biochem Mol Biol. 2005;38:182–190. doi: 10.5483/bmbrep.2005.38.2.182. [DOI] [PubMed] [Google Scholar]
  28. Tatonova YV, Chelomina GN, Besprosvannykh VV. Genetic diversity of nuclear ITS1–5.8S–ITS2 rDNA sequence in Clonorchis sinensis Cobbold, 1875 (Trematoda: Opisthorchidae) from the Russian Far East Parasitol Int. 2012;61:664–674. doi: 10.1016/j.parint.2012.07.005. http://dx.doi.org/10.1016/j.parint.2012.07.005. [DOI] [PubMed] [Google Scholar]
  29. Tatonova YV, Chelomina GN, Besprozvannykh VV. Genetic diversity of Clonorchis sinensis (Trematoda: Opisthorchiidae) in the Russian southern Far East based on mtDNA cox1 sequence variation Folia parasitologica. 2013;60:155–162. doi: 10.14411/fp.2013.017. [DOI] [PubMed] [Google Scholar]
  30. Toth G, Gaspari Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000;10:967–981. doi: 10.1101/gr.10.7.967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wang X, et al. The draft genome of the carcinogenic human liver fluke Clonorchis sinensis. Genome Biol. 2011;12:R107. doi: 10.1186/gb-2011-12-10-r107. gb-2011-12-10-r107 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010;26:873–881. doi: 10.1093/bioinformatics/btq057. D - NLM: PMC2844994 EDAT- 2010/02/12 06:00 MHDA- 2010/06/22 06:00 CRDT- 2010/02/12 06:00 PHST- 2010/02/10 [aheadofprint] AID - btq057 [pii] AID. PST - ppublish. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Xiao JY, et al. Genetic variation among Clonorchis sinensis isolates from different hosts and geographical locations revealed by sequence analysis of mitochondrial and ribosomal DNA regions. Mitochondrial DNA. 2013;24:559–564. doi: 10.3109/19401736.2013.770490. [DOI] [PubMed] [Google Scholar]
  34. Xiao N, Remais J, Brindley PJ, Qiu D, Spear R, Lei Y, Blair D. Polymorphic microsatellites in the human bloodfluke, Schistosoma japonicum, identified using a genomic resource. Parasit Vectors. 2011;4:13. doi: 10.1186/1756-3305-4-13. 1756-3305-4-13 [pii] [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. You F, et al. BatchPrimer3: A high throughput web application for PCR and sequencing primer design. BMC Bioinformatics. 2008;9:253. doi: 10.1186/1471-2105-9-253. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES