Abstract
Orang Asli is the aboriginal people in Peninsular Malaysia who have been recognized as indigenous to the country and still practicing traditional lifestyle. The molecular interest on the Orang Asli started when the earliest prehistoric migration occurred approximately 200 kya and entering Peninsular Malaysia 50 kya in stages. A total of three groups of Orang Asli present in Peninsular Malaysia, namely, Negrito also known as Semang, Senoi and Proto Malays. Through records, there is no research has been conducted on mtDNA variations in the Semoq Beri population, one of the tribes in Senoi group. In this report, variations of mtDNA were analysed in the population in Hulu Terengganu as an initial effort to establish the genetic characterisation and elucidating the history of Orang Asli expansion in Peninsular Malaysia. An array of mtDNA parameters was estimated and the observed polymorphisms with their respective haplogroups in comparison to rCRS were inferred respectively. The DNA sequences are registered in the NCBI with accession numbers KY853670-KY853753.
Specifications table
| Subject area | Forensic science |
| More specific subject area | Forensic genetic |
| Type of data | Tables and figure |
| How data were acquired | Data were acquired by extracting, amplifying, purifying, sequencing and analysing the target mtDNA region using PureLink™ Genomic DNA Mini Kit (Invitrogen, USA), QIAquick Purification Kit (QIAGEN Ag., Germany), DNA sequencer (First Base Laboratories, Malaysia), Sequencher 5.4 software (https://genecodes.com), ClustalW2 MUSCLE (https://www.ecbi.ac.uk), MEGA 7 software [1], DnaSP 5.1 software [2] and Haplogroup software (https://dna.jameslick.com) |
| Data format | Raw and analysed |
| Experimental factors | Blood sample collection, DNA extraction, PCR amplification, DNA purification, sequencing and data interpretation |
| Experimental features | Sequence analysed followed by haplogroup identification |
| Data source location | Kampung Sungai Berua, Hulu Terengganu, Terengganu, Malaysia |
| Data accessibility | The mtDNA sequences are registered in the NCBI with accession number KY853670-KY853753 [Table S1] |
| Related research article | Zahidin [3] |
Value of the data
-
•
Presently, there are 533 Semoq Beri and likely to be a threatened population in Hulu Terengganu due to the culture assimilation and intermarriage [3], [4], [5], [6].
-
•
The data provide baseline information to any future genetic and evolutionary studies as inferred from control region mtDNA.
-
•
The data will enhance the DNA database of Semoq Beri population to elucidating the history of Orang Asli expansion in Peninsular Malaysia.
-
•
The data allow other researchers focusing on this population to start genome-wide analysis.
1. Data
This data article is possible after unrelated blood samples successfully sequenced as inferred from Hypervariable Segment I (HVSI) and HVSII of mtDNA (Table 1). Each sequence was subjected into Sequencher 5.4 software (https://genecodes.com), ClustalW2 MUSCLE (https://www.ecbi.ac.uk) and MEGA 7 software [1] to identify the sequence polymorphisms (Table S1), C-stretch (Table 2) and nucleotide composition (Table 3). Meanwhile, haplotype data (Table 4) were obtained through DnaSP 5.1 software [2]. Haplogroup classification was performed by using Haplogroup software (https://dna.jameslick.com) on HVSI sequences. The schematic diagrams that represent two major haplogroups, which are M and N, were drawn ( Figs. 1 and 2).
Table 1.
Details of the primers used for PCR amplification [3].
| mtDNA region | Nucleotide position | Primers | Primer sequences (5′-3′) | Size (bp) |
|---|---|---|---|---|
| HVI | 16,024–16,569 | conL1 (F) | TCAAGCTTACACCAGTCTTGTAAACC | 600 |
| conH1 (R) | CCTGAAGTAGGAACCAGATG | |||
| HVII | 0–576 | conL4 (F) | GGTCTATCACCCTATTAACCAC | 600 |
| conH4 (R) | CTGTTAAAAGTGCATACCGCCA |
Table 2.
C-stretch region of HVII region between nucleotide positions 233 (*233C) to 250 (250C).
| ns | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | n |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 | ||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | ||
| rCRS | N | N | N | C | C | C | C | C | C | C | T | C | C | C | C | C | G | C | |
| 24 | C | C | C | C | C | C | C | C | 8 | ||||||||||
| 13 | C | C | C | C | C | C | C | 7 | |||||||||||
| 6 | C | C | C | C | C | C | C | C | C | 9 | |||||||||
| 1 | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | 16 | ||
| 1 | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | 17 |
N - deletion base, ns - total number of sequences, n - total number of unbroken bases C series.
Table 3.
Sequence variation for the HVI and HVII regions.
| Variation indices | HVI region | HVII region |
|---|---|---|
| Nucleotide position (%) | 16,024 to 16,504 (88%) | 72 to 351 (49%) |
| Base pair | 481 bp | 280 bp |
| No. of polymorphic sites | 18 | 26 |
| No. of observed transitions | 16 | 17 |
| No. of observed transversions | 2 | 9 |
| No. of indels | – | 5 |
| Nucleotide composition (%) C | 31.17 | 27.66 |
| T | 23.74 | 27.39 |
| A | 31.20 | 28.75 |
| G | 13.89 | 16.20 |
Table 4.
Frequency distribution of the mtDNA haplotypes.
| Haplotype | N | Samples | Frequency | |
|---|---|---|---|---|
| HVS-I | Hap 1 | 1 | Semaq Beri 19 | 0.025 |
| Hap 2 | 2 | Semaq Beri 3, 43 | 0.050 | |
| Hap 3 | 1 | Semaq Beri 45 | 0.025 | |
| Hap 4 | 18 | Semaq Beri 1, 5, 6, 8, 11, 12, 17, 18, 21, 24, 29, 32, 33, 35, 39, 40, 42, 47 | 0.450 | |
| Hap 5 | 7 | Semaq Beri 7, 13, 23, 30, 36, 37, 49 | 0.175 | |
| Hap 6 | 4 | Semaq Beri 20, 27, 28, 34 | 0.100 | |
| Hap 7 | 7 | Semaq Beri 2, 9, 14, 22, 31, 38, 48 | 0.175 | |
| HVS-II | Hap 8 | 1 | Semaq Beri 44 | 0.023 |
| Hap 9 | 1 | Semaq Beri 46 | 0.023 | |
| Hap 10 | 1 | Semaq Beri 21 | 0.023 | |
| Hap 11 | 1 | Semaq Beri 36 | 0.023 | |
| Hap 12 | 1 | Semaq Beri 35 | 0.023 | |
| Hap 13 | 10 | Semaq Beri 2, 9, 14, 20, 22, 27, 28, 31, 38, 48 | 0.227 | |
| Hap 14 | 1 | Semaq Beri 3 | 0.023 | |
| Hap 15 | 1 | Semaq Beri 25 | 0.023 | |
| Hap 16 | 1 | Semaq Beri 26 | 0.023 | |
| Hap 17 | 3 | Semaq Beri 10, 15, 41 | 0.068 | |
| Hap 18 | 2 | Semaq Beri 19, 50 | 0.045 | |
| Hap 19 | 14 | Semaq Beri 1, 5, 6, 8, 11, 12, 17, 18, 24, 29, 32, 39, 40, 47 | 0.318 | |
| Hap 20 | 7 | Semaq Beri 4, 13, 16, 23, 30, 43, 49 | 0.159 |
N - number of haplotype.
Fig. 1.
The current Asian and Pacific mtDNA within Manju clan. The tree was reconstructed based on [11]. The uppercase letter (E-East, N-North, S-South, NA-North Asia, EA-East Asia, SEA-Southeast Asia and PM-Peninsular Malaysia) is referring to the geographical location.
Fig. 2.
The current Asian and Pacific mtDNA within Nasreen clan. The tree was reconstructed based on [11]. The uppercase letter (E-East, N-North, S-South, NA-North Asia, EA-East Asia, SEA-Southeast Asia and PM-Peninsular Malaysia) is referring to the geographical location.
2. Experimental design, materials, and methods
2.1. Sample collection and genomic DNA extraction
All sequence data were generated from DNA samples that were collected with informed and written consent, and approved by Universiti Sultan Zainal Abidin (UniSZA) Human Research Ethics Committee, Malaysia. Blood samples were collected from unrelated individuals of Semoq Beri in Kampung Sungai Berua, Hulu Terengganu, Malaysia. The blood samples were extracted using PureLink™ Genomic DNA Mini Kit (Invitrogen, USA) following protocol provided by the manufacturer.
2.2. PCR amplification, DNA purification and sequencing
The isolated genomic DNA were amplified using a set of partial forward and reverse HVI and HVII primers respectively (Table 1) [7]. Negative, amplification and reagent blank controls were used to avoid contamination present at any stage during laboratory works. The PCR amplification was carried out in a final volume of 25 μl (Table S2) in Arktik Thermal Cycler (Thermo Scientific, USA) and the PCR profile was given in Table S3. The amplified PCR products were purified using QIAquick Purification Kit (QIAGEN Ag., Germany). The DNA products were visualized using 1% of agarose gel electrophoresis to read the size of the amplified product. The sequencing was carried out at First Base Laboratories Sdn Bhd (Malaysia) using ABI PRISM® 377 DNA Sequencher with the BigDye® Terminator 3.0 Cycle Sequencing Kit.
2.3. Statistical sequence analyses
The fluorescence nucleotide bases of segmented DNA sequences were visualized and read using Sequencher 5.4 (https://genecodes.com). The sequences were matched and aligned with the revised Cambridge Reference Sequences (rCRS) [8], [9] using ClustalW2 MUSCLE (Multiple Sequence Comparison by Log-Expectation) (https://www.ebi.ac.uk). The C-stretch for each sequence was checked and counted (Table 2). The nucleotide composition was performed in MEGA 7 [1] (Table 3). The Arlequin haplotype data were generated using DnaSP 5.1 [2] (Table 4). Haplogroup classification was performed using haplogroup online software (https://dna.jameslick.com) where the haplogroup data were compatible with PhyloTree Build 17 [10]. The schematic diagrams were drawn based on [10] and [11] (Figs. 1 and 2). GenBank accession numbers and haplogroups identification for HVI and HVII of Semoq Beri population are provided in Table S1.
Acknowledgments
We thank the Universiti Malaysia Terengganu (UMT), Semoq Beri population and Department of Orang Asli Development (JAKOA/PP30/052/9/39), UHREC (UniSZA/C/1/UHREC/628-1/56), Hulu Terengganu District Office (PKD/HTR/19/1/10) and the Department of Wildlife and National Park Peninsular Malaysia (JPHL&TN/IP/100-34/1/24/4/11) for the administrative supports, ethical clearance and permit approvals. This study was funded by the Ministry of Education (MOE) Malaysia Transdisciplinary Research Grant Scheme (TRGS/2015/59373) and the UMT Geran Galakan Penyelidikan (GGP/68007/2014/127).
Footnotes
Transparency data associated with this article can be found in the online version at https://doi.org/10.1016/j.dib.2018.10.158.
Supplementary data associated with this article can be found in the online version at https://doi.org/10.1016/j.dib.2018.10.158.
Transparency document. Supplementary material
Supplementary material.
.
Appendix A. Supplementary material
Supplementary Table S1.
.
Supplementary Table S2.
.
Supplementary Table S3.
.
References
- 1.Kumar S., Stecher G., Tamura K. MEGA 7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 2016;33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rozas J., Sanchez-Delbarrio J.C., Messeguer X., Rozas R. DnaSP: DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003;19:2496–2497. doi: 10.1093/bioinformatics/btg359. [DOI] [PubMed] [Google Scholar]
- 3.Zahidin M.A. Mitochondrial DNA Hypervariable Segment I (HVSI) Analysis of Semoq Beri Population in Peninsular Malaysia (M.Sc. thesis) Universiti Malaysia Terengganu; Kuala Nerus: 2018. [Google Scholar]
- 4.Bartholomew C.V. Hunting of Threatened Wildlife Species by Indigenous Peoples in Kenyir, Terengganu, Peninsular Malaysia: Prevalence, Predictors, Perceptions and Practices (MSc thesis) Universiti Malaysia Terengganu; Kuala Nerus: 2017. [Google Scholar]
- 5.Zahidin M.A., Wan Bayani W.O., Wan Rohani W.T., Zilfalil B.A., Abdullah M.T., Rovie-Ryan J.J., Azuan H. Sejarah migrasi dan kedudukan Orang Asli di Semenanjung Malaysia. In: Abdullah M.T., Abdullah M.F., Bartholomew C.V., Jani R., editors. Kelestarian Masyarakat Orang Asli Terengganu. Penerbit Universiti Malaysia Terengganu; Kuala Nerus: 2016. pp. 15–20. [Google Scholar]
- 6.Abdullah M.T., Abdullah M.F., Bartholomew C.V., Jani R. Kelestarian Masyarakat Orang Asli Terengganu. Penerbit Universiti Malaysia Terengganu; Kuala Nerus: 2016. [Google Scholar]
- 7.Hill C., Soares P., Mormina M., Macaulay V., Meehan W., Blackburn J., Clarke D., Raja J.M., Ismail P., Bulbeck D., Oppenheimer S., Richards M. Phylogeography and ethnogenesis of aboriginal Southeast Asians. Mol. Biol. Evol. 2006;23:2480–2491. doi: 10.1093/molbev/msl124. [DOI] [PubMed] [Google Scholar]
- 8.Anderson S., Bankier A.T., Barrel B.G., de Bruijin M.H., Coulson A.R., Drouin J., Eperon I.C., Nierlich D.P., Roe B.A., Sanger F., Schreier P.H., Smith A.J.H., Stden R., Young I.G. Sequence and organisation of the human mitochondrial genome. Nature. 1981;290:457–465. doi: 10.1038/290457a0. [DOI] [PubMed] [Google Scholar]
- 9.Andrews R.M., Kubacka I., Chinnery P.F., Lightowlers R.N., Turnbull D.M., Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 1999;23:147. doi: 10.1038/13779. [DOI] [PubMed] [Google Scholar]
- 10.van Oven M., Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 2009;30:386–394. doi: 10.1002/humu.20921. [DOI] [PubMed] [Google Scholar]
- 11.Oppenheimer S. Out of Eden: The Peopling of the World. Constable and Robinson Ltd.; London: 2003. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material.
Supplementary Table S1.
Supplementary Table S2.
Supplementary Table S3.


