Summary
Ancient genomics has revolutionized our understanding of human evolution and migration history in recent years. Here, we present a protocol to prepare samples for ancient genomics research. We describe steps for releasing DNA from human remains, DNA library construction, hybridization capture, quantification, and sequencing. We then detail procedures for mapping sequence reads and population genetics analysis. This protocol also outlines challenges in extracting ancient DNA samples and authenticating ancient DNA to uncover the genetic history and diversity of ancient populations.
For complete details on the use and execution of this protocol, please refer to Tao et al.1
Subject areas: Sequence analysis, Genetics, Genomics, Sequencing
Graphical abstract

Highlights
-
•
Comprehensive workflow of ancient DNA from extracting DNA to data analysis
-
•
Detailed protocol for extracting ancient DNA, library preparation, and capture
-
•
Key steps for double-stranded library preparation robust for ancient DNA
-
•
Analytic steps for ancient DNA authentication and population history analysis
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
Ancient genomics has revolutionized our understanding of human evolution and migration history in recent years. Here, we present a protocol to prepare samples for ancient genomics research. We describe steps for releasing DNA from human remains, DNA library construction, hybridization capture, quantification, and sequencing. We then detail procedures for mapping sequence reads and population genetics analysis. This protocol also outlines challenges in extracting ancient DNA samples and authenticating ancient DNA to uncover the genetic history and diversity of ancient populations.
Before you begin
The protocol below for sample preparation for genetic analysis includes ancient DNA extraction, library building and in-solution hybridization capture. All steps before DNA sequencing should be performed in an ancient DNA cleanroom (it is better to have several isolated rooms to avoid cross-contamination and located in a place without a large amount of DNA, such as PCR products from other labs). Notably, each step should be performed in different, isolated cleanrooms to avoid contamination from the ancient DNA library preparation procedures, e.g., amplified ancient DNA library could introduce contamination to all wet experiments processes. Basic knowledge and equipment (such as centrifuge, pipettes, thermal cycler, etc.) about molecular biology is required.
The protocol below describes some basic steps for population genetics analyses. Basic knowledge of Linux shell, python and R is required to understand and apply this protocol.
Clean workplace preparation
Timing: 2 h
-
1.Prepare a cleaned workspace with all necessary reagents and equipment.
-
a.Clean all surfaces with 75% ethanol and 10% NaClO, especially the super clean bench.
-
b.Expose all workspaces, including a super clean bench, to UV radiation for over 1h before the experiment begins.
-
c.Ensure all pipette tips, tubes, and water are DNA and DNase-free.
-
a.
Download the relevant software
Timing: 3–6 h
-
2.
In this protocol, all the analyses are performed by existing software listed in the key resources table. We provide the step-to-step scripts to run these ancient DNA analyses.
Download the scripts and test dataset
Timing: 4–6 h
-
3.
We use Tao et al. as a data source and the Human Reference Genome hs37d5 in this protocol.1 Besides, some previously published ancient DNA datasets are also included in the analyses.
-
4.
Download the test dataset from Google Drive.
https://drive.google.com/drive/folders/1zoKse3cUX-p6h7l_VxVdAFaZzNGYDNtZ?usp=sharing.
-
5.
Unzip test data.
>unzip TESTDATA.zip
-
6.
Download the Human Reference Genome hs37d5.
>cd /PATH/TO/TESTDATA/hs37d5
>wget
>wget
-
7.
Rename the index file.
>mv hs37d5.fa.gz.fai hs37d5.fa.fai
-
8.
Use md5sum to check data integrity.
>md5sum hs37d5.fa∗
-
9.
Unzip and index the Human Reference Genome hs37d5 using bwa.
>gunzip hs37d5.fa.gz
>bwa index hs37d5.fa
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Chemicals, peptides, and recombinant proteins | ||
| Ethanol | Sinopharm | 100092008 |
| NaClO | Sinopharm | 80010428 |
| 0.5 M EDTA, pH 8.0 | Thermo Fisher Scientific | AM9262 |
| Proteinase K | Beyotime | ST533 |
| Guanidine hydrochloride | Sigma | G3272 |
| Isopropanol | Sigma | 34863 |
| Acetic acid | Sigma | 695092 |
| Sodium acetate | Sigma | S7899 |
| Tween 20 | Sigma | P7947 |
| Isothermal amplification buffer | NEB | B0537S |
| Deoxynucleotide (dNTP) solution mix | NEB | N0447L |
| Bst 2.0 DNA polymerase | NEB | M0537L |
| AMPure XP Beads | Beckman | A63881 |
| Agarose | Biowest | 111860 |
| Tris-EDTA buffer solution (100×) | Sigma | T9285 |
| Critical commercial assays | ||
| MinElute PCR Purification Kit | QIAGEN | 28006 |
| NEBNext Ultra II DNA Library Prep Kit | NEB | E7645 |
| Twist ancient human DNA panel | Twist | 106658 |
| Twist mitochondrial panel | Twist | 104562 |
| Twist Hybridization and Wash Kit (2 boxes) | Twist | 101026 |
| Twist wash buffers | Twist | 100846 |
| Equinox Library Amp Mix | Twist | 104108 |
| Oligonucleotides | ||
| A∗C∗A∗C∗TCTTTCCCTACACGACGCT CTTCCGA∗T∗C∗T∗T |
Modified from Meyer and Kircher2 | IS1_adapter_P5; ∗ indicates a PTO bond |
| G∗T∗G∗A∗CTGGAGTTCAGACGTGTG CTCTTCCGA∗T∗C∗T∗T |
Modified from Meyer and Kircher2 | IS2_adapter_P7; ∗ indicates a PTO bond |
| A∗G∗A∗T∗CGGAA∗G∗A∗G∗C | Meyer and Kircher2 | IS3_adapter_P5+P7; ∗ indicates a PTO bond |
| AATGATACGGCGACCACCGAGATCT ACACxxxxxxxACACTCTTTCCCTACA CGACGCTCTT |
Meyer and Kircher2 | IS4_index_P5 |
| CAAGCAGAAGACGGCATACGAGA TxxxxxxxGTGACTGGAGTTCAGACGTGT |
Meyer and Kircher2 | IS4_index_P7 |
| Deposited data | ||
| Test dataset | This protocol | https://drive.google.com/drive/folders/1zoKse3cUX-p6h7l_VxVdAFaZzNGYDNtZ?usp=sharing |
| Software and algorithms | ||
| AdapterRemoval v2.3.1 | Schubert et al.3 | https://github.com/MikkelSchubert/adapterremoval; RRID:SCR_011834 |
| ADMIXTOOLS v7.0.2 | Patterson et al.4 | https://github.com/DReichLab/AdmixTools/; RRID:SCR_018495 |
| ADMIXTURE v1.3.0 | Alexander et al.5 | http://dalexander.github.io/admixture/index.html; RRID:SCR_001263 |
| BamUtil v1.0.14 | https://github.com/statgen/bamUtil | https://github.com/statgen/bamUtil |
| BWA v0.7.17 | Li et al.6 | https://bio-bwa.sourceforge.net/; RRID:SCR_010910 |
| SAMtools | Danecek et al.7 | https://github.com/samtools/samtools; RRID:SCR_002105 |
| DeDup v0.12.3 | Peltzer et al.8 | https://github.com/apeltzer/DeDup |
| EIGENSOFT v7.2.1 | Patterson et al.9 | https://github.com/DReichLab/EIG; RRID:SCR_004965 |
| PMDtools v0.60 | Skoglund et al.10 | https://github.com/pontussk/PMDtools |
| PileupCaller | https://github.com/stschiff/sequenceTools | https://github.com/stschiff/sequenceTools |
| PLINK v1.9 | Purcell et al.11 | https://www.cog-genomics.org/plink |
| Python 2.x | https://www.python.org/ | https://www.python.org/ |
| R v4.0.2 | https://www.r-project.org/ | https://www.r-project.org/ |
| READ | Monroy Kuhn et al.12 | https://bitbucket.org/tguenther/read/src/master/ |
| Schmutzi v1.5.5.5 | Renaud et al.13 | https://github.com/grenaud/schmutzi |
| Others | ||
| Eppendorf Research plus pipettes, single channel | Eppendorf | - |
| Axygen Maxymum Recovery pipette tips | Axygen | - |
| 0.2-mL PCR eight-tube strips | Eppendorf | 0030124359 |
| 1.5-mL Safe-lock LoBind tubes | Eppendorf | 0030120086 |
| Tubes (15 mL, 50 mL) | Corning | 430052 and 430829 |
| Centrifuge 5425 | Eppendorf | 5405000298 |
| Horizontal low-speed centrifuge | Gallop | DL-400A |
| Practum analytical balance | Sartorius | PRACTUM124-ICN |
| Super-clean bench | Kuncheng | SW-CJ-1D |
| Mastercycler nexus gradient thermal cycler | Eppendorf | 6331000076 |
| Thermo mixer | Ruicheng | TS-100 |
| Handheld centrifuge | Beyotime | E6686 |
| DynaMag-2 magnetic rack | Thermo Fisher Scientific | 12321D |
| DynaMag-PCR magnetic rack | Thermo Fisher Scientific | 492025 |
| Qubit 4 fluorometer | Thermo Fisher Scientific | Q33226 |
| Gel imaging workstations | Azure Biosystems | c150 |
| Lab vacuum freeze dryer | Foring Technology | LGJ-10C/E |
| Illumina sequencing instrument (MiSeq, HiSeq, NovaSeq platforms) and related sequencing chemistry | Illumina | - |
| Dell PowerEdge R940xa CTO Rack S"tgrouperver | Dell | R940xa |
Materials and equipment
Timing: 4 h
-
•
Adapter preparation.
| Reagent | Final concentration | Amount |
|---|---|---|
| IS1_adapter_P5 IS1(or IS2_adapter_P7IS2) | 45 μM | 22.5 μL |
| IS3_adapter_P5+P7 | 45 μM | 22.5 μL |
| Oligo hybridization buffer (10×) | 1× | 5 μL |
| Total | N/A | 50 μL |
Storage conditions: Store at −20°C for up to one year.
-
•
Dilute adapters (sequences are provided in key resources table).
-
•
If the adapters are delivered in dry powder condition, centrifuge at 13,000 g, 1 min before dissolved.
-
•
Dilute to 100 μM by ddH2O or TE.
-
•
Mix IS1_adapter_P5 and IS3_adapter_P5+P7 in a 1:1 volume ratio in a PCR tube.
-
•
Mix IS2_adapter_P7IS2 and IS3_adapter_P5+P7 in a 1:1 volume ratio in a PCR tube.
-
•
Add Oligo hybridization buffer (10×) into the abovementioned adapter mixture.
-
•
Recipe of Oligo hybridization buffer (10×): 500 mM NaCl, 10 mM Tris-HCl (pH = 8.0), 1 mM EDTA (pH = 8.0). Oligo hybridization buffer (10×) can be stored at −20°C for at least 1 year.
-
•
The final Oligo hybridization buffer concentration should be 1×.
-
•
Put the mixture solution into thermal cycler at 95°C for 10 s, followed by a ramp to 12°C at 0.1 °C/s.
-
•
Combine the contents of two tubes in a new 1.5 mL tube (45 μM adapter), dilute to 10 μM by ddH2O or TE.
-
•
Spilt approximately 50 μL per tube and freeze at −20°C, avoiding repeatedly freezing and thawing.
Alternatives: Oligo hybridization buffer can be replaced by T4 DNA polymerase/ligase buffer.
-
•
Binding buffer (pH: 4.5–5.5).
| Reagent | Final concentration | Amount |
|---|---|---|
| Guanidine Hydrochloride | 5 M | 238.825 g |
| Sodium acetate | 90 mM | 3.66 g |
| Tween-20 | 0.05% (v/v) | 250 μL |
| Isopropanol | 40% (v/v) | 200 mL |
| Acetic acid | 2.8% (v/v) | 7 mL |
| ddH2O | N/A | ∼220 mL |
| Total | N/A | 500 mL |
Storage conditions: Store at 23°C for up to two months, pH = 4.5–5.5.
-
•
Make sure all reagents and equipment are DNA and DNase-free. The stir bar, the scoop and the beakers should be cleaned up and exposed to UV light for at least 30 min.
-
•
Weigh 238.83 g of guanidine hydrochloride. Transfer the measured substance into a clean 500 mL beaker and add ddH2O until the liquid level reaches the 250 mL scale line on the beaker, ensuring it does not exceed the 280 mL scale line. Place the beaker on a magnetic stirrer and initiate heating while stirring.
-
•
This step constitutes an endothermic dissolution process characterized by a cold beaker’s wall. Wait until the solution becomes transparent (If opaqueness persists, cautiously introduce a small quantity of additional water). Continue to heat and stir until the complete dissolution (transparent solution and beaker reach 23°C).
-
•
Add 3.66 g sodium acetate and 200 mL isopropanol into the solution and keep stirring.
-
•
Slowly add 250 μL Tween-20 into the solution; keep stirring.
-
•
Use acetic acid to adjust pH to 4.5–5.5, stop stirring. Transfer the solution to a 500 mL volumetric flask to adjust the volume.
-
•
This binding buffer recipe comes from Dabney et al.14
Note: The hazardous reactions between NaClO and guanidine hydrochloride.
-
•
Software and scripts used in this protocol are provided in the key resources table.
-
•
Basic knowledge of Linux shell, python and R is required to understand and apply this protocol.
-
•
All tests are run on Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30 GHz Linux servers. Computing clusters are highly recommended to perform this protocol.
Step-by-step method details
Part 1: Obtain bone powder
Timing: 5 h for about 15 samples (depending on the condition of the sample)
In this section, we describe the steps to obtain bone powder.
-
1.Clean up the surface of the sample.
-
a.Clean the dust and soil from the sample surface with 75% ethanol.
-
b.Polish the surface of the sampling site using a dental drill.
-
c.Clean the sample surface using 10% NaClO.
-
d.Expose the sample to ultraviolet light for at least 30 min.
-
a.
-
2.Obtain bone powder.
-
a.Drill deeply into the dental pulp of the tooth or petrous parts of the temporal bone.
-
b.Drill the limb bone (do not drill into the marrow cavity to avoid contamination from impurities in the hollow part).
-
c.Collect 60–150 mg of the powder into a 15 mL tube.
-
a.
Note: The process we provide here is one of many different approaches for obtaining bone powder. Other processes that yield a good amount of powder can be used, depending on your instruments. All human remains used in ancient DNA extraction should be approved from ethic committee. Ancient DNA can be preserved well in dense and interior parts. Teeth and the petrous of temporal are ideal parts for DNA extraction, and the dense limb bones are also acceptable. For poor preserved sample we can process several subsamples and analyzing each individually to increase quality for data.
Part 2: Lysis of the bone powder
Timing: 5 min per sample to handle and incubation for 16–20 h
After collecting the bone powder, we add the EDTA and proteinase to bind impurities and release DNA into the supernatant.
-
3.
Add 1 mL of 0.5 M EDTA and 1 μL of 20 mg/mL Proteinase K into each sample.
Note: If more than 150 mg bone powder is obtained in the last step, increase the volume of EDTA and Proteinase K proportionately.
Note: A negative control containing 1 mL of 0.5 M EDTA and 1 μL of 20 mg/mL Proteinase K should be prepared with the same reagents used for bone powder lysis.
-
4.
Agitate at 300 rpm and incubate at 37°C for 10 h.
Part 3: Extract DNA
Timing: 20 min per sample
We apply a commercial silica column (MinElute kit, provided in key resources table) for DNA extraction.
-
5.
Add 12.5 mL Binding buffer into each solution. Mix the solution by repeatedly inverting the tube upside down at least 10 times.
Note: If more than 1 mL EDTA is used in the last step, increase the volume of the binding buffer proportionately.
-
6.Bind the DNA to the MinElute column.
-
a.Apply the well-mixed solution into a MinElute column and centrifuge at 13,000 g for 1 min.
-
b.Discard flow-through. Place the MinElute column back into the collection tube.
-
c.Multiple times for this step.
-
a.
Note: A reservoir can be connected to the column to reduce the time. Vacuum draw can be used in this step.
-
7.Wash the column.
-
a.Add 750 μL Buffer PE to the MinElute column.
-
b.Centrifuge 13,000 g for 1 min.
-
c.Discard flow-through and place the MinElute column back into the collection tube.
-
d.Centrifuge the column for an additional 2 min at 13,000 g.
-
e.Prepare a new 1.5 mL tube to replace the collection tube.
-
a.
-
8.Elute DNA.
-
a.Add 52 μL ddH2O or TE on the center of the membrane to elute DNA.
-
b.Centrifuge 13,000 g for 1 min.
-
c.Collect 50 μL DNA as final extracts.
-
d.DNA could be stored at −20°C or −80°C for at least one week.
-
a.
Part 4: Double-strand DNA library construction
Timing: 6 h for about 16 samples (including a negative control)
In this part, we describe the steps to generate a double-strand DNA library.
-
9.End Prep.
-
a.Add the following components to a clean new tube:
Reagent Volume NEBNext Ultra II End Prep Enzyme Mix 3 μL NEBNext Ultra II End Prep Reaction Buffer 7 μL DNA extracts 50 μL Total Volume 60 μL Note: A negative control that replaces the DNA extracts in the recipe above should be included: extraction blank using the negative control DNA extracts from the end of step 8 in part 3 of the same batch extraction process. An extra library preparation blank could be added using ddH2O instead of DNA extracts in this step. -
b.Pipette the entire volume up and down at least 10 times to mix thoroughly. Spin quickly to collect all liquid from the sides of the tube.Note: It is important to mix well. The presence of a small amount of bubbles does not interfere with the performance.
-
c.Place in a thermal cycler with the heated lid set to ≥75°C, and run the following program:
Temperature Time 20°C 30 min 65°C 30 min 4°C Hold
-
a.
-
10.Adapter ligation.
-
a.Add the following components to the End Prep Reaction Mixture (step 9):
Reagent Volume End Prep Reaction Mixture (from the end of Step 9) 60 μL NEBNext Ultra II Ligation Master Mix 30 μL NEBNext Ligation Enhancer 1 μL Illumina Adapter (10 μM) 2 μL Total Volume 93 μL -
b.Pipette the entire volume up and down at least 10 times to mix thoroughly. Spin quickly to collect all liquid from the sides of the tube.Note: The NEBNext Ultra II Ligation Master Mix is very viscous. Ensure adequate mixing of the ligation reaction, as incomplete mixing will result in reduced ligation efficiency.
-
c.Incubate at 20°C for 15 min in a thermal cycler with the heated lid off.
-
a.
-
11.Purify the product with a MinElute Kit.
-
a.Add 465 μL PB binding buffer into the ligation reaction mixture (step 10 in part 4) and mix well by pipetting the entire volume up and down at least 10 times.
-
b.Apply the solution from the last step into the MinElute column and centrifuge at 13,000 g for 1 min.
-
c.Discard flow-through. Place the MinElute column back into the collection tube.
-
d.Add 750 μL Buffer PE to wash. Centrifuge 13,000 g for 1 min and discard flow-through.
-
e.Place the MinElute column into a new 1.5 mL tube.
-
f.Add 16 μL ddH2O or TE on the center of the membrane and centrifuge at 13,000 g for 1 min to elute DNA.
-
a.
-
12.Adapter fill-in.
-
a.Prepare a master mix containing the reagents below for the required number of reactions:
Reagent Volume Isothermal Amplification Buffer (10×) 3 μL dNTP solution mix (10 mM each) 1.5 μL Bst 2.0 DNA Polymerase 1.5 μL ddH2O 9 μL Purified DNA product (step 11 in part 4) 15 μL Total 30 μL -
b.Incubate the mix in a thermal cycler:
Temperature Time 37°C 20 min 80°C 20 min 4°C 5 min 4°C Hold -
c.Store the half of the product (15 μL) at −20°C or −80°C.
Pause point: Fill-in product could be stored at −20°C or −80°C for at least two months.
-
a.
-
13.Index PCR.
-
a.Prepare a master mix containing the reagents below and record the index used for each library:
Reagent Volume NEBNext Ultra II Q5 Master Mix 25 μL IS4_index_P5 5 μL IS4_index_P7 5 μL DNA fill-in product (step 12 in part 4) 15 μL Total 50 μL -
b.Pipette the entire volume up and down at least 10 times to mix thoroughly. Spin quickly to collect all liquid from the sides of the tube.
-
c.Transfer the tube to the thermal cycler and start the program below (heat lid set to 105°C):
Steps Temperature Time Cycles Initial Denaturation 98°C 30 s 1 Denaturation 98°C 10 s 15 Annealing/ Extension 65°C 75 s Final extension 72°C 5 min 1 Hold 4°C Hold
-
a.
-
14.Purify PCR product.Note: The AMPure XP beads should equilibrate to 23°C for at least 30 min before use. Resuspend beads by repeatedly inverting the bottle upside down until it is well mixed.
-
a.Prepare freshly made 80% ethanol, 500 μL per sample.
-
b.Add 90 μL (1.8 ×) resuspended beads to the PCR solution.
-
c.Mix well by vortexing 5 s or pipetting at least 10 times.Note: Expel all the liquid from the tip in the last mix. A vortex for 3–5 s on a high level can also be used. Use a mini centrifuge for 5–10 s to collect all liquid from the side and lid of the tube, and be sure to stop the centrifugation before the beads start to settle out.
CRITICAL: Perform a pre-experiment to verify the beads needed to bind 100 bp or larger DNA fragments. Different batches of silicon beads may differ in ability to bind DNA fragments. Use the proper volume ratio to ensure your beads can bind DNA fragments ranging from 100 bp to 200 bp (insert DNA ranging from about 0–100 bp). Otherwise, you may lose some short but valuable DNA fragments. -
d.Incubate samples on a bench top for at least 5 min at 23°C.
-
e.Place the tube on an appropriate magnetic rack to separate the beads from the supernatant.
-
f.After 3 min (or when the solution is clear), carefully remove and discard the supernatant with the appropriate pipette.
CRITICAL: Do not disturb the beads. -
g.Add 200 μL fresh 80% ethanol into the tube while in the magnetic rack. Incubate at 23°C for 30 s and then carefully remove and discard the supernatant. Be careful not to disturb the beads.
-
h.Repeat the steps 14.e to 14.g for a total of 2 washes. Be sure to remove all visible liquid after the second wash.
-
i.Air dry the beads for up to 5 min with the lid open.
CRITICAL: Do not over-dry the beads. This may result in lower recovery of DNA. Elute the samples when the beads are still dark brown and glossy but all visible liquid has evaporated. The beads are over-dry and start to crack when they turn lighter brown. -
j.Remove the tube from the magnetic rack. Add 30–40 μL ddH2O or TE and mix well by pipetting or vortexing.
-
k.Incubate for at least 2 min at 23°C.
-
l.Place the tube on the magnetic rack. Transfer the DNA eluate to a new tube after 3 min (or when the solution is clear).Note: The DNA library construction process provided here is mainly based on the NEBNext Ultra II DNA Library Prep Kit (provided in key resources table), with an adapter from blunt-ended ligation-based approaches instead of a circular NEBNext Adapter. Other library construction processes that have been proved in ancient DNA field can be used to replace this part (e.g., Meyer et al.2). Notably, our downstream analysis procedure is based on this library construction process and may not be compatible with other processes.
-
a.
Part 5: Hybridization capture
Timing: 2 days
In this part, we show the procedures to enrich human ancient DNA for studying population genetic analyses.
Note: The steps in this part mostly refer to Twist Target Enrichment Protocol.15 For use with the Harvard Ancient DNA panel) with some dosage and reagent changes.
-
15.Prepare libraries for hybridization with AMPure XP beads.Note: The AMPure XP beads should equilibrate to 23°C for at least 30 min. Resuspend beads by repeatedly inverting the bottle upside down until it is well mixed.
-
a.Determine the quality of each library (see steps 21 and 22 in part 6).
-
b.Measure the library concentration using Qubit or NanoDrop.
-
c.Calculate the volume of each library for 1.2 μg per library.
-
d.Transfer and mix the calculated volume of 2 libraries in one 0.2 mL PCR tube; each tube contains 2.4 μg total DNA for each hybridization reaction.
CRITICAL: More than 4 μg total DNA used in this step may lead to reduced performance of hybridization enrichment. However, less than 1 μg per sample may decrease library complexity. -
e.Prepare freshly made 80% ethanol, 500 μL per sample.
-
f.Add 1.8 ×(v/v) resuspended beads to the library solution. Mix well by vortex for 5 s or pipetting at least 10 times, and incubate the tube for 5 min at 23°C.
-
g.Place the tube on an appropriate magnetic rack for 3 min (or when the solution is clear), carefully remove and discard the supernatant by appropriate pipetting.
CRITICAL: Do not disturb the beads. -
h.Add 200 μL fresh 80% ethanol into the tube while in the magnetic rack. Incubate at 23°C for 30 s and then carefully remove and discard the supernatant. Be careful not to disturb the beads.
-
i.Repeat the last step (step 15. h) for a total of 2 washes. Be sure to remove all visible liquid after the second wash.
-
j.Air dry the beads for up to 5 min with the lid open.
-
k.Close the lid, remove the tube from the magnetic rack, and leave them on the bench rack.
-
a.
-
16.Hybridization capture solution preparation.
-
a.Thaw all reagents below on ice.
Reagent Twist Ancient Human DNA Panel Twist Mitochondrial Panel Hybridization Mix Hybridization Enhancer Universal Blockers Blocker Solution -
b.Heat the Hybridization Mix at 65°C in a heat blocker until all precipitate is dissolved, and cool to 23°C for 5 min on the bench top.
-
c.Prepare a probe solution containing the reagents below; mix well by pipetting.
Reagent Volume Hybridization Mix 5 μL Twist Ancient Human DNA Panel 1 μL Twist Mitochondrial Panel 0.167 μL Total 6.167 μL -
d.Prepare a blocker solution containing the reagents below, add the mixed solution to the tube in step 15 (step 15.k: tube with beads in), and mix and resuspend beads by pipetting.
Reagent Volume Blocker Solution 5 μL Universal Blocker 7 μL Total 12 μL -
e.Heat the probe solution at 95°C with a heat lid at 105°C for 5 min, cool to 4°C for 5 min, then equilibrate the mix to 23°C on the bench top for 5 min.
-
f.Heat the blocker solution at 95°C with a heated lid at 105°C for 5 min, then equilibrate the solution to temperature on a bench top for 5 min.
-
g.Perform a quick spin to ensure all solution is at the bottom of the tube.
-
h.Add all probe solution to the tube containing blocker solution and beads, and mix well by vortexing or pipetting.Note: Some bubbles in this step after vortexing or pipetting do not affect the hybridization reaction.
-
i.Add 30 μL Hybridization Enhancer slowly to the top of the entire solution.
CRITICAL: Add Hybridization Enhancer slowly; DO NOT MIX.Note: Bubbles will float and disappear after adding Hybridization Enhancer, and if there are still some bubbles, pulse-spin the tube or use a tip to ensure there are no remaining bubbles. -
j.Incubate the tube at 62°C for 16 h in a thermal cycler with the lid at 65°C.
-
a.
-
17.Prepare the beads.
-
a.Equilibrate Wash Buffer 1 (provided in Twist Wash kit) to 23°C.
-
b.Heat Wash Buffer 2 (provided in Twist Wash kit) to 49°C and hold.
-
c.Equilibrate the Streptavidin Binding Beads to 23°C for at least 30 min. Resuspend beads by repeatedly inverting the bottle upside down until it is well mixed.
-
d.Add 300 μL Streptavidin Binding Beads to a 1.5 mL tube. Prepare one tube for each hybridization reaction.
-
e.Add 600 μL Binding Buffer (provided in Twist Wash kit) to the tube and mix by pipetting.
-
f.Place the tube on an appropriate magnetic rack for 3 min (or when the solution is transparent), and carefully remove and discard the supernatant with the appropriate pipette. Remove the tube from the magnetic rack.
CRITICAL: Do not disturb the beads. -
g.Repeat the wash (steps 17.e and 17.f) two more times for a total of three washes.
-
h.After removing the clear supernatant from the third wash, add a final 200 μL Binding Buffer and resuspend the beads by vortexing until homogenized.
-
a.
-
18.Bind the target.
-
a.After the hybridization (step 16.j) is complete, open the thermal cycler lid and directly transfer the volume of each hybridization reaction into a corresponding tube of washed Streptavidin Binding Beads from step 17.h. Mix by pipetting and flicking.
CRITICAL: Rapid transfer directly from the thermal cycler at 62°C is critical for minimizing off-target binding. Please do not remove the tube of hybridization reaction from the thermal cycler or otherwise allow it to cool to less than 62°C before transferring the solution to the washed Streptavidin Binding Beads. Allowing to cool to 23°C for less than 5 min will result in as much as 10–20% increase in off-target binding. -
b.Mix the tube of the hybridization reaction with the Streptavidin Binding Beads for 30 min at 23°C on a shaker, rocker, or rotator at a speed sufficient to keep the solution mixed.
CRITICAL: Do not vortex. Aggressive mixing is not required. -
c.Remove the tube containing the hybridization reaction with Streptavidin Binding Beads from the mixer and spin to ensure all solution is at the bottom of the tube.
-
d.Place the tube on a magnetic rack for 1 min (or until the solution is clear).
-
e.Remove and discard the clear supernatant, including the Hybridization Enhancer. Do not disturb the bead pellet.Note: Some Hybridization Enhancers may still be visible after removing the supernatant and throughout each wash step. It does not affect the final capture product.
-
f.Remove the tube from the magnetic rack and add 200 μL Wash Buffer 1. Mix by pipetting. Pulse-spin to ensure all solution is at the bottom of the tube.
-
g.Transfer the entire volume from Step 18.f (∼200 μL) into a new 1.5 mL tube, one per hybridization reaction. Place the tube on a magnetic rack for 1 min (or until the solution is clear). Carefully remove and discard the supernatant.
CRITICAL: Do not disturb the beads. -
h.Remove the tube from the magnetic rack and add 200 μL of 49°C Wash Buffer 2 (keep it on the dry bath). Mix by pipetting, then pulse-spin to ensure all solution is at the bottom of the tube.
-
i.Incubate the tube for 5 min at 49°C.
-
j.Place the tube on a magnetic rack for 1 min (or until the solution is clear). Carefully remove and discard the supernatant with the appropriate pipette.
CRITICAL: Do not disturb the beads. -
k.Repeat the wash (Steps 18.h - 18.j) two more times, for a total of three washes.
-
l.After the final wash, use a 10 μL pipette to remove all traces of liquid at the bottom of the tube. Proceed immediately to the next step. Do not allow the beads to dry.
-
m.Remove the tube from the magnetic rack and add 45 μL water. Mix by pipetting until homogenized, then incubate for 5 min at 23°C, hereafter referred to as the Streptavidin binding Bead slurry, on ice.
-
a.
-
19.Post-capture PCR amplify and purify.
-
a.Transfer 22.5 μL of the Streptavidin Binding Bead slurry to a new 0.2 mL PCR tube. Keep on ice until ready to use in the next step.
-
b.Store the remaining 22.5 μL Streptavidin Binding Bead slurry at –20°C for future use.
-
c.Prepare a PCR mixture by adding the following reagents to the tube containing the Streptavidin Binding Bead slurry. Mix by pipetting.
Reagent Volume Beads slurry 22.5 μL Amplification Primers, ILMN 2.5 μL Equinox Library Amp Mix (2×) 25 μL Total 50 μL -
d.Pulse-spin the tubes, transfer them to the thermal cycler and start the program below (heat lid set to 105°C):
Step Temperature Time Cycles Initialization 98°C 45 s 1 Denaturation 98°C 15 s 23 Annealing 60°C 30 s Extension 72°C 30 s Final extension 72°C 1 min 1 Final hold 4°C Hold - -
e.When the thermal cycler program is complete, remove the tube from the block and immediately proceed to the next Purify step.
-
a.
-
20.
Purify PCR product.
Purify the final product with AMPure XP beads as described in step 14.
Part 6: Library quantification and sequencing
Timing: 6–8 days
In this part, we provide several methods for library quantification, along with recommendations for sequencing platform and strategy selection.
-
21.
Measure the library concentration by Qubit or NanoDrop. Troubleshooting 1.
-
22.Determine the presence and the length of each library or capture product by 2% agarose gel electrophoresis. (Figure 1).
-
a.Use 50 bp DNA Ladder. 80 V for 30 min. Troubleshooting 2.
-
a.
Note: The band should be trailing and start between 150 and 250 bp. Troubleshooting 3
-
23.
Measure the DNA concentration on a DNA 1000 chip using Bioanalyzer 2100. (Optional) Troubleshooting 3.
-
24.
Use a TapeStation system to analyze the size, quantity, and integrity of your samples. (Optional) Troubleshooting 3.
-
25.
For sequencing, follow the instructions provided by Illumina or BGI for multiplexed sequencing.
Note: Compared to common adapters, the adapters in this protocol contain an additional Thymine deoxyribonucleotide (T). Therefore, the first sequencing cycle should be a dark cycle (no fluorescence measurement). We recommend PE100 strategy for library sequencing.
Alternatives: If the Sequencer does not support the dark cycle, manually remove the first base of sequencing data before downstream analysis.
CRITICAL: The first base must be removed when using standard Illumina sequencing primer, or it will affect sequencing reads mapping to the human reference genome.
Figure 1.

Map of electrophoresis
Ancient DNA bands should range from 120–200 bp. The size of the markers (M) corresponds to library molecules (loaded on lanes L). DNA band should not be observed in the Negative control (NC) lane; otherwise, the samples may be contaminated by exogenous DNA. The brightness of this graph has been entirely adjusted to present a clear DNA marker.
Part 7: Mapping sequencing reads to human reference genome
Timing: 1 h
In this part, we aim to map our raw sequencing reads to the Human Reference Genome hs37d5, commonly used in population genetics analyses. Besides, we are going to do some post-processing that makes our output files suitable for downstream analyses. To speed up the protocol, we only include 100 thousand reads from both sides from ten samples for the mapping pipeline.
-
26.
Remove adapters from paired-end reads using AdapterRemoval.
>cd /PATH/TO/TESTDATA
>dir=$(pwd)
>mkdir -p ${dir}/1.bamfiles/CDG56 ; cd ${dir}/1.bamfiles/CDG56
>AdapterRemoval
--adapter1 AAGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNATCTCGTATGCCGTCT
--adapter2 AAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTNNNNNNNGTGTAGATCTCGGTGGT
--file1 ${dir}/0.fq/CDG56_R1.fq.gz
--file2 ${dir}/0.fq/CDG56_R2.fq.gz
--minlength 30
--trim3p 0 --preserve5p
--trimns --trimqualities --minquality 20
--collapse --gzip --basename CDG56
Note: Ancient DNA is highly degraded and therefore short in length. In this case, we usually only map the overlapping reads (from paired-end reads) to the Reference Genome in the bwa mapping step, CDG56.collapsed.gz (Figure 2). Adapter sequences are different according to different sequencing platforms and strategies.
Figure 2.
Screenshot of the sample1 output of AdapterRemoval
CRITICAL: Spaces or special characters should not be included in the path.
-
27.
Map the collapsed reads to the reference genome using bwa.
>bwa aln -l 1024 -n 0.01 ${dir}/hs37d5/hs37d5.fa CDG56.collapsed.gz > CDG56.sai
>bwa samse -r "@RG\tID:CDG56\tLB:LB\tSM:CDG56\tPL:Illumina" ${dir}/hs37d5/hs37d5.fa CDG56.sai CDG56.collapsed.gz | samtools view -Shb -o CDG56.aln.bam -
Note: BWA aln -o argument sets the maximum number of gap opens. We use the default parameter (-o 1) in our study, while (-o 2) is also used in various aDNA studies to potentially accommodate more divergent sequences.
CRITICAL: Do not forget the “-” symbol at the end of the bwa samse command; this allows samtools to use the pipeline as a data source (Troubleshooting 4).
-
28.
Use Samtools flagstat to generate statistics of binary alignment map files.
>samtools flagstat CDG56.aln.bam > CDG56.aln.flagstats
-
29.Remove duplicated reads.
-
a.Sort binary alignment map files before removing duplicated reads.>samtools sort CDG56.aln.bam -o CDG56.sort.bam>samtools index CDG56.sort.bam
-
b.Remove duplicated reads.>dedup -m -i CDG56.sort.bam -o ./>samtools sort CDG56.sort_rmdup.bam -o CDG56.bam
-
a.
-
30.Extract reads that mapping to the human reference genome.
-
a.Generate statistics using samtools stats and idxstats.>samtools index CDG56.bam>samtools stats CDG56.bam > CDG56.stats>samtools idxstats CDG56.bam > CDG56.idxstats
-
b.Extract reads mapped to chromosomes 1 to 22, X, Y and MT.>merge="">samtools view -F0x4 -b CDG56.bam > CDG56.map.bam>samtools index CDG56.map.bam>for i in {1..22} X Y MT ;dosamtools view -b CDG56.map.bam ${i} > CDG56.${i}.bammerge=${merge}" CDG56.${i}.bam"done>samtools merge -p -c CDG56.mapped.bam ${merge}>samtools index CDG56.mapped.bam>samtools idxstats CDG56.mapped.bam > CDG56.mapped.idxstats>samtools stats CDG56.mapped.bam > CDG56.mapped.stats>rm -v CDG56.map.bam CDG56.map.bam.bai>rm -v ${merge}
-
a.
-
31.
Above, we provide the step-by-step mapping pipeline (Figure 3). For other samples, we use scripts for mapping.
>cd /PATH/TO/TESTDATA/0.scripts
>sh 1.mapping.sh
Note: The multi-thread setting (default = 5) should be modified according to the user’s device.
Figure 3.
Screenshot of the sample1 output of part 1
Part 8: Ancient DNA authentication
Timing: 6 h
To speed up the protocol, we only include 100 thousand reads from both sides from ten samples for the above section. We provided the full alignment results of reads mapped to chromosomes 1 to 22, X, Y and MT for downstream analyses at /PATH/TO/TESTDATA/0.bam. In this section, we will evaluate the quality of samples, including ancient DNA damage patterns, contamination rate and relationship between samples.
-
32.
Authentication of ancient DNA using PMDtools (Figure 4).
>cd /PATH/TO/TESTDATA
>dir=$(pwd)
>mkdir -p 2.PMDtools; cd 2.PMDtools
>li=$(ls ${dir}/0.bam/∗.mapped.bam)
>cp ${dir}/0.scripts/src/PMDtools/∗ ./
> for i in ${li};do
sample=$(basename ${i} .mapped.bam)
total=$(samtools stats ${i} | grep "reads mapped:" | awk '{print $4}')
sub=$(python ${dir}/0.scripts/subbc.py 500000/${total}) # calculate the sample rate
samtools view -@ 5 ${sub} ${i} | python pmdtools.0.60.py3 --platypus \
--requirebaseq 30 > ${sample}.pmd
Rscript plotPMD.v2.edit.R ${sample}
done
-
33.
Contamination rate estimation using Schmutzi.
>cd /PATH/TO/TESTDATA/0.scripts
>dir=$(dirname $(pwd))
>li="CDG56 CDG60 CDG71 CDG86 CDG87 JHC14 JHC20 JHC21 JHC27 JHC28 JHC29"
>for i in ${li};do
sh 2.schmutzi.sh ${dir}/0.bam/${i}.mapped.bam ${dir}/2.schmutzi
done
Note: 2.schumutzi.sh includes two path variables in Line 13 and Line 14, which need to be modified according to the user’s circumstances. This will produce results for each sample at /PATH/TO/TESTDATA/2.schmutzi.
CRITICAL: In this analysis, all the path variables should use the absolute path rather than the relative path; this includes two path variables after 2.schmtuzi.sh and two path variables in 2.schmutzi.sh Line 13 and Line 14 (Troubleshooting 5). R packages “fitdistrplus” and “MASS “are required to plot the schmutzi results (Troubleshooting 6).
-
34.
Use bamUtil to trim excess C- > T and G- > A transitions at the ends of the sequences.
>cd /PATH/TO/TESTDATA
>dir=$(pwd)
>samples=$(ls ${dir}/0.bam/∗.mapped.bam)
>mkdir -p ${dir}/2.trimBam ; cd ${dir}/2.trimBam
>for i in ${samples};do
sample=$(basename ${i} .mapped.bam)
bam trimBam ${i} ./${sample}.trim.bam 10
samtools index ./${sample}.trim.bam
done
-
35.
Use pileupCaller to sample alleles from low coverage sequence data in EIGENSTRAT format.
>cd /PATH/TO/TESTDATA
>dir=$(pwd)
>mkdir -p ${dir}/2.geno ; cd ${dir}/2.geno
>bamfiles=$(ls ${dir}/2.trimBam/∗.trim.bam)
>ref=${dir}/hs37d5/hs37d5.fa
>ref_snp=${dir}/0.map/1240k.snp
>ref_pos=${dir}/0.map/1240k.pos
>samples=""
>bams=""
>for i in ${bamfiles};do
bams="${bams} $i"
samples=${samples},$(basename $i .trim.bam)
done
>samples=${samples:1}
>echo ${samples}
>echo ${bams}
>samtools mpileup -R -B -q30 -Q30 -l ${ref_pos} \
-f ${ref} \
${bams} | \
>pileupCaller --randomHaploid --sampleNames ${samples} \
-f ${ref_snp} \
-e 1240k > pileupCaller.log 2>&1
# modify the individuals files
>cp 1240k.ind 1240k.ind.bak
>cat 1240k.ind.bak | awk '{ if ($1 ∼ /CDG/) {print $1,"U","CDG"} else {print $1,"U","JHC"}}' > 1240k.ind
-
36.
Use READ to estimate kin relationships between samples (Figure 5).
>cd /PATH/TO/TESTDATA
>dir=$(pwd)
>mkdir -p ${dir}/2.READ ; cd ${dir}/2.READ
>cp ${dir}/0.scripts/src/READ/∗ ./
# Write convertf parameter and Run convert
# We need to convert the JHC and CDG, respectively
>echo JHC > pop
>for i in "convertf.par"; do
echo "genotypename: ${dir}/2.geno/1240k.geno"
echo "snpname: ${dir}/2.geno/1240k.snp"
echo "indivname: ${dir}/2.geno/1240k.ind"
echo "outputformat: PACKEDPED"
echo "genotypeoutname: JHC.bed"
echo "snpoutname: JHC.bim"
echo "indivoutname: JHC.fam"
echo "poplistname: pop"
done > convertf.par ; convertf -p convertf.par
>echo CDG > pop
>sed -i "s/JHC/CDG/g" convertf.par
>convertf.par ; convertf -p convertf.par
# Convertf to .tped/.tfam format
>plink --bfile JHC --recode transpose --out JHC
>plink --bfile CDG --recode transpose --out CDG
# Run READ analysis using Python 2.x
>python2 READ.py JHC
>mkdir JHC ; mv READ_∗ JHC
>python2 READ.py CDG
>mkdir CDG ; mv READ_∗ CDG
CRITICAL: The Python version is exclusively restricted to Python 2.x. The wrong use of the Python version would cause an error (Troubleshooting 7). The user should modify the last line according to the circumstances.
Figure 4.
Ancient DNA damage pattern of sample1
Nucleotide misincorporation patterns caused by C- > T deamination in both ends of ancient DNA sequences.
Figure 5.
Genetic kin relationships estimated by READ
Part 9: Population genetic analyses
Timing: 1–2 days
Timing: 3–6 h (for step 39)
Above, we examined the quality of our sequencing samples. Next, we start to investigate the genetic profiles of test populations using PCA, ADMIXTURE and f-statistics.
-
37.
Merge our data with previously published datasets.
>cd /PATH/TO/TESTDATA/0.dataset
>mergeit -p mergeit_HO.par
>mergeit -p mergeit_1240k.par
-
38.Principal component analysis (PCA) using smartpca.
-
a.Create a directory and copy scripts for smartpca from src.>cd /PATH/TO/TESTDATA>dir=$(pwd)>mkdir -p ${dir}/3.smartpca ; cd ${dir}/3.smartpca>cp ${dir}/0.scripts/src/smartpca/∗ ./
-
b.Write the smartpca parameter into “smartpca.par” and run smartpca.>for i in "smartpca.par";doecho "genotypename: ${dir}/0.dataset/JHCCDG_HO.geno"echo "snpname: ${dir}/0.dataset/JHCCDG_HO.snp"echo "indivname: ${dir}/0.dataset/JHCCDG_HO.ind"echo "evecoutname: smartpca.evec"echo "evaloutname: smartpca.eval"echo "poplistname: modern.poplist"echo "lsqproject: YES"echo "numoutevec: 5"echo "altnormstyle: NO"echo "numthreads: 20"done > smartpca.par>smartpca -p smartpca.par > smartpca.log 2>&1Note: Smartpca uses the merged dataset “JHCCDG_HO” for calculating PCA, population names including “modern.poplist” will be included in the PCs calculation and then ancient data will be projected on the PCs. The results will output at “smartpca.evec” and “smartpca.eval”. In ancient DNA analyses, we usually only use modern populations to calculate PCs.
-
c.Visualize the PCA result (Figure 6).>tail -n+2 smartpca.evec | awk '{print $7,$1,$2,$3}' > plot.txt>Rscript smartpca.r
-
a.
-
39.Unsupervised admixture analysis using ADMIXTURE.
-
a.Create a directory and copy scripts for ADMIXTURE from src.>cd /PATH/TO/TESTDATA>dir=$(pwd)>mkdir -p ${dir}/3.ADMIXTURE ; cd ${dir}/3.ADMIXTURE>cp -r ${dir}/0.scripts/src/ADMIXTURE/∗ ./
-
b.Convert the dataset to PLINK format and perform linkage disequilibrium (LD) pruning.# Extract ADMIXTURE poplist>for i in "eig2bed.par";doecho "genotypename: ${dir}/0.dataset/JHCCDG_HO.geno"echo "snpname: ${dir}/0.dataset/JHCCDG_HO.snp"echo "indivname: ${dir}/0.dataset/JHCCDG_HO.ind"echo "genotypeoutname: admixture.geno"echo "snpoutname: admixture.snp"echo "indivoutname: admixture.ind"echo "poplistname: admixture.poplist"done > convertf.par ; convertf -p convertf.par# Convert to bed,bim,fam>for i in "eig2bed.par";doecho "genotypename: admixture.geno"echo "snpname: admixture.snp"echo "indivname: admixture.ind"echo "outputformat: PACKEDPED"echo "genotypeoutname: extract.bed"echo "snpoutname: extract.bim"echo "indivoutname: extract.fam"done > eig2bed.par ; convertf -p eig2bed.par>plink --bfile extract --indep-pairwise 200 25 0.4 --out plink --allow-no-sex>plink --bfile extract --extract plink.prune.in --make-bed --out prune --allow-no-sex
-
c.Modify .fam file as ADMIXTURE format and run ADMIXTURE with K ranging from 2 to 5; this will cost 3–6 h according to multi-thread setting (-j).>cat admixture.ind | awk '{print $3,$1,$2}' > prune.fam>parallel -j 4 --verbose admixture -s time -j10 --cv prune.bed {} "|" tee -a result.out ::: $(seq 2 5)
-
d.Visualize the ADMIXTURE result (Figure 7).>cat result.out | grep "CV error" | sort -k 4 -n > CV_error.txt>cp fancy_admixture/∗ ./>Rscript admixture_plot.r # output plot.pdf
-
a.
-
40.Use outgroup-f3 statistics to measure the shared genetic drift. Outgroup-f3 in the form of f3(X, Y; Z) calculates the shared genetic drift between population X and population Y with respect to population Z.
-
a.Create a directory and copy scripts for outgroup-f3 statistics from src.>cd /PATH/TO/TESTDATA>dir=$(pwd)>mkdir -p ${dir}/3.outgroupf3 ; cd ${dir}/3.outgroupf3>cp ${dir}/0.scripts/src/f3.poplist ./
-
b.Write qp3Pop parameter into “qp3.par”. The combination of tested populations is written in f3.poplist in the form of f3(test, Ref populations; Mbuti), where Mbuti acts as an outgroup.
-
c.Run outgroup-f3 statistics and grep results.> for i in "qp3.par";doecho "genotypename: ${dir}/0.dataset/JHCCDG_1240k.geno"echo "snpname: ${dir}/0.dataset/JHCCDG_1240k.snp"echo "indivname: ${dir}/0.dataset/JHCCDG_1240k.ind"echo "popfilename: f3.poplist"echo "inbreed: YES"done > qp3.par>qp3Pop -p qp3.par > qp3.result>cat qp3.result | grep result: | sort -nrk5 > result.txtNote: The qp3Pop output f-statistic results in the fifth column. A larger f3(X, Y; Z) value indicates a larger shared genetic drift between X and Y with respect to population Z (Figure 8).
-
a.
-
41.Use f4-statistics to measure the differences between Haimenkou (JHC) and Gaoshan (CDG). F4-statistics is a different way to test admixture events. In the form of f4(A, B; C, D), we usually set a divergent outgroup as population A and then test for gene flow between B and C (Z-score<−3) or between B and D (Z-score>3).
-
a.Create a directory and copy scripts for f4-statistics from src.>cd /PATH/TO/TESTDATA>dir=$(pwd)>mkdir -p ${dir}/3.f4 ; cd ${dir}/3.f4>cp ${dir}/0.scripts/src/f4.poplist ./
-
b.Write qpDstat parameter into “qp4.par”. The combination of tested populations is written in f4.poplist in the form of f4(Mbuti, Ref populations; CDG, JHC), where Mbuti acts as an outgroup.>for i in "qp4.par";doecho "genotypename: ${dir}/0.dataset/JHCCDG_1240k.geno"echo "snpname: ${dir}/0.dataset/JHCCDG_1240k.snp"echo "indivname: ${dir}/0.dataset/JHCCDG_1240k.ind"echo "popfilename: f4.poplist"echo "f4mode: YES"echo "printsd: YES"echo "inbreed: YES"done > qp4.par
-
c.Run qpDstat and sort results according to Z-scores (Figure 9).>qpDstat -p qp4.par > qp4.result>cat qp4.result | grep result: | sort -nk8 > result.txt
-
a.
-
42.Estimate the admixture proportion of CDG and JHC using qpAdm.
-
a.Create a directory and copy parameter files from src.>cd /PATH/TO/TESTDATA>dir=$(pwd)>mkdir -p ${dir}/3.qpAdm ; cd ${dir}/3.qpAdm>cp ${dir}/0.scripts/src/qpAdm/∗ ./
-
b.Prepare parameter files for CDG and JHC. Here, we write two parameter files (.left/.par) for CDG and JHC and use the same set of outgroup and reference populations for two-way admixture modeling.>for i in CDG JHC;dofor j in "par";doecho "genotypename: ${dir}/0.dataset/JHCCDG_1240k.geno"echo "snpname: ${dir}/0.dataset/JHCCDG_1240k.snp"echo "indivname: ${dir}/0.dataset/JHCCDG_1240k.ind"echo "popleft: ${i}.left"echo "popright: right.pops"echo "inbreed: NO"echo "details: YES"echo "allsnps: YES"done > ${i}.parecho "${i}" > ${i}.leftcat left.pops >> ${i}.leftdone
-
c.Run qpAdm (Figure 10).>for i in CDG JHC;doqpAdm -p ${i}.par > ${i}.resultdone
-
a.
Figure 6.
Visualization of first two PCs of the output of smartPCA
Figure 7.
Admixture ancestry estimation based on the model-based ADMIXTURE analysis
Figure 8.
The top and bottom five results of outgroup-f3 statistics show the close relationship between Upper Yellow River populations and the test population
Figure 9.
The top and bottom five results of f4-statistics show the close genetics profiles between CDG and JHC
Figure 10.
Screenshot of the output of the qpAdm provided in CDG.result and JHC.result
Expected outcomes
Agarose gel electrophoresis (part 6)
Ancient DNA bands should range from 120–200 bp and trailing. DNA band should not be observed in Negative control (NC) lane otherwise the samples may be contaminated by exogenous DNA.
Mapping sequencing reads to the human reference genome (part 7)
In this part of our protocol, we map our raw sequencing data to the human reference genome to generate the binary alignment map files for downstream analyses, including BAM files (.bam) that contain all sequencing reads (mapped and unmapped reads) and BAM files that (.mapped.bam) only containing sequencing reads mapped to the reference genome (Figure 3). Mapped BAM files take up very little space compared to original BAM files and are suitable for most population genetics analyses. In addition, a few statistical information are generated in this part, such as endogenous rate in .flagstats files and average read length in .stats files (Figure 3).
Ancient DNA authentication (part 8)
In this part, we aim to evaluate the quality of sequencing ancient data. These analyses include authentication of characteristic ancient DNA damage patterns (Figure 4), estimation of the contamination rate, assessment of the kin relationship among samples (Figure 5), generation of EIGENSTRAT format files used for population genetic analyses (.fam/.geno/.snp) and evaluation of samples coverage. The estimation results for the contamination rate will be generated in the _final.cont.est files. Additionally, the pileupCaller.log file will contain details on the number of SNPs that overlap with the 1240 k dataset.
Population genetic analyses (part 9)
A few basic population genetic analyses are included in this part, including PCA, ADMIXTURE, f-statistics test and qpAdm. We first gain an overall genetic structure of the test population from PCA and ADMIXTURE analyses (Figures 6 and 7). We could find a close relationship between CDG and JHC with the Yellow River populations. Then, we use f-statistics and qpAdm to quantify the genetic profiles of test populations (Figures 8, 9, and 10).
Limitations
There are numerous analytical methods in ancient DNA studies. This protocol performs only the most fundamental and commonly used analysis procedures. Additional analyses should be included according to the research requirements.
Troubleshooting
Problem 1
Low yield of DNA library molecule (step 21 in part 6: Library quantification and sequencing).
Potential solution
-
•
Repeat library preparation with more DNA extract.
-
•
Visually inspect the success of mixing for each tube.
Problem 2
Poor separation of markers (step 22.a in part 6: Library quantification and sequencing).
Potential solution
-
•
Allow the gel to come to 23°C before starting electrophoresis
-
•
Increase run time.
Problem 3
The main fragment size of the DNA library does not fall in the range from 150 bp to 250 bp (step 22.b, 23 and 24 in part 6: Library quantification and sequencing).
Potential solution
Use a 50 bp DNA marker to verify a proper ratio of beads, and make sure the beads are sufficient to bind DNA fragments 100 bp and larger. Return to step 13 in part 4, and use the previously frozen adapter fill-in to continue the remaining process.
Problem 4
Bwa samse command fail (step 26 in part 7: Mapping sequencing reads to Human Reference Genome).
Potential solution
Samtools view requires “-” to use stdin as input in the early version. Error in this step might be due to the lack of “-“at the end of the command.
Problem 5
Schmutzi command fails with an error message “2.schmutzi.sh: line 39: /PATH/TO/schmutzi/src/contDeam.pl: No such file or directory” (step 32 in part 8: Ancient DNA authentication).
Potential solution
Ensure that Schmutzi has been installed correctly and modify Lines 13 and 14 in the 2.schumutz.sh.
Problem 6
Schmutzi command fails with error messages “Error in library(fitdistrplus): there is no package called ‘fitdistrplus’” and “Error in library(MASS): there is no package called ‘MASS’” (step 32 in part 8: Ancient DNA authentication).
Potential solution
Run “install.packages("fitdistrpllus")” and “install.packages("MASS")” in R.
Problem 7
samse command fails with an error message “SyntaxError: invalid syntax” (step 35 in part 9: Ancient DNA authentication).
Potential solution
READ.py was written in Python 2.x, and attempting to run it with Python 3.x will cause SyntaxError. Make sure that you use the correct Python version. An alternative solution would be to utilize the built-in Python script 2to3.py to transform READ.py into a Python 3.x version.
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Chuan-Chao Wang (wang@xmu.edu.cn).
Technical contact
Technical support and information will be provided by the technical contacts, Kongyang Zhu (kyanzhu@foxmail.com), Haifeng He (HeHaifeng_H2F@outlook.com) and Le Tao (taole@stu.xmu.edu.cn).
Materials availability
This study did not generate new reagents.
Data and code availability
The test dataset is freely available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/ and at Google Drive: https://drive.google.com/drive/folders/1zoKse3cUX-p6h7l_VxVdAFaZzNGYDNtZ?usp=sharing.
Acknowledgments
The work was funded by the National Natural Science Foundation of China (32270667), the Natural Science Foundation of Fujian Province of China (no. 2023J06013), the Major Project of the National Social Science Foundation of China granted to C.-C.W. (21&ZD285), Open Research Fund of State Key Laboratory of Genetic Engineering at Fudan University (no. SKLGE-2310), Major Special Project of Philosophy and Social Sciences Research of the Ministry of Education (2022JZDZ023), the “Double First-Class University Plan” key construction project of Xiamen University (0310/X2106027), Nanqiang Outstanding Young Talents Program of Xiamen University (X2123302), and European Research Council (ERC) grant (ERC-2019-ADG-883700-TRAM).
Author contributions
C.-C.W. and J.G. conceived and supervised the project. H.H., L.T., H.M., and X.Y. conducted the wet experiments. K.Z., L.T., and R.W. analyzed the data. K.Z., H.H., L.T., and C.-C.W. wrote and edited the paper. All the authors revised the paper.
Declaration of interests
The authors declare no competing interests.
Contributor Information
Jianxin Guo, Email: jxguo@xmu.edu.cn.
Chuan-Chao Wang, Email: wang@xmu.edu.cn.
References
- 1.Tao L., Yuan H., Zhu K., Liu X., Guo J., Min R., He H., Cao D., Yang X., Zhou Z., et al. Ancient genomes reveal millet farming-related demic diffusion from the Yellow River into southwest China. Curr. Biol. 2023;33:4995–5002.e7. doi: 10.1016/j.cub.2023.09.055. [DOI] [PubMed] [Google Scholar]
- 2.Meyer M., Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010;2010 doi: 10.1101/pdb.prot5448. pdb.prot5448. [DOI] [PubMed] [Google Scholar]
- 3.Schubert M., Lindgreen S., Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes. 2016;9:88. doi: 10.1186/s13104-016-1900-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Patterson N., Moorjani P., Luo Y., Mallick S., Rohland N., Zhan Y., Genschoreck T., Webster T., Reich D. Ancient admixture in human history. Genetics. 2012;192:1065–1093. doi: 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M., Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10 doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Peltzer A., Jäger G., Herbig A., Seitz A., Kniep C., Krause J., Nieselt K. EAGER: efficient ancient genome reconstruction. Genome Biol. 2016;17:60. doi: 10.1186/s13059-016-0918-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Patterson N., Price A.L., Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2 doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Skoglund P., Northoff B.H., Shunkov M.V., Derevianko A.P., Pääbo S., Krause J., Jakobsson M. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl. Acad. Sci. USA. 2014;111:2229–2234. doi: 10.1073/pnas.1318934111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Monroy Kuhn J.M., Jakobsson M., Günther T. Estimating genetic kin relationships in prehistoric populations. PLoS One. 2018;13 doi: 10.1371/journal.pone.0195491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Renaud G., Slon V., Duggan A.T., Kelso J. Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol. 2015;16:224. doi: 10.1186/s13059-015-0776-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dabney J., Knapp M., Glocke I., Gansauge M.T., Weihmann A., Nickel B., Valdiosera C., García N., Pääbo S., Arsuaga J.L., Meyer M. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. USA. 2013;110:15758–15763. doi: 10.1073/pnas.1314445110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rohland N., Mallick S., Mah M., Maier R., Patterson N., Reich D. Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs. Genome Res. 2022;32:2068–2078. doi: 10.1101/gr.276728.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The test dataset is freely available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/ and at Google Drive: https://drive.google.com/drive/folders/1zoKse3cUX-p6h7l_VxVdAFaZzNGYDNtZ?usp=sharing.

Timing: 2 h







