Skip to main content
iScience logoLink to iScience
. 2023 Oct 4;26(11):108066. doi: 10.1016/j.isci.2023.108066

Review: Computational analysis of human skeletal remains in ancient DNA and forensic genetics

Ainash Childebayeva 1,2,, Elena I Zavala 3,4
PMCID: PMC10622734  PMID: 37927550

Summary

Degraded DNA is used to answer questions in the fields of ancient DNA (aDNA) and forensic genetics. While aDNA studies typically center around human evolution and past history, and forensic genetics is often more concerned with identifying a specific individual, scientists in both fields face similar challenges. The overlap in source material has prompted periodic discussions and studies on the advantages of collaboration between fields toward mutually beneficial methodological advancements. However, most have been centered around wet laboratory methods (sampling, DNA extraction, library preparation, etc.). In this review, we focus on the computational side of the analytical workflow. We discuss limitations and considerations to consider when working with degraded DNA. We hope this review provides a framework to researchers new to computational workflows for how to think about analyzing highly degraded DNA and prompts an increase of collaboration between the forensic genetics and aDNA fields.

Subject areas: Molecular biology, Computational bioinformatics, Paleogenetics, Archeology

Graphical abstract

graphic file with name fx1.jpg


Molecular biology; Computational bioinformatics; Paleogenetics; Archeology

Introduction

Human genetics is a cornerstone of both the fields of forensic genetics and ancient DNA (aDNA). In forensic genetics, this typically relates to linking DNA recovered from a piece of evidence to a specific individual. This can include not only matching DNA profiles, but also information about an individual’s phenotype,1 genetic ancestry,2 and/or relatives3 that may be paired with non-genetic evidence to narrow the investigative space. In aDNA, recovered DNA has been used to learn more about past human interactions, kinship structures and migrations,4,5,6,7 test evolutionary hypotheses,8 and to study phylogenetic relationships between archaic lineages and their modern representatives.9,10,11 Degraded DNA is a hallmark of aDNA, due to the time periods of the remains from which data are generated. Human identification (HID) casework, a subset of forensic genetics that includes disaster victim identification, active and cold cases, and historical identifications, also deals with degraded DNA depending on the time periods and environmental conditions of the recovered human remains.12,13,14 The challenges faced with generating DNA profiles from degraded DNA are therefore shared between the forensic genetics and aDNA fields. Overlaps and benefits of exchanging laboratory protocols between these fields have been previously outlined.15,16 In this review, we build on this foundation by focusing on the impacts of degraded DNA on the computational portion of analysis while highlighting overlapping and distinct features between forensic genetics and aDNA.

The key characteristics of degraded DNA are its relatively short fragment length (30–70 base pairs), limited quantity, and damage patterns,17,18,19,20 each of which presents a challenge that both fields have worked to overcome for data generation and analysis. Conventional laboratory methods for isolating non-degraded DNA from different sources (DNA extraction) and preparing it for downstream analysis favor the exclusion of small DNA fragments, which are typically thought to be artifacts or uninformative. These protocols have thus needed to be altered for application to degraded DNA. Early exchanges of DNA extraction protocols between the fields21,22,23,24 have continued through the decades, leading to the recovery of DNA fragments less than 50 base pairs14,25,26,27,28,29 and establishing pre-treatment protocols for contamination removal.30,31,32 The later due to the low endogenous DNA content of degraded samples, which makes them susceptible to exogenous contamination from other DNA sources (i.e., microbial and non-degraded human DNA). This has resulted in all pre-DNA amplification steps being carried out in specialized clean room laboratories dedicated to aDNA work,20,33,34 with similar guidelines being established for forensic analyses.35

Degraded DNA is typically fragmented and present in low quantities which is a challenge for preparing the extracted DNA for downstream analyses. DNA cloning was used to identify the first aDNA fragments,36,37 but this method often generated artifacts that led to false positives. While the advent of PCR helped to overcome this challenge, damage patterns and short fragment sizes resulted in low amplification efficiency and the co-amplification of often indistinguishable contaminant DNA.19,20 The advent of next-generation sequencing (NGS) technology has provided an avenue for data generation through parallel sequencing of millions of DNA molecules and downstream bioinformatic processing. As with the initial data generation steps, the analysis of NGS data has required the development of bioinformatic tools and techniques to address the difficulties arising from the degraded nature of the DNA source, which is the focus of this review.

Despite all these challenges, in the last decade, the publication of more than 10,000 genome-wide and whole-genome data from ancient humans has made it possible to learn more about human evolutionary history and genetics, even in areas that are known to be challenging for DNA preservations due to high ambient temperatures and humidity (Figures 1A, 1C, and 1D). NGS-based methodologies have expanded the information that can be gained from degraded DNA samples beyond the traditional forensic DNA profile standard of short tandem repeats (STRs).38,39,40,41,42 Although STRs are unlikely to be replaced for routine forensic casework due to their prevalence in existing databases and the ease of generating STR profiles, NGS analysis of SNPs has gained traction. SNPs have been shown to be more effective for generating DNA profiles from degraded remains,43,44,45,46,47 including enabling the identification of individuals through more distant relatives (investigative genetic genealogy, IGG) instead of via first-generation relatives or a direct match.48,49 The growing interest for NGS in forensic studies is exemplified by the marked increase in studies related to NGS in forensics in the last decade (Figure 1B). The increase in the number of laboratories performing research on degraded DNA, publications of step-by-step laboratory protocols,26,50,51,52,53,54,55 computational pipelines,56,57,58 and workflow primers55 has helped to ensure transparency and reproducibility of processing between aDNA datasets. Within forensics, organizations such as the Scientific Working Group on DNA Analysis Methods (SWGDAM)59 (in the US), the European Network of Forensic Science Institutes (ENSFI),60 and the International Society of Forensic Genetics (ISFG)61 have served as platforms for discussion, sharing of protocols, and the development of guidelines for quality assurance of forensic DNA analysis. However, in both fields, step-by-step pipelines of the computational workflows, including discussions around limitations, are limited.

Figure 1.

Figure 1

Distribution of published ancient DNA data and frequency of forensic genetics NGS studies

(A) Map of published aDNA data. Frequency indicates a number of individuals, Years BP = thousands of years before present, Data source = capture data, shotgun data, or a combination of both. AADR v54.163 was used for metadata; (B) A histogram of the number of articles with titles, abstracts, or keywords that include forensics and mention NGS or massively parallel sequencing (gray) and degraded (yellow) based on a search on scopus.com; (C) Average Annual Temperature map (The Nelson Institute Center for Sustainability and the Global Environment, University of Wisconsin-Madison); (D) Average Annual Relative Humidity map (The Nelson Institute Center for Sustainability and the Global Environment, University of Wisconsin-Madison).

Since the advent of NGS, a natural widening has occurred between the forensic and aDNA fields. The legal connotations of forensic casework and its consequences on people living today require forensic laboratories to adhere to strict quality assurance standards for laboratory accreditation and strict IT requirements.62 This includes performing verification and validation studies before the implementation of new wet laboratory methods and software (including any version updates). These rigorous standards necessarily slow the integration of new technology into forensic genetics practice, emphasizing the importance of understanding the limits and factors impacting the accuracy of new methods and techniques. Leveraging the flexibility of the aDNA field to explore and test new methods has the potential to narrow the search space for advancing forensic genetics technology, as has already been discussed for laboratory methods.15 In this review, we focus on the computational workflows performed in forensic genetics and aDNA analysis when working with low-coverage NGS data. This includes a discussion of limitations and contextualizing the decision-making processes involved at different steps. We hope this both provides a solid foundation for those new to computational analysis in either field, but also prompts interdisciplinary conversations that will lead to mutually beneficial advancements in forensics and aDNA.

Sampling and laboratory work

The general laboratory workflow for degraded DNA analysis can be divided into five steps: sample preparation, DNA extraction, library preparation, in some cases targeted enrichment, and sequencing (Figure 2). The genetic material used for the analyses covered in this review is typically recovered from skeletal material; however, rootless hair has also been shown to yield degraded DNA,64,65,66 including in commercial forensics applications.67 DNA is extracted and purified from bone or tooth powder that has been drilled or ground from a particular skeletal element. The resulting DNA extract contains all DNA extracted from the sample, including microbial DNA and other non-endogenous DNA (contamination) from individuals or animals who may have come into contact with the remains of interest. Extracted DNA is then converted into libraries by adding adapter sequences, which allow it to be sequenced. Included in this library preparation step is the addition of indices (short oligos) to the DNA library molecule, which work as a tag to identify which DNA sequences are associated with which library. Depending on the data quality and types of questions that are being addressed, enrichment of specific loci may also be performed prior to sequencing. While this provides a brief overview, there are different considerations that must be made for each step.

Figure 2.

Figure 2

Simplified workflow for forensic and ancient genetic analyses

The first step in the workflow is DNA extraction in a dedicated pre-PCR lab room. Following the DNA extraction, a genetic library is constructed by adding barcodes/indices and adapters to the DNA for downstream sequencing. The library is then sequenced for downstream bioinformatic analyses. In some cases, targeted capture is done to enrich for specific targets or types of DNA, also followed by NGS and downstream analyses.

Due to the limited availability of ancient remains, the importance of these remains for relatives and descendant communities, or the historical and biological record, many studies have focused on reducing the amount of bone material used, while maximizing the amount of DNA recovered.68,69,70 While the petrous portion of the temporal bone has been found to be a rich source of DNA in both forensic and aDNA studies,71 due to its importance for understanding hominin evolution and associated invasive sampling requirements, teeth and other long bones are often substituted depending on the sample quality and availability.70 For the recovery of highly degraded DNA (<100 base pairs in length), different versions of an inorganic DNA extraction method26,72,73,74 are used in both aDNA and forensics, paired with either double- or single-stranded (ss) DNA library preparation.53,75,76,77,78 Library preparation for forensics is typically amplicon based and sequencing directly follows the completion of this step. For “younger” (<∼30,000 years) aDNA samples that have relatively better quality DNA, the double-stranded library preparation method is typically coupled with a partial uracil-DNA glycosylase (UDG) treatment to repair DNA damage while preserving the deamination on the terminal ends to allow for aDNA authentication.79 Decisions around library preparation methods and whether whole-genome shotgun or targeted enrichment is performed impact how the data are analyzed downstream as will be discussed in the following section.

Specific regions or types of DNA are sometimes targeted for enrichment (e.g., hybridization capture) in order to decrease sequencing costs and/or restrict the type of genetic data that is produced.44,80,81,82 Which regions of the genome are targeted for analysis differ between forensic and aDNA analyses. Forensic studies typically focus on common SNPs that are informative for individualization and may also include SNPs that provide insights about genetic ancestry, phenotype (skin, hair, and eye color), or can be used for identifying genetic relatives44,49,83(Figure 3A). Concerns around genetic privacy of individuals whose data may be collected for forensic databases also motivates the use of SNP panels instead of whole-genome sequencing (WGS) in forensic casework to minimize collecting medically informative data.84 Ancient data capture arrays target SNPs that are informative for evaluating genetic variation on a population rather than individual scale. One array commonly used in aDNA studies is the “1240k” SNP capture array that targets ∼1.2 million single-nucleotide positions representing global genetic variation, as well as functional SNPs and SNPs under selection,4,85 which has been commercially available since 2021.86 An updated version of the array, known as the “Twist Ancient DNA” assay,87 is able to enrich for ∼1.4 million SNPs, containing additional SNPs not present on the 1240k array. A larger set of ∼3.7 million SNPs adds additional SNP panels to the 1240k set that are informative for genetic variation observed in Neanderthals and Denisovans.85 In addition, the aDNA field generally promotes open data sharing.88 However, there are cases when open data sharing is discouraged, for example, when working with Indigenous groups89 (see the Public Databases section for further discussion).

Figure 3.

Figure 3

Comparison of aDNA and forensic genetics analyses and quality control measures

(A) The different types of conventional downstream analysis currently performed in aDNA and forensic genetics fields as well as (B) quality control measures for monitoring contamination and confirming endogenous DNA.

Initial bioinformatic processing

In order to perform downstream analysis, raw sequence data must undergo preliminary bioinformatic processing (Figure 4). This generally includes demultiplexing (assigning sequences to their specific libraries based on their assigned indices), trimming of adapter sequences, removal of PCR duplicates, filtering based on length and quality metrics, and mapping of sequences to a reference genome. These preliminary steps are performed in both fields. Subsequent NGS forensic genetics analysis can be split into three categories (amplicon-based sequencing, enrichment or capture, and WGS) all of which are typically performed with commercial kits. Amplicon-based sequencing is widely used within forensics as it helps to maintain similarity to previous amplicon-based genetic workflows, compatibility with existing databases, and is required for correctly identifying STR alleles due to challenges in determining the start and end positions of these short repetitive regions. Unless unique molecular identifiers are used, PCR duplicate removal is not performed for amplicon-based sequencing. For mtDNA analysis, different commercial software packages are available that have specifically been designed to work with mapping mtDNA (a circular reference) and improved calling of indels (e.g., QIAGEN’s CLC Genomics Workbench,44,90 AQME,91 and SoftGenetics’s GeneMarker HTS92) that also allow users to visualize the pileup of reads. Nuclear DNA kits are often paired with commercial bioinformatic workflows that take in sequencing data, perform demultiplexing, adapter trimming, and mapping, and then present the user with genotype calls (e.g., Verogen’s FGx system,93 or Thermo Fisher Scientific’s HID Ion GeneStudio S5 System94,95). The development of bioinformatics workflows for sequencing applications in forensics is rare as forensic laboratories may not have the personnel (bioinformaticians) or the flexibility to develop such pipelines when adhering to the information management and IT standards of government data security systems. Thus, commercial software packages such as Parabon Fχ Forensic Analysis Software Platform, QIAGEN’s CLC Genomics Workbench, and SoftGenetic’s NextGENe may be used for analysis of WGS and capture data, each of which includes a deduplication step in addition to the adapter trimming and mapping. While helpful for reproducibility, it is often difficult for the general public to directly test how different data qualities impact specific workflow elements via simulations or other testing as these workflows are typically not open source.

Figure 4.

Figure 4

General bioinformatic processing steps in aDNA and forensic genetics

Light blue color indicates steps that are more relevant for aDNA, while light gray for forensics.

Initial bioinformatic analysis of aDNA data is conventionally performed as follows. After demultiplexing, raw sequencing FASTQ files are trimmed to remove sequencing adapters,96 and the reads are mapped to the reference genome using tools like bwa97 or Bowtie98 with relaxed parameters99 to produce BAM files. Duplicate reads are removed using tools such as Picard MarkDuplicates100 or Dedup.101 The aDNA status is then authenticated based on deamination, represented by C-to-T substitutions on the 5′ end (and G-to-A substitutions on the 3′ end in double-stranded libraries) (see Deamination in Figure 3B), and using tools like MapDamage2.0102 or DamageProfiler,103 which allow the user to both visualize and quantify the damage patterns in the data. Based on the damage profile, read trimming is performed to remove the DNA damage, which accumulates at the terminal ends of the reads, with the number of the bases trimmed depending on the type and architecture of the library preparation. Two to three bases can be trimmed from a double-stranded UDG-half library,104 eight to ten bases from a double-stranded non-UDG library (assessed per library), while the double-stranded full-UDG libraries are not trimmed. In the case of non-UDG ssDNA libraries, the number of bases trimmed is typically performed on the terminal bases in a library-dependent manner.

Identifying contamination

Contamination in the context of genetic analysis of an individual’s skeletal remains refers to the presence of any DNA that is not from that individual. The presence of contamination is a concern for both forensic and aDNA analyses and each field has developed different techniques for monitoring its presence (Figure 3B). Forensic laboratories maintain databases of DNA profiles from potential contamination sources (individuals who are involved in casework, have been in the laboratory, etc.) and previous casework which can be compared to genotyped casework samples.35 Internal validation studies are used to set coverage thresholds for calling consensus alleles and determining that a profile is predominantly from a single individual.105,106 This is also part of what drives the high-coverage thresholds in forensic analysis. Data that are determined to represent multiple individuals are excluded from downstream analyses. Finally, in some forensics laboratories, when possible at least two skeletal elements or bone powder aliquots per individual are processed and the DNA profiles between these extracts are required to match.12 This independent replication of results also serves as a means to check for concordance between replicates, and to monitor for sample switches that may have occurred during batched downstream processing. Both forensic and aDNA fields carry contamination controls, such as no-template controls also known as negative controls and reagent blanks containing all reagents used in the experiment minus the sample,107 from DNA extraction and library preparation steps through to sequencing to identify potential contamination from reagents or handling in the laboratory.

Methods for estimating contamination in aDNA studies are often based on the haploid DNA elements and rely on measuring heterozygosity levels: X chromosome in males,108,109 or mtDNA (contamMix110,111 and schmutzi112) in both males and females. Other methods include comparing the contamination in reads with and without aDNA damage,113 or evaluating the breakdown of linkage disequilibrium.114 Unambiguous sex determination can also be used as a metric for contamination assessment, except in instances of same-sex contamination, as well as the unambiguous determination of mitochondrial and Y chromosome haplogroups. The methods listed previously that are based on confirming the presence of a single individual can also be used in HID casework as exemplified in a recent study on the analysis of hair from Ludwig van Beethoven.115

Limitations of contamination metrics

Each of the contamination evaluation methods described previously has its own limitations, the impacts of which are dependent on the data that are being evaluated. Methods based on the coverage of the X chromosome are most effective when applied to male individuals for detecting female contamination. Detecting male contamination is still possible, but requires determining if there are multiple X chromosomes resulting in polymorphic loci. Methods that rely on mtDNA overcome this issue, but still require at least 3-fold coverage depth (e.g., schmutzi112) and an even coverage of the mitogenome. Other methods were designed for specific sample preparation and sequencing parameters (e.g., AuthentiCT113), such as single-stranded library preparation and paired-end sequencing. More generally, contamination estimation methods tend to work more reliably on higher coverage data, and are not as accurate when used on low-coverage samples. Users must consider which tools are most applicable to their data and may decide to use multiple methods. It should also be noted that contamination estimates for the mtDNA and nuclear DNA can differ and it is therefore necessary to monitor the mt/nuclear (nc) DNA ratio.116 The mt/nc ratio is known to vary between and within the same bone sample,117 and can influence the contamination estimates by underestimating nDNA contamination when extrapolated from mtDNA in cases of high mt/nc ratio.116 Overall, for aDNA analysis, a contamination level of 5% is often considered an upper threshold for inclusion in the downstream analyses.109,114

Chromosomal sex determination

Sex determination based on evidence for the presence of different sex chromosomes is common practice in forensic and aDNA analysis for both inferring biological sex and evaluating the presence of contamination. Within the forensic field, sex determination is sometimes performed by looking for the number of copies of the amelogenin gene where two copies indicate a male and one indicates a female. However, this test is not always reliable as a deletion of this gene in males has been observed,118,119 leading to the use of other regions of the Y chromosome.120,121

In aDNA studies, different methods for chromosomal sex determination are used based on the data analyzed. The first and perhaps most straightforward approach is to evaluate the ratio of average coverage across the X and Y chromosomes compared to the average coverage across autosomal chromosomes,122 while normalizing for the target size of each chromosome.123 The expectation for this method (after normalization) is that males with one X chromosome would have half the coverage on the X compared to females. This method has been shown to work with at least 1,000 reads, but should only be applied to WGS data. When using capture data, different methods are used to correct for the preferential enrichment of regions of the autosomal and sex chromosomes which impacts expectations around coverage ratios. When limiting analysis to a set of “390k” array SNPs (a subset of the 1240k SNP panel), a ratio of reads mapping to the Y and X chromosomes is calculated as Y/(Y + X).4 When expanding the analysis to the full 1240k SNP panel, these ratios are corrected by the number of bases targeted on each chromosome.124 Another technique, which has been used for both low-coverage WGS and capture data, is to calculate the ratio of the average coverage of the X chromosome to the average coverage of X and autosomal chromosomes (X/(X + auto)).125 Care should be taken when applying this technique to capture data, which is already known to not work as expected with the 1240k SNP panel. Sanity checks by comparing calculations with similarly processed data for individuals of known chromosomal sex can be helpful in these situations. All methods mentioned will be impacted by the presence of contamination and also are created to differentiate between a binary where XX and XY are the two possible outcomes. Alternative methods have been developed for identifying other karyotypes, which have resulted in the identification of ancient individuals who may have had Klinefelter syndrome (XXY).126,127

Summary statistics

The points for making decisions around data quality and downstream processing differ between the forensic genetics and aDNA fields. In forensic workflows, qPCR is performed on DNA extracts to quantify the amount (nanograms) of DNA present in the extract and to detect potential PCR inhibition.128,129,130 This step is used for calculating input volumes for library preparation. Evaluation of contamination and coverage estimates is then combined with the genotyping and analysis steps. The motivation behind this workflow is the large numbers of samples (predominantly non-degraded), time constraints, and that validated workflows cannot be changed in order to enable direct comparisons across laboratories and to maintain laboratory accreditations.

In contrast, many aDNA laboratories perform low-coverage WGS to evaluate the data quality and make decisions around how, and if, to generate more data. In studies with large numbers of individuals that are presumed to have DNA of similar quality, a subset of skeletal elements may be evaluated for their DNA preservation before making decisions that are applied to the full set of skeletal remains. The set of summary statistics used for this initial data quality evaluation typically include the percent of sequences that map to the reference genome (in total and for a certain length cutoff), duplication rate, deamination percentages, coverage of the reference genome, complexity of each library, and contamination. The percent of mapped reads informs on how much human DNA (endogenous and contamination) is present in a DNA library relative to all sequences (and categorized by a minimum length, typically 25–35 base pairs). If a library contains a high percentage of short DNA fragments, gel cuts, or physical separation of DNA band(s) above a certain fragment length from an agarose gel for downstream analysis, may aid in decreasing sequencing costs.53 As this is a time-intensive and complex protocol, it is not recommended for routine use. Duplication rates can be a reflection of the amount of unique DNA molecules in a library, since the more times that the same original DNA molecule is sequenced the less likely it is that new, unsequenced molecules are still present in a library. High duplication rates can indicate that increased sequencing depth will unlikely result in an increased genome coverage. Deamination rates are used to determine the aDNA status, and an arbitrary cutoff of at least 10% observed C-to-T substitutions on terminal ends is used to indicate the presence of aDNA in non-UDG-treated libraries, or C-to-T substitutions in only the two terminal bases of reads in UDG-half libraries. Coverage estimates begin to provide information about how much data are sequenced from a given library; however, this is only informative for the portion of the library that was sequenced. To determine how much data may still be available in the library, we recommend using complexity estimates. This can be measured by first calculating the number of informative sequences present in the library (percentage of mapped reads above a certain length and quality threshold multiplied by the number of molecules present in the library as determined by qPCR).131 The number of informative sequences are then multiplied by the average fragment length of filtered reads and divided by the genome target size (i.e., 3 billion for the human genome). This metric provides an estimate for the theoretical coverage that can be obtained from a library if every DNA molecule is sequenced. Alternatively, library complexity can be calculated bioinformatically after sequencing using tools like Picard (GATK). Contamination estimates (discussed previously) will provide information as to what percentage of the previously calculated complexity is endogenous DNA. These numbers can then be used to estimate the cost and feasibility of generating different genome coverages. It is at this point where decisions are made as to whether to proceed with data generation and if WGS or SNP capture-based approaches should be pursued. While WGS is the gold standard when it comes to the amount of data generated and the potential analyses available, SNP capture is a more cost-effective approach when taking into account the low endogenous content of the aDNA data.

Genotyping

Forensic genetics

The term genotyping, while focused on allele determination at specific loci, has been used to refer to different segments of workflows from sample preparation to allele determination using various methods (e.g., capillary electrophoresis). In this review, we will refer to genotyping as the process by which sequencing data are used to determine alleles at specific loci.

Software coupled with amplicon-based NGS kits for forensic applications use a binary or threshold genotyping approach where the sequencing read pileup at each position is examined to determine if a locus is homozygous or heterozygous. Genotype calls are based on predetermined analytical thresholds for allelic coverage and heterozygous balance and typically require relatively high coverages at each locus (e.g., >650 reads for amplicon-based sequencing93). For degraded samples profiled with NGS, a 10X reporting threshold has been used for both mtDNA12 and SNPs.48 Validation studies are key for setting reporting thresholds as outlined in SWGDAM guidelines,132 FBI Quality Assurance Standards,133 and ENSFI best practices manuals.134,135

Probabilistic genotyping offers an alternative to the binary approach that can include models evaluating multiple factors (e.g., number of individuals, heterozygosity, amount of data) and incorporate prior knowledge based on available reference data (e.g., sequencing error, allele frequency errors, patterns of linkage disequilibrium),136 to provide a probability that each genotyped SNP is correct. Genotype probabilities can then be incorporated into downstream analyses and allow for more informed decisions on which SNPs to include in a final DNA profile. The benefits of integrating probabilistic genotyping methods into forensic genetics have been recognized via validation studies and guidelines.137,138 While most of these studies are focused on STR and mixture analysis, the application of probabilistic genotyping to degraded, single-source samples has in recent years begun to be explored for both identification of human remains and for identifying potential perpetrators in criminal cases.45,139 Notably, the probabilistic genotyping method used in the study by Gordan et al., 202245 (ATLAS140) was developed for aDNA data and allows the user to take into account deamination rates, which have been shown to be present in historical remains.14 Due to the highly degraded quality of DNA in ancient studies, it is unsurprising that methods from this field may be useful for forensic casework involving historical and/or degraded remains.

Ancient DNA

Genotyping of aDNA is often split into two categories depending on the data quality and planned downstream analyses. The first is pseudo-haploid genotyping, which involves randomly selecting a single read per position in place of calling a true diploid haplotype. This is typically performed when working with low-coverage data, genome-wide array data, or when analyzing a large number of genomes where the majority are low coverage. There are different available software for performing pseudo-haploid genotyping, including pileupCaller and bam-caller (Table S1). Each of these softwares allows users to specify which SNPs should be genotyped and allows filtering based on coverage, mapping quality, and base quality. They can either randomly select a read from the pileup or, given sufficient coverage, select the allele supported by the majority of the reads. This step should be performed after additional end trimming to minimize impacts of deamination. For non-UDG or partial-UDG ssDNA libraries, deamination or contamination impacts can be further minimized by limiting calls of C-to-T SNPs to the reverse strands, and G-to-A to the forward strands only. In addition, after genotyping, one can quantify the observed number of transitions and transversions (C>T, A>G, A>C, etc.) to determine if this ratio (also known as Ti/Tv) differs from the expected value of 2–2.1 for WGS data.141 However, this method will not work for capture arrays where the Ti/Tv ratios significantly deviate from the expectation.

Probabilistic genotyping is generally used for aDNA samples with better coverage. There are different software that can be used to determine genotype likelihoods in ancient samples: snpAD,142 ATLAS,143 bcftools,144 GATK,145 ANGSD,108 and others (Table S1). A set of reference genotypes from modern data is often employed to aid genotyping ancient samples, a commonly used one being the 1000 Genomes reference dataset.146 Again, trimming of termini is important prior to diploid genotype calling. However, tools like ATLAS143 are able to take into account the aDNA damage when determining genotype likelihoods and thus additional preprocessing is not necessary prior to the genotyping. Moreover, when using non-UDG data, genotyping can be restricted to transversions only, which are not prone to aDNA deamination, and are thus more reliable for downstream analyses, as well as restricting to damaged reads with PMDtools.147 Non-UDG ssDNA library data further allow for processing reads separately, which can serve as an additional control.

Limitations and considerations

Limitations of genotyping can be examined from two different perspectives: genotype accuracy (error rate and allelic dropout) and the impact of this accuracy on downstream analyses. Here, we focus on the former for currently used methods in forensic and aDNA work based on evaluations with simulation studies, which allow decoupling of laboratory and bioinformatic parameters. This excludes analysis pipelines paired with commercial kits.

Low coverage is a known concern for genotyping as it can result in allelic dropout and increased stochasticity in allele sampling, complicating differentiation between heterozygous and homozygous loci. A recent forensic case solving a 16-year-old double murder in Sweden48 encouraged pairing WGS with SNP panels as a check for these issues. Conventional forensic genotyping also does not take into account the presence of damage patterns, which have been observed in DNA recovered from historical and forensic casework.14,148,149,150 Due to the high-coverage values required for typical forensic casework, low rates of damage are negligible and are not expected to impact downstream analyses, but may have an impact on low-coverage samples. Practices from the aDNA field of trimming ends of reads to remove damage patterns or utilizing probabilistic genotypers that take postmortem damage into account may open up more degraded samples for HID analysis.

Genotyping of low-coverage aDNA data with pseudo-haploid calling in theory does not have a limit as, given coverage by at least one read, an allele can always be selected. However, the presence of contamination can decrease the chance of randomly sampling reads from an endogenous content. In parallel processing, limiting analyses to putatively deaminated fragments can serve as a sanity check for evaluating if certain signals are contamination driven. Another check is to use the f4-statistic, a summary statistic measuring correlations in allele frequencies between four populations151 (see the downstream analyses section for explanation of f statistics), in the form f4 (all fragments, deaminated fragments; set of test modern populations, outgroup). If there is no contamination, the resulting statistic should be ∼0, i.e., indistinguishable from 0. Reference bias is also a concern and can be checked with an f4-statistic if there is a diploid version of the genotype as well (i.e., in scenarios where higher and low coverage data are co-analyzed). The f4-statistic can be used to detect reference bias in pseudo-haploid testing, when set in the form f4 (diploid genotypes, pseudo-haploid genotypes; reference genome, outgroup). A significant negative f4-statistic would indicate attraction between the pseudo-haploid data and the reference (reference bias). In case of archaic individuals, pseudo-haploid data may be attracted to the outgroup via the so-called “long-branch” attraction.

Reference bias continues to be a concern for probabilistic genotyping, which is relevant for both modern and ancient applications. This bias was identified when researchers discovered that higher genotype probabilities are assigned to calls that are homozygous with respect to the reference used for alignment, which are composed predominantly of European and African ancestry.152 Evaluating reference bias for genotyping individuals from underrepresented populations continues to be assessed153 and many studies focused on these groups start with an evaluation of genotyping accuracy for identifying rare variants. Modern reference databases may not fully represent the genetic variation of individuals involved in forensic casework or past populations studied in aDNA. Moreover, modern individuals represent already admixed states, and thus exhibit different patterns on linkage disequilibrium and potentially shorter haplotypes compared to the more ancient un-admixed sources. When deciding which methods to use, it is important to remember that different degrees of uncertainty can be tolerated for genotyping in forensics and aDNA. In the case of aDNA, more relaxed quality control metrics, and reliance on population-wide estimates allows for usage of lower quality and quantity data compared to the forensic data where the goal is identification of individuals. The potential implications of making an error are also significantly greater in forensics where genetic evidence is used in court cases. However, there is also potential for aDNA to directly impact present-day people, for example aDNA evidence can be used by Native American tribes to gain U.S. federal recognition.154

Imputation

Imputation, or filling in missing genotype information, is a common procedure in both modern and ancient datasets that allows the inclusion of additional genetic information based on patterns of linkage disequilibrium across the genome.155,156 Reference datasets provide information on what alleles are more likely to be inherited together. Imputation methods often involve a phasing step where maternal and paternal chromosomes are separated into haplotypes (for review, see the study by De Marino et al.157). Phasing often relies on using a reference dataset, or related trios (parents and offspring), as short read sequencing is not informative on the background of alleles. Common uses for data after imputation and phasing include identity-by-descent (IBD) calling, local ancestry inference, selection scans, demographic modeling, and other analyses. When performing imputation, reference panels of worldwide populations are regularly used (such as the 1000 Genomes Reference panel146); however, in cases with large numbers of test samples, the use of a reference set may not be necessary.158

Forensic genetics

Imputation is just beginning to be explored for forensic applications with low-coverage data.48,159 Direct-to-consumer testing companies, such as FamilyTree,160 and GEDmatch,161 have databases that have been used for IGG and typically type between 0.7 and 1.6 million SNPs. Commercial forensic sequencing labs like Astrea Forensics use imputation in their pipeline for recovery of low-quality DNA for comparison to direct-to-consumer tests (Astrea Forensics, California, USA). Imputation has the potential to increase the ease of comparing DNA profiles generated from these different platforms and also improve chances of identification from low-quality samples with partial DNA profiles by producing a more complete profile. It could even allow for matching between STR and SNP profiles.162 However, as the reference panels used for imputation are predominantly derived from populations of European ancestry, questions have been raised about the accuracy of using them for inferring SNP genotypes for individuals from underrepresented populations.163 A recent study evaluated (1) the accuracy of two imputation programs (Beagle164 and Gencove) and (2) the impact of using currently available reference panels for samples from different African populations.153 It was found that, at 4X coverage for the five African populations included in the study, 38% of common variants and ∼50% of rare variants could not be imputed, likely due to variance in genetic distance to the reference panels.153 Continued studies are needed to explore potential biases introduced by available reference panels when applied to diverse populations for individualization purposes.

Ancient DNA

Common imputation software used in aDNA studies includes Beagle,164 GeneImp,165 and GLIMPSE.166 Generally, imputation starts with producing genotype likelihoods. These likelihoods are then used together with a panel of modern high-coverage populations to determine the most likely genotypes based on linkage disequilibrium patterns observed in the reference data.167 The DNA coverage cutoff of 0.5X is often used as the inclusion criterion for imputation, since samples with lower coverage are less likely to produce accurate genotype calls after imputation.168,169,170 In the future, greater availability of high-coverage WGS ancient genomes may be able to overcome this issue by generating curated high-quality reference datasets of ancient individuals only, and thus removing the need for modern reference datasets.

Although powerful, there are important limitations to consider when applying imputation to degraded DNA. These include damage patterns, contamination, and if one is using WGS or capture data. WGS data allow for imputation of a greater number of positions in the genome and are less prone to ascertainment bias. With capture data, it is not possible to detect new variation or private variation present in a population not used in the ascertainment. Using modern reference panels contains the same concerns as those outlined in the genotyping section. In addition, due to the nature of the imputation procedure, the homozygous reference genotypes are more likely to get high imputation scores, while homozygote alternative and heterozygous genotypes often have lower post-imputation accuracies, which can result in a reference bias after imputation.168 For ancient individuals, this limits the types of populations that can be successfully imputed. Despite this limitation, a recent study found that with 1X coverage imputation can result in >99% genotype concordance with a minor allele frequency threshold of 0.1171 with Beagle v4.0.172 The accuracy of imputation can be assessed by downsampling high-coverage ancient or modern DNA data.170 When evaluating imputation accuracy, it is important to remember that aDNA coverage and damage are often non-randomly distributed, and thus a randomly downsampled genome does not necessarily represent the true low-coverage state.

A general consideration when using imputation both in forensics and aDNA is the quality of the reference dataset. Worldwide genetic variation is not represented equally in publicly available reference datasets and thus comparison of imputed samples from various parts of the world should be done with caution.173

Downstream analyses

General population genetics analyses

Ancestry-informative markers are commonly included in forensic sequencing kits with the aim of providing investigative leads and/or aiding in improved accuracy for downstream population genetics analyses. The goal of such work is to evaluate the ancestry of an individual. Although conceptually different, the term ancestry is often used interchangeably with race and ethnicity in various fields, including medical genetics, pharmacogenetics, forensics, and others.174,175 Thus, guidelines around the communication of this information to law enforcement that decouple genetic ancestry from race and ethnicity, particularly as categories differ country by country and through time, is an area of active discussion and attention in the field due to concerns around interpretations and dissemination of results.176,177

Principal component analysis (PCA) and ADMIXTURE178 or STRUCTURE179 analyses are among the most common population genetics methods that are performed on modern and aDNA to better understand the population structure and broader genetic affinity among individuals and populations. Projection PCA, wherein ancient samples are projected upon modern genetic variation, is commonly used to overcome the low coverage and the presence of damage patterns in aDNA, and the SmartPCA software from EIGENSOFT v7.2.1 (http://www.hsph.harvard.edu/alkes-price/software/) is often used for this purpose.151 Briefly, ancient samples are merged with modern worldwide populations, and a set of populations is chosen as a reference set upon which all other samples are projected. Multidimensional scaling (MDS) is another dimension-reduction method that has been used to assess broad relationships between sets of individuals in a hypothesis-free manner, similar to PCA. Ancient individuals with as low as 1% endogenous DNA have been assigned correctly to their geographic origin using MDS.180 It is important to note that population genetics analyses like PCA and MDS are most informative when used in concert with other methods like F-statistics, ADMIXTURE, and reference population sets. ADMIXTURE178 analysis is used to cluster individuals based on their genetic ancestry. In an ADMIXTURE analysis, ancient samples can be analyzed using a modern reference, or without one if a sufficiently large number of individuals from relevant population sources are available. Based on the admixture analysis and PCA, genetic ancestry and admixture components are often formally tested using qpAdm and F-statistics.151,181

Biological relatedness

The identification of relatives is of interest in both forensic and aDNA studies. In forensic human identification casework, in the absence of a direct match in available databases, kinship analysis with STRs (familial searching) has been used to identify potential perpetrators of violent crimes, unknown remains, or determine paternity. False negative and false positive rates of this analysis have been evaluated and found to be impacted by likelihood ratio cutoffs use, the type of relationship in question, and the individual’s genetic ancestry.182,183,184 One of the commonly used software for this analysis, Familias,185,186 uses allele sharing between genetic relatives that are identical by descent (IBD) to identify first- to second-degree relatives.

To identify more distant genetic relatives, IGG utilizes SNPs to either calculate total shared segments of DNA (measured in centiMorgans, cM) or by computing kinship coefficients based on allele sharing and pairwise differences. There are several informative, in-depth reviews and studies on this approach and its application in forensic settings.187,188,189,190 For the first category, a threshold is used to differentiate between segments that are identical by state and IBD.189 The total number of shared cM used for IGG relationship estimates is largely based on tests using European populations for relatively close familial relationships (1st to 3rd degree). The frequency with which more distant genetic relatives are misidentified as closer genetic relatives across various population backgrounds based on the number of shared cM is unknown. The second category relies on reference panels to determine allele frequencies and to control for potential population substructure. This approach is commonly used in the medical genetics and aDNA fields (as described in the following section) and has the benefit of working on smaller subsets of SNPs, as it is not dependent on identifying IBD tracks.189 Another category estimates IBD tracks between individuals and therefore does not rely on allelic frequencies.191,192,193 Verogen has recently leveraged the likelihood approach to develop their ForenSeq Kintelligence kit, which contains 10,230 SNPs identified as being maximally informative for identifying genetic relatives.194 The limitations of these methods are still being explored. A recent study found that while incomplete SNP profiles (>50%) had a minimal impact on relative identification, 1%–5% of genotyping error resulted in reduced accuracy for the segment-based identification methods often used in IGG.195

The certainty of individual identification in forensic casework is often based on a likelihood ratio that expresses how likely a certain DNA profile would be observed from the individual in question versus from a random person in a specific population. This calculation utilizes allele frequency data from select populations to determine likelihood ratios per population. SWGDAM guidelines indicate how to report and describe these ratios.196

Methods used to assess biological relatedness among aDNA samples include pairwise mismatch rate (PMR), ancIBD,197 READ,198 and lcMLkin.199 PMR, lcMLkin, and READ provide pairwise relatedness information, while ancIBD197 can be used to find links between more distant relatives based on the IBD sharing. PMR and READ can be used on genotype calls, while lcMLkin relies on genotype likelihoods, and ancIBD is based on imputed and phased data. Most methods that are used to determine biological relatedness in aDNA data are only able to determine biological kinship up to a second degree. Some, such as PMR and READ, do not separate between parent-offspring (PO) and full siblings (FS), while others, lcMLkin and ancIBD, can be used to differentiate PO from FS, and identify more distant relatives.

More recent methods that have been developed to determine biological relatedness among individuals rely on imputed and phased data. One of these methods is ancIBD, which assesses pairwise haplotype sharing in a set of samples. IBD-based methods can generally differentiate between PO and FS and identify more distant relatedness, such as 4-5th degree, as well as avuncular relatedness in some cases. Generally, a combination of several methods to estimate biological relatedness is used. Additionally, uniparental marker data (Y- and mtDNA haplotypes), age at death, and archaeological context are used when building family trees. There are important caveats and limitations to consider when estimating relatedness, such as consanguineous relationships, increased background relatedness due to a population bottleneck, and sample coverage (lower coverage may increase relatedness). Determining mtDNA haplotypes and heteroplasmy for low-quality data, including deconvoluting mixtures, is another area of overlap between aDNA and forensic genetics.40,149,200,201,202,203

Admixture and genetic introgression

F-statistics

F-statistics are a commonly used suite of methods in aDNA to test for various scenarios of admixture and population relationships.151,204 In forensics, F-statistics have been used for quality control and STR evaluation and analysis,205,206 as well as the analysis of ancestry-informative SNPs.207 These methods are based on either two-, three-, or four-population comparisons, and are called, respectively, f2, f3, and f4.151 The f2-statistic determines the difference in allele frequencies between two populations. In comparison, f3 is a three-population test typically represented as f3(A,B; C) where each A, B, and C are a different population or individual. Depending on the configuration, it can be used to test if population C can be modeled as an admixture of the populations A and B or it can test for shared drift between populations A and B compared to the outgroup (C). In the case of admixture f3, the statistic is expected to be negative, while in the case of the outgroup f3 it is expected to be positive. Adding another population, an f4-statistic, similar to the D-statistic,208 which is also known as the ABBA-BABA statistic that has been developed to test for admixture in closely related populations, can be used to test for admixture and tree-ness using the formula f4(A,B; C,D) = (a-b)(c-d), where a, b, c, and d are allele frequencies in populations A, B, C, and D. When there is no additional admixture between A and B, and C and D, the statistic would be non-significant. As mentioned in the contamination section, f4 can also be used to identify presence of contamination within a dataset. The f-statistics calculation has been implemented in the software ADMIXTOOLS151 and an R-package admixr,209 as well as treemix,210 each of which have primers.

Another admixture modeling method qpAdm relies on the basic idea of the f4-statistic.4 The main difference lies in the ability of qpAdm to estimate the admixture proportions in the target population. The limitation of qpAdm is the need for the test of reference populations/individuals used as a tree scaffold to understand the relationship between the potential sources and the target.211 Choosing the outgroups correctly then becomes crucial for being able to disentangle how the source populations are related to the target. The use and limitations of qpAdm have been recently described.211

Public databases

Both aDNA and forensic genetics use databases of modern populations to explore wider population genetics signatures and inform field-specific questions. In addition, in the aDNA field, the publication of genetic data from ancient individuals is commonplace. Databases are important resources that must be treated with care in relation to quality and accuracy of contributed data as well as considerations of privacy and respect for the individuals who have contributed their data. There have been vigorous discussions on best practices in both fields that include definitions of informed consent, acknowledgment of power dynamics, and potential implications of these databases on descendants and descendant communities, or genetic relatives of individuals in these databases.88,89,212,213,214,215,216

Within the forensics field, databases are typically used in three different ways. One is to provide references for determining haplogroup information for uniparental markers as exemplified by EMPOP (European DNA Profiling mtDNA Population Database)217 and YHRD (Y Chromosome Haplotype Reference Database).218 A second is to provide allele frequency information for different loci in order to aid in calculations for determining the certainty of an identification. These frequencies are also published for common forensic markers.219,220 The third is to provide searchable DNA profiles for identifying remains of missing people or providing leads to identify potential persons of interest in criminal cases. As of December 2022, EMPOP contains over 48,000 mitochondrial haplotypes that cover at least the hypervariable I region (over 4,200 complete mitochondrial genomes). This database has clear guidelines around nomenclature221 and quality control217,222 for the upload of new profiles. YHRD contains more than 350,000 Y-STR profiles and aims to provide accurate allele frequencies for Y chromosome STRs, although it contains Y-SNP data as well.223 Databases that provide autosomal frequency data vary by country and are typically divided based on either genetic ancestry or race, although there are ongoing debates as to the use (and accuracy) of these divisions.224

Laws and access to databases for identifying individuals vary by country, but are typically well defined and stringent.225 The inclusion of individuals in these databases also varies greatly and is an ongoing area of debate with criteria for inclusion ranging from citizenship,226 arrestee status, and category of criminal offense, and the biases of these databases due to racial and socioeconomic disparities.227,228 International database sharing among law enforcement also presents logistical and ethical questions, with international agencies such as INTERPOL creating protected measures for DNA profile searching.229,230 Outside of law enforcement, there are selected public databases of genetic data that law enforcement can search depending on the scope of the case in question. For example, the genetic profiles of the ∼1.4 million users of the genetic genealogical database GEDmatch are accessible for searching for missing person identifications if users make their profile public, but users can decide if their genetic data can be also used by law enforcement for searches related to violent crimes.231 However, there have been serious concerns that the profiles of GEDmatch users who opted out of sharing their genetic profiles with the law enforcement were still accessible to the police.232,233 Due to the ability of genetic genealogical analysis to identify deep family connections,234 investigators have estimated that a database similar to GEDmatch could be used to identify over 60% of individuals of European descent in the United States,235 raising significant concerns about privacy.

There are multiple publicly available databases and repositories that are commonly used to house aDNA data, such as: the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/), the Edmond Open Research Data Repository of the Max Planck Society (MPS) (https://edmond.mpdl.mpg.de/), Allen Ancient DNA Resources (AADR),63 the Poseidon database (http://www.poseidon-adna.org/#/), and others. AADR contains downloadable genotypes of ancient and present-day DNA data that cover various panels with the 1.2 million SNPs described in the “sampling and laboratory work” section. The AADR database also includes rich annotation information for each sample in the dataset, including the age of the sample, contamination metrics, coverage, sex, and geographic location, to name a few. The Edmond database can be used by MPS members to upload any files associated with their publications, including genotypes. Another resource is the Allen Ancient Genome Diversity Project/John Templeton Ancient DNA Atlas containing medium- and high-coverage shotgun sequencing data for 216 individuals (https://reich.hms.harvard.edu/ancient-genome-diversity-project). The authors ask the community to observe the Fort Lauderdale principles entitling the authors to be the first to present and publish the dataset. The Poseidon (http://www.poseidon-adna.org/#/) framework features a decentralized repository of genotyping and sequence data. Poseidon is managed through the efforts of the Department of Archaeogenetics of the Max Planck Institute for Evolutionary Anthropology. However, individual researchers are encouraged to submit packages of their genotyping data, with annotation files, and links to the BAM and FASTQ files. Finally, most aDNA studies published make the data available via the raw FASTQ files and/or BAM files aligned to the reference genome. ENA accession numbers can usually be found within the publication “Data Availability” statement.

Study design

Power analysis

One very important issue to consider when designing a study is the number of samples and/or individuals necessary to answer certain questions. This is relevant for both fields, as genetic analysis of skeletal remains typically requires destructive sampling.236 Different cultures and communities can have various points of view as to the implications of destruction of remains that should be weighed and discussed as part of a study’s initial design.236 In scenarios where destructive sampling is not a concern for cultural reasons, it can still result in the destruction of certain morphological features, such as teeth or petrous bone. Thus, a potential benefit of a large sample size has to be weighed against the consequences of this type of analysis. Power analysis is a way to assess the necessary sample size for certain research questions. For aDNA studies, this can refer either to the number of individuals, geographic locations, or number of loci covered. Demographic reconstructions,237 studies of natural selection,238 and single-locus analyses239 are examples of types of questions where a sufficient number of individuals and/or loci covered are necessary. For forensic casework related to HID, power analysis is still relevant for determining how much data are needed to reach conclusions around an identification. In addition, methodological and validation studies that seek to evaluate different steps within the wet and dry lab workflows should consider power analyses as part of the study design. It is also important to consider confounding factors, for example, how to isolate evaluations of the accuracy of genotyping and imputation methods. For methodological development and evaluation of bioinformatic tools, simulations are a powerful and necessary component.

Summary and outlook

The development of new methods has enabled researchers to trace ancient populations and solve decades-old cold cases. Each year seems to bring a new study that pushes the limits as to what was previously thought possible for genetic research from degraded DNA. In addition, the number of aDNA labs and forensic genetic labs integrating DNA sequencing continues to increase, expanding the size of both fields and number of people performing these types of work. We hope that this review serves as both a primer for those new to working with sequencing data from degraded DNA and a marker for the current guidelines and limitations of different types of analyses. Due to these fluctuations, we encourage new and old members of the field to join and stay active in international and regional field-specific organizations such as the International Society for Biomolecular Anthropology, American Association of Anthropological Genetics, ISFG (which has language-specific working groups), SWGDAM, and ENSFI. In scenarios of no existing studies on the limitations of certain methods for working with incomplete data with signs of degradation and potential contamination commonly seen in forensics and aDNA fields, we also recommend performing simulations and/or downsampling tests in order to understand the power of any resulting associations from lower quality data. While this review has been limited to describing human genetic analysis from skeletal remains, these same methodological advancements have opened up new areas of study including ancient pathogens and sediment DNA that face similar (and added) challenges. As exploration into these new areas continues, we hope this review also serves to highlight the overlaps between forensics and ancient genetics and motivates future collaborations between these disciplines.

Acknowledgments

We would like to thank Wolfgang Haak, Mateja Hajdinjak, Thiseas Lamnidis, Charla Marshall, Priya Moorjani, Rori Rohlfs, and Irina Velsko for feedback and helpful conversations. Illustrations for Figures 2 and 3 were created by Petra Korlevic. E.I.Z. would like to thank the Miller Institute for Basic Research in Science, University of California Berkeley for funding her work on this project.

Author contributions

A.C. and E.I.Z.: Conceptualization, writing – original draft preparation, and review & editing.

Declaration of interests

The authors declare no competing interests.

Box 1. Glossary.

Library (next-generation sequencing (NGS)): A collection of DNA fragments from an organism with synthetic DNA attached that allow them to be sequenced and identified.

Library preparation: The addition of adapters to DNA molecules to allow them to adhere onto the beads, flow cells, or chips used in NGS. This step often includes adding indices for identifying which DNA sequences are from the same library. Double-stranded (ds) DNA library preparation only uses dsDNA as a template. Concern of the impact of losing single-stranded (ss) DNA fragments depends on the extent of DNA degradation. SsDNA library preparation starts with a denaturation step and uses ssDNA as a template so theoretically all DNA molecules are converted into libraries. This is useful for degraded DNA applications, but due to the time and consumable costs of existing protocols, it is less widely used. Uracil-DNA glycosylase (UDG) treatment removes uracil residues from DNA, which are often observed in degraded DNA as a result of the deamination of cytosines.

Adapter sequences: Short DNA fragments (oligos) that are used in library preparation to prepare DNA molecules for sequencing. For example, in the case of Illumina sequencing, adapter sequences are essential for binding to and generating clusters on the flow cell.

Reads: Sequences of base pairs corresponding to a fragment of DNA generated through sequencing.

Demultiplexing: Splitting reads from different libraries that were sequenced together into separate files per library. Demultiplexing is done based on the barcodes or indices (unique sequences) attached to the DNA molecules in each library on the sequencing run.

Enrichment: The process of targeting and amplifying certain parts of the genome for downstream sequencing. Enrichment allows us to focus on a specific organism, gene, set of SNPs, etc., based on the design of capture that is used for this purpose.

PCR duplicates: Identical reads that belong to the same PCR clone/template. These are typically identified based on start and/or end coordinates.

Human identification (HID): In this paper, we define HID casework to encompass unknown human skeletal remains from cold cases, disaster victim identification (DVI), and historical identification cases.

Investigative genetic genealogy (IGG): The identification of an individual through distant relatives (typically 2-4th degree genetic relatives) by using SNP profiles that are uploaded to large databases. The DNA profile is used to identify the genetic relative and then genealogists use other information (census records, obituaries, etc.) to trace through family trees and identify the unknown individual.

Pileup: The alignment of all filtered reads to a reference sequence.

Ascertainment bias: Bias that results from the non-random selection of loci that are not representative of the full genetic diversity. A common source of ascertainment bias lies in the selection of the SNPs to include on a certain SNP array. For this reason, arrays are not capable of detecting new variation or private variation present in a population not used in the ascertainment.

Local ancestry inference: Decomposition of chromosomes into ancestral chunks in admixed populations.

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2023.108066.

Supporting citations

The following references appear in the supplemental information: 56,57,58,91,92,93,94,95,97,98,101,102,103,104,108,110,111,112,113,142,143,144,145,147,151,164,165,166,178,180,191,193,197,198,199,209,210,240,241,242,243,244,245,246,247,248,249,250,251,252,253.

Supplemental information

Table S1. Non-exhaustive list of software commonly used in forensic and aDNA analysis
mmc1.xlsx (18KB, xlsx)

References

  • 1.Kayser M. Forensic DNA Phenotyping: Predicting human appearance from crime scene material for investigative purposes. Forensic Sci. Int. Genet. 2015;18:33–48. doi: 10.1016/j.fsigen.2015.02.003. [DOI] [PubMed] [Google Scholar]
  • 2.Phillips C. Forensic genetic analysis of bio-geographical ancestry. Forensic Sci. Int. Genet. 2015;18:49–65. doi: 10.1016/j.fsigen.2015.05.012. [DOI] [PubMed] [Google Scholar]
  • 3.Ge J., Budowle B. Forensic investigation approaches of searching relatives in DNA databases. J. Forensic Sci. 2021;66:430–443. doi: 10.1111/1556-4029.14615. [DOI] [PubMed] [Google Scholar]
  • 4.Haak W., Lazaridis I., Patterson N., Rohland N., Mallick S., Llamas B., Brandt G., Nordenfelt S., Harney E., Stewardson K., et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522:207–211. doi: 10.1038/nature14317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Patterson N., Isakov M., Booth T., Büster L., Fischer C.-E., Olalde I., Ringbauer H., Akbari A., Cheronet O., Bleasdale M., et al. Large-scale migration into Britain during the Middle to Late Bronze Age. Nature. 2022;601:588–594. doi: 10.1038/s41586-021-04287-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Fowler C., Olalde I., Cummings V., Armit I., Büster L., Cuthbert S., Rohland N., Cheronet O., Pinhasi R., Reich D. A high-resolution picture of kinship practices in an Early Neolithic tomb. Nature. 2022;601:584–587. doi: 10.1038/s41586-021-04241-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ning C., Zhang F., Cao Y., Qin L., Hudson M.J., Gao S., Ma P., Li W., Zhu S., Li C., et al. Ancient genome analyses shed light on kinship organization and mating practice of Late Neolithic society in China. iScience. 2021;24:103352. doi: 10.1016/j.isci.2021.103352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Mathieson I., Lazaridis I., Rohland N., Mallick S., Patterson N., Roodenberg S.A., Harney E., Stewardson K., Fernandes D., Novak M., et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015;528:499–503. doi: 10.1038/nature16152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Meyer M., Kircher M., Gansauge M.-T., Li H., Racimo F., Mallick S., Schraiber J.G., Jay F., Prüfer K., de Filippo C., et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Prüfer K., Racimo F., Patterson N., Jay F., Sankararaman S., Sawyer S., Heinze A., Renaud G., Sudmant P.H., de Filippo C., et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505:43–49. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Reich D., Green R.E., Kircher M., Krause J., Patterson N., Durand E.Y., Viola B., Briggs A.W., Stenzel U., Johnson P.L.F., et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010;468:1053–1060. doi: 10.1038/nature09710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Marshall C., Sturk-Andreaggi K., Daniels-Higginbotham J., Oliver R.S., Barritt-Ross S., McMahon T.P. Performance evaluation of a mitogenome capture and Illumina sequencing protocol using non-probative, case-type skeletal samples: Implications for the use of a positive control in a next-generation sequencing procedure. Forensic Sci. Int. Genet. 2017;31:198–206. doi: 10.1016/j.fsigen.2017.09.001. [DOI] [PubMed] [Google Scholar]
  • 13.Ambers A., Bus M.M., King J.L., Jones B., Durst J., Bruseth J.E., Gill-King H., Budowle B. Forensic genetic investigation of human skeletal remains recovered from the La Belle shipwreck. Forensic Sci. Int. 2020;306:110050. doi: 10.1016/j.forsciint.2019.110050. [DOI] [PubMed] [Google Scholar]
  • 14.Zavala E.I., Thomas J.T., Sturk-Andreaggi K., Daniels-Higginbotham J., Meyers K.K., Barrit-Ross S., Aximu-Petri A., Richter J., Nickel B., Berg G.E., et al. Ancient DNA Methods Improve Forensic DNA Profiling of Korean War and World War II Unknowns. Genes. 2022;13:129. doi: 10.3390/genes13010129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hofreiter M., Sneberger J., Pospisek M., Vanek D. Progress in forensic bone DNA analysis: Lessons learned from ancient DNA. Forensic Sci. Int. Genet. 2021;54:102538. doi: 10.1016/j.fsigen.2021.102538. [DOI] [PubMed] [Google Scholar]
  • 16.Capelli C., Tschentscher F., Pascali V.L. “Ancient” protocols for the crime scene?: Similarities and differences between forensic genetics and ancient DNA analysis. Forensic Sci. Int. 2003;131:59–64. doi: 10.1016/s0379-0738(02)00396-1. [DOI] [PubMed] [Google Scholar]
  • 17.Briggs A.W., Stenzel U., Johnson P.L.F., Green R.E., Kelso J., Prüfer K., Meyer M., Krause J., Ronan M.T., Lachmann M., Pääbo S. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. USA. 2007;104:14616–14621. doi: 10.1073/pnas.0704665104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lindahl T. Instability and decay of the primary structure of DNA. Nature. 1993;362:709–715. doi: 10.1038/362709a0. [DOI] [PubMed] [Google Scholar]
  • 19.Hebsgaard M.B., Phillips M.J., Willerslev E. Geologically ancient DNA: fact or artefact? Trends Microbiol. 2005;13:212–220. doi: 10.1016/j.tim.2005.03.010. [DOI] [PubMed] [Google Scholar]
  • 20.Pääbo S., Poinar H., Serre D., Jaenicke-Despres V., Hebler J., Rohland N., Kuch M., Krause J., Vigilant L., Hofreiter M. Genetic analyses from ancient DNA. Annu. Rev. Genet. 2004;38:645–679. doi: 10.1146/annurev.genet.37.110801.143214. [DOI] [PubMed] [Google Scholar]
  • 21.Hagelberg E., Sykes B., Hedges R. Ancient bone DNA amplified. Nature. 1989;342:485. doi: 10.1038/342485a0. [DOI] [PubMed] [Google Scholar]
  • 22.Pääbo S., Gifford J.A., Wilson A.C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res. 1988;16:9775–9787. doi: 10.1093/nar/16.20.9775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hochmeister M.N., Budowle B., Jung J., Borer U.V., Comey C.T., Dirnhofer R. PCR-based typing of DNA extracted from cigarette butts. Int. J. Leg. Med. 1991;104:229–233. doi: 10.1007/BF01369812. [DOI] [PubMed] [Google Scholar]
  • 24.Hochmeister M.N., Budowle B., Borer U.V., Eggmann U., Comey C.T., Dirnhofer R. Typing of deoxyribonucleic acid (DNA) extracted from compact bone from human remains. J. Forensic Sci. 1991;36:1649–1661. [PubMed] [Google Scholar]
  • 25.Dabney J., Knapp M., Glocke I., Gansauge M.-T., Weihmann A., Nickel B., Valdiosera C., García N., Pääbo S., Arsuaga J.-L., Meyer M. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. USA. 2013;110:15758–15763. doi: 10.1073/pnas.1314445110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rohland N., Glocke I., Aximu-Petri A., Meyer M. Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat. Protoc. 2018;13:2447–2461. doi: 10.1038/s41596-018-0050-5. [DOI] [PubMed] [Google Scholar]
  • 27.Damgaard P.B., Margaryan A., Schroeder H., Orlando L., Willerslev E., Allentoft M.E. Improving access to endogenous DNA in ancient bones and teeth. Sci. Rep. 2015;5:11184. doi: 10.1038/srep11184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gamba C., Hanghøj K., Gaunitz C., Alfarhan A.H., Alquraishi S.A., Al-Rasheid K.A.S., Bradley D.G., Orlando L. Comparing the performance of three ancient DNA extraction methods for high-throughput sequencing. Mol. Ecol. Resour. 2016;16:459–469. doi: 10.1111/1755-0998.12470. [DOI] [PubMed] [Google Scholar]
  • 29.Zavala E.I., Rajagopal S., Perry G.H., Kruzic I., Bašić Ž., Parsons T.J., Holland M.M. Impact of DNA degradation on massively parallel sequencing-based autosomal STR, iiSNP, and mitochondrial DNA typing systems. Int. J. Leg. Med. 2019;133:1369–1380. doi: 10.1007/s00414-019-02110-4. [DOI] [PubMed] [Google Scholar]
  • 30.Kemp B.M., Smith D.G. Use of bleach to eliminate contaminating DNA from the surface of bones and teeth. Forensic Sci. Int. 2005;154:53–61. doi: 10.1016/j.forsciint.2004.11.017. [DOI] [PubMed] [Google Scholar]
  • 31.Korlević P., Meyer M. Pretreatment: Removing DNA Contamination from Ancient Bones and Teeth Using Sodium Hypochlorite and Phosphate. Methods Mol. Biol. 2019:15–19. doi: 10.1007/978-1-4939-9176-1_2. [DOI] [PubMed] [Google Scholar]
  • 32.Hajdinjak M., Fu Q., Hübner A., Petr M., Mafessoni F., Grote S., Skoglund P., Narasimham V., Rougier H., Crevecoeur I., et al. Reconstructing the genetic history of late Neanderthals. Nature. 2018;555:652–656. doi: 10.1038/nature26151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Velsko I., Skourtanioti E., Brandt G. 2020. Ancient DNA Extraction from Skeletal Material V1. [DOI] [Google Scholar]
  • 34.Fulton T.L., Shapiro B. Setting Up an Ancient DNA Laboratory. Methods Mol. Biol. 2019;1963:1–13. doi: 10.1007/978-1-4939-9176-1_1. [DOI] [PubMed] [Google Scholar]
  • 35.Scientific Working Group on DNA Analysis Methods (SWGDAM) 2017. Contamination Prevention and Detection Guidelines for Forensic DNA Laboratories. [Google Scholar]
  • 36.Higuchi R., Bowman B., Freiberger M., Ryder O.A., Wilson A.C. DNA sequences from the quagga, an extinct member of the horse family. Nature. 1984;312:282–284. doi: 10.1038/312282a0. [DOI] [PubMed] [Google Scholar]
  • 37.Pääbo S. Molecular cloning of Ancient Egyptian mummy DNA. Nature. 1985;314:644–645. doi: 10.1038/314644a0. [DOI] [PubMed] [Google Scholar]
  • 38.Just R.S., Irwin J.A., Parson W. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing. Forensic Sci. Int. Genet. 2015;18:131–139. doi: 10.1016/j.fsigen.2015.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Alonso A., Barrio P.A., Müller P., Köcher S., Berger B., Martin P., Bodner M., Willuweit S., Parson W., Roewer L., Budowle B. Current state-of-art of STR sequencing in forensic genetics. Electrophoresis. 2018;39:2655–2668. doi: 10.1002/elps.201800030. [DOI] [PubMed] [Google Scholar]
  • 40.Marshall C., Parson W. Interpreting NUMTs in forensic genetics: Seeing the forest for the trees. Forensic Sci. Int. Genet. 2021;53:102497. doi: 10.1016/j.fsigen.2021.102497. [DOI] [PubMed] [Google Scholar]
  • 41.Budowle B., Schmedes S.E., Wendt F.R. Increasing the reach of forensic genetics with massively parallel sequencing. Forensic Sci. Med. Pathol. 2017;13:342–349. doi: 10.1007/s12024-017-9882-5. [DOI] [PubMed] [Google Scholar]
  • 42.Børsting C., Morling N. Next generation sequencing and its applications in forensic genetics. Forensic Sci. Int. Genet. 2015;18:78–89. doi: 10.1016/j.fsigen.2015.02.002. [DOI] [PubMed] [Google Scholar]
  • 43.Parsons T.J., Huel R.M.L., Bajunović Z., Rizvić A. Large scale DNA identification: The ICMP experience. Forensic Sci. Int. Genet. 2019;38:236–244. doi: 10.1016/j.fsigen.2018.11.008. [DOI] [PubMed] [Google Scholar]
  • 44.Gorden E.M., Sturk-Andreaggi K., Marshall C. Capture enrichment and massively parallel sequencing for human identification. Forensic Sci. Int. Genet. 2021;53:102496. doi: 10.1016/j.fsigen.2021.102496. [DOI] [PubMed] [Google Scholar]
  • 45.Gorden E.M., Greytak E.M., Sturk-Andreaggi K., Cady J., McMahon T.P., Armentrout S., Marshall C. Extended kinship analysis of historical remains using SNP capture. Forensic Sci. Int. Genet. 2022;57:102636. doi: 10.1016/j.fsigen.2021.102636. [DOI] [PubMed] [Google Scholar]
  • 46.Sanchez J.J., Endicott P. Developing multiplexed SNP assays with special reference to degraded DNA templates. Nat. Protoc. 2006;1:1370–1378. doi: 10.1038/nprot.2006.247. [DOI] [PubMed] [Google Scholar]
  • 47.Quintáns B., Alvarez-Iglesias V., Salas A., Phillips C., Lareu M.V., Carracedo A. Typing of mitochondrial DNA coding region SNPs of forensic and anthropological interest using SNaPshot minisequencing. Forensic Sci. Int. 2004;140:251–257. doi: 10.1016/j.forsciint.2003.12.005. [DOI] [PubMed] [Google Scholar]
  • 48.Tillmar A., Fagerholm S.A., Staaf J., Sjölund P., Ansell R. Getting the conclusive lead with investigative genetic genealogy – A successful case study of a 16 year old double murder in Sweden. Forensic Sci. Int. Genet. 2021;53:102525. doi: 10.1016/j.fsigen.2021.102525. [DOI] [PubMed] [Google Scholar]
  • 49.Peck M.A., Koeppel A.F., Gorden E.M., Bouchet J.L., Heaton M.C., Russell D.A., Reedy C.R., Neal C.M., Turner S.D. Internal Validation of the ForenSeq Kintelligence Kit for Application to Forensic Genetic Genealogy. bioRxiv. 2022 doi: 10.1101/2022.10.28.514056. Preprint at. [DOI] [Google Scholar]
  • 50.Yates J.A.F., Aron F., Neumann G.U., Velsko I., Skourtanioti E., Orfanou E., Fagernas Z., et al. 2020. A--Z of Ancient DNA Protocols for Shotgun Illumina Next Generation Sequencing. [Google Scholar]
  • 51.Stahl, R., Warinner, C., Velsko, I., Orfanou, E., Aron, F., and Brandt, G. Illumina Double-Stranded DNA Dual Indexing for Ancient DNA V2. 10.17504/protocols.io.bvt8n6rw
  • 52.Aron F., Neumann G.U., Brandt G. 2022. Half-UDG treated double-stranded ancient DNA library preparation for Illumina sequencing v1. [DOI] [Google Scholar]
  • 53.Gansauge M.-T., Aximu-Petri A., Nagel S., Meyer M. Manual and automated preparation of single-stranded DNA libraries for the sequencing of DNA from ancient biological remains and other sources of highly degraded DNA. Nat. Protoc. 2020;15:2279–2300. doi: 10.1038/s41596-020-0338-0. [DOI] [PubMed] [Google Scholar]
  • 54.Llamas B., Valverde G., Fehren-Schmitz L., Weyrich L.S., Cooper A., Haak W. From the field to the laboratory: Controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era. STAR: Sci. Technol. Archaeol. Res. 2017;3:1–14. [Google Scholar]
  • 55.Orlando L., Allaby R., Skoglund P., Der Sarkissian C., Stockhammer P.W., Ávila-Arcos M.C., Fu Q., Krause J., Willerslev E., Stone A.C., Warinner C. Ancient DNA analysis. Nat. Rev. Methods Primers. 2021;1:14–26. [Google Scholar]
  • 56.Fellows Yates J.A., Lamnidis T.C., Borry M., Andrades Valtueña A., Fagernäs Z., Clayton S., Garcia M.U., Neukamm J., Peltzer A. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ. 2021;9:e10947. doi: 10.7717/peerj.10947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Schubert M., Ermini L., Der Sarkissian C., Jónsson H., Ginolhac A., Schaefer R., Martin M.D., Fernández R., Kircher M., McCue M., et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protoc. 2014;9:1056–1082. doi: 10.1038/nprot.2014.063. [DOI] [PubMed] [Google Scholar]
  • 58.Neuenschwander S., Cruz Dávalos D.I., Anchieri L., Sousa da Mota B., Bozzi D., Rubinacci S., Delaneau O., Rasmussen S., Malaspinas A.-S. Mapache: a flexible pipeline to map ancient DNA. Bioinformatics. 2023;39:btad028. doi: 10.1093/bioinformatics/btad028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.HOME Swgdam. https://www.swgdam.org/
  • 60.ENFSI ENFSI | European Network of Forensic Science Institutes. 2016. https://enfsi.eu/
  • 61.ISFG. https://www.isfg.org/
  • 62.Federal Bureau of Investigation . 2020. Quality Assurance Standards for Forensic DNA Testing Laboratories. [Google Scholar]
  • 63.Mallick S., Micco A., Mah M., Ringbauer H., Lazaridis I., Olalde I., Patterson N., Reich D. The Allen Ancient DNA Resource (AADR): A Curated Compendium of Ancient Human Genomes. bioRxiv. 2023 doi: 10.1101/2023.04.06.535797. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Loreille O., Tillmar A., Brandhagen M.D., Otterstatter L., Irwin J.A. Improved DNA Extraction and Illumina Sequencing of DNA Recovered from Aged Rootless Hair Shafts Found in Relics Associated with the Romanov Family. Genes. 2022;13:202. doi: 10.3390/genes13020202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brandhagen M.D., Loreille O., Irwin J.A. Fragmented Nuclear DNA is the Predominant Genetic Material in Human Hair Shafts. Genes. 2018;9:640. doi: 10.3390/genes9120640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Gutierrez R., LaRue B., Houston R. Novel extraction chemistry and alternative amplification strategies for use with rootless hair shafts. J. Forensic Sci. 2021;66:1929–1936. doi: 10.1111/1556-4029.14763. [DOI] [PubMed] [Google Scholar]
  • 67.Harkins Kincaid K. The ISHI Report; 2020. Solve Cold Cases with DNA from Rootless Hair Using Genetic Genealogy. [Google Scholar]
  • 68.Sirak K., Fernandes D., Cheronet O., Harney E., Mah M., Mallick S., Rohland N., Adamski N., Broomandkhoshbacht N., Callan K., et al. Human auditory ossicles as an alternative optimal source of ancient DNA. Genome Res. 2020;30:427–436. doi: 10.1101/gr.260141.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Harney É., Cheronet O., Fernandes D.M., Sirak K., Mah M., Bernardos R., Adamski N., Broomandkhoshbacht N., Callan K., Lawson A.M., et al. A minimally destructive protocol for DNA extraction from ancient teeth. Genome Res. 2021;31:472–483. doi: 10.1101/gr.267534.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Parker C.E., Bos K.I., Haak W., Krause J. Optimized Bone Sampling Protocols for the Retrieval of Ancient DNA from Archaeological Remains. J. Vis. Exp. 2021 doi: 10.3791/63250. [DOI] [PubMed] [Google Scholar]
  • 71.Pinhasi R., Fernandes D., Sirak K., Novak M., Connell S., Alpaslan-Roodenberg S., Gerritsen F., Moiseyev V., Gromov A., Raczky P., et al. Optimal Ancient DNA Yields from the Inner Ear Part of the Human Petrous Bone. PLoS One. 2015;10:e0129102. doi: 10.1371/journal.pone.0129102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Rohland N., Hofreiter M. Comparison and optimization of ancient DNA extraction. Biotechniques. 2007;42:343–352. doi: 10.2144/000112383. [DOI] [PubMed] [Google Scholar]
  • 73.Loreille O.M., Diegoli T.M., Irwin J.A., Coble M.D., Parsons T.J. High efficiency DNA extraction from bone by total demineralization. Forensic Sci. Int. Genet. 2007;1:191–195. doi: 10.1016/j.fsigen.2007.02.006. [DOI] [PubMed] [Google Scholar]
  • 74.Amory S., Huel R., Bilić A., Loreille O., Parsons T.J. Automatable full demineralization DNA extraction procedure from degraded skeletal remains. Forensic Sci. Int. Genet. 2012;6:398–406. doi: 10.1016/j.fsigen.2011.08.004. [DOI] [PubMed] [Google Scholar]
  • 75.Meyer M., Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010;2010:prot5448. doi: 10.1101/pdb.prot5448. [DOI] [PubMed] [Google Scholar]
  • 76.Troll C.J., Kapp J., Rao V., Harkins K.M., Cole C., Naughton C., Morgan J.M., Shapiro B., Green R.E. A ligation-based single-stranded library preparation method to analyze cell-free DNA and synthetic oligos. BMC Genom. 2019;20:1023. doi: 10.1186/s12864-019-6355-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Fortes G.G., Paijmans J.L.A. Analysis of Whole Mitogenomes from Ancient Samples. Methods Mol. Biol. 2015;1347:179–195. doi: 10.1007/978-1-4939-2990-0_13. [DOI] [PubMed] [Google Scholar]
  • 78.Sproul J.S., Maddison D.R. Sequencing historical specimens: successful preparation of small specimens with low amounts of degraded DNA. Mol. Ecol. Resour. 2017;17:1183–1201. doi: 10.1111/1755-0998.12660. [DOI] [PubMed] [Google Scholar]
  • 79.Rohland N., Harney E., Mallick S., Nordenfelt S., Reich D. Partial uracil–DNA–glycosylase treatment for screening of ancient DNA. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2015;370:20130624. doi: 10.1098/rstb.2013.0624. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Burbano H.A., Hodges E., Green R.E., Briggs A.W., Krause J., Meyer M., Good J.M., Maricic T., Johnson P.L.F., Xuan Z., et al. Targeted investigation of the Neandertal genome by array-based sequence capture. Science. 2010;328:723–725. doi: 10.1126/science.1188046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Avila-Arcos M.C., Cappellini E., Romero-Navarro J.A., Wales N., Moreno-Mayar J.V., Rasmussen M., Fordyce S.L., Montiel R., Vielle-Calzada J.-P., Willerslev E., Gilbert M.T.P. Application and comparison of large-scale solution-based DNA capture-enrichment methods on ancient DNA. Sci. Rep. 2011;1:74. doi: 10.1038/srep00074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Eduardoff M., Xavier C., Strobl C., Casas-Vargas A., Parson W. Optimized mtDNA Control Region Primer Extension Capture Analysis for Forensically Relevant Samples and Highly Compromised mtDNA of Different Age and Origin. Genes. 2017;8:237. doi: 10.3390/genes8100237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Tillmar A., Sturk-Andreaggi K., Daniels-Higginbotham J., Thomas J.T., Marshall C. The FORCE Panel: An All-in-One SNP Marker Set for Confirming Investigative Genetic Genealogy Leads and for General Forensic Applications. Genes. 2021;12:1968. doi: 10.3390/genes12121968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Schneider P.M. Basic issues in forensic DNA typing. Forensic Sci. Int. 1997;88:17–22. doi: 10.1016/s0379-0738(97)00079-0. [DOI] [PubMed] [Google Scholar]
  • 85.Fu Q., Hajdinjak M., Moldovan O.T., Constantin S., Mallick S., Skoglund P., Patterson N., Rohland N., Lazaridis I., Nickel B., et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature. 2015;524:216–219. doi: 10.1038/nature14558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.MyBaits Expert Human Affinities Daicel Arbor Biosciences https://arborbiosci.com/genomics/targeted-sequencing/mybaits/mybaits-expert/mybaits-expert-human-affinities/
  • 87.Rohland N., Mallick S., Mah M., Maier R., Patterson N., Reich D. Three Reagents for In-Solution Enrichment of Ancient Human DNA at More than a Million SNPs. bioRxiv. 2022 doi: 10.1101/2022.01.13.476259. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Alpaslan-Roodenberg S., Anthony D., Babiker H., Bánffy E., Booth T., Capone P., Deshpande-Mukherjee A., Eisenmann S., Fehren-Schmitz L., Frachetti M., et al. Ethics of DNA research on human remains: five globally applicable guidelines. Nature. 2021;599:41–46. doi: 10.1038/s41586-021-04008-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Kowal E., Weyrich L.S., Argüelles J.M., Bader A.C., Colwell C., Cortez A.D., Davis J.L., Figueiro G., Fox K., Malhi R.S., et al. Community Partnerships Are Fundamental to Ethical Ancient DNA Research. Hum. Genet. Genom. Adv. 2023:100161. doi: 10.1016/j.xhgg.2022.100161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Parson W., Huber G., Moreno L., Madel M.-B., Brandhagen M.D., Nagl S., Xavier C., Eduardoff M., Callaghan T.C., Irwin J.A. Massively parallel sequencing of complete mitochondrial genomes from hair shaft samples. Forensic Sci. Int. Genet. 2015;15:8–15. doi: 10.1016/j.fsigen.2014.11.009. [DOI] [PubMed] [Google Scholar]
  • 91.Sturk-Andreaggi K., Peck M.A., Boysen C., Dekker P., McMahon T.P., Marshall C.K. AQME: A forensic mitochondrial DNA analysis tool for next-generation sequencing data. Forensic Sci. Int. Genet. 2017;31:189–197. doi: 10.1016/j.fsigen.2017.09.010. [DOI] [PubMed] [Google Scholar]
  • 92.Holland M.M., Pack E.D., McElhoe J.A. Evaluation of GeneMarker® HTS for improved alignment of mtDNA MPS data, haplotype determination, and heteroplasmy assessment. Forensic Sci. Int. Genet. 2017;28:90–98. doi: 10.1016/j.fsigen.2017.01.016. [DOI] [PubMed] [Google Scholar]
  • 93.Jäger A.C., Alvarez M.L., Davis C.P., Guzmán E., Han Y., Way L., Walichiewicz P., Silva D., Pham N., Caves G., et al. Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories. Forensic Sci. Int. Genet. 2017;28:52–70. doi: 10.1016/j.fsigen.2017.01.011. [DOI] [PubMed] [Google Scholar]
  • 94.Børsting C., Fordyce S.L., Olofsson J., Mogensen H.S., Morling N. Evaluation of the Ion Torrent™ HID SNP 169-plex: A SNP typing assay developed for human identification by second generation sequencing. Forensic Sci. Int. Genet. 2014;12:144–154. doi: 10.1016/j.fsigen.2014.06.004. [DOI] [PubMed] [Google Scholar]
  • 95.Seo S.B., King J.L., Warshauer D.H., Davis C.P., Ge J., Budowle B. Single nucleotide polymorphism typing with massively parallel sequencing for human identification. Int. J. Leg. Med. 2013;127:1079–1086. doi: 10.1007/s00414-013-0879-7. [DOI] [PubMed] [Google Scholar]
  • 96.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. j. 2011;17:10–12. [Google Scholar]
  • 97.Li H. Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM. arXiv. 2013 doi: 10.48550/arXiv.1303.3997. Preprint at. [DOI] [Google Scholar]
  • 98.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Schubert M., Ginolhac A., Lindgreen S., Thompson J.F., Al-Rasheid K.A.S., Willerslev E., Krogh A., Orlando L. Improving ancient DNA read mapping against modern reference genomes. BMC Genom. 2012;13:178. doi: 10.1186/1471-2164-13-178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Broad Institute . Broad Institute, GitHub Repository; 2019. “Picard Toolkit”. [Google Scholar]
  • 101.Peltzer A., Jäger G., Herbig A., Seitz A., Kniep C., Krause J., Nieselt K. EAGER: efficient ancient genome reconstruction. Genome Biol. 2016;17:60. doi: 10.1186/s13059-016-0918-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Jónsson H., Ginolhac A., Schubert M., Johnson P.L.F., Orlando L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics. 2013;29:1682–1684. doi: 10.1093/bioinformatics/btt193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Neukamm J., Peltzer A., Nieselt K. DamageProfiler: fast damage pattern calculation for ancient DNA. Bioinformatics. 2021;37:3652–3653. doi: 10.1093/bioinformatics/btab190. [DOI] [PubMed] [Google Scholar]
  • 104.eager Introduction. https://nf-co.re/eager/2.4.7
  • 105.DNA Analysis Methods (SWGDAM) 2015. Guidelines for the Validation of Probabilistic Genotyping Systems. [Google Scholar]
  • 106.SWGDAM . 2021. Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories. [Google Scholar]
  • 107.Wilson M.R., DiZinno J.A., Polanskey D., Replogle J., Budowle B. Validation of mitochondrial DNA sequencing for forensic casework analysis. Int. J. Leg. Med. 1995;108:68–74. doi: 10.1007/BF01369907. [DOI] [PubMed] [Google Scholar]
  • 108.Korneliussen T.S., Albrechtsen A., Nielsen R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinf. 2014;15:356. doi: 10.1186/s12859-014-0356-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Huang Y., Ringbauer H. hapCon: Estimating Contamination of Ancient Genomes by Copying from Reference Haplotypes. Bioinformatics. 2022;38:3768–3777. doi: 10.1093/bioinformatics/btac390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Fu Q., Mittnik A., Johnson P.L.F., Bos K., Lari M., Bollongino R., Sun C., Giemsch L., Schmitz R., Burger J., et al. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 2013;23:553–559. doi: 10.1016/j.cub.2013.02.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Fu Q., Li H., Moorjani P., Jay F., Slepchenko S.M., Bondarev A.A., Johnson P.L.F., Aximu-Petri A., Prüfer K., de Filippo C., et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature. 2014;514:445–449. doi: 10.1038/nature13810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Renaud G., Slon V., Duggan A.T., Kelso J. Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biol. 2015;16:224. doi: 10.1186/s13059-015-0776-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Peyrégne S., Peter B.M. AuthentiCT: a model of ancient DNA damage to estimate the proportion of present-day DNA contamination. Genome Biol. 2020;21:246. doi: 10.1186/s13059-020-02123-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Nakatsuka N., Harney É., Mallick S., Mah M., Patterson N., Reich D. ContamLD: estimation of ancient nuclear DNA contamination using breakdown of linkage disequilibrium. Genome Biol. 2020;21:199. doi: 10.1186/s13059-020-02111-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Begg T.J.A., Schmidt A., Kocher A., Larmuseau M.H.D., Runfeldt G., Maier P.A., Wilson J.D., Barquera R., Maj C., Szolek A., et al. Genomic analyses of hair from Ludwig van Beethoven. Curr. Biol. 2023;33:1431–1447.e22. doi: 10.1016/j.cub.2023.02.041. [DOI] [PubMed] [Google Scholar]
  • 116.Furtwängler A., Reiter E., Neumann G.U., Siebke I., Steuri N., Hafner A., Lösch S., Anthes N., Schuenemann V.J., Krause J. Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis. Sci. Rep. 2018;8:1–8. doi: 10.1038/s41598-018-32083-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Green R.E., Briggs A.W., Krause J., Prüfer K., Burbano H.A., Siebauer M., Lachmann M., Pääbo S. The Neandertal genome and ancient DNA authenticity. EMBO J. 2009;28:2494–2502. doi: 10.1038/emboj.2009.222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Steinlechner M., Berger B., Niederstätter H., Parson W. Rare failures in the amelogenin sex test. Int. J. Leg. Med. 2002;116:117–120. doi: 10.1007/s00414-001-0264-9. [DOI] [PubMed] [Google Scholar]
  • 119.Kao L.-G., Tsai L.-C., Lee J.C., Hsieh H.-M. Controversial cases of human gender identification by amelogenin test. Forensic Sci. J. 2007;6:69–71. [Google Scholar]
  • 120.Drobnič K. A new primer set in a SRY gene for sex identification. Int. Congr. Ser. 2006;1288:268–270. [Google Scholar]
  • 121.Kayser M. Forensic use of Y-chromosome DNA: a general overview. Hum. Genet. 2017;136:621–635. doi: 10.1007/s00439-017-1776-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Skoglund P., Storå J., Götherström A., Jakobsson M. Accurate sex identification of ancient human remains using DNA shotgun sequencing. J. Archaeol. Sci. 2013;40:4477–4482. [Google Scholar]
  • 123.Mittnik A., Wang C.-C., Svoboda J., Krause J. A Molecular Approach to the Sexing of the Triple Burial at the Upper Paleolithic Site of Dolní Věstonice. PLoS One. 2016;11:e0163019. doi: 10.1371/journal.pone.0163019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Fu Q., Posth C., Hajdinjak M., Petr M., Mallick S., Fernandes D., Furtwängler A., Haak W., Meyer M., Mittnik A., et al. The genetic history of Ice Age Europe. Nature. 2016;534:200–205. doi: 10.1038/nature17993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Meyer M., Arsuaga J.-L., de Filippo C., Nagel S., Aximu-Petri A., Nickel B., Martínez I., Gracia A., Bermúdez de Castro J.M., Carbonell E., et al. Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature. 2016;531:504–507. doi: 10.1038/nature17405. [DOI] [PubMed] [Google Scholar]
  • 126.Moilanen U., Kirkinen T., Saari N.-J., Rohrlach A.B., Krause J., Onkamo P., Salmela E. A woman with a sword?--weapon grave at Suontaka Vesitorninmäki, Finland. Eur. J. Archaeol. 2022;25:42–60. [Google Scholar]
  • 127.Roca-Rada X., Tereso S., Rohrlach A.B., Brito A., Williams M.P., Umbelino C., Curate F., Deveson I.W., Souilmi Y., Amorim A., et al. A 1000-year-old case of Klinefelter’s syndrome diagnosed by integrating morphology, osteology, and genetics. Lancet. 2022;400:691–692. doi: 10.1016/S0140-6736(22)01476-3. [DOI] [PubMed] [Google Scholar]
  • 128.Ewing M.M., Thompson J.M., McLaren R.S., Purpero V.M., Thomas K.J., Dobrowski P.A., DeGroot G.A., Romsos E.L., Storts D.R. Human DNA quantification and sample quality assessment: Developmental validation of the PowerQuant(®) system. Forensic Sci. Int. Genet. 2016;23:166–177. doi: 10.1016/j.fsigen.2016.04.007. [DOI] [PubMed] [Google Scholar]
  • 129.Vernarecci S., Ottaviani E., Agostino A., Mei E., Calandro L., Montagna P. Quantifiler® Trio Kit and forensic samples management: a matter of degradation. Forensic Sci. Int. Genet. 2015;16:77–85. doi: 10.1016/j.fsigen.2014.12.005. [DOI] [PubMed] [Google Scholar]
  • 130.Pineda G.M., Montgomery A.H., Thompson R., Indest B., Carroll M., Sinha S.K. Development and validation of InnoQuant™, a sensitive human DNA quantitation and degradation assessment method for forensic samples using high copy number mobile elements Alu and SVA. Forensic Sci. Int. Genet. 2014;13:224–235. doi: 10.1016/j.fsigen.2014.08.007. [DOI] [PubMed] [Google Scholar]
  • 131.Glocke I., Meyer M. Extending the spectrum of DNA sequences retrieved from ancient bones and teeth. Genome Res. 2017;27:1230–1237. doi: 10.1101/gr.219675.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.SWGDAM . 2017. SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories. [Google Scholar]
  • 133.FBI . 2020. Quality Assurance Standards for Forensic DNA Testing Laboratories. [Google Scholar]
  • 134.ENFSI . 2022. Best Practice Manual for Human Forensic Biology and DNA Profiling ENFSI-DNA-BPM-03. [Google Scholar]
  • 135.ENFSI . 2010. Recommended Minimum Criteria for the Validation of Various Aspects of the DNA Profiling Process. [Google Scholar]
  • 136.Nielsen R., Paul J.S., Albrechtsen A., Song Y.S. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 2011;12:443–451. doi: 10.1038/nrg2986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Haned H., Gill P., Lohmueller K., Inman K., Rudin N. Validation of probabilistic genotyping software for use in forensic DNA casework: Definitions and illustrations. Sci. Justice. 2016;56:104–108. doi: 10.1016/j.scijus.2015.11.007. [DOI] [PubMed] [Google Scholar]
  • 138.Bright J.-A., Taylor D., McGovern C., Cooper S., Russell L., Abarno D., Buckleton J. Developmental validation of STRmix™, expert software for the interpretation of forensic DNA profiles. Forensic Sci. Int. Genet. 2016;23:226–239. doi: 10.1016/j.fsigen.2016.05.007. [DOI] [PubMed] [Google Scholar]
  • 139.Nielsen M.B., Andersen M.M., Eriksen P.S., Mogensen H.S., Morling N. Probabilistic SNP genotyping at low DNA concentrations. Forensic Sci. Int.: Genet. Suppl. Series. 2022;8:151–152. [Google Scholar]
  • 140.Hofmanová Z., Kreutzer S., Hellenthal G., Sell C., Diekmann Y., Díez-Del-Molino D., van Dorp L., López S., Kousathanas A., Link V., et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl. Acad. Sci. USA. 2016;113:6886–6891. doi: 10.1073/pnas.1523951113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Wang J., Raskin L., Samuels D.C., Shyr Y., Guo Y. Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics. 2015;31:318–323. doi: 10.1093/bioinformatics/btu668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Prüfer K. snpAD: an ancient DNA genotype caller. Bioinformatics. 2018;34:4165–4171. doi: 10.1093/bioinformatics/bty507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Link V., Kousathanas A., Veeramah K., Sell C., Scheu A., Wegmann D. ATLAS: Analysis Tools for Low-Depth and Ancient Samples. bioRxiv. 2017 doi: 10.1101/105346. Preprint at. [DOI] [Google Scholar]
  • 144.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Skoglund P., Northoff B.H., Shunkov M.V., Derevianko A.P., Pääbo S., Krause J., Jakobsson M. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl. Acad. Sci. USA. 2014;111:2229–2234. doi: 10.1073/pnas.1318934111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Holland C.A., McElhoe J.A., Gaston-Sanchez S., Holland M.M. Damage patterns observed in mtDNA control region MPS data for a range of template concentrations and when using different amplification approaches. Int. J. Leg. Med. 2021;135:91–106. doi: 10.1007/s00414-020-02410-0. [DOI] [PubMed] [Google Scholar]
  • 149.Rathbun M.M., McElhoe J.A., Parson W., Holland M.M. Considering DNA damage when interpreting mtDNA heteroplasmy in deep sequencing data. Forensic Sci. Int. Genet. 2017;26:1–11. doi: 10.1016/j.fsigen.2016.09.008. [DOI] [PubMed] [Google Scholar]
  • 150.Gorden E.M., Sturk-Andreaggi K., Marshall C. Repair of DNA damage caused by cytosine deamination in mitochondrial DNA of forensic case samples. Forensic Sci. Int. Genet. 2018;34:257–264. doi: 10.1016/j.fsigen.2018.02.015. [DOI] [PubMed] [Google Scholar]
  • 151.Patterson N., Moorjani P., Luo Y., Mallick S., Rohland N., Zhan Y., Genschoreck T., Webster T., Reich D. Ancient admixture in human history. Genetics. 2012;192:1065–1093. doi: 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Carneiro M.O., Russ C., Ross M.G., Gabriel S.B., Nusbaum C., DePristo M.A. Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genom. 2012;13:375. doi: 10.1186/1471-2164-13-375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Martin A.R., Atkinson E.G., Chapman S.B., Stevenson A., Stroud R.E., Abebe T., Akena D., Alemayehu M., Ashaba F.K., Atwoli L., et al. Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations. Am. J. Hum. Genet. 2021;108:656–668. doi: 10.1016/j.ajhg.2021.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Imbler S. The New York Times; 2022. New DNA Analysis Supports an Unrecognized Tribe’s Ancient Roots in California. [Google Scholar]
  • 155.Li Y., Willer C., Sanna S., Abecasis G. Genotype Imputation. Annu. Rev. Genom. Hum. Genet. 2009;10:387–406. doi: 10.1146/annurev.genom.9.081307.164242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Browning S.R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 2008;124:439–450. doi: 10.1007/s00439-008-0568-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.De Marino A., Mahmoud A.A., Bose M., Bircan K.O., Terpolovsky A., Bamunusinghe V., Bohn S., Khan U., Novković B., Yazdi P.G. A comparative analysis of current phasing and imputation software. PLoS One. 2022;17:e0260177. doi: 10.1371/journal.pone.0260177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Davies R.W., Flint J., Myers S., Mott R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 2016;48:965–969. doi: 10.1038/ng.3594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.Cady J., Greytak E.M. Whole-genome sequencing of degraded DNA for investigative genetic genealogy. Forensic Sci. Int.: Genet. Suppl. Series. 2022;8:20–22. [Google Scholar]
  • 160.DNA Testing for Ancestry & Genealogy. http://familytreedna.org
  • 161.DNA and genealogy tools to grow your family tree GEDmatch - Comprehensive Solutions for Genetic Genealogy and Family Tree Reseach. 2022. http://gedmatch.com
  • 162.Kim J., Rosenberg N.A. Record-matching of STR Profiles with Fragmentary Genomic SNP Data. bioRxiv. 2022 doi: 10.1101/2022.09.01.505545. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Wojcik G.L., Fuchsberger C., Taliun D., Welch R., Martin A.R., Shringarpure S., Carlson C.S., Abecasis G., Kang H.M., Boehnke M., et al. Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies. G3 (Bethesda) 2018;8:3255–3267. doi: 10.1534/g3.118.200502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Browning B.L., Browning S.R. Genotype Imputation with Millions of Reference Samples. Am. J. Hum. Genet. 2016;98:116–126. doi: 10.1016/j.ajhg.2015.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Spiliopoulou A., Colombo M., Orchard P., Agakov F., McKeigue P. GeneImp: Fast Imputation to Large Reference Panels Using Genotype Likelihoods from Ultralow Coverage Sequencing. Genetics. 2017;206:91–104. doi: 10.1534/genetics.117.200063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Rubinacci S., Ribeiro D.M., Hofmeister R.J., Delaneau O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat. Genet. 2021;53:120–126. doi: 10.1038/s41588-020-00756-0. [DOI] [PubMed] [Google Scholar]
  • 167.Kabisch M., Hamann U., Lorenzo Bermejo J. Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure. BMC Genom. 2017;18:798. doi: 10.1186/s12864-017-4208-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Hui R., D’Atanasio E., Cassidy L.M., Scheib C.L., Kivisild T. Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes. Sci. Rep. 2020;10:18542. doi: 10.1038/s41598-020-75387-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Childebayeva A., Rohrlach A.B., Barquera R., Rivollat M., Aron F., Szolek A., Kohlbacher O., Nicklisch N., Alt K.W., Gronenborn D., et al. Population Genetics and Signatures of Selection in Early Neolithic European Farmers. Mol. Biol. Evol. 2022;39:msac108. doi: 10.1093/molbev/msac108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170.Sousa da Mota B., Rubinacci S., Cruz Dávalos D.I., G Amorim C.E., Sikora M., Johannsen N.N., Szmyt M.H., Włodarczak P., Szczepanek A., Przybyła M.M., et al. Imputation of ancient human genomes. Nat. Commun. 2023;14:3660. doi: 10.1038/s41467-023-39202-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Ausmees K., Sanchez-Quinto F., Jakobsson M., Nettelblad C. An empirical evaluation of genotype imputation of ancient DNA. G3 (Bethesda) 2022;12:jkac089. doi: 10.1093/g3journal/jkac089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Browning S.R., Browning B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 2007;81:1084–1097. doi: 10.1086/521987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 173.Rubinacci S., Hofmeister R.J., Sousa da Mota B., Delaneau O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat. Genet. 2023;55:1088–1090. doi: 10.1038/s41588-023-01438-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.Lu C., Ahmed R., Lamri A., Anand S.S. Use of race, ethnicity, and ancestry data in health research. PLOS Glob. Public Health. 2022;2:e0001060. doi: 10.1371/journal.pgph.0001060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175.Bonham V.L., Green E.D., Pérez-Stable E.J. Examining How Race, Ethnicity, and Ancestry Data Are Used in Biomedical Research. JAMA. 2018;320:1533–1534. doi: 10.1001/jama.2018.13609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176.Skinner D. Forensic genetics and the prediction of race: What is the problem? BioSocieties. 2020;15:329–349. [Google Scholar]
  • 177.Gannett L. Biogeographical ancestry and race. Stud. Hist. Philos. Biol. Biomed. Sci. 2014;47:173–184. doi: 10.1016/j.shpsc.2014.05.017. [DOI] [PubMed] [Google Scholar]
  • 178.Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Pritchard J.K., Stephens M., Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 180.Malaspinas A.-S., Tange O., Moreno-Mayar J.V., Rasmussen M., DeGiorgio M., Wang Y., Valdiosera C.E., Politis G., Willerslev E., Nielsen R. bammds: a tool for assessing the ancestry of low-depth whole-genome data using multidimensional scaling (MDS) Bioinformatics. 2014;30:2962–2964. doi: 10.1093/bioinformatics/btu410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181.Reich D., Thangaraj K., Patterson N., Price A.L., Singh L. Reconstructing Indian population history. Nature. 2009;461:489–494. doi: 10.1038/nature08365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182.Ge J., Budowle B. How many familial relationship testing results could be wrong? PLoS Genet. 2020;16:e1008929. doi: 10.1371/journal.pgen.1008929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.Rohlfs R.V., Fullerton S.M., Weir B.S. Familial identification: population structure and relationship distinguishability. PLoS Genet. 2012;8:e1002469. doi: 10.1371/journal.pgen.1002469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184.Fortier A.L., Kim J., Rosenberg N.A. Human-Genetic Ancestry Inference and False Positives in Forensic Familial Searching. G3 (Bethesda) 2020;10:2893–2902. doi: 10.1534/g3.120.401473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185.Kling D., Tillmar A.O., Egeland T. Familias 3 - Extensions and new functionality. Forensic Sci. Int. Genet. 2014;13:121–127. doi: 10.1016/j.fsigen.2014.07.004. [DOI] [PubMed] [Google Scholar]
  • 186.Egeland T., Mostad P.F., Mevâg B., Stenersen M. Beyond traditional paternity and identification cases. Selecting the most probable pedigree. Forensic Sci. Int. 2000;110:47–59. doi: 10.1016/s0379-0738(00)00147-x. [DOI] [PubMed] [Google Scholar]
  • 187.Kling D., Tillmar A. Forensic genealogy—A comparison of methods to infer distant relationships based on dense SNP data. Forensic Sci. Int. Genet. 2019;42:113–124. doi: 10.1016/j.fsigen.2019.06.019. [DOI] [PubMed] [Google Scholar]
  • 188.Kling D. On the use of dense sets of SNP markers and their potential in relationship inference. Forensic Sci. Int. Genet. 2019;39:19–31. doi: 10.1016/j.fsigen.2018.11.022. [DOI] [PubMed] [Google Scholar]
  • 189.Kling D., Phillips C., Kennett D., Tillmar A. Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Sci. Int. Genet. 2021;52:102474. doi: 10.1016/j.fsigen.2021.102474. [DOI] [PubMed] [Google Scholar]
  • 190.Greytak E.M., Moore C., Armentrout S.L. Genetic genealogy for cold case and active investigations. Forensic Sci. Int. 2019;299:103–113. doi: 10.1016/j.forsciint.2019.03.039. [DOI] [PubMed] [Google Scholar]
  • 191.Conomos M.P., Reiner A.P., Weir B.S., Thornton T.A. Model-free Estimation of Recent Genetic Relatedness. Am. J. Hum. Genet. 2016;98:127–148. doi: 10.1016/j.ajhg.2015.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 192.Goudet J., Kay T., Weir B.S. How to estimate kinship. Mol. Ecol. 2018;27:4121–4135. doi: 10.1111/mec.14833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 193.Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 194.Snedecor J., Fennell T., Stadick S., Homer N., Antunes J., Stephens K., Holt C. Fast and accurate kinship estimation using sparse SNPs in relatively large database searches. Forensic Sci. Int. Genet. 2022;61:102769. doi: 10.1016/j.fsigen.2022.102769. [DOI] [PubMed] [Google Scholar]
  • 195.Turner, N., Scholz, J., and Acevedo Evaluating the impact of dropout and genotyping error on SNP-based kinship analysis with forensic samples. Front. Genet. [DOI] [PMC free article] [PubMed]
  • 196.Swgdam . Federal Bureau of Investigation’s Scientific Working Group on DNA Analysis Methods (SWGDAM); 2018. Recommendations of the SWGDAM Ad Hoc Working Group on Genotyping Results Reported as Likelihood Ratios. [Google Scholar]
  • 197.Ringbauer H., Huang Y., Akbari A., Mallick S., Patterson N., Reich D. ancIBD - Screening for Identity by Descent Segments in Human Ancient DNA. bioRxiv. 2023 doi: 10.1101/2023.03.08.531671. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 198.Monroy Kuhn J.M., Jakobsson M., Günther T. Estimating genetic kin relationships in prehistoric populations. PLoS One. 2018;13:e0195491. doi: 10.1371/journal.pone.0195491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 199.Lipatov M., Sanjeev K., Patro R., Veeramah K.R. Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data. bioRxiv. 2015 doi: 10.1101/023374. Preprint at. [DOI] [Google Scholar]
  • 200.Santibanez-Koref M., Griffin H., Turnbull D.M., Chinnery P.F., Herbert M., Hudson G. Assessing mitochondrial heteroplasmy using next generation sequencing: A note of caution. Mitochondrion. 2019;46:302–306. doi: 10.1016/j.mito.2018.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 201.Churchill J.D., Stoljarova M., King J.L., Budowle B. Massively parallel sequencing-enabled mixture analysis of mitochondrial DNA samples. Int. J. Leg. Med. 2018;132:1263–1272. doi: 10.1007/s00414-018-1799-3. [DOI] [PubMed] [Google Scholar]
  • 202.Li M., Schönberg A., Schaefer M., Schroeder R., Nasidze I., Stoneking M. Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am. J. Hum. Genet. 2010;87:237–249. doi: 10.1016/j.ajhg.2010.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 203.Vohr S.H., Gordon R., Eizenga J.M., Erlich H.A., Calloway C.D., Green R.E. A phylogenetic approach for haplotype analysis of sequence data from complex mitochondrial mixtures. Forensic Sci. Int. Genet. 2017;30:93–105. doi: 10.1016/j.fsigen.2017.05.007. [DOI] [PubMed] [Google Scholar]
  • 204.Peter B.M. Admixture, Population Structure, and F-Statistics. Genetics. 2016;202:1485–1501. doi: 10.1534/genetics.115.183913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 205.Gouy A., Zieger M. STRAF-A convenient online tool for STR data evaluation in forensic genetics. Forensic Sci. Int. Genet. 2017;30:148–151. doi: 10.1016/j.fsigen.2017.07.007. [DOI] [PubMed] [Google Scholar]
  • 206.Buckleton J., Curran J., Goudet J., Taylor D., Thiery A., Weir B.S. Population-specific FST values for forensic STR markers: A worldwide survey. Forensic Sci. Int. Genet. 2016;23:91–100. doi: 10.1016/j.fsigen.2016.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 207.He G., Liu J., Wang M., Zou X., Ming T., Zhu S., Yeh H.-Y., Wang C., Wang Z., Hou Y. Massively parallel sequencing of 165 ancestry-informative SNPs and forensic biogeographical ancestry inference in three southern Chinese Sinitic/Tai-Kadai populations. Forensic Sci. Int. Genet. 2021;52:102475. doi: 10.1016/j.fsigen.2021.102475. [DOI] [PubMed] [Google Scholar]
  • 208.Durand E.Y., Patterson N., Reich D., Slatkin M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 2011;28:2239–2252. doi: 10.1093/molbev/msr048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 209.Petr M., Vernot B., Kelso J. admixr—R package for reproducible analyses using ADMIXTOOLS. Bioinformatics. 2019;35:3194–3195. doi: 10.1093/bioinformatics/btz030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 210.Pickrell J.K., Pritchard J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8:e1002967. doi: 10.1371/journal.pgen.1002967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 211.Harney É., Patterson N., Reich D., Wakeley J. Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture. Genetics. 2021;217 doi: 10.1093/genetics/iyaa045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 212.Wagner J.K., Colwell C., Claw K.G., Stone A.C., Bolnick D.A., Hawks J., Brothers K.B., Garrison N.A. Fostering Responsible Research on Ancient DNA. Am. J. Hum. Genet. 2020;107:183–195. doi: 10.1016/j.ajhg.2020.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 213.Tsosie K.S., Begay R.L., Fox K., Garrison N.A. Generations of genomes: advances in paleogenomics technology and engagement for Indigenous people of the Americas. Curr. Opin. Genet. Dev. 2020;62:91–96. doi: 10.1016/j.gde.2020.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 214.Ávila-Arcos M.C., de la Fuente Castro C., Nieves-Colón M.A., Raghavan M. Recommendations for Sustainable Ancient DNA Research in the Global South: Voices From a New Generation of Paleogenomicists. Front. Genet. 2022;13:880170. doi: 10.3389/fgene.2022.880170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 215.Budowle B., Sajantila A. Revisiting informed consent in forensic genomics in light of current technologies and the times. Int. J. Leg. Med. 2023;137:551–565. doi: 10.1007/s00414-023-02947-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 216.Katsanis S.H., Snyder L., Arnholt K., Mundorff A.Z. Consent process for US-based family reference DNA samples. Forensic Sci. Int. Genet. 2018;32:71–79. doi: 10.1016/j.fsigen.2017.10.011. [DOI] [PubMed] [Google Scholar]
  • 217.Parson W., Dür A. EMPOP—A forensic mtDNA database. Forensic Sci. Int. Genet. 2007;1:88–92. doi: 10.1016/j.fsigen.2007.01.018. [DOI] [PubMed] [Google Scholar]
  • 218.Roewer L., Krawczak M., Willuweit S., Nagy M., Alves C., Amorim A., Anslinger K., Augustin C., Betz A., Bosch E., et al. Online reference database of European Y-chromosomal short tandem repeat (STR) haplotypes. Forensic Sci. Int. 2001;118:106–113. doi: 10.1016/s0379-0738(00)00478-3. [DOI] [PubMed] [Google Scholar]
  • 219.Moretti T.R., Moreno L.I., Smerick J.B., Pignone M.L., Hizon R., Buckleton J.S., Bright J.-A., Onorato A.J. Population data on the expanded CODIS core STR loci for eleven populations of significance for forensic DNA analyses in the United States. Forensic Sci. Int. Genet. 2016;25:175–181. doi: 10.1016/j.fsigen.2016.07.022. [DOI] [PubMed] [Google Scholar]
  • 220.Kidd K.K., Soundararajan U., Rajeevan H., Pakstis A.J., Moore K.N., Ropero-Miller J.D. The redesigned Forensic Research/Reference on Genetics-knowledge base, FROG-kb. Forensic Sci. Int. Genet. 2018;33:33–37. doi: 10.1016/j.fsigen.2017.11.009. [DOI] [PubMed] [Google Scholar]
  • 221.Parson W., Gusmão L., Hares D.R., Irwin J.A., Mayr W.R., Morling N., Pokorak E., Prinz M., Salas A., Schneider P.M., et al. DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing. Forensic Sci. Int. Genet. 2014;13:134–142. doi: 10.1016/j.fsigen.2014.07.010. [DOI] [PubMed] [Google Scholar]
  • 222.Zimmermann B., Röck A., Huber G., Krämer T., Schneider P.M., Parson W. Application of a west Eurasian-specific filter for quasi-median network analysis: Sharpening the blade for mtDNA error detection. Forensic Sci. Int. Genet. 2011;5:133–137. doi: 10.1016/j.fsigen.2010.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 223.Willuweit S., Roewer L. The new Y Chromosome Haplotype Reference Database. Forensic Sci. Int. Genet. 2015;15:43–48. doi: 10.1016/j.fsigen.2014.11.024. [DOI] [PubMed] [Google Scholar]
  • 224.Oldt R.F., Kanthaswamy S. Expanded CODIS STR allele frequencies – Evidence for the irrelevance of race-based DNA databases. Leg. Med. 2020;42:101642. doi: 10.1016/j.legalmed.2019.101642. [DOI] [PubMed] [Google Scholar]
  • 225.Edwards DNA Identification Act of 1993. 1993. https://www.govinfo.gov/app/details/BILLS-103s497is
  • 226.Joly Y., Marrocco G., Dupras C. Risks of compulsory genetic databases. Science. 2019;363:938–940. doi: 10.1126/science.aaw4347. [DOI] [PubMed] [Google Scholar]
  • 227.Chow-White P.A., Duster T. Do Health and Forensic DNA Databases Increase Racial Disparities? PLoS Med. 2011;8:e1001100. doi: 10.1371/journal.pmed.1001100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 228.Wickenheiser R.A. Expanding DNA database effectiveness. Forensic Sci. Int. Synerg. 2022;4:100226. doi: 10.1016/j.fsisyn.2022.100226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 229.Amankwaa A.O. Trends in forensic DNA database: transnational exchange of DNA data. Forensic Sci. Res. 2020;5:8–14. doi: 10.1080/20961790.2019.1565651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 230.Triverio S.C., Crespillo Márquez M. The need for cross-border exchange of genetic data for criminal investigation purposes in Latin America: implementation challenges. Spanish J. Leg. Med. 2022;48:158–165. [Google Scholar]
  • 231.GEDmatch GEDmatch & Community Safety
  • 232.Murphy H. The New York Times; 2020. Why a Data Breach at a Genealogy Site Has Privacy Experts Worried. [Google Scholar]
  • 233.Edge M.D., Coop G. Attacks on genetic privacy via uploads to genealogical databases. Elife. 2020;9:e51810. doi: 10.7554/eLife.51810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 234.Taylor D., Buckleton J., Evett I. Testing likelihood ratios produced from complex DNA profiles. Forensic Sci. Int. Genet. 2015;16:165–171. doi: 10.1016/j.fsigen.2015.01.008. [DOI] [PubMed] [Google Scholar]
  • 235.Erlich Y., Shor T., Pe’er I., Carmi S. Identity inference of genomic data using long-range familial searches. Science. 2018;362:690–694. doi: 10.1126/science.aau4832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 236.Fox K., Hawks J. Use ancient remains more wisely. Nature. 2019;572:581–583. doi: 10.1038/d41586-019-02516-5. [DOI] [PubMed] [Google Scholar]
  • 237.Mourier T., Ho S.Y.W., Gilbert M.T.P., Willerslev E., Orlando L. Statistical guidelines for detecting past population shifts using ancient DNA. Mol. Biol. Evol. 2012;29:2241–2251. doi: 10.1093/molbev/mss094. [DOI] [PubMed] [Google Scholar]
  • 238.Malaspinas A.-S. Methods to characterize selective sweeps using time serial samples: an ancient DNA perspective. Mol. Ecol. 2016;25:24–41. doi: 10.1111/mec.13492. [DOI] [PubMed] [Google Scholar]
  • 239.Klunk J., Vilgalys T.P., Demeure C.E., Cheng X., Shiratori M., Madej J., Beau R., Elli D., Patino M.I., Redfern R., et al. Evolution of immune genes is associated with the Black Death. Nature. 2022;611:312–319. doi: 10.1038/s41586-022-05349-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 240.Renaud G., Hanghøj K., Willerslev E., Orlando L. gargammel: a sequence simulator for ancient DNA. Bioinformatics. 2017;33:577–579. doi: 10.1093/bioinformatics/btw670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 241.Huang W., Li L., Myers J.R., Marth G.T. ART: a next-generation sequencing read simulator. Bioinformatics. 2012;28:593–594. doi: 10.1093/bioinformatics/btr708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 242.Henriksen R.A., Zhao L., Korneliussen T.S. NGSNGS: next-generation simulator for next-generation sequencing data. Bioinformatics. 2023;39:btad041. doi: 10.1093/bioinformatics/btad041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 243.Renaud G., Stenzel U., Kelso J. leeHom: adaptor trimming and merging for Illumina sequencing reads. Nucleic Acids Res. 2014;42:e141. doi: 10.1093/nar/gku699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 244.Li, H. seqtk Toolkit for processing sequences in FASTA/Q formats. GitHub
  • 245.Schubert M., Lindgreen S., Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes. 2016;9:88. doi: 10.1186/s13104-016-1900-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 246.Lindgreen S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes. 2012;5:337. doi: 10.1186/1756-0500-5-337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 247.Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 248.Li H., Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 249.Langmead B., Salzberg S.L. Fast Gapped-Read Alignment With Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 250.Danecek P., Bonfield J.K., Liddle J., Marshall J., Ohan V., Pollard M.O., Whitwham A., Keane T., McCarthy S.A., Davies R.M., Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 251.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 252.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  • 253.Ringbauer H., Novembre J., Steinrücken M. Parental relatedness through time revealed by runs of homozygosity in ancient DNA. Nat. Commun. 2021;12:1–11. doi: 10.1038/s41467-021-25289-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Non-exhaustive list of software commonly used in forensic and aDNA analysis
mmc1.xlsx (18KB, xlsx)

Articles from iScience are provided here courtesy of Elsevier

RESOURCES