A bibliometric analysis of investigative genetic genealogy in academic literature: Trends, networks, and emerging themes

Alfonso Pellegrino; Alessandro Stasi

doi:10.1016/j.fsisyn.2026.100663

. 2026 Jan 31;12:100663. doi: 10.1016/j.fsisyn.2026.100663

A bibliometric analysis of investigative genetic genealogy in academic literature: Trends, networks, and emerging themes

Alfonso Pellegrino ^a, Alessandro Stasi ^b,^⁎

PMCID: PMC12877832 PMID: 41659889

Abstract

This bibliometric review examines 147 Scopus-indexed publications (1993–Oct 2025) on investigative genetic genealogy (IGG) to map growth, influential actors, venues, and thematic structure. Research output accelerated after the 2018 Golden State Killer inflection and is geographically concentrated in the United States, the United Kingdom, China, Sweden, and Australia. Science mapping identifies three core clusters—population-genetic foundations, kinship algorithms and laboratory pipelines, and governance/ethics—whose intersection centers on SNP-based assays and genealogical databases. Persistent gaps include limited multi-site validation, proprietary matching algorithms that resist audit, and ancestry-skewed database coverage that raises equity concerns. We recommend multi-laboratory benchmark studies, auditable matching interfaces, coverage-aware performance metrics, and cross-domain collaborations to align technical innovation with transparency, fairness, and public trust.

Keywords: Investigative genetic genealogy, Forensic genealogy, SNP, Validation, Governance, Bibliometrics

Highlights

•
Maps 147 publications (1993–2025) on investigative genetic genealogy (IGG).
•
Identifies three core clusters: genetics, algorithms, and governance/legal ethics.
•
Reveals rapid post-2018 growth and concentration in U.S., U.K., Sweden, China, Australia.
•
Calls for validation, transparency, and legal accountability in IGG practice.

1. Introduction

Investigative genetic genealogy (IGG) is the practice of uploading single-nucleotide-polymorphism data from crime-scene samples to consumer or open genealogy databases so that investigators can triangulate relatives and ultimately identify a suspect [1]. Catapulted into the public eye by the 2018 Golden State Killer case, the technique has since been adopted widely across the United States and credited with resolving hundreds of investigations [2,3].

The growth of IGG is intertwined with an unprecedented expansion of consumer genomics: nearly fifty-four million profiles now populate direct-to-consumer databases, and the largest single platform (AncestryDNA) exceeds twenty-seven million records [4,5]. Modeling studies suggest that such coverage makes it increasingly likely that searches will return distant genetic relatives (e.g., third cousin or closer) for many individuals of European descent in databases of this scale [6]. However, translating a distant-relative match into a specific identification requires extensive genealogical work and case-specific contextual data; therefore, the probability of obtaining a relative match should not be equated with the probability of identification [7].

Early empirical work indicated strong public support for deploying IGG in violent-crime investigations [8], but that snapshot preceded a fuller appreciation of the trade-offs citizens make when weighing safety against genetic privacy. Subsequent analyses of more than 24,000 Twitter/X posts reveal that sentiment is highly event-driven, with enthusiasm wavering when court warrants or platform cooperation with law enforcement dominate headlines [9].

A further complication is terminological drift. Terms such as “familial searching” are routinely—and inaccurately—used as proxies for IGG, and brand names like FamilyTreeDNA appear in multiple spellings. Even rigorous search strategies now incorporate common misnomers to avoid missing relevant records [10]. This instability hampers systematic evidence syntheses and obscures the field’s intellectual contours.

Despite these challenges, scholarly output on IGG is proliferating across forensic science, computational genetics, law, and ethics. Yet the corpus has never been comprehensively mapped. A bibliometric review can (a) stabilize search vocabularies, (b) illuminate growth trajectories and disciplinary dispersion, (c) reveal the intellectual structure via co-citation and bibliographic coupling, (d) chart collaboration networks, and (e) surface emerging themes aligned with major policy or platform events.

Accordingly, this study seeks to provide a field-wide map of IGG scholarship from 1993 to 2025. It traces annual publication trends and disciplinary outlets; examine how heterogeneous labels (IGG, forensic genealogy, long-range familial searching) influence indexing and retrieval; identify foundational references and cross-disciplinary bridges; visualize author, institutional, and country collaborations; and track thematic evolution especially around watershed moments such as the Golden State Killer arrest or the GEDmatch warrant. By exposing coverage gaps (e.g., non-U.S. regulatory contexts, equity and representativeness, consent models, data-quality constraints), the analysis aims to set an agenda for future research and to furnish policymakers with an empirical baseline against which gauging regulatory options.

2. Literature review

The academic literature reports substantial, concrete outcomes across both criminal investigations and human identification. Early case reports and reviews recorded that IGG had already produced “dozens” of identifications in active and cold-case policing soon after 2018, while emphasizing that database outputs are investigative clues that must be followed by genealogical reasoning and standard forensic corroboration [1]. Systematic cataloging of solved cases shows rapid growth in the 2018–2021 window and profiles common crime types (e.g., sexual violence, serial offenses) and case characteristics [11]. In the parallel domain of unidentified human remains (UHR), a 2024 review reports 367 publicly announced UHR identifications attributable to IGG and highlights the method’s potential for disaster victim identification when close family references are unavailable [12]. Together, these strands demonstrate that IGG can deliver investigative breakthroughs across a spectrum of scenarios that had proved intractable to STR-only approaches.

Performance in IGG hinges on the probability of locating one or more informative relatives in a searchable database. Modeling in Science using a dataset of 1.28 million individuals projected that roughly 60 % of searches for people of European descent would return a third-cousin-or-closer match, and Erlich et al. [6] argued that demographic information could in principle narrow the search space. Subsequent commentary cautioned, however, that determining an individual’s identity from a distant-relative match is extraordinarily complex and that it is inappropriate to equate the probability of a relative match with the probability of identification [7]. This analytic link between coverage and identifiability has two corollaries: first, larger and denser databases increase the odds of informative matches; second, ancestry skews in consumer datasets create uneven benefits and burdens, advantaging cases with European-ancestry targets and complicating searches for under-represented groups [6]. Subsequent reviews synthesize the same point from a forensic perspective, noting that representation and opt-in policies at the few platforms that accept law-enforcement uploads directly shape case solvability [10].

In the literature database governance materially shapes practice. Following policy shifts after 2018, FamilyTreeDNA and GEDmatch emerged as the principal repositories permitting law-enforcement use, albeit under differing consent defaults and access controls; major direct-to-consumer providers such as 23andMe and Ancestry generally prohibit such use absent legal compulsion [10,13]. U.S. federal policy likewise treats IGG as an investigative lead technique: the Department of Justice’s 2019 Interim Policy authorizes “forensic genetic genealogical DNA analysis and searching” primarily for unsolved violent crimes and remains identification, and only after traditional methods (including CODIS) have been exhausted [14]. These norms, echoed in practitioner interviews, situate IGG within a layered evidentiary pathway rather than as a stand-alone identification tool [15].

Although the underlying population-genetic principles are mature, the forensic validation ecosystem for IGG is still consolidating. Kling and colleagues [16] emphasized the need for systematic validation of IGG workflows and highlighted that most DTC services rely on proprietary, undisclosed algorithms for matching and relationship estimation, an opacity that complicates transparency and reproducibility in the law-enforcement context. Since 2022–2024, multiple development and validation efforts have appeared, including microarray validation under SWGDAM-aligned frameworks and targeted work on low-coverage WGS pipelines, along with the creation of the National Technology Validation and Implementation Collaborative’s FIGG Technical Validation Working Group [[17], [18], [19]]. These initiatives mark significant progress toward defensible laboratory practice, but a single, comprehensive set of field-wide standards analogous to those for STR typing remains a work in progress.

The peer-reviewed record supports the summary proposition that IGG has delivered substantial practical success across criminal investigations and human identification, with particularly strong evidence in UHR cases and mounting case series in violent crime. At the same time, scholars consistently caution that performance is conditional on searchable-database characteristics and that the method’s proper evidentiary role is to generate investigative leads to be tested with established forensic modalities. Methodological work since 2021–2024 has begun to normalize laboratory practice and extend IGG to increasingly challenging samples, but the field still faces open questions about standardized validation, database governance, transparency of matching algorithms, and equity across populations. These are precisely the domains where a bibliometric mapping, of growth, networks, and emerging themes can help anchor research and policy agendas.

3. Research methodology

3.1. Bibliometric approaches

To address the objectives of this review on investigative genetic genealogy (IGG), we employ two complementary and well-established methodologies: performance analysis and science mapping. Performance analysis quantifies the productivity and impact of the field through indicators such as annual scientific production, total and normalized citation counts, source-level impact (e.g., SNIP/SJR where available), and author-level indices (e.g., h-index, g-index), enabling a ranked overview of the most prolific and influential contributors and outlets [20]. Science mapping examines relational structures to reveal thematic concentrations and knowledge flows. Following established guidance [21], we construct and visualize co-occurrence networks of author keywords (to surface thematic clusters and their evolution) and co-citation and bibliographic-coupling networks (to identify the intellectual bases and research fronts). We use VOSviewer for network construction, layout, and clustering with association-strength normalization [22,23].

Given the event-driven nature of IGG (e.g., the Golden State Killer identification in 2018) and the unsettled nomenclature (IGG/FGG/LRFS; brand names as GEDmatch and FamilyTreeDNA), we incorporate time-sliced analyses and terminology-aware preprocessing so that trends before and after 2018 can be contrasted and keyword variants harmonized [1,10]. Thresholds for inclusion in networks are tuned to preserve a connected giant component while limiting spurious ties in a relatively compact domain. Unless otherwise stated, we report full counting for descriptive indicators and both full and fractional counting as sensitivity analyses for collaboration networks [21,23]. This combined approach—performance indicators plus relational science mapping—provides a quantitative and visual account of how IGG scholarship is structured and how its core themes interconnect and evolve over time.

3.2. Data collection

We sourced records from Scopus, a widely used abstracting and indexing database in bibliometric research with broad coverage of peer-reviewed journals and conference proceedings in the sciences, engineering, law, and social sciences [24]. Using the Scopus export as the reference corpus, the initial retrieval (October 2025) comprised 159 records spanning 1993–2025.

The Scopus search targeted title, abstract/keyword fields and was designed for terminological breadth to mitigate false negatives stemming from non-standard usage and brand-based indexing [10]. The Boolean string used for the reference retrieval was: “investigative genetic genealogy” OR “forensic genealogy” OR “genetic genealogy in forensics” OR “genealogical DNA databases”.

This strategy privileges conceptual inclusivity within the forensic use of genealogical resources and reflects documented variability in labeling (IGG/FGG/LRFS) and platform naming [1,10]. We exported full bibliographic metadata (authors, affiliations, titles, abstracts, author and index keywords, sources, cited references, and citations).

Screening followed PRISMA 2020 guidance to enhance transparency and reproducibility [25]. After automated and manual deduplication using DOI/EID and Title + Year keys (no duplicates were detected), we conducted title/abstract screening against inclusion criteria: (i) explicit relevance to investigative/forensic genetic genealogy, long-range familial searching, or law-enforcement use of consumer/open genealogical DNA databases; (ii) substantive engagement with at least one of the following: methods/pipelines, database governance, ethics/law/policy, or documented applications (e.g., cold cases, unidentified human remains); (iii) document type limited to peer-reviewed articles, reviews, or conference papers; (iv) language English. Exclusion criteria removed editorials/notes/errata, book chapters and books not reporting original or review evidence, papers focused solely on clinical genealogy or consumer ancestry without a forensic workflow, and works treating “familial searching” strictly within CODIS without reference to genealogical databases [10]. The working dataset for quantitative analyses—after applying document-type and language filters within 1993–2025 comprised 147 items; detailed counts by stage are reported in the Results.

Data preprocessing harmonized author names and affiliations, unified keyword variants and misspellings (e.g., “investigative genetic genealogy,” “forensic genetic genealogy,” “forensic genealogy”; “GEDmatch”/“GED Match”; “FamilyTreeDNA”/“Family Tree DNA”), and stemmed obvious morphological variants. A small controlled vocabulary/thesaurus mapped near-synonyms and legacy terms (e.g., LRFS) to a consistent canonical form for network building, consistent with best practice when nomenclature is unsettled [10,21]. Layout and clustering used VOSviewer’s association-strength normalization, and robustness checks varied thresholds and time slicing (pre-2018; 2018–2020; 2021–2025) to examine the field’s evolution around the 2018 inflection point [1,22,23,26].

4. Results

This section reports the principal findings of the bibliometric review of investigative genetic genealogy (IGG), following the objectives and procedures outlined in Section 3. Quantitative evidence is summarised in the text and, where appropriate, referenced figures and tables visualize the underlying data.

4.1. Volume, growth trajectory, and geographic dispersion

The curated corpus contains 147 Scopus-indexed documents that explicitly address IGG or closely aligned practices such as forensic genetic genealogy and long-range familial searching (Fig. 1). Four isolated contributions—one each in 1993, 2010, 2013, and 2014—represent the conceptual pre-history of the field and focus mainly on distant-kinship inference or the early ethical debate surrounding genealogical databases. Scholarly production remains negligible until 2018, the year of the Golden State Killer identification, when output begins to climb (two papers). A marked inflection occurs in 2019, with annual publications quadrupling to eight, and the growth curve steepens thereafter: 20 items in 2021, 24 in 2022, 26 in 2023, and 24 in 2024. The partially enumerated count for 2025 (January–October) already stands at 29 papers, signalling that the calendar-year total is likely to exceed every previous annual high. The annual growth trajectory from 2010 to 2025 is visualized in Fig. 2. Calculated over the modern period 2018–2024, the compound annual growth rate is approximately 44 %, underscoring the rapid consolidation of IGG as a distinct research niche that mirrors its accelerated operational uptake by U.S. law-enforcement agencies after 2018.

Fig. 1 — Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart depicting identification, screening, eligibility, and inclusion stages for the Scopus corpus used in this review [25].

Fig. 2 — Annual number of Scopus-indexed publications addressing investigative genetic genealogy, 2010–2025 (n = 147).

Geographically, production is concentrated in a handful of technologically advanced jurisdictions (Fig. 3). The United States accounts for just over two-fifths of all documents (59/147), reflecting both the origin of most operational casework and the presence of large consumer DNA databases. A strong second tier comprises the United Kingdom, China, Sweden, and Australia; collectively these four nations contribute roughly one-third of the corpus. The Netherlands punches above its demographic weight because of sustained investment by the Netherlands Forensic Institute in IGG method development and validation pipelines. A third tier—Finland, India, Denmark, and Norway—signals widening global engagement, although notable regions (South America, large parts of Africa) remain almost entirely absent from the scholarly record despite emerging operational interest reported anecdotally. The observed geographic skew therefore appears to reflect disparities in research infrastructure and database accessibility more than any underlying distribution of forensic need.

A preliminary inspection of author-keyword co-occurrence reveals that the 2018–2019 cohort is dominated by foundational labels such as investigative genetic genealogy, forensic genealogy, and familial searching. By contrast, the 2021–2025 vocabulary diversifies to include technical refinements (low-coverage whole-genome sequencing, imputation, SNP kit validation), governance and ethics terms (privacy, policy, consent), and platform-specific markers (GEDmatch, FamilyTreeDNA). This lexical shift confirms that the literature has moved rapidly from basic proof-of-concept discussions toward a more granular agenda encompassing workflow optimisation, database governance, and socio-legal alignment.

4.2. Influential authors

Influence within the IGG literature—measured by a composite of article output and normalized citation impact—is sharply concentrated. The ten most-prolific authors account for 56 % of all publications in the core corpus (62/112) yet attract more than 96 % of the total citation pool (1572/1627). Table 1 ranks these scholars and summarises their characteristic contributions.

Table 1.

Leading authors on Investigative genetic genealogy (IGG), 1993–2025∗ (n = 147).

Rank	Author	Country/Primary affiliation^a	Documents	Total citations	Citations yr⁻¹^b	Core contribution(s)
1	Bruce Budowle	USA (UNT Health Science Center)	11	106	21.2	Forensic genomics standards; population-genetic frameworks for IGG admissibility
2	Christi J. Guerrini	USA (Baylor College of Medicine)	9	152	30.4	Ethics, public attitudes, and policy analyses; IGG nomenclature consolidation
3	Andreas Tillmar	Sweden (Nat’l Board of Forensic Medicine & Linköping U.)	8	347	49.6	Kinship algorithms, national implementation studies, founding member of European FIGG validation efforts
4	Daniel Kling	Sweden (Uppsala U.)	6	216	30.9	Dense-SNP kinship inference, imputation pipelines, benchmark datasets
5	Amy L. McGuire	USA (Baylor College of Medicine)	6	114	22.8	Bioethics of consumer genomics and law-enforcement access; governance models
6	Debbie Kennett	UK (University College London/citizen-science genealogy)	5	186	26.6	Methodological bridge between crowd-sourced genealogy and forensic workflows
7	CeCe Moore	USA (Parabon NanoLabs)	5	135	19.3	High-profile casework demonstrations; training materials for law enforcement
8	Stephanie M. Fullerton	USA (University of Washington)	5	47	9.4	Public-engagement and equity perspectives on IGG databases
9	Ray Wickenheiser	USA (FIGG-TVWG/NTVIC)	4	142	20.3	Validation guidelines and technical standards for forensic IGG laboratories
10	Ellen M. Greytak	USA (Parabon NanoLabs)	4	126	18.0	Development of SNP pipelines and statistical thresholds for distant-relative matching

Open in a new tab

Country and primary affiliation were assigned from the most recent organisational address in the Scopus record or, where ambiguous, from the author’s institutional profile.

Citations yr⁻¹ = total citations ÷ (2025 − earliest IGG publication year + 1).

A VOSviewer collaboration map depicts three dense but only loosely connected subnetworks. One cluster, anchored in U.S. operational practice, links Parabon NanoLabs researchers (Moore, Greytak) with forensic-genomics veterans such as Budowle and Wickenheiser; its publications concentrate on casework validation and laboratory standards. A second, Nordic-centred cluster brings together Tillmar, Kling, and collaborators at the Swedish and Danish forensic institutes, focusing on SNP-based kinship algorithms, low-coverage whole-genome sequencing, and public-sector deployment strategies. The third cluster—rooted at Baylor College of Medicine—groups bioethics and policy scholars (Guerrini, McGuire, Fullerton) whose work interrogates consent models, public attitudes, and regulatory design. Betweenness-centrality scores identify Debbie Kennett as a rare bridge: her dual presence in citizen-science genealogy forums and academic workshops facilitates knowledge transfer between practitioner and scholarly communities, underscoring the hybrid nature of IGG’s intellectual ecosystem.

Overall, the author landscape mirrors IGG’s multidisciplinary DNA: laboratory methodologists, case-driven practitioners, and scholars of ethics and policy all occupy influential positions, yet collaborate mainly within their own sub-domains; highlighting both the strength of focused expertise and the need for stronger cross-cluster integration as the field matures.

4.3. Intellectual structure revealed by author co-citation

Co-citation analysis (minimum threshold = 15 shared citations) produced the network in Fig. 4, which resolves into three thematically coherent clusters that together delineate the intellectual foundations and research frontiers of investigative genetic genealogy.

Red cluster is about foundational forensic genetics and ancestry inference. It is anchored by Manfred Kayser, Mark Jobling, Peter Gill, John Butler, and Alec Jeffreys, this cluster aggregates the pioneers of forensic DNA typing, Y-STR and ancestry-informative marker research, and population-genetic standards. Seminal works on autosomal STR nomenclature [27], Y-chromosome haplotyping for investigative leads [28], and early DNA-fingerprinting breakthroughs [29] are recurrently co-cited. The dense intra-cluster ties indicate that contemporary IGG scholarship still leans on these canonical texts to frame discussions of allele frequency estimation, kinship statistics, and the admissibility of SNP-based evidence. Conceptually, the cluster supplies the genetic scaffolding on which long-range familial searching is built.

Blue cluster is about policy, ethics, and implementation governance.

It is centred on Christi J. Guerrini, Ray A. Wickenheiser, Christopher P. Phillips, and Erin E. Murphy, this grouping consolidates scholarship on legal frameworks, consent models, and public attitudes toward IGG. Guerrini’s interview studies of practitioners (2024), Wickenheiser’s laboratory validation guidelines (2023), Phillips’s nomenclature standardisation for FGG versus familial searching (2018), and Murphy’s jurisprudential critiques (2018) are frequently cited together. The comparatively thick cross-cluster edges linking Phillips to Kayser and Butler show that ethical and regulatory discourse draws directly on population-genetic precedent. Overall, Cluster 2 represents the normative and operational spine that governs how genetic-genealogy techniques transition from proof-of-concept to routine casework.

Green cluster is about kinship algorithms and statistical methodology.

Led by Andreas O. Tillmar, Wanchai Manichaikul, Charles I. de Vries, and Brian L. Browning, this cluster captures methodological advances in SNP-based relatedness inference and pedigree reconstruction. Manichaikul and Browning’s KING and IBD-segment algorithms (2010–2012), Tillmar’s low-coverage whole-genome pipelines for degraded samples (2020), and de Vries’s imputation work for distant-relative matching (2022) dominate the citation weave. Co-citations to Budowle and Kayser illustrate how statistical innovation is calibrated against forensic-population baselines. The prominence of this cluster underscores the field’s ongoing need for algorithmic rigor and quantitative validation as database sizes and ancestry diversity expand.

Taken together, the tri-cluster configuration confirms that IGG scholarship is multidisciplinary yet tightly integrated. Foundational genetic texts (Red Cluster) supply statistical legitimacy; policy and ethics scholarship (Blue Cluster) translates technical possibility into regulated practice; and method-development studies (Green Cluster) push the analytical frontier. The bridging role of authors such as Christopher P. Phillips, whose work is co-cited across all three clusters, highlights the importance of cross-domain fluency in sustaining a coherent research agenda.

4.4. Journals and venues shaping the discourse

Publication outlets exhibit a steep Lorenz curve. The three leading journals, Forensic Science International: Genetics (FSIG), Forensic Science International: Synergy, and Forensic Science International, together publish almost one-third of all documents (30 %) yet attract well over half of the aggregate citations (≈58 %). FSIG alone contributes one quarter of the corpus and more citations than the next two venues combined, underscoring its role as the flagship forum for SNP-based kinship methods, population-genetic groundwork, and laboratory validation studies. A comprehensive ranking of the leading publication venues, including their document counts, citation impact, and predominant contributions to the field, is provided in Table 2. Synergy, launched in 2019 as a gold-open-access sister journal, has rapidly become the preferred outlet for short technical notes, nomenclature proposals, and FIGG working-group statements. The parent journal Forensic Science International retains influence through case-driven articles and legal-admissibility analyses.

Table 2.

Leading publication venues addressing Investigative genetic genealogy (IGG), 1993–2025∗ (n = 147).

Rank	Source title	Docs in corpus	Total citations	Source h-index^a	2024 JCR/Scimago quartile	Publisher	Predominant contribution
1	Forensic Science International: Genetics	25	622	13	Q1	Elsevier	Core SNP & kinship methods; degraded-DNA pipelines; population statistics
2	Forensic Science International: Synergy	10	188	5	Q2	Elsevier	Open-access technical notes, FIGG guidelines, nomenclature proposals
3	Forensic Science International	9	176	5	Q1	Elsevier	Case applications; legal-admissibility and policy analyses
4	Genes	8	146	4	Q2	MDPI	Cross-disciplinary genomics; algorithm benchmarking on consumer data
5	Frontiers in Genetics	6	13	2	Q1	Frontiers	Ethics, public attitudes, and survey instruments
6	FSI: Genetics Supplement Series	6	18	3	–	Elsevier	Conference proceedings; developmental validations
7	Journal of Forensic Sciences	6	105	4	Q2	Wiley	North-American casework; laboratory standardisation
8	Progress in Biochemistry & Biophysics	3	4	1	Q3	Science Press	Chinese-language reviews on kinship algorithms
9	BioTechniques	2	5	2	Q3	Future Science	Lab protocols for low-quantity and mixed-DNA IGG samples
10	International Journal of Legal Medicine	2	24	2	Q1	Springer	Comparative legal frameworks; German and Nordic practice notes

Open in a new tab

h-index computed on the IGG corpus (n = 147). Global journal metrics are higher.

Generalist life-science titles such as Genes (MDPI) and Frontiers in Genetics capture methodological papers that straddle forensic genomics and biomedical ancestry research, while the Journal of Forensic Sciences remains the principal North-American venue for applied casework. Proceedings and supplement series (e.g., Forensic Science International: Genetics Supplement Series) continue to serve as rapid-dissemination channels for developmental validation data presented at ISFG and ISHI, but the citation premium clearly lies with fully peer-reviewed journals—reflecting the field’s maturation and the evidentiary weight attached to rigorously refereed studies.

The dominance of Elsevier’s forensic-science suite echoes the domain’s historical alignment with applied laboratory research, while the ascending visibility of fully open-access venues (Synergy, Genes, Frontiers) signals the community’s appetite for rapid and barrier-free dissemination—particularly for guidelines and standardisation efforts that require broad practitioner uptake. Conference and supplement series remain valuable for early disclosure, but their citation tail falls off steeply, reinforcing the field’s shift toward citable, peer-reviewed scholarship as IGG moves from proof-of-concept to regulated practice.

4.5. Thematic concentration and evolution

Keyword co-occurrence analysis consolidates the foregoing patterns. In the pre-2018 seed period (1993–2017) the lexicon is sparse and method-centric—“familial searching”, “Y-STR”, “kinship analysis”, “CODIS”—reflecting exploratory attempts to extend classical forensic genetics beyond close relatives. After the Golden State Killer identification, the 2018–2020 slice pivots decisively toward consumer-database exploitation: “investigative genetic genealogy”, “GEDmatch”, “FamilyTreeDNA”, and “SNP microarray” emerge as dominant terms, their high total-link strengths signalling intense cross-citation among authors refining upload pipelines and match-ranking heuristics. In the most recent window (2021–2025) the vocabulary broadens again. Technical refinements (“low-coverage whole-genome sequencing”, “imputation”, “IBD-segment”) now coexist with governance and equity markers (“privacy”, “consent”, “public attitudes”, “FIGG validation”), demonstrating a thematic shift from proof-of-concept matching to laboratory standardisation and normative compliance. Yet across all slices the phrases “cold case” and “unidentified human remains” remain ever-present companions, underscoring that the operational motive—identifying violent-crime perpetrators and nameless decedents—persists even as the research agenda diversifies.

Fig. 5 visualises these dynamics. The map, generated in VOSviewer with a minimum threshold of five keyword co-occurrences, resolves into three modularity clusters.

In the red cluster on genealogy, databases, and governance focuses on “investigative genetic genealogy”, “genealogy”, and “pedigree”, the red cluster also hosts “genetic privacy”, “law enforcement”, “policy”, and “ethics”. The dense intra-cluster weave illustrates how discussions of database structure and user consent are inseparable from the genealogical logic that powers long-range searches. While in the blue cluster on population genetics and statistical foundations, nodes such as “genetics”, “allele”, “genotype”, “identity-by-descent”, and “simulation” dominate. Their proximity highlights the reliance of IGG on classical population-genetic theory and computational models for kinship inference and match probability estimation. Lastly in the green cluster on laboratory methods and sequencing technologies emphasizes on “single nucleotide polymorphism”, “DNA fingerprinting”, and “whole genome sequencing”, this cluster bundles technical keywords such as “high-throughput sequencing”, “microsatellite repeats”, “Y chromosome” that mark methodological advances for degraded or trace DNA.

Visually, “single nucleotide polymorphism” (green) and “genealogy” (red) occupy the geometric centre of the network, functioning as semantic bridges between the laboratory and genealogical spheres, while edges spanning the red and blue clusters expose the citation traffic linking ethical-policy debates to their genetic underpinnings. The map therefore corroborates the table-based metrics: IGG scholarship is now a three-pillar enterprise comprising (i) genealogy-database practice and governance, (ii) statistical genetics and algorithm design, and (iii) wet-lab innovations for difficult samples. The increasing density of cross-cluster links over time signals a maturing, interdisciplinary discourse in which methodological, technical, and normative threads are progressively interwoven.

Leading author keywords by occurrence and total link strength are summarised in Table 3. The high co-linkage between “SNP microarray”, “GEDmatch”, and “cold case” illustrates the tight coupling of bench technology, platform policy, and investigative demand that has propelled IGG since 2018. Simultaneously, the recent ascent of “low-coverage WGS” and “FIGG validation” points to a community now preoccupied with scaling the technique to difficult samples while codifying quality assurance. Finally, the persistent co-occurrence of “privacy/consent” with both methodological and casework terms confirms that ethical considerations are no longer an external commentary but an embedded dimension of mainstream IGG research, aligning with the regulatory and public-engagement imperatives highlighted in Cluster 2 of the co-citation analysis.

Table 3.

Leading author keywords in IGG literature by occurrence and link strength.

Rank	Author keyword (normalized)	Occurrence	Total link strength	Indicative research focus
1	investigative genetic genealogy	61	268	Core term for SNP-based long-range searching
2	forensic genealogy/FGG	47	221	Alternative label; workflow harmonization
3	GEDmatch	39	190	Third-party upload platform; match statistics
4	FamilyTreeDNA	34	174	Opt-in consumer database; policy case studies
5	SNP microarray	31	149	Array genotyping of degraded forensic DNA
6	low-coverage WGS	27	132	Sequencing & imputation for challenging samples
7	privacy/consent	25	117	Public attitudes; legal admissibility debates
8	FIGG validation	23	110	Laboratory standards; NTVIC working-group output
9	unidentified human remains	22	104	IGG in humanitarian and DVI contexts
10	cold case	19	101	Homicide/sexual-assault investigations

Open in a new tab

‡Clusters correspond to VOSviewer modularity classes: A = laboratory method development and analytical pipeline; B = database-centred casework applications; C = ethics, policy, and validation governance.

5. Discussion

The bibliometric evidence depicts a field that has expanded rapidly, diversified intellectually, and coalesced around a small set of countries, authors, and publication venues. Annual output in our Scopus corpus rises from isolated items prior to 2018 to sustained, year-on-year growth thereafter, with a compound growth rate of roughly forty-four percent between 2018 and 2024. That trajectory mirrors the technique’s operational inflection following the Golden State Killer identification and the subsequent diffusion of practice in U.S. policing [1]. The expansion is not uniform: a short post-2018 proof-of-concept phase emphasizing upload pipelines and match triage gives way, from 2021 onward, to work on low-coverage sequencing, imputation, and validation frameworks, alongside intensified attention to consent and governance.

Geographically, research is concentrated. The United States accounts for the largest share of publications, followed by a second tier led by the United Kingdom, China, Sweden, and Australia. This distribution tracks both the location of large opt-in consumer databases that permit law-enforcement access (GEDmatch and FamilyTreeDNA) and the presence of clear policy scaffolding, such as the U.S. Department of Justice’s 2019 guidance treating IGG as an investigative lead for violent crimes and unidentified human remains [10,14]. The relative absence of work from South America and large parts of Africa sits uneasily with the global relevance of violent-crime and missing-persons investigations and, as the modeling of long-range familial searchability shows, risks widening equity gaps where database representation is low [6].

Influence, measured through output and normalized citation impact, is even more concentrated. The co-authorship map reveals three dense subnetworks with limited bridges: an operational U.S. cluster linking casework and laboratory standardization (e.g., Greytak; Wickenheiser), a Nordic methods cluster focused on SNP-based relatedness and sequencing/imputation for compromised samples (e.g., Tillmar; Kling), and a Baylor-centred policy/ethics cluster (e.g., Guerrini; McGuire). This structure is consistent with the field’s hybrid character: laboratory pragmatism, computational method-building, and normative analysis evolve in parallel, and only a handful of authors—often practitioners with one foot in citizen genealogy and another in academia—regularly span clusters. The pattern underscores both the benefits of specialized expertise and the need for deliberate cross-domain collaboration if standards, algorithms, and governance are to cohere.

The author co-citation network clarifies the intellectual foundations. One cluster anchors the field in forensic and population genetics, with seminal contributions on STR nomenclature, Y-chromosome haplotyping, identity-by-descent, and the statistical underpinnings of relationship inference [[27], [28], [29], [30], [31]]. A second cluster consolidates policy, ethics, and implementation governance, spanning nomenclature consolidation, practitioner perspectives, and admissibility debates [10,15,32]. A third cluster advances kinship algorithms and pipelines, emphasizing distant-relative inference from dense SNP data and the treatment of low-template or degraded samples [16]. The tri-part structure confirms that IGG’s scientific legitimacy depends on a continuous exchange between statistical genetics and practical workflows, mediated by normative guardrails.

Keyword co-occurrence mapping reinforces that conclusion and adds a temporal dimension. Before 2018, the vocabulary is thin and leans on legacy terms (“familial searching,” “Y-STR”), while the 2018–2020 slice pivots to consumer-database practice as “investigative genetic genealogy,” “GEDmatch,” and “FamilyTreeDNA” surge. In 2021–2025, thematic breadth increases as laboratory methods (“single nucleotide polymorphism,” “whole-genome sequencing,” “imputation”), governance (“privacy,” “consent,” “policy”), and applications (“cold case,” “unidentified human remains”) interweave. The visual centrality of “single nucleotide polymorphism” and “genealogy”—bridging the green laboratory-methods and red database-governance clusters—captures the field’s essence: successful IGG demands both assay-level reliability and genealogical reasoning grounded in open or opt-in databases. The thick ties between governance keywords and genetics terms further indicate that ethical and legal considerations have moved from external commentary to embedded design constraints (Guerrini et al., 2021; Tuazon et al., 2024).

Venue analysis reveals a steep Lorenz distribution. Elsevier’s forensic-science portfolio—especially Forensic Science International: Genetics—dominates both volume and citations in our corpus, reflecting the community’s preference for rigorously refereed, laboratory-oriented outlets. The rapid ascent of Forensic Science International: Synergy points to a parallel demand for open-access dissemination of technical notes and working-group guidance [17]. More generalist life-science venues (Genes, Frontiers in Genetics) capture cross-overs in algorithm development and public-attitudes research. The tilt toward journals, and away from conference proceedings, is typical of domains where admissibility, reproducibility, and validation weigh heavily.

Three implications follow; first, validation and transparency remain the most urgent needs. Kling et al. [16] note that most matching engines used by direct-to-consumer services are proprietary; our results show that validation papers and FIGG guidance only start to dominate after 2021, suggesting that standardization is catching up with practice rather than leading it [17]. Given known vulnerabilities in upload-matching ecosystems [33], method validation should extend beyond laboratory accuracy to include adversarial security, provenance logging, and end-to-end error budgets relevant to evidentiary risk. Second, equity and representativeness require sustained attention. Modeling indicates that long-range reidentification is far easier in populations with deep database coverage [6]. The geographic skew of scholarship toward database-rich jurisdictions, together with the persistent prominence of “privacy/consent” in our keyword network, argues for research that couples technical progress with robust, locally legitimate consent models and outreach in under-represented communities [13,15]. Third, the bibliometric centrality of “unidentified human remains” highlights IGG’s humanitarian potential, but the bridge between humanitarian identification and criminal investigation raises distinct ethical and legal issues—especially around familial notification and the scope of search warrants—that merit finer-grained comparative study [10,14].

The study also surfaces gaps. Cross-cluster bridges are thin: method developers rarely co-author with ethics/policy scholars, and vice versa. Stronger team-science models could accelerate consensus on reporting standards (e.g., minimal information for IGG studies), error-rate disclosure, and thresholds for pedigree reconstruction. Our venue analysis shows healthy uptake of open-access journals, but data and code openness lag behind. Shared, de-identified benchmark sets for degraded DNA, kinship distance, and pedigree complexity would allow reproducible comparisons across algorithms and laboratories, paralleling norms long established in STR typing [27]. Finally, the corpus is overwhelmingly Anglophone and Scopus-indexed; as others have warned, terminology drift (IGG/FGG/LRFS; brand names and misspellings) risks both false negatives and false positives in evidence syntheses [10]. Future work should triangulate Scopus with Web of Science and PubMed and maintain living thesauri to stabilize retrieval.

At the same time, it is important to recognize methodological limitations that may have influenced the corpus composition and, by extension, the patterns observed. The dataset is derived from Scopus-indexed records, the analysis reflects the coverage strengths and constraints of that database. In particular, IGG-relevant laws and policies are frequently discussed in law review articles and other legal scholarship that may be indexed inconsistently in Scopus compared with specialist legal databases; consequently, parts of the legal and policy discourse may be under-represented in the corpus even though the present study’s framing and audience are primarily forensic-science and applied-genomics communities. In addition, the reliance on English-language peer-reviewed records and keyword-based retrieval in a domain with persistent terminological variation (e.g., IGG/FGG/LRFS and platform name variants) may produce false negatives despite keyword harmonization procedures [10]. Finally, citation-based indicators tend to privilege older publications and highly visible outlets, meaning that newer contributions may appear less influential despite practical relevance. These limitations are consistent with known database-coverage effects in bibliometric research and should be considered when generalizing beyond the Scopus corpus [24].

Policy-wise, funders and standards bodies should move quickly on three fronts. They should underwrite multi-site validation consortia to evaluate microarray and low-coverage WGS pipelines on realistically degraded samples and to publish error structures salient to distant-kinship inference [16]. They should mandate algorithmic transparency commensurate with forensic use, even when commercial IP is involved, and require that IGG outputs are framed explicitly as investigative leads to be confirmed via conventional methods, consistent with existing policy ([1]; U.S. Department of Justice, 2019). And they should support equitable database governance, balancing investigative utility with privacy protections and community trust [10,13,15].

5.1. Future research avenues and conclusion

The bibliometric portrait shows a field that has grown quickly, diversified thematically, and converged on a tri-pillar intellectual structure; laboratory methods, population-genetic/statistical foundations, and governance/ethics, while remaining geographically concentrated and author-clustered. IGG scholarship has moved from post-2018 proof-of-concept to a maturing, interdisciplinary discourse, but gaps in validation, transparency, and equity persist [6,16,33].

First, multi-site validation and benchmark transparency must catch up with practice; our venue analysis shows that rigorous laboratory papers and open guidance are recent and growing, but still trail operational adoption [17]. Building on the U.S. Department of Justice’s “investigative lead” framing (U.S. Department of Justice, 2019) and the consolidation of FIGG working groups, future work should create longitudinal, multi-laboratory test beds for IGG pipelines that span microarray and low-coverage WGS, include realistically degraded samples, and report end-to-end error budgets relevant to distant-kinship inference [16]. A community standard akin to minimal information for IGG studies could specify inputs (sample quality metrics), outputs (IBD segment statistics, pedigree hypotheses with uncertainty), and confirmation pathways (STR/Y-STR re-testing), enabling reproducible comparisons across algorithms and labs [1].

Second, algorithmic transparency and system security require targeted research; our results highlight heavy reliance on a small number of upload-enabled databases and proprietary matching engines; both realities complicate reproducibility and legal scrutiny [10,16]. Future work should: (i) define auditable interfaces for closed-source engines (e.g., signed provenance logs, explainable match rationales), (ii) conduct adversarial evaluations to quantify risks from IBS/IBD manipulation and genotype-extraction attacks [33], and (iii) test defense-in-depth controls (rate-limiting, cryptographic proofs of upload origin, anomaly detection). These directions would align tool design with courtroom expectations for transparency and reliability.

Third, representation and fairness need systematic measurement, not just acknowledgment. Modeling shows that long-range reidentification success scales with the depth and ancestry composition of searchable databases [6]. Our geographic skew, dominated by database-rich jurisdictions, echoes that dependency. Research should therefore develop coverage-aware performance metrics (e.g., probability of informative match by ancestry and geography), expand imputation panels and validation cohorts beyond European-ancestry samples, and study consent models and trust in under-represented communities using mixed methods [15]. Humanitarian uses, particularly unidentified human remains identifications, offer an avenue for equitable benefits but raise distinct familial-notification and warrant-scope questions requiring comparative legal analysis ([12]; U.S. Department of Justice, 2019).

Fourth, bridge the author-cluster silos with team-science designs, the co-authorship map shows limited cross-cluster ties: method developers rarely publish with policy/ethics scholars, and operational genealogists seldom co-author with statisticians. Competitive challenge problems (e.g., blinded pedigree reconstruction on consented or synthetic cases), co-authored registered reports, and shared, de-identified benchmark sets would incentivize collaboration while improving external validity. Given that our keyword network places governance terms alongside core technical lexemes, such teams are essential to embed privacy-by-design principles—data minimization, selective disclosure, and explicit uncertainty communication—into pipelines from the outset [10,13].

Fifth, codify practice through living standards and practitioner training, the concentration of influence in a few venues suggests the community is ready to converge on living standards that iterate with evidence [17]. Priorities include thresholding for distant-match utility, pedigree-search stopping rules, and reporting templates that distinguish clearly between investigative leads and identifications [1,14]. Parallel investment in open-access training—drawing on the rapid-publication ecosystems of Synergy and allied journals—would aid agencies in low-resource settings and mitigate the geographic imbalances seen in our corpus.

Because database terms, privacy, and consent now appear as central nodes in the keyword map, comparative policy research should accompany every technical trajectory. Studies should evaluate warrant practices, terms-of-service changes, and public-attitudes dynamics across jurisdictions to determine when investigative value justifies intrusion and how to sustain legitimacy over time [10,13,15].

IGG has matured from post-2018 demonstrations to a structured, multidisciplinary field whose center of gravity sits at the intersection of SNP assay reliability, genealogical reasoning, and governance. The bibliometric signals; the surge in output, the three-cluster intellectual base, the dominance of forensic-science journals, and the centrality of SNPs and genealogy in the lexicon, converge on a clear agenda: validate at scale, make algorithms auditable and secure, measure and mitigate representational inequities, and build cross-domain teams that translate advances into accountable practice. Pursued together, these avenues would align technical innovation with the “investigative lead” role envisioned by policy [14] and the ethical commitments articulated by scholars and practitioners [10,13,15], enabling IGG to deliver investigative effectiveness without sacrificing transparency, fairness, or public trust.

CRediT authorship contribution statement

Alfonso Pellegrino: Writing – original draft, Visualization, Software, Methodology, Formal analysis, Data curation. Alessandro Stasi: Writing – review & editing, Validation, Project administration, Conceptualization.

Present/permanent address notes

None.

Declaration of generative AI

During the preparation of this work, the authors used Gemini in order to improve readability and language. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

None.

Contributor Information

Alfonso Pellegrino, Email: ap4273@columbia.edu.

Alessandro Stasi, Email: alessandro.sta@mahidol.edu.

Abbreviations:

CODIS: Combined DNA Index System
DTC: Direct-to-Consumer
DVI: Disaster Victim Identification
FGG: Forensic Genetic Genealogy
FIGG: Forensic Investigative Genetic Genealogy
IBD: Identity by Descent
IBS: Identity by State
IGG: Investigative Genetic Genealogy
ISFG: International Society for Forensic Genetics
ISHI: International Symposium on Human Identification
LRFS: Long-Range Familial Searching
NTVIC: National Technology Validation and Implementation Collaborative
SNP: Single-Nucleotide Polymorphism
STR: Short Tandem Repeat
SWGDAM: Scientific Working Group on DNA Analysis Methods
UHR: Unidentified Human Remains
WGS: Whole-Genome Sequencing

References

1.Greytak E.M., Moore C., Armentrout S.L. Genetic genealogy for cold case and active investigations. Forensic Sci. Int. 2019;299:103–113. doi: 10.1016/j.forsciint.2019.03.039. [DOI] [PubMed] [Google Scholar]
2.Dowdeswell T.L. Data sovereignty & forensic investigative genetic genealogy (FIGG): a path forward for humanitarian & mass graves investigations. Int. J. Foren. Sci. 2023;8(2):1–7. doi: 10.23880/ijfsc-16000300. [DOI] [Google Scholar]
3.Katsanis S.H. Pedigrees and perpetrators: uses of DNA and genealogy in forensic investigations. Annu. Rev. Genom. Hum. Genet. 2020;21(1):535–564. doi: 10.1146/annurev-genom-111819-084213. [DOI] [PubMed] [Google Scholar]
4.ISOGG Autosomal DNA testing comparison chart. ISOGG Wiki. 2025, December 6 https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Retrieved January 21, 2026, from. [Google Scholar]
5.Larkin L. The DNA Geek; 2021, May 27. AncestryDNA Surpasses 20 Million.https://thednageek.com/ancestrydna-surpasses-20-million/ [Google Scholar]
6.Erlich Y., Shor T., Pe’er I., Carmi S. Identity inference of genomic data using long-range familial searches. Science. 2018;362(6415):690–694. doi: 10.1126/science.aau4832. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Greytak E.M., Moore C., Armentrout S.L. eLetter on “Identity inference of genomic data using long-range familial searches” [eLetter] Science. 2018, October 29 doi: 10.1126/science.aau4832. https://www.science.org/doi/10.1126/science.aau4832 Retrieved January 19, 2026, from. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Guerrini C.J., Robinson J.O., Petersen D., McGuire A.L. Should police have access to genetic genealogy databases? Capturing the golden state killer and other criminals using a controversial new forensic technique. PLoS Biol. 2018;16(10) doi: 10.1371/journal.pbio.2006906. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Huston S., Madden D., Villanes A., Reed N., Bash Brooks W., Healey C., Guerrini C. Insights from social media into public perspectives on investigative genetic genealogy. Front. Genet. 2025;15 doi: 10.3389/fgene.2024.1482831. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Tuazon O.M., Wickenheiser R.A., Ansell R., Custers B. Law enforcement use of genetic genealogy databases in criminal investigations: nomenclature, definition and scope. Forensic Sci. Int.: Synergy. 2024;8 doi: 10.1016/j.fsisyn.2024.100460. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Dowdeswell T.L. Forensic genetic genealogy: a profile of cases solved. Forensic Sci. Int.: Genetics. 2022;58 doi: 10.1016/j.fsigen.2022.102679. [DOI] [PubMed] [Google Scholar]
12.Greytak E.M., Wyatt S., Cady J., Moore C., Armentrout S. Investigative genetic genealogy for human remains identification. J. Forensic Sci. 2024;69(5):1531–1545. doi: 10.1111/1556-4029.15469. [DOI] [PubMed] [Google Scholar]
13.Guerrini C.J., Wickenheiser R.A., Bettinger B., McGuire A.L., Fullerton S.M. Four misconceptions about investigative genetic genealogy. J. Law Biosci. 2021;8(1) doi: 10.1093/jlb/lsab001. lsab001. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.U.S. Department of Justice . Author; Washington, DC: 2019. Interim Policy: Forensic Genetic Genealogical DNA Analysis and Searching.https://www.justice.gov/olp/page/file/1204386/download [Google Scholar]
15.Guerrini C.J., Bash Brooks W., Robinson J.O., Fullerton S.M., Zoorob E., McGuire A.L. IGG in the trenches: results of an in-depth interview study on the practice, politics, and future of investigative genetic genealogy. Forensic Sci. Int. 2024;356 doi: 10.1016/j.forsciint.2024.111946. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kling D., Phillips C., Kennett D., Tillmar A. Investigative genetic genealogy: current methods, knowledge and practice. Forensic Sci. Int.: Genetics. 2021;52 doi: 10.1016/j.fsigen.2021.102474. [DOI] [PubMed] [Google Scholar]
17.Gamette M.J., Wickenheiser R.A. Establishment of the national technology validation and implementation collaborative (NTVIC) and forensic investigative genetic genealogy technology validation working group (FIGG-TVWG) Forensic Sci. Int.: Synergy. 2023;6 doi: 10.1016/j.fsisyn.2023.100317. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Russell D.A., Gorden E.M., Peck M.A., Neal C.M., Heaton M.C. Developmental validation of the illumina infinium assay using the global screening array on the iScan system for use in forensic laboratories. Forens. Genom. 2023;3(1):15–24. doi: 10.1089/forensic.2022.0013. [DOI] [Google Scholar]
19.Cady J., Greytak E.M. Whole-genome sequencing of degraded DNA for investigative genetic genealogy. Forensic Sci. Int.: Genet. Supplement Ser. 2022;8:20–22. doi: 10.1016/j.fsigss.2022.09.008. [DOI] [Google Scholar]
20.Donthu N., Kumar S., Mukherjee D., Pandey N., Lim W.M. How to conduct a bibliometric analysis: an overview and guidelines. J. Bus. Res. 2021;133:285–296. doi: 10.1016/j.jbusres.2021.04.070. [DOI] [Google Scholar]
21.Zupic I., Čater T. Bibliometric methods in management and organization. Organ. Res. Methods. 2015;18(3):429–472. doi: 10.1177/1094428114562629. [DOI] [Google Scholar]
22.Van Eck N.J., Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 2010;84(2):523–538. doi: 10.1007/s11192-009-0146-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Van Eck N.J., Waltman L. In: Measuring Scholarly Impact: Methods and Practice. Ding Y., Rousseau R., Wolfram D., editors. Springer; 2014. Visualizing bibliometric networks; pp. 285–320. [DOI] [Google Scholar]
24.Mongeon P., Paul-Hus A. The journal coverage of web of science and scopus: a comparative analysis. Scientometrics. 2016;106(1):213–228. doi: 10.1007/s11192-015-1765-5. [DOI] [Google Scholar]
25.Page M.J., McKenzie J.E., Bossuyt P.M., Boutron I., Hoffmann T.C., Mulrow C.D.…Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Aria M., Cuccurullo C. Bibliometrix: an R-tool for comprehensive science mapping analysis. J. Informetr. 2017;11(4):959–975. doi: 10.1016/j.joi.2017.08.007. [DOI] [Google Scholar]
27.Butler J.M. second ed. Academic Press; 2006. Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers. [Google Scholar]
28.Kayser M., de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat. Rev. Genet. 2011;12(3):179–192. doi: 10.1038/nrg2952. [DOI] [PubMed] [Google Scholar]
29.Jeffreys A.J., Wilson V., Thein S.L. Hypervariable ‘minisatellite’ regions in human DNA. Nature. 1985;314(6006):67–73. doi: 10.1038/314067a0. [DOI] [PubMed] [Google Scholar]
30.Browning B.L., Browning S.R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 2011;88(2):173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Murphy E. Law and policy oversight of familial searches in recreational genealogy databases. Forensic Sci. Int. 2018;292:e5–e9. doi: 10.1016/j.forsciint.2018.08.027. [DOI] [PubMed] [Google Scholar]
33.Edge M.D., Coop G. Attacks on genetic privacy via uploads to genealogical databases. eLife. 2020;9 doi: 10.7554/eLife.51810. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] 1.Greytak E.M., Moore C., Armentrout S.L. Genetic genealogy for cold case and active investigations. Forensic Sci. Int. 2019;299:103–113. doi: 10.1016/j.forsciint.2019.03.039. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Dowdeswell T.L. Data sovereignty & forensic investigative genetic genealogy (FIGG): a path forward for humanitarian & mass graves investigations. Int. J. Foren. Sci. 2023;8(2):1–7. doi: 10.23880/ijfsc-16000300. [DOI] [Google Scholar]

[bib3] 3.Katsanis S.H. Pedigrees and perpetrators: uses of DNA and genealogy in forensic investigations. Annu. Rev. Genom. Hum. Genet. 2020;21(1):535–564. doi: 10.1146/annurev-genom-111819-084213. [DOI] [PubMed] [Google Scholar]

[bib4] 4.ISOGG Autosomal DNA testing comparison chart. ISOGG Wiki. 2025, December 6 https://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Retrieved January 21, 2026, from. [Google Scholar]

[bib5] 5.Larkin L. The DNA Geek; 2021, May 27. AncestryDNA Surpasses 20 Million.https://thednageek.com/ancestrydna-surpasses-20-million/ [Google Scholar]

[bib6] 6.Erlich Y., Shor T., Pe’er I., Carmi S. Identity inference of genomic data using long-range familial searches. Science. 2018;362(6415):690–694. doi: 10.1126/science.aau4832. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Greytak E.M., Moore C., Armentrout S.L. eLetter on “Identity inference of genomic data using long-range familial searches” [eLetter] Science. 2018, October 29 doi: 10.1126/science.aau4832. https://www.science.org/doi/10.1126/science.aau4832 Retrieved January 19, 2026, from. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Guerrini C.J., Robinson J.O., Petersen D., McGuire A.L. Should police have access to genetic genealogy databases? Capturing the golden state killer and other criminals using a controversial new forensic technique. PLoS Biol. 2018;16(10) doi: 10.1371/journal.pbio.2006906. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Huston S., Madden D., Villanes A., Reed N., Bash Brooks W., Healey C., Guerrini C. Insights from social media into public perspectives on investigative genetic genealogy. Front. Genet. 2025;15 doi: 10.3389/fgene.2024.1482831. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Tuazon O.M., Wickenheiser R.A., Ansell R., Custers B. Law enforcement use of genetic genealogy databases in criminal investigations: nomenclature, definition and scope. Forensic Sci. Int.: Synergy. 2024;8 doi: 10.1016/j.fsisyn.2024.100460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Dowdeswell T.L. Forensic genetic genealogy: a profile of cases solved. Forensic Sci. Int.: Genetics. 2022;58 doi: 10.1016/j.fsigen.2022.102679. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Greytak E.M., Wyatt S., Cady J., Moore C., Armentrout S. Investigative genetic genealogy for human remains identification. J. Forensic Sci. 2024;69(5):1531–1545. doi: 10.1111/1556-4029.15469. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Guerrini C.J., Wickenheiser R.A., Bettinger B., McGuire A.L., Fullerton S.M. Four misconceptions about investigative genetic genealogy. J. Law Biosci. 2021;8(1) doi: 10.1093/jlb/lsab001. lsab001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.U.S. Department of Justice . Author; Washington, DC: 2019. Interim Policy: Forensic Genetic Genealogical DNA Analysis and Searching.https://www.justice.gov/olp/page/file/1204386/download [Google Scholar]

[bib15] 15.Guerrini C.J., Bash Brooks W., Robinson J.O., Fullerton S.M., Zoorob E., McGuire A.L. IGG in the trenches: results of an in-depth interview study on the practice, politics, and future of investigative genetic genealogy. Forensic Sci. Int. 2024;356 doi: 10.1016/j.forsciint.2024.111946. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Kling D., Phillips C., Kennett D., Tillmar A. Investigative genetic genealogy: current methods, knowledge and practice. Forensic Sci. Int.: Genetics. 2021;52 doi: 10.1016/j.fsigen.2021.102474. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Gamette M.J., Wickenheiser R.A. Establishment of the national technology validation and implementation collaborative (NTVIC) and forensic investigative genetic genealogy technology validation working group (FIGG-TVWG) Forensic Sci. Int.: Synergy. 2023;6 doi: 10.1016/j.fsisyn.2023.100317. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Russell D.A., Gorden E.M., Peck M.A., Neal C.M., Heaton M.C. Developmental validation of the illumina infinium assay using the global screening array on the iScan system for use in forensic laboratories. Forens. Genom. 2023;3(1):15–24. doi: 10.1089/forensic.2022.0013. [DOI] [Google Scholar]

[bib19] 19.Cady J., Greytak E.M. Whole-genome sequencing of degraded DNA for investigative genetic genealogy. Forensic Sci. Int.: Genet. Supplement Ser. 2022;8:20–22. doi: 10.1016/j.fsigss.2022.09.008. [DOI] [Google Scholar]

[bib20] 20.Donthu N., Kumar S., Mukherjee D., Pandey N., Lim W.M. How to conduct a bibliometric analysis: an overview and guidelines. J. Bus. Res. 2021;133:285–296. doi: 10.1016/j.jbusres.2021.04.070. [DOI] [Google Scholar]

[bib21] 21.Zupic I., Čater T. Bibliometric methods in management and organization. Organ. Res. Methods. 2015;18(3):429–472. doi: 10.1177/1094428114562629. [DOI] [Google Scholar]

[bib22] 22.Van Eck N.J., Waltman L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics. 2010;84(2):523–538. doi: 10.1007/s11192-009-0146-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Van Eck N.J., Waltman L. In: Measuring Scholarly Impact: Methods and Practice. Ding Y., Rousseau R., Wolfram D., editors. Springer; 2014. Visualizing bibliometric networks; pp. 285–320. [DOI] [Google Scholar]

[bib24] 24.Mongeon P., Paul-Hus A. The journal coverage of web of science and scopus: a comparative analysis. Scientometrics. 2016;106(1):213–228. doi: 10.1007/s11192-015-1765-5. [DOI] [Google Scholar]

[bib25] 25.Page M.J., McKenzie J.E., Bossuyt P.M., Boutron I., Hoffmann T.C., Mulrow C.D.…Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26.Aria M., Cuccurullo C. Bibliometrix: an R-tool for comprehensive science mapping analysis. J. Informetr. 2017;11(4):959–975. doi: 10.1016/j.joi.2017.08.007. [DOI] [Google Scholar]

[bib27] 27.Butler J.M. second ed. Academic Press; 2006. Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers. [Google Scholar]

[bib28] 28.Kayser M., de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat. Rev. Genet. 2011;12(3):179–192. doi: 10.1038/nrg2952. [DOI] [PubMed] [Google Scholar]

[bib29] 29.Jeffreys A.J., Wilson V., Thein S.L. Hypervariable ‘minisatellite’ regions in human DNA. Nature. 1985;314(6006):67–73. doi: 10.1038/314067a0. [DOI] [PubMed] [Google Scholar]

[bib30] 30.Browning B.L., Browning S.R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 2011;88(2):173–182. doi: 10.1016/j.ajhg.2011.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 31.Manichaikul A., Mychaleckyj J.C., Rich S.S., Daly K., Sale M., Chen W.M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32.Murphy E. Law and policy oversight of familial searches in recreational genealogy databases. Forensic Sci. Int. 2018;292:e5–e9. doi: 10.1016/j.forsciint.2018.08.027. [DOI] [PubMed] [Google Scholar]

[bib33] 33.Edge M.D., Coop G. Attacks on genetic privacy via uploads to genealogical databases. eLife. 2020;9 doi: 10.7554/eLife.51810. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A bibliometric analysis of investigative genetic genealogy in academic literature: Trends, networks, and emerging themes

Alfonso Pellegrino

Alessandro Stasi

Abstract

Highlights

1. Introduction

2. Literature review

3. Research methodology

3.1. Bibliometric approaches

3.2. Data collection

4. Results

4.1. Volume, growth trajectory, and geographic dispersion

Fig. 1.

Fig. 2.

Fig. 3.

4.2. Influential authors

Table 1.

4.3. Intellectual structure revealed by author co-citation

Fig. 4.

4.4. Journals and venues shaping the discourse

Table 2.

4.5. Thematic concentration and evolution

Fig. 5.

Table 3.

5. Discussion

5.1. Future research avenues and conclusion

CRediT authorship contribution statement

Present/permanent address notes

Declaration of generative AI

Funding

Declaration of competing interest

Acknowledgements

Contributor Information

Abbreviations:

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases