Abstract
The Kolbroek pig is indigenous to South Africa and a breed of choice for smallholder farmers. This is mainly due to its characteristics, such as disease resistance and adaptability to tropical agroecological environments. Despite these desirable traits, the genomic architecture of this breed has not been explored. In this study, we report a high-quality genome assembly of the South African Kolbroek pig sequenced at 31 X coverage through a combination of PacBio Sequel IIe HiFi and Illumina Novaseq 6000 Omni-C sequencing. The assembled genome resulted in a length of 2.6 Gb in size, including 83 Scaffolds, which consist of 19 chromosome-size scaffolds with 138.7 Mb. The BUSCO completeness at 95.5%. Genome annotation and structure prediction identified 22,025 genes with protein-coding potential. The genome provides an opportunity to investigate genetic variation across multiple pig breeds and serves as a genetic resource to develop breeding programs for the conservation and improvement of the Kolbroek pig.
Subject terms: Genome, Agricultural genetics
Background and Summary
South Africa is a net importer of pork, importing mainly from other Southern African countries. Commercial breeders dominate the market with a total of 1,450,713 pigs compared to 893,262 for smallholder breeders (BFAP, 2020; DALRRD, 2021). Non-commercial farmers mainly use indigenous breeds, which are characterised by hardiness and tolerance to harsh local environmental conditions1. The Kolbroek is one of these commonly used indigenous pig breeds in South Africa (Fig. 1), classified under Sus scrofa domesticus (Historically Sus indica)2. This breed is generally used in smallholder farming systems where it is considered easy to maintain, as it requires low inputs. As with most indigenous breeds, it is suited to the local environment and climate. The Kolbroek pig is characterised by small size and low litter sizes, and therefore exotic breeds are more preferred in the commercial sector. As a result, there have been limited efforts to investigate the genomic architecture of the Kolbroek pig. Previous studies have demonstrated significant genetic diversity among communal indigenous pig breed populations found in South Africa and Zimbabwe, with higher levels of heterozygosity (0.61–0.75) than commercial breeds (0.35–0.6)1. To demonstrate some level of distinctness and conserved genetic structure of the Kolbroek, Hlongwane et al.3, reported levels of heterozygosity (He = 0.339) and fixation index (Fst = 0.468) between Warthog and Kolbroek. However, the authors further show that South African indigenous breeds such as Kolbroek are more inbred and, therefore, would benefit from better resources to enable their conservation.
Fig. 1.
A picture of the Kolbroek Pig.
Economically important traits in pig breeding include fertility, growth, and meat quality. Exotic breeds tend to be preferred over indigenous breeds for commercial pig production purposes4. Regions under natural selection in the Kolbroek genome, such as in chromosome 5, where there are genes associated with motor control, temperature control, control of inflammation, and cell growth, have been identified4. Identification of other useful regions could contribute towards the improvement of these traits. Among other valuable traits, Kolbroek pigs have proven ability to utilise fibrous feeds and tannin-rich red sorghum. They have high parasite tolerance and are suitable for sustainable and organic farmers seeking low input requirements1,5–7. Research focused on improving traits of economic importance at the genomic level in indigenous pig breeds will be crucial for improving the Kolbroek as a viable commercial option.
Attempts are usually made to harness the desirable qualities of both exotic and indigenous breeds through crossbreeding8. This practice unfortunately leads to the dilution of indigenous breeds and, if not monitored, can eventually lead to the extinction of the breed. To guard against this, there is a need to generate high-quality reference genomes that can be used to develop conservation programs and breeding objectives to improve indigenous breeds. Having a high-quality assembly will enable further investigation of traits such as disease resistance, heat tolerance, adaptability, and the ability to utilize poor-quality forage, as well as facilitate accurate genomic selection. The high-quality genome will assist in accurately identifying quantitative trait loci that are associated with economically important traits and assist in inferring sites of genetic variation4.
Obtaining reference genome assemblies allows geographically representative samples to be incorporated in pangenome graphs, which improves sequence mapping and thus enables single-nucleotide polymorphism (SNP) arrays that are better suited to a particular region. Previous studies have used Illumina Porcine SNP60K and the improved SNP80K, which were developed using commercial breeds and are therefore prone to ascertainment bias when used in indigenous pig breeds3. Currently, a major challenge is that genotyping services are still largely unaffordable for farmers who rely on the indigenous breeds that are resilient due to their resistance to local diseases and adaptation to local environmental conditions4. The currently available reference genome of Duroc (Sscrofa11.1) is not representative of South African indigenous pig breeds due to selection pressure and population structure9.
In this study, we generated a de novo sequenced assembly of a Kolbroek pig, which is a breed that is valued in South Africa by non-commercial farmers. This was done using a combination of 89 Gb of high-fidelity (HiFi) sequence reads sequenced at 31 X coverage, and 85 Gb of proximity ligated chromatin, (Omni-C) sequence. Workflow developed and curated by the Vertebrate Genome Project, implemented on the Galaxy Bioinformatics platform (Europe), was used. We used a female animal to generate an assembly with a genome size of 2.6 Gb, and contig N50 size of 48.5 Mb. After scaffolding, the assembly was mapped to the 18 autosomes and the X chromosome. The assembly has 18 telomeric regions identified on the chromosomal scaffolds. Repeat elements made up 38.07% of the assembly. The number of protein-coding genes was 22,025. This genome is the first of its kind for the Kolboek breed, and will serve as a resource for the production of specialised SNP panels and comparative genomic analysis against other breeds. This resource will help to identify genetic variants causing variation in traits of economic importance, which can improve the breed’s value to commercial and non-commercial farmers.
Methods
Ethics statement
The procedures for animal handling, sample collection, and all research-related activities were approved by the University of South Africa, Animal Research Ethics Committee (AREC-100818-024), and the Department of Agriculture, Land Reform, and Rural Development (DALRRD) under section 20 of the Animal Diseases Act 1984 (Act 35 of 84) (12/11/1/1/23 (6508 AC)).
Sample collection and sequencing
A mature pure-breed sow was selected from a Kolbroek pig stud farmer located in the Northwest Province of South Africa. Blood was collected using an EDTA vacutainer vial by a veterinarian, which was immediately placed on dry ice and transported to the laboratory to be stored at −80 °C until processing. Genomic DNA was extracted from 200 uL of whole blood using the Nanobind protocol for high molecular weight (HMW) DNA extraction. The HiFi sequencing library was prepared using the SMRTBell® prep kit 3.0 (Pacific Biosciences) following the manufacturer’s instructions and run on the A sequencing library on the Pacific Biosciences (PacBio) Sequel IIe. The Omni-C library preparation was performed from the same sample using the Dovetail Omni-C Kit (Dovetail Genomics), following the manufacturer’s instructions. The resulting Omni-C library was sequenced on the NovaSeq 6000 instrument (Illumina). Initial sequence quality control processing was performed on the SMRTlink Software v11.0 (Pacific Biosciences) as well as FastQC v0.12.110.
Genome assembly
The genome assembly was carried out through a Vertebrate Genome Project workflow11 on the Galaxy platform11. These workflows involve a series of steps that incorporate a selection of bioinformatic tools listed in (Supplementary Table 1). Key steps are to characterize the genome using GenomeScope2 v2.0.1 + galaxy012. In this step, HiFi reads13 are used to estimate factors such as genome size, heterozygosity, and homozygosity. This step is followed by genome assembly using Hifiasm v0.19.9 + galaxy014. Hifiasm was performed in Hi-C mode to generate two haplotypes as the kolbroek pig is diploid. These two haplotypes possess structural variants from the paternal and maternal lines. Both assemblies underwent scaffolding with Omni-C reads13, making use of the assemblers, BWA-MEM2 v2.2.1 + galaxy115 and YAHS v1.2a.216. Traces of mitochondrial DNA and other foreign sequences, which may have been incorporated in the assembly, were removed through assembly decontamination, using Kraken 2 v2.1.3 + galaxy117. The best phased assembly was retained for further assembly. This was followed by manual curation using the PretextView software18. K-mer analysis showed an estimated genome size to be 2.6 Gbp (Fig. 2). The scaffold N50 was 138.7 Mb, and the contig N50 was 48.5 Mb. Both the clean spectra (CN) plot and the Assembly spectrum (ASM) in supplementary Fig. 1 display a bimodal distribution with the two peaks at ~15 and ~31-fold coverage. The number of contigs for the assembled genomes was 83 scaffolds each. A graphical representation of the assembly statistics for the primary assembly is presented in Supplementary Fig. 2.
Fig. 2.

The Genome scope profile generated from HiFi reads. The figure describes the estimated genome size (len), the heterozygosity (ab), and homozygosity (aa), the number of duplicates (dup), the user-specified K-mer size (k), reads that contained errors (err), and the ploidy (p).
Genome annotation
The transcriptome was not available for support annotation, and thus an ab initio approach was used. The genome assembly was soft-masked using RepeatMasker v 4.1.5 + galaxy019 to mask the repetitive regions in the genome. The repeat elements were then characterised through the modelling software, RepeatModeler v0.0920, through which RepeatScout v1.0.621, RECON v1.5.021, and TRF v4.0922 identified repeat elements that were then annotated using the one_code_to_find_them_all Perl script23. The masked assembly was then used for de novo gene annotation using TIBERius v1.1.424, which uses a deep-learning ab initio gene structure tool for prediction. Structural annotations were determined through filtering the gtf files25 generated through Tiberius and counting the number of predicted elements.
Data Records
The completed genome assembly and raw data for the Kolbroek pig were submitted to the National Centre for Biotechnology Information (NCBI). The GenBank accession number: is GCA_055447695.1 available under the Bioproject: PRJNA1227266. Raw reads are available: HIFI reads (SRR32967040), Omni-C reads (SRR32967041), and the Annotation results: (10.6084/m9.figshare.28754990).
Technical Validation
The genome assembly was evaluated for completeness, contiguity, correctness, as well as comparisons with related assemblies. Initially, the sequences were quality-checked using FastQC v0.12.110. The assemblies were screened using Merqury, which compares K-mers for de novo assembly by ensuring accuracy at the base-level26. Mequry also generates a database file, which is used as an input for genome size estimation through Genomescope 2 v2.0.1 + galaxy012 and27 The Merqury CN plots were generated to check the completeness of the assembly by counting the k-mers. At each stage of the assembly of contigs, purging duplicates, and generating the scaffolds, BUSCO completeness was scored using BUSCO v5.8.0 + galaxy127. Here, we compared the assembly with the lineage, Cetartiodactyla, which was included in V5 lineage datasets (Fig. 3). Also, Gfastats v1.3.11 + galaxy028 was used to generate statistics at each point of the assembly process, and the final assembly quality statistics were visualized using a snail plot generated through BlobToolKit v 4.0.7 + galaxy229 (Supplementary Fig. 2). The assembly statistics of the Kolbroek genome as compared to other Sus Scrofa genomes are provided in Table 1. The genome size and synteny of the Kolbroek are comparable to the NCBI reference genome, Sscrofa11.130. Collinearity was assessed using the reference (Fig. 4). We identified structural variants which are supported through chromatin conformation capture that will need to be confirmed through further analysis. In terms of telomeric regions, 22 were identified (Supplementary Table 2) using seqtk v1.5 + galaxy031. Of these, 18 regions are part of the chromosomal scaffolds, and 5 were in the unassembled portion. Repeat elements were masked and characterised (Table 2). Genes were predicted through annotation, and a total of 22,025 genes with a combined coding DNA sequence (CDS) length of 34.23 Mb were identified. CDS length ranged from 201 base pairs at the least to 112,653 base pairs at the largest, with an average of 1,554.3 base pairs. The number of RNA elements, such as small RNA, were 1,235,085. The protein-coding genes ranged from 66 to 37,550 amino acids, and an average protein length of 517.1 amino acids; the corresponding protein sequences obtained from these CDS sequences resulted in 11,388,922 amino acids. Structural features extracted from the annotation file are presented and compared to the reference genome in Table 3. The BUSCO analysis performed to predict gene sequences showed a high-quality annotated genome with 91.5% completeness, of which 0.5% were duplicated but complete, and 91.0% completeness. As a comparison, the reference genome has 98.5% for the presence of single and duplicate BUSCOs.
Fig. 3.

The BUSCO analysis of the primary assembly at the scaffold level. Where the dark and light blue represent the complete single copy and duplicates, respectively. Yellow represents fragmented genes, and red represents what is missing. The assembly was compared to the Cetartiodactyla included in the BUCSO v5 lineage datasets.
Table 1.
Benchmarking the assembly statistics of the Kolbroek assembly against three public assemblies.
| Statistics | Reference Sscrofa11.130 | Chenghua33,34 | Ningxiang35,36 | Kolbroek (Current)32 |
|---|---|---|---|---|
| Genome size (Gb) | 2.5 | 2.7 | 2.4 | 2.6 |
| Total ungapped length | 2.5 | 2.7 | 2.4 | 2.5 |
| Gaps between scaffolds | 93 | nd | nd | 77 |
| Number of chromosomes | 20 | 20 | 19* | 19* |
| Number of scaffolds | 705 | 98 | 120 | 77 |
| Scaffold N50 (Mb) | 88.2 | 141.8 | 139 | 138.65 |
| Scaffold L50 | 9 | 8 | 7 | 8 |
| Number of contigs | 1117 | 274 | 421 | 154 |
| Contig N50 (Mb) | 48.2 | 84.7 | 26.1 | 48.5 |
| Contig L50 | 15 | 11 | 29 | 17 |
| GC percent | 42 | 42.5 | 42 | 42 |
| Genome coverage | 65.0 X | 43.0 X | 42.7 X | 31.0X |
nd – no data.
*The Y chromosome was not available since the sample was obtained from a female.
Fig. 4.
Collinearity plot showing the scaffolds that match with the reference genome, Sscrofa11.1, and the Kolbroek assembly generated in D-genies37.
Table 2.
Summary of repeat elements identified in the genome of the Kolbroek pig.
| Repeat element | Number of elements* | Length occupied (Mbp) | Percentage of sequence (%) |
|---|---|---|---|
| Retroelements | 2,681,508 | 822.9 | 32.43 |
| SINEs: | 1,387,359 | 333.6 | 13.15 |
| LINEs: | 934,163 | 408.2 | 16.09 |
| CRE/SLACS | 0 | 0.0 | 0.00 |
| L2/CR1/Rex | 103,292 | 15.3 | 0.60 |
| RTE/Bov-B | 1378 | 0.2 | 0.01 |
| L1/CIN4 | 829,493 | 392.8 | 15.48 |
| LTR elements: | 359,986 | 81.1 | 3.19 |
| Gypsy/DIRS1 | 0 | 0.0 | 0.00 |
| Retroviral | 343,932 | 79.3 | 3.13 |
| DNA transposons | 207,140 | 33.3 | 1.31 |
| hobo-Activator | 131,861 | 20.6 | 0.81 |
| Tc1-IS630-Pogo | 74,686 | 12.5 | 0.49 |
| MULE-MuDR | 340 | 0.1 | 0.00 |
| PiggyBac | 253 | 0.1 | 0.00 |
| Unclassified: | 34,114 | 69.9 | 2.75 |
| Total interspersed repeats: | 926.1 | 36.49 | |
| Small RNA: | 1,235,085 | 313.9 | 12.37 |
| Satellites: | 214 | 0.6 | 0.02 |
| Simple repeats: | 677,017 | 34.2 | 1.35 |
*Most repeats fragmented by insertions or deletions have been counted as one element. Runs of >=20 X/Ns in query were excluded in % calculations.
Table 3.
Structural Annotations that are observed in the KOLB assembly, which are compared to the reference genome.
| Metric | KOLB (Current)32 | Ref Sscrofa11.130 |
|---|---|---|
| Number of genes | 22,025 | 27,304 |
| Mean gene length (bp) | 31,404.9 | 508,92.6 |
| Number of exons per gene | 8.69371 | 35.5897 |
| Mean exon length (bp) | 178.781 | 331.245 |
| Total genome length (Mb) | 2,109.31 | 13,260.3 |
| Genes with introns (%) | 81.91 | 0 |
| Mean intron length (bp) | 3,879.84 | 0 |
| Total intron length (bp) | 657.459 | 0 |
| Average transcript length (bp) | 31,404.9 | 72,301 |
Supplementary information
Acknowledgements
We acknowledge funding and support from the University of South Africa for funding In addition, we acknowledge the Vertebrate Genome Project for their support when executing the pipeline. We would also like to thank the Staff at Inqaba Biotech for completing the library preparation of the collected samples and as well as completing the HiFi Sequencing. We would like to thank the staff at the University of Stellenbosch, where the Omni-C data was produced. We would like to acknowledge Prof Jasper Rees, Dr Sikhumbuzo Mbizeni, and Dr Thivhilaheli Richard Netshirovha for their assistance during sample collection and analyses. We would also like to that Prof Cuthbert Banga for his assistance with critical reading. We would like to acknowledge Galaxy Europe for hosting the data and supplying computing resources. This article forms part of the objectives for the Africa Biogenome Project. A special thanks to the team in Mapholi Labs for the resources and for managing the project.
Author contributions
Conceptualization: R.M.S., L.T.N., N.O.M., S.M., A.D.; Data Curation: R.M.S., A.H.M.; Formal Analysis: R.M.S., L.T.N., T.T., S.M., A.H.M.; Funding Acquisition: N.O.M., T.M.; Investigation: R.M.S.; Methodology: R.M.S., L.T.N., S.M., A.H.; Project Administration: N.O.M., T.M.; Resources: L.T.N., S.M., Software: R.M.S., A.H.M., T.T.; Validation: A.H.M., R.M.S., L.T.N., S.M., T.T.; Visualisation: R.M.S.; Writing of the original draft: R.M.S., A.H.M.; Reviewing and editing of the manuscript: All authors.
Code availability
Genetic analysis was performed on the Galaxy Europe platform (https://usegalaxy.eu) and the workflow that was used is on the Vertebrate Genome Project (VGP) pipeline (https://galaxyproject.org/projects/vgp/workflows/). The tools and their versions associated with the VGP assembly pipeline are listed in Supplementary Table 1. Moreover, additional analyses that were performed are also listed. This pipeline is under development; thus, the versions that were used are specified. Where incompatibility issues were encountered, alternate versions were used, which provided options for the required input data. The VGP group updates its pipelines to incorporate newer versions of software or resolve dependency issues.
Data availability
The BioProject is PRJNA1227266. The SRA data may be found via SRP57632513 and the GenBank accession for Sus scrofa is GCA_055447695.132 / JBLUWV000000000; BioSample SAMN46977218. The annotation files are available at Figshare25.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Rae Marvin Smith, Email: smithrm@unisa.ac.za.
Ntanganedzeni Olivia Mapholi, Email: maphon@unisa.ac.za.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-026-07002-y.
References
- 1.Halimani, T. E., Dzama, K., Chimonyo, M. & Muchadeyi, F. C. Some insights into the phenotypic and genetic diversity of indigenous pigs in southern Africa. South African Journal of Animal Science42, 507–510 (2012). [Google Scholar]
- 2.Nicholas, G. Kolbroek – the unique local breed. Farmer’s Weekly (1999).
- 3.Hlongwane, N. L., Hadebe, K., Soma, P., Dzomba, E. F. & Muchadeyi, F. C. Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa. Frontiers in genetics11, 344 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hlongwane, N. L. et al. Identification of Signatures of Positive Selection That Have Shaped the Genomic Landscape of South African Pig Populations. Animals (Basel)14, 236 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ramsay, K. R. Sustainable housing: Indicators and implications (2002).
- 6.Chimonyo, M. & Dzama, K. Estimation of genetic parameters for growth performance and carcass traits in Mukota pigs. Animal (Cambridge, England)1, 317–323 (2007). [DOI] [PubMed] [Google Scholar]
- 7.Halimani, T. E., Muchadeyi, F. C., Chimonyo, M. & Dzama, K. Pig genetic resource conservation: The Southern African perspective. Ecological economics69, 944–951 (2010). [Google Scholar]
- 8.Mathobela, R. M., Molotsi, A. H., Marufu, M. C., Strydom, P. E. & Mapiye, C. Transitioning opportunities for sub-Saharan Africa’s small-scale urban pig farming towards a sustainable circular bioeconomy. International journal of agricultural sustainability22 (2024).
- 9.Wy, S. et al. Chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data11, 840–8 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data.
- 11.Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature biotechnology42, 367–370 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun11, 1432–1432 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Smith, R. M. et al. Kolbroek HIFI and Omni-C. https://identifiers.org/ncbi/insdc.sra:SRP576325 (2025).
- 14.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems, IEEE, May 2019).
- 16.Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics39 (2023). [DOI] [PMC free article] [PubMed]
- 17.Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biology20, 1 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Harry, E. PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps.
- 19.Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics21, 351 (2005). [DOI] [PubMed] [Google Scholar]
- 20.Brown, T. et al. Genome Annotation and Other Post-Assembly Workflows for the Tree of Life.
- 21.Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Research12, 1269–1276 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bailly-Bechet, M., Haudry, A. & Lerat, E. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mobile DNA5, 13–13 (2014). [Google Scholar]
- 24.Gabriel, L., Becker, F., Hoff, K. J. & Stanke, M. Tiberius: end-to-end deep learning with an HMM for gene prediction. Bioinformatics (Oxford, England)40 (2024). [DOI] [PMC free article] [PubMed]
- 25.Kolbroek Annotation Files, 10.6084/m9.figshare.28754990 (2025).
- 26.Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology21, 1–27 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics38, 4214–4216 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Challis, R., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3: genes - genomes - genetics10, 1361–1374 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sscrofa11.1: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_000003025.6 (2017).
- 31.OHalloran, D. M. fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data. BMC Res Notes10, 275–4 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Smith, R. M. et al. Kolbroek Assembly. GenBankhttp://identifiers.org/insdc.gca:GCA_055447695.1 (2026).
- 33.Wang, Y. et al. A chromosome-level genome of Chenghua pig provides new insights into the domestication and local adaptation of pigs. Int. J. Biol. Macromol.270, 131796 (2024). [DOI] [PubMed] [Google Scholar]
- 34.Chenghua: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_037447515.1 (2024).
- 35.Ma, H. et al. Long‐read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Molecular ecology resources22, 1508–1520 (2022). [DOI] [PubMed] [Google Scholar]
- 36.Ningxiang: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_020567905.1 (2021).
- 37.Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ6, e4958–e4958 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Smith, R. M. et al. Kolbroek Assembly. GenBankhttp://identifiers.org/insdc.gca:GCA_055447695.1 (2026).
Supplementary Materials
Data Availability Statement
Genetic analysis was performed on the Galaxy Europe platform (https://usegalaxy.eu) and the workflow that was used is on the Vertebrate Genome Project (VGP) pipeline (https://galaxyproject.org/projects/vgp/workflows/). The tools and their versions associated with the VGP assembly pipeline are listed in Supplementary Table 1. Moreover, additional analyses that were performed are also listed. This pipeline is under development; thus, the versions that were used are specified. Where incompatibility issues were encountered, alternate versions were used, which provided options for the required input data. The VGP group updates its pipelines to incorporate newer versions of software or resolve dependency issues.
The BioProject is PRJNA1227266. The SRA data may be found via SRP57632513 and the GenBank accession for Sus scrofa is GCA_055447695.132 / JBLUWV000000000; BioSample SAMN46977218. The annotation files are available at Figshare25.


