A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus

Rae Marvin Smith; Annelin Henriehetta Molotsi; Lucky Tendani Nesengani; Thendo Stanley Tshilate; Sinebongo Mdyogolo; Nompilo Lucia Hlongwane; Tracy Madimabi Masebe; Appolinaire Djikeng; Ntanganedzeni Olivia Mapholi

doi:10.1038/s41597-026-07002-y

. 2026 Mar 9;13:635. doi: 10.1038/s41597-026-07002-y

A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus

Rae Marvin Smith ^1,^✉, Annelin Henriehetta Molotsi ², Lucky Tendani Nesengani ², Thendo Stanley Tshilate ², Sinebongo Mdyogolo ¹, Nompilo Lucia Hlongwane ², Tracy Madimabi Masebe ¹, Appolinaire Djikeng ^2,³, Ntanganedzeni Olivia Mapholi ^1,^2,^✉

PMCID: PMC13100151 PMID: 41803164

Abstract

The Kolbroek pig is indigenous to South Africa and a breed of choice for smallholder farmers. This is mainly due to its characteristics, such as disease resistance and adaptability to tropical agroecological environments. Despite these desirable traits, the genomic architecture of this breed has not been explored. In this study, we report a high-quality genome assembly of the South African Kolbroek pig sequenced at 31 X coverage through a combination of PacBio Sequel IIe HiFi and Illumina Novaseq 6000 Omni-C sequencing. The assembled genome resulted in a length of 2.6 Gb in size, including 83 Scaffolds, which consist of 19 chromosome-size scaffolds with 138.7 Mb. The BUSCO completeness at 95.5%. Genome annotation and structure prediction identified 22,025 genes with protein-coding potential. The genome provides an opportunity to investigate genetic variation across multiple pig breeds and serves as a genetic resource to develop breeding programs for the conservation and improvement of the Kolbroek pig.

Subject terms: Genome, Agricultural genetics

Background and Summary

South Africa is a net importer of pork, importing mainly from other Southern African countries. Commercial breeders dominate the market with a total of 1,450,713 pigs compared to 893,262 for smallholder breeders (BFAP, 2020; DALRRD, 2021). Non-commercial farmers mainly use indigenous breeds, which are characterised by hardiness and tolerance to harsh local environmental conditions¹. The Kolbroek is one of these commonly used indigenous pig breeds in South Africa (Fig. 1), classified under Sus scrofa domesticus (Historically Sus indica)². This breed is generally used in smallholder farming systems where it is considered easy to maintain, as it requires low inputs. As with most indigenous breeds, it is suited to the local environment and climate. The Kolbroek pig is characterised by small size and low litter sizes, and therefore exotic breeds are more preferred in the commercial sector. As a result, there have been limited efforts to investigate the genomic architecture of the Kolbroek pig. Previous studies have demonstrated significant genetic diversity among communal indigenous pig breed populations found in South Africa and Zimbabwe, with higher levels of heterozygosity (0.61–0.75) than commercial breeds (0.35–0.6)¹. To demonstrate some level of distinctness and conserved genetic structure of the Kolbroek, Hlongwane et al.³, reported levels of heterozygosity (He = 0.339) and fixation index (Fst = 0.468) between Warthog and Kolbroek. However, the authors further show that South African indigenous breeds such as Kolbroek are more inbred and, therefore, would benefit from better resources to enable their conservation.

Economically important traits in pig breeding include fertility, growth, and meat quality. Exotic breeds tend to be preferred over indigenous breeds for commercial pig production purposes⁴. Regions under natural selection in the Kolbroek genome, such as in chromosome 5, where there are genes associated with motor control, temperature control, control of inflammation, and cell growth, have been identified⁴. Identification of other useful regions could contribute towards the improvement of these traits. Among other valuable traits, Kolbroek pigs have proven ability to utilise fibrous feeds and tannin-rich red sorghum. They have high parasite tolerance and are suitable for sustainable and organic farmers seeking low input requirements^1,5–7. Research focused on improving traits of economic importance at the genomic level in indigenous pig breeds will be crucial for improving the Kolbroek as a viable commercial option.

Attempts are usually made to harness the desirable qualities of both exotic and indigenous breeds through crossbreeding⁸. This practice unfortunately leads to the dilution of indigenous breeds and, if not monitored, can eventually lead to the extinction of the breed. To guard against this, there is a need to generate high-quality reference genomes that can be used to develop conservation programs and breeding objectives to improve indigenous breeds. Having a high-quality assembly will enable further investigation of traits such as disease resistance, heat tolerance, adaptability, and the ability to utilize poor-quality forage, as well as facilitate accurate genomic selection. The high-quality genome will assist in accurately identifying quantitative trait loci that are associated with economically important traits and assist in inferring sites of genetic variation⁴.

Obtaining reference genome assemblies allows geographically representative samples to be incorporated in pangenome graphs, which improves sequence mapping and thus enables single-nucleotide polymorphism (SNP) arrays that are better suited to a particular region. Previous studies have used Illumina Porcine SNP60K and the improved SNP80K, which were developed using commercial breeds and are therefore prone to ascertainment bias when used in indigenous pig breeds³. Currently, a major challenge is that genotyping services are still largely unaffordable for farmers who rely on the indigenous breeds that are resilient due to their resistance to local diseases and adaptation to local environmental conditions⁴. The currently available reference genome of Duroc (Sscrofa11.1) is not representative of South African indigenous pig breeds due to selection pressure and population structure⁹.

In this study, we generated a de novo sequenced assembly of a Kolbroek pig, which is a breed that is valued in South Africa by non-commercial farmers. This was done using a combination of 89 Gb of high-fidelity (HiFi) sequence reads sequenced at 31 X coverage, and 85 Gb of proximity ligated chromatin, (Omni-C) sequence. Workflow developed and curated by the Vertebrate Genome Project, implemented on the Galaxy Bioinformatics platform (Europe), was used. We used a female animal to generate an assembly with a genome size of 2.6 Gb, and contig N50 size of 48.5 Mb. After scaffolding, the assembly was mapped to the 18 autosomes and the X chromosome. The assembly has 18 telomeric regions identified on the chromosomal scaffolds. Repeat elements made up 38.07% of the assembly. The number of protein-coding genes was 22,025. This genome is the first of its kind for the Kolboek breed, and will serve as a resource for the production of specialised SNP panels and comparative genomic analysis against other breeds. This resource will help to identify genetic variants causing variation in traits of economic importance, which can improve the breed’s value to commercial and non-commercial farmers.

Methods

Ethics statement

The procedures for animal handling, sample collection, and all research-related activities were approved by the University of South Africa, Animal Research Ethics Committee (AREC-100818-024), and the Department of Agriculture, Land Reform, and Rural Development (DALRRD) under section 20 of the Animal Diseases Act 1984 (Act 35 of 84) (12/11/1/1/23 (6508 AC)).

Sample collection and sequencing

A mature pure-breed sow was selected from a Kolbroek pig stud farmer located in the Northwest Province of South Africa. Blood was collected using an EDTA vacutainer vial by a veterinarian, which was immediately placed on dry ice and transported to the laboratory to be stored at −80 °C until processing. Genomic DNA was extracted from 200 uL of whole blood using the Nanobind protocol for high molecular weight (HMW) DNA extraction. The HiFi sequencing library was prepared using the SMRTBell® prep kit 3.0 (Pacific Biosciences) following the manufacturer’s instructions and run on the A sequencing library on the Pacific Biosciences (PacBio) Sequel IIe. The Omni-C library preparation was performed from the same sample using the Dovetail Omni-C Kit (Dovetail Genomics), following the manufacturer’s instructions. The resulting Omni-C library was sequenced on the NovaSeq 6000 instrument (Illumina). Initial sequence quality control processing was performed on the SMRTlink Software v11.0 (Pacific Biosciences) as well as FastQC v0.12.1¹⁰.

Genome assembly

The genome assembly was carried out through a Vertebrate Genome Project workflow¹¹ on the Galaxy platform¹¹. These workflows involve a series of steps that incorporate a selection of bioinformatic tools listed in (Supplementary Table 1). Key steps are to characterize the genome using GenomeScope2 v2.0.1 + galaxy0¹². In this step, HiFi reads¹³ are used to estimate factors such as genome size, heterozygosity, and homozygosity. This step is followed by genome assembly using Hifiasm v0.19.9 + galaxy0¹⁴. Hifiasm was performed in Hi-C mode to generate two haplotypes as the kolbroek pig is diploid. These two haplotypes possess structural variants from the paternal and maternal lines. Both assemblies underwent scaffolding with Omni-C reads¹³, making use of the assemblers, BWA-MEM2 v2.2.1 + galaxy1¹⁵ and YAHS v1.2a.2¹⁶. Traces of mitochondrial DNA and other foreign sequences, which may have been incorporated in the assembly, were removed through assembly decontamination, using Kraken 2 v2.1.3 + galaxy1¹⁷. The best phased assembly was retained for further assembly. This was followed by manual curation using the PretextView software¹⁸. K-mer analysis showed an estimated genome size to be 2.6 Gbp (Fig. 2). The scaffold N50 was 138.7 Mb, and the contig N50 was 48.5 Mb. Both the clean spectra (CN) plot and the Assembly spectrum (ASM) in supplementary Fig. 1 display a bimodal distribution with the two peaks at ~15 and ~31-fold coverage. The number of contigs for the assembled genomes was 83 scaffolds each. A graphical representation of the assembly statistics for the primary assembly is presented in Supplementary Fig. 2.

Fig. 2 — The Genome scope profile generated from HiFi reads. The figure describes the estimated genome size (len), the heterozygosity (ab), and homozygosity (aa), the number of duplicates (dup), the user-specified K-mer size (k), reads that contained errors (err), and the ploidy (p).

Genome annotation

The transcriptome was not available for support annotation, and thus an ab initio approach was used. The genome assembly was soft-masked using RepeatMasker v 4.1.5 + galaxy0¹⁹ to mask the repetitive regions in the genome. The repeat elements were then characterised through the modelling software, RepeatModeler v0.09²⁰, through which RepeatScout v1.0.6²¹, RECON v1.5.0²¹, and TRF v4.09²² identified repeat elements that were then annotated using the one_code_to_find_them_all Perl script²³. The masked assembly was then used for de novo gene annotation using TIBERius v1.1.4²⁴, which uses a deep-learning ab initio gene structure tool for prediction. Structural annotations were determined through filtering the gtf files²⁵ generated through Tiberius and counting the number of predicted elements.

Data Records

The completed genome assembly and raw data for the Kolbroek pig were submitted to the National Centre for Biotechnology Information (NCBI). The GenBank accession number: is GCA_055447695.1 available under the Bioproject: PRJNA1227266. Raw reads are available: HIFI reads (SRR32967040), Omni-C reads (SRR32967041), and the Annotation results: (10.6084/m9.figshare.28754990).

Technical Validation

The genome assembly was evaluated for completeness, contiguity, correctness, as well as comparisons with related assemblies. Initially, the sequences were quality-checked using FastQC v0.12.1¹⁰. The assemblies were screened using Merqury, which compares K-mers for de novo assembly by ensuring accuracy at the base-level²⁶. Mequry also generates a database file, which is used as an input for genome size estimation through Genomescope 2 v2.0.1 + galaxy0¹² and²⁷ The Merqury CN plots were generated to check the completeness of the assembly by counting the k-mers. At each stage of the assembly of contigs, purging duplicates, and generating the scaffolds, BUSCO completeness was scored using BUSCO v5.8.0 + galaxy1²⁷. Here, we compared the assembly with the lineage, Cetartiodactyla, which was included in V5 lineage datasets (Fig. 3). Also, Gfastats v1.3.11 + galaxy0²⁸ was used to generate statistics at each point of the assembly process, and the final assembly quality statistics were visualized using a snail plot generated through BlobToolKit v 4.0.7 + galaxy2²⁹ (Supplementary Fig. 2). The assembly statistics of the Kolbroek genome as compared to other Sus Scrofa genomes are provided in Table 1. The genome size and synteny of the Kolbroek are comparable to the NCBI reference genome, Sscrofa11.1³⁰. Collinearity was assessed using the reference (Fig. 4). We identified structural variants which are supported through chromatin conformation capture that will need to be confirmed through further analysis. In terms of telomeric regions, 22 were identified (Supplementary Table 2) using seqtk v1.5 + galaxy0³¹. Of these, 18 regions are part of the chromosomal scaffolds, and 5 were in the unassembled portion. Repeat elements were masked and characterised (Table 2). Genes were predicted through annotation, and a total of 22,025 genes with a combined coding DNA sequence (CDS) length of 34.23 Mb were identified. CDS length ranged from 201 base pairs at the least to 112,653 base pairs at the largest, with an average of 1,554.3 base pairs. The number of RNA elements, such as small RNA, were 1,235,085. The protein-coding genes ranged from 66 to 37,550 amino acids, and an average protein length of 517.1 amino acids; the corresponding protein sequences obtained from these CDS sequences resulted in 11,388,922 amino acids. Structural features extracted from the annotation file are presented and compared to the reference genome in Table 3. The BUSCO analysis performed to predict gene sequences showed a high-quality annotated genome with 91.5% completeness, of which 0.5% were duplicated but complete, and 91.0% completeness. As a comparison, the reference genome has 98.5% for the presence of single and duplicate BUSCOs.

Fig. 3 — The BUSCO analysis of the primary assembly at the scaffold level. Where the dark and light blue represent the complete single copy and duplicates, respectively. Yellow represents fragmented genes, and red represents what is missing. The assembly was compared to the Cetartiodactyla included in the BUCSO v5 lineage datasets.

Table 1.

Benchmarking the assembly statistics of the Kolbroek assembly against three public assemblies.

Statistics	Reference Sscrofa11.1³⁰	Chenghua^33,34	Ningxiang^35,36	Kolbroek (Current)³²
Genome size (Gb)	2.5	2.7	2.4	2.6
Total ungapped length	2.5	2.7	2.4	2.5
Gaps between scaffolds	93	nd	nd	77
Number of chromosomes	20	20	19*	19*
Number of scaffolds	705	98	120	77
Scaffold N50 (Mb)	88.2	141.8	139	138.65
Scaffold L50	9	8	7	8
Number of contigs	1117	274	421	154
Contig N50 (Mb)	48.2	84.7	26.1	48.5
Contig L50	15	11	29	17
GC percent	42	42.5	42	42
Genome coverage	65.0 X	43.0 X	42.7 X	31.0X

Open in a new tab

nd – no data.

*The Y chromosome was not available since the sample was obtained from a female.

Fig. 4 — Collinearity plot showing the scaffolds that match with the reference genome, Sscrofa11.1, and the Kolbroek assembly generated in D-genies³⁷.

Table 2.

Summary of repeat elements identified in the genome of the Kolbroek pig.

Repeat element	Number of elements*	Length occupied (Mbp)	Percentage of sequence (%)
Retroelements	2,681,508	822.9	32.43
SINEs:	1,387,359	333.6	13.15
LINEs:	934,163	408.2	16.09
CRE/SLACS	0	0.0	0.00
L2/CR1/Rex	103,292	15.3	0.60
RTE/Bov-B	1378	0.2	0.01
L1/CIN4	829,493	392.8	15.48
LTR elements:	359,986	81.1	3.19
Gypsy/DIRS1	0	0.0	0.00
Retroviral	343,932	79.3	3.13
DNA transposons	207,140	33.3	1.31
hobo-Activator	131,861	20.6	0.81
Tc1-IS630-Pogo	74,686	12.5	0.49
MULE-MuDR	340	0.1	0.00
PiggyBac	253	0.1	0.00
Unclassified:	34,114	69.9	2.75
Total interspersed repeats:		926.1	36.49
Small RNA:	1,235,085	313.9	12.37
Satellites:	214	0.6	0.02
Simple repeats:	677,017	34.2	1.35

Open in a new tab

^*Most repeats fragmented by insertions or deletions have been counted as one element. Runs of >=20 X/Ns in query were excluded in % calculations.

Table 3.

Structural Annotations that are observed in the KOLB assembly, which are compared to the reference genome.

Metric	KOLB (Current)³²	Ref Sscrofa11.1³⁰
Number of genes	22,025	27,304
Mean gene length (bp)	31,404.9	508,92.6
Number of exons per gene	8.69371	35.5897
Mean exon length (bp)	178.781	331.245
Total genome length (Mb)	2,109.31	13,260.3
Genes with introns (%)	81.91	0
Mean intron length (bp)	3,879.84	0
Total intron length (bp)	657.459	0
Average transcript length (bp)	31,404.9	72,301

Open in a new tab

Supplementary information

Supplimentary information^{(2.3MB, docx)}

Acknowledgements

We acknowledge funding and support from the University of South Africa for funding In addition, we acknowledge the Vertebrate Genome Project for their support when executing the pipeline. We would also like to thank the Staff at Inqaba Biotech for completing the library preparation of the collected samples and as well as completing the HiFi Sequencing. We would like to thank the staff at the University of Stellenbosch, where the Omni-C data was produced. We would like to acknowledge Prof Jasper Rees, Dr Sikhumbuzo Mbizeni, and Dr Thivhilaheli Richard Netshirovha for their assistance during sample collection and analyses. We would also like to that Prof Cuthbert Banga for his assistance with critical reading. We would like to acknowledge Galaxy Europe for hosting the data and supplying computing resources. This article forms part of the objectives for the Africa Biogenome Project. A special thanks to the team in Mapholi Labs for the resources and for managing the project.

Author contributions

Conceptualization: R.M.S., L.T.N., N.O.M., S.M., A.D.; Data Curation: R.M.S., A.H.M.; Formal Analysis: R.M.S., L.T.N., T.T., S.M., A.H.M.; Funding Acquisition: N.O.M., T.M.; Investigation: R.M.S.; Methodology: R.M.S., L.T.N., S.M., A.H.; Project Administration: N.O.M., T.M.; Resources: L.T.N., S.M., Software: R.M.S., A.H.M., T.T.; Validation: A.H.M., R.M.S., L.T.N., S.M., T.T.; Visualisation: R.M.S.; Writing of the original draft: R.M.S., A.H.M.; Reviewing and editing of the manuscript: All authors.

Code availability

Genetic analysis was performed on the Galaxy Europe platform (https://usegalaxy.eu) and the workflow that was used is on the Vertebrate Genome Project (VGP) pipeline (https://galaxyproject.org/projects/vgp/workflows/). The tools and their versions associated with the VGP assembly pipeline are listed in Supplementary Table 1. Moreover, additional analyses that were performed are also listed. This pipeline is under development; thus, the versions that were used are specified. Where incompatibility issues were encountered, alternate versions were used, which provided options for the required input data. The VGP group updates its pipelines to incorporate newer versions of software or resolve dependency issues.

Data availability

The BioProject is PRJNA1227266. The SRA data may be found via SRP576325¹³ and the GenBank accession for Sus scrofa is GCA_055447695.1³² / JBLUWV000000000; BioSample SAMN46977218. The annotation files are available at Figshare²⁵.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Rae Marvin Smith, Email: smithrm@unisa.ac.za.

Ntanganedzeni Olivia Mapholi, Email: maphon@unisa.ac.za.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-026-07002-y.

References

1.Halimani, T. E., Dzama, K., Chimonyo, M. & Muchadeyi, F. C. Some insights into the phenotypic and genetic diversity of indigenous pigs in southern Africa. South African Journal of Animal Science42, 507–510 (2012). [Google Scholar]
2.Nicholas, G. Kolbroek – the unique local breed. Farmer’s Weekly (1999).
3.Hlongwane, N. L., Hadebe, K., Soma, P., Dzomba, E. F. & Muchadeyi, F. C. Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa. Frontiers in genetics11, 344 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hlongwane, N. L. et al. Identification of Signatures of Positive Selection That Have Shaped the Genomic Landscape of South African Pig Populations. Animals (Basel)14, 236 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Ramsay, K. R. Sustainable housing: Indicators and implications (2002).
6.Chimonyo, M. & Dzama, K. Estimation of genetic parameters for growth performance and carcass traits in Mukota pigs. Animal (Cambridge, England)1, 317–323 (2007). [DOI] [PubMed] [Google Scholar]
7.Halimani, T. E., Muchadeyi, F. C., Chimonyo, M. & Dzama, K. Pig genetic resource conservation: The Southern African perspective. Ecological economics69, 944–951 (2010). [Google Scholar]
8.Mathobela, R. M., Molotsi, A. H., Marufu, M. C., Strydom, P. E. & Mapiye, C. Transitioning opportunities for sub-Saharan Africa’s small-scale urban pig farming towards a sustainable circular bioeconomy. International journal of agricultural sustainability22 (2024).
9.Wy, S. et al. Chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data11, 840–8 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data.
11.Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature biotechnology42, 367–370 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun11, 1432–1432 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Smith, R. M. et al. Kolbroek HIFI and Omni-C. https://identifiers.org/ncbi/insdc.sra:SRP576325 (2025).
14.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems, IEEE, May 2019).
16.Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics39 (2023). [DOI] [PMC free article] [PubMed]
17.Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biology20, 1 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Harry, E. PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps.
19.Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics21, 351 (2005). [DOI] [PubMed] [Google Scholar]
20.Brown, T. et al. Genome Annotation and Other Post-Assembly Workflows for the Tree of Life.
21.Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Research12, 1269–1276 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Bailly-Bechet, M., Haudry, A. & Lerat, E. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mobile DNA5, 13–13 (2014). [Google Scholar]
24.Gabriel, L., Becker, F., Hoff, K. J. & Stanke, M. Tiberius: end-to-end deep learning with an HMM for gene prediction. Bioinformatics (Oxford, England)40 (2024). [DOI] [PMC free article] [PubMed]
25.Kolbroek Annotation Files, 10.6084/m9.figshare.28754990 (2025).
26.Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology21, 1–27 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics38, 4214–4216 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Challis, R., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3: genes - genomes - genetics10, 1361–1374 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Sscrofa11.1: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_000003025.6 (2017).
31.OHalloran, D. M. fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data. BMC Res Notes10, 275–4 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Smith, R. M. et al. Kolbroek Assembly. GenBankhttp://identifiers.org/insdc.gca:GCA_055447695.1 (2026).
33.Wang, Y. et al. A chromosome-level genome of Chenghua pig provides new insights into the domestication and local adaptation of pigs. Int. J. Biol. Macromol.270, 131796 (2024). [DOI] [PubMed] [Google Scholar]
34.Chenghua: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_037447515.1 (2024).
35.Ma, H. et al. Long‐read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Molecular ecology resources22, 1508–1520 (2022). [DOI] [PubMed] [Google Scholar]
36.Ningxiang: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_020567905.1 (2021).
37.Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ6, e4958–e4958 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Smith, R. M. et al. Kolbroek Assembly. GenBankhttp://identifiers.org/insdc.gca:GCA_055447695.1 (2026).

Supplementary Materials

Supplimentary information^{(2.3MB, docx)}

Data Availability Statement

[CR1] 1.Halimani, T. E., Dzama, K., Chimonyo, M. & Muchadeyi, F. C. Some insights into the phenotypic and genetic diversity of indigenous pigs in southern Africa. South African Journal of Animal Science42, 507–510 (2012). [Google Scholar]

[CR2] 2.Nicholas, G. Kolbroek – the unique local breed. Farmer’s Weekly (1999).

[CR3] 3.Hlongwane, N. L., Hadebe, K., Soma, P., Dzomba, E. F. & Muchadeyi, F. C. Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa. Frontiers in genetics11, 344 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Hlongwane, N. L. et al. Identification of Signatures of Positive Selection That Have Shaped the Genomic Landscape of South African Pig Populations. Animals (Basel)14, 236 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Ramsay, K. R. Sustainable housing: Indicators and implications (2002).

[CR6] 6.Chimonyo, M. & Dzama, K. Estimation of genetic parameters for growth performance and carcass traits in Mukota pigs. Animal (Cambridge, England)1, 317–323 (2007). [DOI] [PubMed] [Google Scholar]

[CR7] 7.Halimani, T. E., Muchadeyi, F. C., Chimonyo, M. & Dzama, K. Pig genetic resource conservation: The Southern African perspective. Ecological economics69, 944–951 (2010). [Google Scholar]

[CR8] 8.Mathobela, R. M., Molotsi, A. H., Marufu, M. C., Strydom, P. E. & Mapiye, C. Transitioning opportunities for sub-Saharan Africa’s small-scale urban pig farming towards a sustainable circular bioeconomy. International journal of agricultural sustainability22 (2024).

[CR9] 9.Wy, S. et al. Chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data11, 840–8 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data.

[CR11] 11.Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature biotechnology42, 367–370 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun11, 1432–1432 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Smith, R. M. et al. Kolbroek HIFI and Omni-C. https://identifiers.org/ncbi/insdc.sra:SRP576325 (2025).

[CR14] 14.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems, IEEE, May 2019).

[CR16] 16.Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics39 (2023). [DOI] [PMC free article] [PubMed]

[CR17] 17.Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biology20, 1 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Harry, E. PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps.

[CR19] 19.Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics21, 351 (2005). [DOI] [PubMed] [Google Scholar]

[CR20] 20.Brown, T. et al. Genome Annotation and Other Post-Assembly Workflows for the Tree of Life.

[CR21] 21.Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Research12, 1269–1276 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Bailly-Bechet, M., Haudry, A. & Lerat, E. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mobile DNA5, 13–13 (2014). [Google Scholar]

[CR24] 24.Gabriel, L., Becker, F., Hoff, K. J. & Stanke, M. Tiberius: end-to-end deep learning with an HMM for gene prediction. Bioinformatics (Oxford, England)40 (2024). [DOI] [PMC free article] [PubMed]

[CR25] 25.Kolbroek Annotation Files, 10.6084/m9.figshare.28754990 (2025).

[CR26] 26.Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology21, 1–27 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics38, 4214–4216 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Challis, R., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3: genes - genomes - genetics10, 1361–1374 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Sscrofa11.1: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_000003025.6 (2017).

[CR31] 31.OHalloran, D. M. fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data. BMC Res Notes10, 275–4 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Smith, R. M. et al. Kolbroek Assembly. GenBankhttp://identifiers.org/insdc.gca:GCA_055447695.1 (2026).

[CR33] 33.Wang, Y. et al. A chromosome-level genome of Chenghua pig provides new insights into the domestication and local adaptation of pigs. Int. J. Biol. Macromol.270, 131796 (2024). [DOI] [PubMed] [Google Scholar]

[CR34] 34.Chenghua: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_037447515.1 (2024).

[CR35] 35.Ma, H. et al. Long‐read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Molecular ecology resources22, 1508–1520 (2022). [DOI] [PubMed] [Google Scholar]

[CR36] 36.Ningxiang: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_020567905.1 (2021).

[CR37] 37.Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ6, e4958–e4958 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus

Rae Marvin Smith

Annelin Henriehetta Molotsi

Lucky Tendani Nesengani

Thendo Stanley Tshilate

Sinebongo Mdyogolo

Nompilo Lucia Hlongwane

Tracy Madimabi Masebe

Appolinaire Djikeng

Ntanganedzeni Olivia Mapholi

Abstract

Background and Summary

Fig. 1.

Methods

Ethics statement

Sample collection and sequencing

Genome assembly

Fig. 2.

Genome annotation

Data Records

Technical Validation

Fig. 3.

Table 1.

Fig. 4.

Table 2.

Table 3.

Supplementary information

Acknowledgements

Author contributions

Code availability

Data availability

Competing interests

Footnotes

Contributor Information

Supplementary information

References

Associated Data

Data Citations

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases