Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2022 Mar 14;18(3):e1009776. doi: 10.1371/journal.pgen.1009776

HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities

Lajos Kalmar 1, Srishti Gupta 1, Iain R L Kean 1, Xiaoliang Ba 1, Nazreen Hadjirin 1, Elizabeth M Lay 1, Stefan P W de Vries 1,¤a, Michael Bateman 1,¤b, Harriet Bartlet 1, Juan Hernandez-Garcia 1, Alexander W Tucker 1, Olivier Restif 1, Mark P Stevens 2, James L N Wood 1, Duncan J Maskell 3, Andrew J Grant 1,#, Mark A Holmes 1,*,#
Editor: Xavier Didelot4
PMCID: PMC8947609  PMID: 35286304

Abstract

Shotgun metagenomics is a powerful tool to identify antimicrobial resistance (AMR) genes in microbiomes but has the limitation that extrachromosomal DNA, such as plasmids, cannot be linked with the host bacterial chromosome. Here we present a comprehensive laboratory and bioinformatics pipeline HAM-ART (Hi-C Assisted Metagenomics for Antimicrobial Resistance Tracking) optimised for the generation of metagenome-assembled genomes including both chromosomal and extrachromosomal AMR genes. We demonstrate the performance of the pipeline in a study comparing 100 pig faecal microbiomes from low- and high-antimicrobial use pig farms (organic and conventional farms). We found significant differences in the distribution of AMR genes between low- and high-antimicrobial use farms including a plasmid-borne lincosamide resistance gene exclusive to high-antimicrobial use farms in three species of Lactobacilli. The bioinformatics pipeline code is available at https://github.com/lkalmar/HAM-ART.

Author summary

Antimicrobial resistance (AMR) is one of the biggest global health threats humanity is facing. Understanding the emergence and spread of AMR between different bacterial species is crucial for the development of effective countermeasures. In this paper we describe a user-friendly, affordable and comprehensive (laboratory and bioinformatics) workflow that can identify, associate and track AMR genes in bacteria. We demonstrate the efficiency and reliability of the method by comparing 50 faecal microbiomes from pig farms with high-antibiotic use (conventional farms), and 50 faecal microbiomes from pig farms with low-antibiotic use (organic farms). Our method provides a novel approach to resistance gene tracking, that also leads to the generation of high quality metagenomic assembled genomes that includes genes on mobile genetic elements, such as plasmids, that would not otherwise be included in these assembled genomes.

Introduction

The emergence of resistance to antimicrobials in bacteria can occur by spontaneous mutation or by the acquisition of mobile genetic elements carrying antimicrobial resistance (AMR) genes [1] (for example, plasmids via natural transformation or conjugation, or bacteriophages via transduction [2]). Over the last decade, metagenomic studies have revealed that bacterial communities comprising gut flora or soil microbiota possess a diverse arsenal of AMR genes, termed the resistome [3], some of which can be transferred between related or unrelated species. A limitation of next-generation sequencing metagenomics is the identification of species harbouring a particular AMR gene when that gene is present in extra-chromosomal DNA. Alternative approaches based on traditional culture of bacteria have provided direct experimental evidence of plasmid-mediated AMR gene transfer from enteric pathogens to commensal Escherichia coli in rodents [4,5], chickens [6] and humans [7]. Salmonella-inflicted enteropathy has been shown to elicit parallel blooms of the pathogen and of resident commensal E. coli. These blooms boosted horizontal gene transfer (HGT) in general, and specifically, the transfer of a conjugative colicin-plasmid p2 from an introduced Salmonella enterica serovar Typhimurium to commensal E. coli [8]. It has been shown that the use of in-feed antimicrobials leads to a bloom in AMR genes in the bacteriophage metagenome recovered from treated pigs [9], although it is unclear what the sources or destinations of these genes are. These observations suggest that HGT between pathogenic and commensal bacteria is a common occurrence in humans and animals and is likely to contribute to the persistence and spread of AMR. Moreover, many previous studies on the spread of AMR from animal sources have focused on AMR of pathogens, with less emphasis on genes within indigenous microbiota that may also pass to humans from animals (and vice versa) but be difficult to culture.

To overcome the inability of next-generation metagenomic sequencing to identify where extra-chromosomal genes of interest reside, several chromosome conformation technologies (such as 3C, Hi-C), originally designed for the study of three-dimensional genome structure in eukaryotes, have been used [1012]. These techniques exploit the ability to create artificial connections between strands of co-localised DNA by cutting and re-ligating the strands. The techniques differ in their manner of detection, and the scope of interactions they can probe. Marbouty et al. describe the application of robust statistical methodology to 3C sequence data (meta3C) derived from a river sediment microbiome [12]. Hi-C, a technical improvement on the 3C method has been shown to successfully disambiguate eukaryotes and prokaryotes [11], and to differentiate closely related E. coli strains from microbiomes [10]. Both these techniques offer great potential to define the dynamics of an introduced AMR gene (both chromosomal and extra-chromosomal), in particular the nature and frequency of transfer events, including into microbiota constituents that are not readily detectable by culture in the laboratory. We showcase the performance of an adapted laboratory method using a novel bioinformatic pipeline (HAM-ART), optimised for tracking AMR genes, in a study comparing 100 faecal microbiomes from UK conventional and organic pig farms.

Results

We adapted exisitng laboratory methodology and developed a bioinformatics pipeline (HAM-ART) that: (i) bins bacterial genomes with high reliability; (ii) associates mobile genetic elements to the host genome; and (iii) annotates and associates AMR genes with high specificity and sensitivity. As HAM-ART is built on traditional metagenomics sequencing methodology, combined with Hi-C sequencing from the same bacterial pellet, it could be applied to any complex microbial community. HAM-ART utilises a widely used sequencing platform, Illumina paired-end sequencing, with standard library sizes and affordable amounts of sequencing per sample. The bioinformatics pipeline was designed to be user friendly, and in addition to generating a set of final metagenomics assembled genomes (MAGs) it outputs results tables reporting assembly quality, taxonomy and AMR gene association.

Proof-of-concept study undertaken to validate HAM-ART

The HAM-ART methodology was tested in a study comparing AMR in two groups of farms: 5 organic (OG1-5) pig farms, farming to organic certification standards with low antibiotic use, and 5 conventional (CV1-5) pig farms, with higher antibiotic use. Ten faecal samples were taken from each farm for metagenomic analysis as described in the methods section. The organic farms had lower population corrected use (PCU) of antibiotics (average 3.0 mg/PCU, range 0–9.8 mg/PCU) over the year prior to sampling compared to conventional farms (average 85.7 mg/PCU, range 3.9–170.1 mg/PCU). Similarly, the number of different classes of antibiotics used on each farm ranged from 0–4 for organic farms and 4–9 for conventional farms. The results of metagenomic analyses are described below.

Generation of MAGs using the HAM-ART pipeline

The pipeline produced de novo assemblies using approximately 500k contigs from each faecal sample, coupled with 0.2–3.4M informative binary connections from the Hi-C pairs. A Hi-C connection is informative if it connects two different contigs as opposed to a connection within the same contig. The initial products of HAM-ART are the consensus clusters (CCs); a collection of contigs that are clustered together during the network resolution step, solely based on Hi-C contacts, that approximate to a genome of a constituent bacterial species. The total number of CCs for each sample varied between 6k-54k. Of these, we focussed on CCs comprising >250kb (representing about 1/10th of an average prokaryotic genome) for further pipeline processing. After the splitting and extension of the large CCs (as described in the methods section) the number of MAGs varied between 5 and 131 (mean: 62, median: 60) per faecal sample.

A total of 6184 MAGs were identified from the 100 samples which were distributed into 1555 clades based on pairwise genetic distance, indicating groups of MAGs which were likely to represent the same species or genus. The number of members for each clade varied between 1 and 79 (mean: 3.97, median: 2). All clades were subjected to clade refinement that resulted in 553 clades with at least one MAG over 500kb in size. After the clade refinement we ended up with 6164 best quality MAGs.

Validation of a Hi-C MAG with the matching genome generated from culture of a single isolate

We noted that E. coli were relatively rarely assembled in our samples (4% of samples). One possible explanation was that E. coli were present, but in low abundance. We investigated this by determining the number of reads in the shotgun libraries from each sample that mapped to an E. coli reference genome (Fig 1).

Fig 1. Presence of E. coli DNA in all samples.

Fig 1

Metagenomic shotgun sequencing raw reads from each faecal sample were aligned to a reference E. coli genome (Escherichia coli O157:H7, GCF_000008865.2) by bowtie2 (—fast option) and the number of reads "aligned concordantly exactly 1 time” were extracted from the output log file. Results were plotted in rank order by the number of aligned reads. In samples plotted as red columns (n = 4) E. coli MAGs were successfully assembled using HAM-ART, while those plotted as blue columns were not. Dotted horizontal lines represent the potential threshold range for successful assembly of a MAG (60,000–80,000 reads, representing ~0.2% of the total number of reads for this sample). Repeated analysis using different E. coli reference genomes (including an E. coli cultured and sequenced from a farm included in this study) gave similar results.

This shows that although reads were present in most shotgun libraries, it was only when there were ≥50k E. coli reads that a MAG could be created. This observation suggests that 60-80k reads (representing about 9–12 Mbp), approximating to 2x coverage of an E. coli genome are required in order to generate a MAG. In this study, which generated approximately 35M reads (5.25 x109 bp) for each sample, an individual species would need to represent ~0.2% of the total bacterial community in order to generate a MAG. We believe that the main reason behind the observed sensitivity threshold is the amount of shotgun metagenomics sequencing as we used pooled (and later shared) Hi-C libraries to resolve binning, and the same number of Hi-C contacts were not enough to assemble E. coli MAGs if the genome wasn’t represented with sufficient coverage.

We examined the quality of a single Hi-C MAG by performing conventional bacterial culture and sequencing of an E. coli isolated from the same faecal sample (CV5_05) that generated the E. coli MAG. DNA was extracted and the genome obtained using DNA sequencing and assembly (Illumina MiSeq and Spades). The MiSeq data yielded a 5.7 Mbp assembly (CV05-2_S2) that was identified as ST20, and harboured 8 AMR genes (aadA1, aadA2, blaCFE-1, cmlA1, dfrA12, mdf(A), sul3 and tet(34)). A BLAST comparison of the MiSeq genome with the Hi-C MAG visualised using BRIG is shown in Fig 2.

Fig 2. BLAST comparison of an E. coli MAG with a corresponding E. coli assembly obtained using culture, followed by Illumina MiSeq sequencing and assembly.

Fig 2

The innermost ring shows the MiSeq assembly with contig boundaries indicated by alternate red and blue colouring. The position of the MLST (Multi-locus Sequence Typing) genes (both MLST schemes 1 and 2) are indicated in the second ring. The matching Hi-C MAG’s identity levels are shown in the third ring. The presence of a possible plasmid is illustrated in pale blue in the outermost ring. This ring contains the comparison results for a S. Typhimurium plasmid pSal8934a (NCBI accession number JF274993). This plasmid has 99.6% identity and a query coverage of 79% compared to one of the MiSeq assembly contigs (and to the matching MAG). This plasmid contains the aadA1, aadA2, cmlA1, dfrA12 and sul3 AMR genes.

Taxa composition of the pig microbiomes from conventional and organic farms

The distribution of taxa between conventional and organic farms (Figs 3 and S1) were broadly similar apart from OG3. On all farms the diversity included common intestinal bacterial orders, dominated by Bacteroidales, Lachnospirales, Lactobacillales, Oscillospirales, which is consistent with previous pig faecal microbiome studies [9,1315]. The relative paucity of Enterobacteriaceae and the presence of a substantial number of treponemes appear to be characteristic of the pig faecal microbiome [16,17].

Fig 3. Order-level relative composition of the pig faecal microbiota in the study farms.

Fig 3

The average relative abundance of different orders identified by GTDB-tk in the final assemblies were calculated from 10 samples in each of the 10 farms included in the study. Plots of different taxonomic levels are shown in S1 Fig.

AMR gene distribution in faecal samples from CV and OG farms

We identified 66 different AMR genes (in 36 resistance gene groups, as described in methods–analysis of assembly data) using the ResFinder AMR gene database within our final 6164 MAGs (S2 Table). A comparison of the distribution of AMR genes (Fig 4A) indicates that a greater diversity of AMR genes was found in conventional farms compared to organic farms. AMR gene presence/absence was validated by using read-based approach (ARIBA) on the raw sequencing data to ensure that we did not lose any AMR genes due to sensitivity issues during assembly and binning.

Fig 4. AMR gene distribution in faecal samples from CV and OG farms and the correlation with levels of antimicrobial used.

Fig 4

Panel A: The heatmap shows the number of samples from which MAGs were generated containing different AMR genes, with the intensity of shading ranging from 0/10 samples (white) to 10/10 samples (black). Conventional (CV1 to 5) and organic (OG1 to 5) are labelled using red and blue text respectively. Panel B: A scatter plot of the amounts of antimicrobial used (mg/population corrected unit (PCU)) in the year prior to sampling, against the number of different AMR genes detected for each farm. The orange line indicates the 2020 target set by the Responsible Use of Medicines in Agriculture Alliance for antibiotic use (99 mg/PCU), and the green line represents the average calculated from the 5 CV farms in our study (85.7 mg/PCU). Panel C: A scatter plot showing the number of different antimicrobial types used in the year prior to sampling, against the number of different AMR genes found. Spearman correlation coefficients and significance values were calculated by fitting a linear regression model to the data points in R (line of best fit and 95% confidence intervals are shaded in grey).

Genes encoding proteins potentially able to confer resistance to β-lactams and chloramphenicol were present at greater numbers in samples from conventional farms. A number of genes were present solely in samples from conventional farms (aac, aad, blaCMY, blaMIR, blaOXA, blaTEM, blaZEG, cml, dfrA, erm(T), lnu(A), mdf(A), mph(N), spc, sul, van(GXY)), whereas the gene vat(E) was found solely in one ogranic farm. Comparing the number of different AMR genes found in the faecal microbiomes to the antimicrobial use on each farm (using PCU, and the number of different antimicrobials used), we observed statistically significant correlations (Fig 4B and 4C). The correlation of AMR genes with PCU had an R squared value of 71% and P value of 0.0023; the correlation of AMR genes with the number of different antimicrobials used had an R squared value of 80% and P value of 0.0005. While we have a limited number of farms and amount of metadata available from these pig farms, we performed multiple regression analysis on the dataset by using the formula: Number of AMR genes ~ Antimicrobial usage (mg/PCU) + Number of different types of antibiotics used + Farm size (number of pigs) + Organic/Conventional type. Including multiple predictors only slightly increased the correlation coefficient compared to the simple linear regression models (adjusted R squared value of 88%, p = 0.0035).

Association of lnu(A) gene harbouring plasmid to Lactobacilli species

An analysis of the distribution of resistance genes among their host MAGs revealed that the lincosamide resistance gene, lnu(A), was found in three clades corresponding to Lactobacillus amylovorus, Lactobacillus johnsonii, and Lactobacillus reuteri. All three clades were present in most samples from both organic and conventional farms, however the lnu(A) gene was only present in conventional farms (Figs 4A and 5).

Fig 5. Association of different AMR genes to Lactobacilli species.

Fig 5

Panel A: A distance tree based on sequence comparisons of L. amylovorus, L. johnsonii and L. reuteri assemblies found in farms (coloured branches) together with genomes of the corresponding Lactobacilli species (grey branches inside clades) and other known Lactobacilli species (grey branches outside the clades) from the NCBI RefSeq collection (https://www.ncbi.nlm.nih.gov/refseq/). The blocks of colour adjacent to the branch-tips indicate the farm type and number (light to dark red: CV1 to 5; light to dark blue: OG1 to 5). The outer circle shows the presence of the lnu(A) gene within the MAG (green: present, none: absent). Panel B: A distance tree based on sequence comparisons of all L. reuteri MAGs found in the farm samples showing the presence/absence (filled/empty square) of all the AMR genes found in this clade (black branches: MAGs from farm samples, red branch: L. reuteri reference genome from NCBI RefSeq collection).

The lnu(A) gene was found in all the conventional farms and did not appear to be restricted to a single lineage or species on individual farms. The number of L. amylovorus, L. johnsonii and L. reuteri MAGs obtained from conventional farms were 47, 36 and 42 respectively, and from organic farms were 36, 23, and 25. Examination of the contigs which harboured the lnu(A) gene indicated that an identical 5.6 kb sequence was present in 27/34 lnu(A) positive L. amylovorus, 17/28 lnu(A) positive L. johnsonii and 18/18 lnu(A) positive L. reuteri MAGs. The sequence was often present in a single contig of approximately the same length but with different sequencing origins, suggesting that it was present as a circular DNA molecule, potentially a plasmid. In the lnu(A) positive lactobacilli MAGs that did not appear to contain the entire sequence, the majority (15/18) had a short contig that was identical to part of the putative plasmid sequence. The plasmid nature of the contig is further strengthened by BLAST search on the NCBI plasmid database. We found that this sequence had 99.8% identity with an 884bp section of a plasmid from a L. johnsonii (CP021704) and 83.2% identity with a 1399bp section of a plasmid from a L. amylovorus (CP002560).

Discussion

Conventional shotgun metagenomics sequencing can generate lists of AMR genes and lists of species contained in a microbiome but is not capable of consistently identifying which bacteria carry which plasmid. The application of Hi-C metagenomics in this study demonstrates that this technique can place AMR genes carried on plasmids with their host genomes. The HAM-ART pipeline was tested using a challenging experimental design involving 100 faeces samples from 10 different farms. The results from this study show that it is possible to obtain high resolution, good quality results by performing relatively modest amounts of sequencing on samples of varying quality. While there are other pipelines capable of analysing combined chromosome-capture based assembly and AMR gene association [1823], during the development of HAM-ART we aimed to design a tool capable of coping with large sample numbers, using the most common Illumina based sequencing platform and delivering results from affordable amounts of sequencing depth.

Unsurprisingly this study shows that farms with lower use of antimicrobials (typically organic farms, who are members of an assurance scheme that strongly regulates the amount of antimicrobials to which the animals are exposed) are associated with smaller numbers and lower diversity of AMR genes, as has been shown in previous studies [24,25]. The statistically significant correlation between the amount of antimicrobial used, and the number of different AMR genes detected for each farm clearly demonstrates this relationship and supports this as a driver of AMR. In future studies collection of further metadata (e.g., main food component ratios, animal density on farms, hygiene rating), may contribute to a more accurate model in predicting the emergence of AMR genes and consequently identify the main factors to increase/decrease. The use of Hi-C metagenomics allows a deeper investigation of the relationship between the use of antimicrobials, AMR genes and the bacteria that harbour those genes.

Of note from this study, is the demonstration of a particular AMR gene, lnu(A) that was only found in samples from conventional farms. Within conventional farms we found this gene to be harboured in three different species of Lactobacillus. All three species of Lactobacillus (L. amylovorus, L. Johnsonii and L. reuteri) were also found in organic farms and the distance tree would suggest that similar levels of diversity are present for each species, whether present in an organic or a conventional farm. There is good evidence that the lnu(A) gene is carried on the same plasmid for all three Lactobacillus species, suggesting that any selection pressure selects for the mobile plasmid rather than the host bacteria. The small number of farms, and potential confounders such as geographical bias may have influenced the observed distribution and so this result needs to be confirmed. Nonetheless, the detection of AMR genes, carried on a plasmid, in multiple species without culture could only be performed using chromosome conformation metagenomics techniques such as Hi-C.

A direct assessment of the quality of a Hi-C MAG was afforded by the parallel culture and sequencing of an E. coli isolate from the same sample. The homologous Hi-C MAG contained the same MLST and AMR genes as the assembly obtained from conventional culture and sequencing, including AMR genes likely present on a plasmid.

Taxonomic identification of shotgun metagenome assemblies is widely recognised as problematic. We used GTDB-Tk [26], a method using significantly larger genome set than previous algorithms, but were still not able to resolve the taxonomy of many large clades of interest beyond the class level. The chromosome conformation methodology has the potential to generate better quality MAGs by generating links between contigs to improve binning, assembly or even scaffolding. Greater use of Hi-C metagenomics will enable the production of better-quality MAGs for rare or difficult to culture bacteria. Use of the HAM-ART pipeline should also give a lower likelihood of generating mixed or contaminated MAGs.

We performed further investigations to confirm that the use of pooled Hi-C libraries (2 per farm) did not lead to artefactual assembly of MAGs from all 5 of the shotgun libraries that used the same pooled Hi-C library to identify connection pairs. The use of pooled Hi-C libraries reduced costs considerably in terms of staff time, reagents and cost. It is clear from an examination of the distribution of taxa among the samples that there are numerous examples of clades/taxa which we only found in a single sample from a set of 5 sharing the same Hi-C library. Out of our total of 6164 MAGs we would have expected equal distribution between organic and conventional farms but only 2176 came from organic farms and 3988 from conventional farms. While this may have occurred due to a lower species diversity present in the organic farms, it is likely to be a consequence of the larger number of lower yielding Hi-C libraries generated from the organic farms.

The sensitivity threshold for the creation of a MAG from a species contained within a microbiome using Hi-C metagenomic sequencing will be affected by three things. Firstly, the size of the genome of the species of interest (which is likely to be a minor effect); secondly, the amount of sequencing undertaken; and thirdly, the relative abundance of the species of interest, which is probably the most significant influence. The relative abundance of a particular bacterial species may limit the power of the technique when a species of interest may only be present in low numbers. It is likely that there will be some species harbouring AMR genes of interest that are present below a threshold of 1:500 (that we estimate as our theoretical threshold from the E. coli content comparison). We used ARIBA to independently assemble AMR genes from our short-read sequencing data and did not find any significant discrepancy between the genes assembled with this method and those found in the MAGs. Indicating that where a gene can be assembled, the Hi-C technique is able to place it in a MAG. We are also aware that Louvain algorithm is not perfect and has its limitations that may introduce slight bias in the HAM-ART pipeline. However, this is one of the most widely accepted network resolution methods in 3C and Hi-C metagenomics. With more raw data available from Hi-C metagenomics studies, the benchmarking and implementation of other clustering algorithms is likely to be easier in the future. We were unable to perform a direct comparison of different chromosome conformation metagenomics methodology. It is reasonable to observe that our Hi-C libraries in this study do not appear to encode more 3D signal compared to a classical 3C library. The resource costs of performing 3C are lower and for some research questions this approach will be preferable.

In summary, we successfully adapted an existing laboratory method and implemented a bioinformatics Hi-C metagenomics pipeline, HAM-ART, and used it to address a research question using a set of 100 separate samples. We optimised HAM-ART to deal with mixed MAGs, to exploit reference MAGs from within the experimental data set, and to assign AMR genes to the correct MAGs with maximum sensitivity and specificity. While the pipeline focusses on AMR gene tracking for this study, it could be used on other dedicated gene sets (for example a library of virulence-associated genes) to associate these to the host genome. Moreover, it provides a cost-effective strategy to assess the dynamics of AMR transfer longitudinally following treatment with specific antibiotics or doses and following experimental infection. We validated our assembly quality and AMR gene associations by comparing a MAG to one obtained from a cultured E. coli from the same faecal sample. We have also shown that the method is robust and affordable when processing large number of samples and provide data illustrating the operational characteristics of both the wet laboratory and bioinformatic protocols involved.

Methods

Study population, sampling and data collection

Ethical approval for the sampling and the collection of data was obtained (CR295; University of Cambridge, Department of Veterinary Medicine). All the pig farms sampled were located in southern England and were selected arbitrarily from a list of volunteering farms. The farm descriptors are shown in S1 Table. We sampled five conventional pig farms and five organic pig farms that were members of the Soil Association farm assurance scheme, which stipulates strict controls on the use of antimicrobials. Ten fresh faecal droppings per farm were collected from different groups of fattening pigs aged between 4–20 weeks, transported on ice/cold packs and stored at –80°C within 6 h of collection. Information on the use of antimicrobials in the one-year period prior to sampling was collected by questionnaire informed by the farm records. The annual use of antimicrobials in mg/number of Population Correction Unit (PCU) was calculated by dividing the total amount of each antibiotic used over the course of a year by the total average liveweight of the animals on the farm considering the numbers of pigs and their ages.

Enrichment of the microbial fraction from pig faeces

The microbial fraction from a faecal sample was enriched using an adaptation of a previously described method by Ikeda et al. [27]. Prior to the enrichment process, 0.5 g of faeces was re-suspended in 9 ml of saline and homogenised for 2 min in a Stomacher 80 (Seward) at high power. Debris was removed from the homogenised sample by centrifugation at 500 g for 1 min. The supernatant was then transferred on top of 3.5 ml of sterile 80% (w/v) Histodenz (Sigma) and centrifuged in a Beckman ultracentrifuge using a JLA 16.250 rotor at 10,000 g for 40 min at 4°C. After centrifugation, the layer on top of the insoluble debris was recovered into a new 15 ml tube (Falcon) and centrifuged at 500 g for 1 min to remove debris. The supernatant was moved to a new 15 ml tube (Falcon) and centrifuged at 10,000 g for 20 min at 4°C. The bacterial pellet was washed in 10 ml of TE buffer (Merck) and used for the generation of Hi-C libraries.

Fixation of bacterial cells with formaldehyde

The isolated bacterial fractions from faeces (described in the previous section) were mixed with 2.5% (v/v) formaldehyde (16% methanol-free formaldehyde, Sigma) and incubated at room temperature (RT) for 30 min followed by 30 min at 4°C to facilitate cross-linking of DNA within each bacterial cell. Formaldehyde was quenched with 0.25 M glycine (Merck) for 5 min at RT followed by 15 min at 4°C. Fixed cells were collected by centrifugation (10 mins, 10000 rpm, 4°C) and stored at −80°C until further use. We pooled bacterial pellets of five samples from the same farm to generate Hi-C libraries, thereby obtaining two Hi-C libraries per farm.

Generation of Hi-C libraries

The method for the construction of bacterial Hi-C libraries was adapted from Burton et al. [11]. Briefly, DNA from the fixed cells was isolated by lysing bacterial pellets in lysozyme (Illumina) followed by mechanical disruption using a Precellys Evolution bead beater (Bertin Technologies, France). Isolated chromatin was split into four aliquots and digested for 3 h at 37°C using HpyCH4IV restriction enzyme (New England Biolabs). Restriction fragment overhangs were filled with biotinylated dCTP (Thermo Scientific) and Klenow (New England Biolabs) as described by van Berkum et al. [28]. Biotin labelled digested chromatin was diluted in 8 ml of ligation buffer (New England Biolabs, T4 ligase kit) and proximity ligation was performed at 16°C for 4 h. De-cross-linking was performed at 65°C overnight (o/n) with 250 μg/ml proteinase K (QIAGEN). DNA was recovered upon precipitation with 50% (v/v) isopropanol (Fischer Scientific) in the presence of 5% (v/v) 3M sodium acetate (pH 5.2) (Merck) and then treated with RNase A (QIAGEN). Finally, DNA from each sample was recovered in 50 μl TE buffer (Merck) upon phenol-chloroform (Merck) extraction. For Hi-C libraries, biotin from the un-ligated DNA ends was removed by T4 Polymerase (New England Biolabs). DNA was purified using the Monarch PCR and DNA Clean-up Kit (New England Biolabs).

Generation of Hi-C Illumina sequencing libraries

Illumina sequencing libraries were constructed from purified DNA obtained after Hi-C library preparations using NEBNext Ultra II DNA library prep kit (New England Biolabs). Approximately, 100 ng of DNA of Hi-C libraries was sheared to 400 bp using a Covaris M220 (duty cycle 20%, 200 cycles per burst, peak incident power 50W, treatment time 40 s; Covaris Ltd., UK). Ends of the sheared fragments were repaired, adaptors ligated, and samples were indexed as described in manufacturer’s protocols. Before the indexing, we performed semi-quantitative PCR to determine the optimal cycle range for indexing.

Metagenome sequencing

Metagenomic DNA was isolated from 0.25 g of faeces using Precellys Soil DNA kit (Bertin Technologies, France). Libraries for shotgun metagenome Illumina sequencing were prepared using the NEBNext Ultra II DNA library prep kit (New England Biolabs) upon shearing 250 ng of metagenomic DNA to 400 bp with Covaris M220 (duty cycle 20%, 200 cycles per burst, peak incident power 50W, treatment time 50 s; Covaris Ltd., UK).

Illumina sequencing of shotgun metagenomic and Hi-C libraries

Following DNA library preparation, the library size was determined with a Bioanalyzer 2100 (Agilent), quantified using the Qubit dsDNA BR kit (Thermo Scientific), pooled appropriately, and analysed with the NEBNext library quant kit (New England Biolabs). The pooled library was subjected to 150 bp paired-end sequencing on the HiSeq 4000 platform (Genomics core facility, Li Ka Shing Centre, University of Cambridge–as 4 shotgun libraries per Illumina HiSeq lane, 1 Hi-C library per Illumina HiSeq lane).

Bioinformatics pipeline–pre-processing and de-novo assembly

Next generation sequencing raw data files were pre-processed in different ways according to the sequencing library they were derived from. Shotgun metagenomics sequencing data passed through two filtering / quality control steps: (i) optical and PCR duplication removal by using clumpify.sh script from the BBMap software package (https://sourceforge.net/projects/bbmap/); (ii) removal of read pairs matching with the host genome using Bowtie2 [29] and the pig reference genome (Sscrofa11.1: GCA_000003025.6). As we performed bacterial cell enrichment during the Hi-C library preparation, we only filtered the raw reads for optical and PCR duplications by using the above-mentioned method. Both raw datasets passed through a merging step, where overlapping (at least 30 nucleotide) reads were merged to one single read, using FLASH software [30]. After merging, metagenomic sequencing reads were passed to the assembly step as paired-end (un-merged) or single-end (merged) reads. All Hi-C sequencing reads were processed further by a Perl script that detected the modified restriction site (in our case A|CGT is modified to ACGCGT) and re-fragmented reads accordingly. This step ensured that hybrid DNA fragments were not used in the assembly step. Re-fragmented Hi-C reads were used in the de novo assembly step as single-end reads to increase genome coverages. The pre-processed sequence reads from both libraries were used in de novo metagenomic assembly to build-up contigs from overlapping reads by using metaSPAdes [31]. To avoid the introduction of any biases towards known species, we did not use any reference sequence-based assembly method.

Bioinformatics pipeline–post-processing

Re-fragmented and unmerged Hi-C reads were realigned to the contigs from the assembly by Bowtie2[29] to extract the binary contact information between DNA fragments. The complete list of binary contacts was then transformed to a weighted list and fed into the Louvain algorithm (https://sourceforge.net/projects/louvain/) for 100 iterations of network resolution. Contigs that were clustered together in all 100 iterations were put in the same consensus cluster (CC).

This network resolution method means that a contig can only be assigned to one cluster which may have two unwanted consequences. Firstly, contigs from two or more closely related species may be assigned to the same CC due to sequence homology. The separation of mixed CCs is first addressed using a coverage distribution-based separation algorithm for each CC which splits the CC if the distribution of sequencing coverage was clearly multimodal. In this step we are aiming to extract the core contigs (the longest contig groups with the least variability in coverage) from the mixed CCs, representing potentially most of the genomic DNA. This may easily result in the exclusion of shorter contigs with higher coverage (e.g., multi-copy plasmids) from the CC, but this does not affect the final assembly content. During the next iterative extension step, the potentially connected accessory fragments will be associated again with the CC. The second consequence is that contigs that are shared may not be correctly assigned to all the CCs that should contain copies (e.g., a plasmid possessed by two or more species as a result of HGT). An iterative CC extension step was built into the pipeline at this point to extend clusters based on the Hi-C inter-contig contacts and cautiously identify contigs that should be allocated to multiple CCs. The detailed workflow of the iterative extension step is summarised in S2 Fig.

Final MAGs were annotated for AMR genes using BLAST [32] using the ResFinder database [33] and taxonomically profiled by GTDB-Tk [26]. AMR genes were also identified from the raw metagenomics sequence reads using ARIBA [34] and compared to the MAG assembly AMR associations to identify the absence of any AMR genes in the final MAGs.

A further clade refinement step in the pipeline exploits the availability of data from multiple samples of the same type (e.g., the other faeces samples from the same study).

Bioinformatics pipeline–clade refinement

This part of the pipeline undertakes a new assembly iteration using reference genomes from the previous assembly attempt. The main steps of this process were: (i) performing pairwise sequence comparisons between all MAGs (from all samples) by using MASH [35,36]; (ii) using the UPGMA (unweighted pair group method with arithmetic mean) algorithm on pairwise distance data to form clades of closely related MAGs (distance threshold in UPGMA for clade definition: 0.12); (iii) select an exemplar MAG in the clade to use as a within-clade reference sequence; (iv) use the exemplar reference sequence to extract highly similar contigs (using BLAST [32]) from the original full contig collection of the de novo assembly for each of the other samples; (v) use Hi-C contact data to refine the collection of contigs extracted by reference search and exclude contigs with no Hi-C contact to other contigs within the MAG; (vi) use Hi-C contacts to extend MAGs with AMR gene containing contigs; (vii) perform a final extension on the MAGs (with the same method as used in the post-processing). We found that the most crucial step during the refinement was the selection of the clade exemplar in the clade that potentially had the most complete genome with minimal contamination. After several attempts of using physical parameters (e.g., using the largest, the median size, the most unimodal coverage distribution) we found that mixed MAGs (mixture of more than one closely related genomes) were also selected as exemplars many times. Therefore, instead of using the above mentioned parameters alone or in combination, we used the core single copy gene set of the GTDB-Tk [26] by running the toolkit “identify” module and looking for: (i) the MAG with the highest number of unique single copy genes (maximum completeness); and (ii) the MAG with the highest unique single copy genes / multiple single copy genes ratio (minimum contamination).

Bioinformatics pipeline—analysis of MAG sequence data

A set of custom scripts were written to perform AMR gene searches, undertake taxonomic identification, identify closely related reference genomes, and generate paired distance trees for the clades. AMR gene searching was performed using a local installation of BLAST [32] using the ResFinder database [33]. AMR genes were defined as being present where >60% of the length of the target gene was present with an identity of >80%. For AMR gene grouped analysis: (i) the aminoglycoside modifying enzymes were grouped by the modifying group which was attached (aminoglycoside nucleotidyl transferases were grouped together, as were phosphotransferases, acetyltransferases and adenyltransferases); (ii) due to increasing interest in the role ESBL plays in disease, beta-lactamases were grouped by homology; (iii) gene families which were represented by different alleles were considered one gene type; (iv) the dihydrofolate reductase genes dfrA12 and dfrA14 are considered as dfrA; (v) nitrofuratonin reducing genes were grouped together into the nim group; (vi) sulfonamide resistance genes sul1-sul3 were grouped as sul; (vii) tetracycline resistance genes were grouped by function and sequence homology, with homologous genes combined into groups; and (viii) vancomycin resistance clusters vanGXY and vanG2XY were considered as one group. Taxonomic identification and the search for closely related reference genomes was performed using GTDB-Tk [26]. Pairwise distances between clade member MAGs and other genomes were determined using MASH [35,36], and the distance tree generated by the UPGMA algorithm. The Newick formatted tree files were annotated using iTOL [37]. Summary text files were automatically created for all clade members with taxonomic identifications and AMR gene associations. A summarised output with all MAGs and AMR gene associations was generated together with a filtered version where incomplete MAGs (filtered out by the default settings of GTDB-Tk [26]) were excluded. For a detailed workflow of the bioinformatics pipeline see S3 Fig.

Bioinformatics pipeline—quality control

We created custom scripts to extract quality information from almost every step during the pipeline: (i) ratio of duplicated raw reads (detecting low concentration libraries); (ii) ratio of merged raw read pairs (verifying library sizes); (iii) ratio of merged Hi-C reads without detectable ligation site (pointing to problems with Hi-C library preparation); (iv) number of contigs in the de novo assembly; (v) Hi-C reads alignment ratio; (vi) number and ratio of informative Hi-C read pairs (a Hi-C read pair is informative if it connects two different contigs); (vii) average modularity during the Louvain network resolution; (viii) single copy gene ratios during clade refinement; (ix) final CheckM-like MAG parameters analysed by GTDB-tk [26] (MAG size, contig number, N50, average coverage, GC-content, taxonomy, completeness). We performed traditional metagenomics assembly on a set (n = 11) of randomly selected samples using the MetaWRAP pipeline (default threshold setting, MetaSpades assembler) [38] and compared the result with the HAM-ART output. While we generally got a higher number of final MAGs from the HAM-ART pipeline (average number of final MAGs 29.5 vs 50.1), due to a few potentially lower quality Hi-C libraries, we had higher variation among the HAM-ART final sets (standard deviation of the mean 13.9 vs 41.2). Quality control information of the 100 (metagenomics) + 20 (Hi-C) sequencing libraries is listed in S3 Table. An additional quality control step was performed on all Hi-C sequencing libraries by using the qc3C tool (pmid: 34634030); results from 1M observations for each library are listed in S4 Table.

Statistical analysis

Simple linear regressions were performed using R (ggplot2 and ggpmisc packages). Spearman’s method was used to determine the P value and correlation coefficient.

Code availability

The code for the HAM-ART pipeline was implemented in Perl programming language. The code and instructions are publicly available under the BSD-3-Clause License at https://github.com/lkalmar/HAM-ART.

Supporting information

S1 Table. Characteristics of the farms used in the study.

The conventional or high-antimicrobial use farms are labelled CV_1 to 5 and the organic, or low antimicrobial use farms are labelled OG_1 to 5.

(PDF)

S2 Table. Complete list of MAGs and their AMR gene associations from 100 pig faecal samples.

Columns in the tab-separated table are: Farm and sample identification (e.g. CV3_2 stands for sample 2 from conventional farm 3); Type of the farm (organic / conventional); Clade identifier; Size of the assembly (in kilobases); Number of contigs in the MAG; N50 of the MAG; Weighted mean coverage of the contigs; GC content of the MAG; GTDB-tk taxonomy string; percentage of the multiple sequence alignment (by GTDB-tk) spanned by the genome; The rest of the columns indicate the presence (1) or absence (0) of the particular AMR gene within the MAG.

(TSV)

S3 Table. Basic QC information on all the libraries used in this study.

Empty cells and redundant numbers are the indicators of pooled Hi-C libraries.

(XLSX)

S4 Table. Quality control information on all Hi-C sequencing libraries generated by the qc3C software package.

(XLSX)

S1 Fig. Composition of the microbiota on the studied pig farms in domain, phylum, class, order and family levels.

(EPS)

S2 Fig. Detailed workflow of the iterative CC and Assembly extension step.

Parameter thresholds are continuously adjusted based on the contig composition of the CC / Assembly.

(TIFF)

S3 Fig. Detailed bioinformatics pipeline workflow separated to pre-assembly, post-assembly and clade refinement.

Text is coloured black for descriptions and white for the used software / script background.

(PDF)

Data Availability

The authors confirm that all data underlying the findings are fully available without restriction. Chromosome conformation capture and metagenome sequencing data have been deposited in the European Nucleotide Archive (http://www.ebi.ac.uk/ena) and are available via study accession number PRJEB48382. The complete collection of assembled MAGs is available from the open access Apollo data store at the University of Cambridge (https://doi.org/10.17863/CAM.80312). Underlying numerical data for all graphs and summary statistics is available in the repositories above or in Supporting Information. The code for the HAM-ART pipeline was implemented in Perl programming language. The code and instructions are publicly available under the BSD-3-Clause License at https://github.com/lkalmar/HAM-ART.

Funding Statement

This work was funded by the UK Medical Research Council grant MR/N002660/1 to MAH, AJG, MPS, DJM, JLNW, OR. https://mrc.ukri.org The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Stokes HW, Gillings MR. Gene flow, mobile genetic elements and the recruitment of antibiotic resistance genes into Gram-negative pathogens. FEMS microbiology reviews. 2011;35(5):790–819. Epub 2011/04/27. doi: 10.1111/j.1574-6976.2011.00273.x . [DOI] [PubMed] [Google Scholar]
  • 2.Baharoglu Z, Garriss G, Mazel D. Multiple Pathways of Genome Plasticity Leading to Development of Antibiotic Resistance . Antibiotics (Basel). 2013;2(2):288–315. Epub 2013/01/01. doi: 10.3390/antibiotics2020288 ; PubMed Central PMCID: PMC4790341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wright GD. The antibiotic resistome: the nexus of chemical and genetic diversity. Nature reviews Microbiology. 2007;5(3):175–86. Epub 2007/02/06. doi: 10.1038/nrmicro1614 . [DOI] [PubMed] [Google Scholar]
  • 4.Faure S, Perrin-Guyomard A, Delmas JM, Chatre P, Laurentie M. Transfer of plasmid-mediated CTX-M-9 from Salmonella enterica serotype Virchow to Enterobacteriaceae in human flora-associated rats treated with cefixime. Antimicrobial agents and chemotherapy. 2010;54(1):164–9. Epub 2009/11/11. doi: 10.1128/AAC.00310-09 ; PubMed Central PMCID: PMC2798505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Guan J, Liu S, Lin Z, Li W, Liu X, Chen D. Severe sepsis facilitates intestinal colonization by extended-spectrum-beta-lactamase-producing Klebsiella pneumoniae and transfer of the SHV-18 resistance gene to Escherichia coli during antimicrobial treatment. Antimicrobial agents and chemotherapy. 2014;58(2):1039–46. Epub 2013/11/28. doi: 10.1128/AAC.01632-13 ; PubMed Central PMCID: PMC3910833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dheilly A, Le Devendec L, Mourand G, Bouder A, Jouy E, Kempf I. Resistance gene transfer during treatments for experimental avian colibacillosis. Antimicrobial agents and chemotherapy. 2012;56(1):189–96. Epub 2011/10/12. doi: 10.1128/AAC.05617-11 ; PubMed Central PMCID: PMC3256041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Doi Y, Adams-Haduch JM, Peleg AY, D’Agata EM. The role of horizontal gene transfer in the dissemination of extended-spectrum beta-lactamase-producing Escherichia coli and Klebsiella pneumoniae isolates in an endemic setting. Diagnostic microbiology and infectious disease. 2012;74(1):34–8. Epub 2012/06/23. doi: 10.1016/j.diagmicrobio.2012.05.020 ; PubMed Central PMCID: PMC3427399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Stecher B, Denzler R, Maier L, Bernet F, Sanders MJ, Pickard DJ, et al. Gut inflammation can boost horizontal gene transfer between pathogenic and commensal Enterobacteriaceae. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(4):1269–74. Epub 2012/01/11. doi: 10.1073/pnas.1113246109 ; PubMed Central PMCID: PMC3268327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Allen HK, Looft T, Bayles DO, Humphrey S, Levine UY, Alt D, et al. Antibiotics in feed induce prophages in swine fecal microbiomes. mBio. 2011;2(6). Epub 2011/12/01. doi: 10.1128/mBio.00260-11 ; PubMed Central PMCID: PMC3225969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ. 2014;2:e415. Epub 2014/06/12. doi: 10.7717/peerj.415 ; PubMed Central PMCID: PMC4045339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Burton JN, Liachko I, Dunham MJ, Shendure J. Species-level deconvolution of metagenome assemblies with hi-C-based contact probability maps. G3. 2014;4(7):1339–46. Epub 2014/05/24. doi: 10.1534/g3.114.011825 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Marbouty M, Cournac A, Flot JF, Marie-Nelly H, Mozziconacci J, Koszul R. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms. Elife. 2014;3:e03318. Epub 2014/12/18. doi: 10.7554/eLife.03318 ; PubMed Central PMCID: PMC4381813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lamendella R, Domingo JW, Ghosh S, Martinson J, Oerther DB. Comparative fecal metagenomics unveils unique functional capacity of the swine gut. BMC Microbiol. 2011;11:103. Epub 2011/05/18. doi: 10.1186/1471-2180-11-103 ; PubMed Central PMCID: PMC3123192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Looft T, Johnson TA, Allen HK, Bayles DO, Alt DP, Stedtfeld RD, et al. In-feed antibiotic effects on the swine intestinal microbiome. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(5):1691–6. Epub 2012/02/07. doi: 10.1073/pnas.1120238109 ; PubMed Central PMCID: PMC3277147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Xiao Y, Kong F, Xiang Y, Zhou W, Wang J, Yang H, et al. Comparative biogeography of the gut microbiome between Jinhua and Landrace pigs. Sci Rep. 2018;8(1):5985. Epub 2018/04/15. doi: 10.1038/s41598-018-24289-z ; PubMed Central PMCID: PMC5899086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gresse R, Chaucheyras Durand F, Duniere L, Blanquet-Diot S, Forano E. Microbiota Composition and Functional Profiling Throughout the Gastrointestinal Tract of Commercial Weaning Piglets. Microorganisms. 2019;7(9). Epub 2019/09/25. doi: 10.3390/microorganisms7090343 ; PubMed Central PMCID: PMC6780805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Le Sciellour M, Renaudeau D, Zemb O. Longitudinal Analysis of the Microbiota Composition and Enterotypes of Pigs from Post-Weaning to Finishing. Microorganisms. 2019;7(12). Epub 2019/12/05. doi: 10.3390/microorganisms7120622 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Baudry L, Foutel-Rodier T, Thierry A, Koszul R, Marbouty M. MetaTOR: A Computational Pipeline to Recover High-Quality Metagenomic Bins From Mammalian Gut Proximity-Ligation (meta3C) Libraries. Front Genet. 2019;10:753. Epub 2019/09/05. doi: 10.3389/fgene.2019.00753 ; PubMed Central PMCID: PMC6710406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bickhart DM, Watson M, Koren S, Panke-Buisse K, Cersosimo LM, Press MO, et al. Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation. Genome Biol. 2019;20(1):153. Epub 2019/08/04. doi: 10.1186/s13059-019-1760-x ; PubMed Central PMCID: PMC6676630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kent AG, Vill AC, Shi Q, Satlin MJ, Brito IL. Widespread transfer of mobile antibiotic resistance genes within individual gut microbiomes revealed through bacterial Hi-C. Nat Commun. 2020;11(1):4379. Epub 2020/09/03. doi: 10.1038/s41467-020-18164-7 ; PubMed Central PMCID: PMC7463002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Yaffe E, Relman DA. Tracking microbial evolution in the human gut using Hi-C reveals extensive horizontal gene transfer, persistence and adaptation. Nat Microbiol. 2020;5(2):343–53. Epub 2019/12/25. doi: 10.1038/s41564-019-0625-0 ; PubMed Central PMCID: PMC6992475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Marbouty M, Baudry L, Cournac A, Koszul R. Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay. Sci Adv. 2017;3(2):e1602105. Epub 2017/02/25. doi: 10.1126/sciadv.1602105 ; PubMed Central PMCID: PMC5315449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Stalder T, Press MO, Sullivan S, Liachko I, Top EM. Linking the resistome and plasmidome to the microbiome. ISME J. 2019;13(10):2437–46. Epub 2019/05/31. doi: 10.1038/s41396-019-0446-4 ; PubMed Central PMCID: PMC6776055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cameron A, McAllister TA. Antimicrobial usage and resistance in beef production. J Anim Sci Biotechnol. 2016;7:68. Epub 2016/12/22. doi: 10.1186/s40104-016-0127-3 ; PubMed Central PMCID: PMC5154118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Schmidt VM, Pinchbeck G, McIntyre KM, Nuttall T, McEwan N, Dawson S, et al. Routine antibiotic therapy in dogs increases the detection of antimicrobial-resistant faecal Escherichia coli. J Antimicrob Chemother. 2018;73(12):3305–16. Epub 2018/09/15. doi: 10.1093/jac/dky352 . [DOI] [PubMed] [Google Scholar]
  • 26.Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019. Epub 2019/11/16. doi: 10.1093/bioinformatics/btz848 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ikeda S, Kaneko T, Okubo T, Rallos LE, Eda S, Mitsui H, et al. Development of a bacterial cell enrichment method and its application to the community analysis in soybean stems. Microb Ecol. 2009;58(4):703–14. Epub 2009/08/08. doi: 10.1007/s00248-009-9566-0 . [DOI] [PubMed] [Google Scholar]
  • 28.van Berkum NL, Lieberman-Aiden E, Williams L, Imakaev M, Gnirke A, Mirny LA, et al. Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments: JoVE. 2010;(39). Epub 2010/05/13. doi: 10.3791/1869 ; PubMed Central PMCID: PMC3149993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012;9(4):357–9. Epub 2012/03/06. doi: 10.1038/nmeth.1923 ; PubMed Central PMCID: PMC3322381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957–63. Epub 2011/09/10. doi: 10.1093/bioinformatics/btr507 ; PubMed Central PMCID: PMC3198573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27(5):824–34. Epub 2017/03/17. doi: 10.1101/gr.213959.116 ; PubMed Central PMCID: PMC5411777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications . BMC bioinformatics. 2009;10:421. Epub 2009/12/17. doi: 10.1186/1471-2105-10-421 ; PubMed Central PMCID: PMC2803857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother. 2012;67(11):2640–4. Epub 2012/07/12. doi: 10.1093/jac/dks261 ; PubMed Central PMCID: PMC3468078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hunt M, Mather AE, Sanchez-Buso L, Page AJ, Parkhill J, Keane JA, et al. ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom. 2017;3(10):e000131. Epub 2017/11/28. doi: 10.1099/mgen.0.000131 ; PubMed Central PMCID: PMC5695208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, et al. Mash Screen: high-throughput sequence containment estimation for genome discovery. Genome Biol. 2019;20(1):232. Epub 2019/11/07. doi: 10.1186/s13059-019-1841-x ; PubMed Central PMCID: PMC6833257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17(1):132. Epub 2016/06/22. doi: 10.1186/s13059-016-0997-x ; PubMed Central PMCID: PMC4915045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 2019;47(W1):W256–W9. Epub 2019/04/02. doi: 10.1093/nar/gkz239 ; PubMed Central PMCID: PMC6602468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6(1):158. Epub 2018/09/17. doi: 10.1186/s40168-018-0541-1 ; PubMed Central PMCID: PMC6138922. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Xavier Didelot, Chengqi YI

13 Sep 2021

Dear Dr Holmes,

Thank you very much for submitting your Research Article entitled 'HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities.' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Xavier Didelot

Associate Editor

PLOS Genetics

Chengqi YI

Section Editor: Methods

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Title: HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities.

Summary:

This is a particularly well prepared and easy to read manuscript. As a consequence, I have zero minor errors to report. As the authors have discussed, their application of Hi-C to generate MAGs and link AMRs containing sequences is not novel per se. However, their approach to Hi-C MAG refinement that leverages external information sources such as GTDBtk determined single-copy genes, is innovative. At the risk of sounding pompous, I would like to commend them for their determination in pursuing the best possible outcome from their data.

That said, there is sufficient new ground being tread within the work that much more explicit methodological detail is required for each stage (from determining consensus-cluster, splitting and refining). Further, and although I realise this is a tall-order, this reviewer would recommend that a section comprising in-silico validation (possibly against a simulated ground-truth) be added. Lastly, I did find any mention of a code repository for this pipeline, which I would regard as mandatory even at the stage of initial submission. This would have been of great assistance and access to the code would remove part of the burden of describing the method. Altogether, this leaves me with concerns for reproducibility.

Comment:

page 17 line 218: The authors may have found that a read-based approach, which targetted resistance genes, would have had greater sensitivity. This would then modify the following discussion contrasting OG/CV presence/absence. Granted, this would not enable any subsequent gene-context analysis. Note: I see this is briefly addressed at line 370.

page 23 line 313: Although this statement is not in hot dispute for this reviewer, I would have liked there to be a wider consideration of available exogenous variables -- that is to say a more complex regression model. I see that there are some words to this effect further down (line 327) mentioning limitations.

page 24 line 341: Although it provides a taxonomic hint as a side-effect of its validation analysis, the authors of checkM would probably be the first to say that it has never been meant for taxonomic assignment. I would consider revising this sentence.

page 31 line 498: Did the authors consider using or did they trial the use an existing Hi-C metagenomic binning software? Modularity based clustering algorithms (such as Louvain) suffer from a resolution limit which may have impacted the authors

capacity to extract less abundant genomes.

page 31 line 507: This is potentially of interest to the tool developers within Hi-C metagenomics community. As such, more detail is required to describe this method. In particular how do the authors reconcile the variation in coverage that will be seen between core and accessory fragments when splitting?

page 31 line 512: As with my previous comment, this step requires greater detail.

page 32 line 526: The detailed description found in this section, along with the brief descriptions of previous steps, highlights the novelty of the author's approach to refine MAGs. Altogether from splitting to this step, there are a number of methodological points at which things could have gone awry. As such, algorithm validation against a ground-truth would have been greatly appreciated by this reviewer.

Reviewer #2: The manuscript by Kalmar et al. entitled “HAM-ART: An optimized culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities” describes a laboratory and a computational pipeline using Hi-C data applied to metagenomic samples. Proximity ligation-based methods (3C, Hi-C) applied to environmental samples is a relatively new field with high potential notably to study mobile genetic elements in microbial communities. In the last years several methods and pipelines have been published in this field demonstrating the increasing interest in this technology. The present manuscript takes advantage of the Hi-C data to study AMR genes and their associated MAGs in a longitudinal cohort of pigs. Not surprisingly, the authors found that a higher antibiotics usage is correlated with a higher AMR genes presence. They also develop and propose a new pipeline to analyze Hi-C data ranging from metagenomic samples.

Even if the field will benefit from new pipelines, the manuscript suffers from various problems that will need to be address before publication.

Major comments:

1 - First, the authors say in the abstract that their pipeline is also a laboratory one but I cannot determine the novelty in the preparation of the libraries compare to previous protocols. Moreover, there is no data to illustrate that their Hi-C data are of good quality in terms of 3D signal. Several tools or calculations exist to check the quality of Hi-C data (3D ratio, q3C) and it will be of great importance to know the quality of their data and to compare them to previous published datasets. Moreover, the authors say that they have implemented several quality controls in their computational pipeline (line 581) but they do not provide them for their libraries.

2- The authors describe succinctly a new computational pipeline but the description and the content will need modifications, descriptions as well as comparison with available ones. First, the authors apply an iterative partitioning procedure using the Louvain algorithm, a procedure and a software already used in previous publications and pipeline. Then they keep group of contigs that cluster together during 100 iterations of the algorithm and they call them Consensus Clusters (CCs); a term very close from the Core Communities (CCs) used in a previous publication and in an available pipeline – MetaTOR (Marbouty et al, 2017, Baudry et al. 2019).

Line 507: “The separation of mixed CCs is first addressed using a coverage distribution-based separation algorithm for each CC which splits the CC if the distribution of sequencing coverage was clearly multimodal.” The authors must provide the algorithm used at this step.

Line 512 :” An iterative CC extension step was built into the pipeline at this point to extend clusters based on the Hi-C inter-contig contacts and cautiously identify contigs that should be allocated to multiple CCs.” Same as previously, details are needed here

If the authors want to publish a new pipeline as they claim it [line 305: “ HAM-ART is the first method that is designed to cope with large sample numbers, using the most common Illumina based sequencing platform and delivering results from affordable amounts of sequencing depth.”], they must provide their code to the community and benchmark their approach with already published pipeline (bin3C, MetaTOR, HicZin …) and other datasets.

3- The paper will be easier to understand if the authors provide a summary table with libraries constructed, assemblies generated, binning statistics … for each sample. Moreover, it will be of great interest for the community to have access to the different assemblies as they can reproduce some of the outputs.

4- The analysis on the lnu(A) gene is interesting but lack several controls and need further analysis. Are the contigs detected as plasmids by state of the art tools like PlasFlow or PlasmidFinder ? Can the authors check the topology of the corresponding contigs by using their Hi-C data ? If the contigs are circular, they should be able to detect the 3D signal in the corresponding contact matrices.

Same remark concerning the E. coli analysis and the possible plasmid.

Minor Comments:

- Line 102 : I would replace “assembles” by “bin” as the pipeline do not assemble the genomes but regroup contigs together.

- In general, there is some problems with the terms “assembled”, “scaffolded” and binned and it is difficult to understand, for instance for the E. coli example, the limits of the method. Is it a problem of assembly (shotgun sequencing) ? a problem of binning (not enough Hi-C reads) ?

- MLST is not defined.

- I have problem to understand exactly how the different assemblies were performed. Do the different Hi-C and Shotgun libraries have been mixed to perform them ? or only the shotgun ones ?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: No: data are not already accessible and i would recommand to also submit the different assemblies generated. Code is also mandatory.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Matthew Z. DeMaere

Reviewer #2: Yes: Martial Marbouty

Decision Letter 1

Xavier Didelot, Chengqi YI

13 Jan 2022

Dear Dr Holmes,

Thank you very much for submitting your Research Article entitled 'HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities.' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated your efforts to account for their previous comments in this revised manuscript. They agreed that your manuscript has improved significantly. One of the reviewers requested a few more minor changes, which we would like you to address in a revised manuscript.

We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Xavier Didelot

Associate Editor

PLOS Genetics

Chengqi YI

Section Editor: Methods

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This reviewer agrees with authors in regards to the on-going difficulty in properly validating new methods in Hi-C metagenomics and I accept that they have tried to address this by sequencing targetted isolates.

Besides this frustration with validation, to which I assign ~no~ blame to the authors, I am satisfied with how the authors have addressed my remarks and appreciate their care in doing so.

Reviewer #2: The revised manuscript by Kalmar appears better than the previous version. Codes and data are now available and would help scientists to reproduce their results. Precisions about their computational pipeline have been also added and their approaches appear promising. The manuscript still need minor modifications / precisions:

1- The authors have provided important statistics about their HiC libraries but never discussed about them. It could of great importance as their HiC libraries do not seem to encode more 3D signal compared to a classical 3C library. Is there a real gain in performing HiC that is more expensive than 3C ?

2- I would really appreciate that the authors submit their assemblies to the NCBI or other databases as it will allow others scientist to reproduce their results without this time-consuming assembly step. It is really easy to do and will help the whole community.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Matthew Z DeMaere

Reviewer #2: Yes: Martial Marbouty

Decision Letter 2

Xavier Didelot, Chengqi YI

7 Feb 2022

Dear Dr Holmes,

We are pleased to inform you that your manuscript entitled "HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities." has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Xavier Didelot

Associate Editor

PLOS Genetics

Chengqi YI

Section Editor: Methods

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-00617R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Xavier Didelot, Chengqi YI

9 Mar 2022

PGENETICS-D-21-00617R2

HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities.

Dear Dr Holmes,

We are pleased to inform you that your manuscript entitled "HAM-ART: An optimised culture-free Hi-C metagenomics pipeline for tracking antimicrobial resistance genes in complex microbial communities." has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Orsolya Voros

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Characteristics of the farms used in the study.

    The conventional or high-antimicrobial use farms are labelled CV_1 to 5 and the organic, or low antimicrobial use farms are labelled OG_1 to 5.

    (PDF)

    S2 Table. Complete list of MAGs and their AMR gene associations from 100 pig faecal samples.

    Columns in the tab-separated table are: Farm and sample identification (e.g. CV3_2 stands for sample 2 from conventional farm 3); Type of the farm (organic / conventional); Clade identifier; Size of the assembly (in kilobases); Number of contigs in the MAG; N50 of the MAG; Weighted mean coverage of the contigs; GC content of the MAG; GTDB-tk taxonomy string; percentage of the multiple sequence alignment (by GTDB-tk) spanned by the genome; The rest of the columns indicate the presence (1) or absence (0) of the particular AMR gene within the MAG.

    (TSV)

    S3 Table. Basic QC information on all the libraries used in this study.

    Empty cells and redundant numbers are the indicators of pooled Hi-C libraries.

    (XLSX)

    S4 Table. Quality control information on all Hi-C sequencing libraries generated by the qc3C software package.

    (XLSX)

    S1 Fig. Composition of the microbiota on the studied pig farms in domain, phylum, class, order and family levels.

    (EPS)

    S2 Fig. Detailed workflow of the iterative CC and Assembly extension step.

    Parameter thresholds are continuously adjusted based on the contig composition of the CC / Assembly.

    (TIFF)

    S3 Fig. Detailed bioinformatics pipeline workflow separated to pre-assembly, post-assembly and clade refinement.

    Text is coloured black for descriptions and white for the used software / script background.

    (PDF)

    Attachment

    Submitted filename: ham_art_responses_final.docx

    Attachment

    Submitted filename: ham_art_responses_Jan22.docx

    Data Availability Statement

    The authors confirm that all data underlying the findings are fully available without restriction. Chromosome conformation capture and metagenome sequencing data have been deposited in the European Nucleotide Archive (http://www.ebi.ac.uk/ena) and are available via study accession number PRJEB48382. The complete collection of assembled MAGs is available from the open access Apollo data store at the University of Cambridge (https://doi.org/10.17863/CAM.80312). Underlying numerical data for all graphs and summary statistics is available in the repositories above or in Supporting Information. The code for the HAM-ART pipeline was implemented in Perl programming language. The code and instructions are publicly available under the BSD-3-Clause License at https://github.com/lkalmar/HAM-ART.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES