Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Jul 30;14:17477. doi: 10.1038/s41598-024-67104-8

A novel integrated extraction protocol for multi-omic studies in heavily degraded samples

Byron Boggi 1, Jack D A Sharpen 1, George Taylor 2, Konstantina Drosou 1,3,
PMCID: PMC11289452  PMID: 39080329

Abstract

The combination of multi-omic techniques, such as genomics, transcriptomics, proteomics, metabolomics and epigenomics, has revolutionised studies in medical research. These techniques are employed to support biomarker discovery, better understand molecular pathways and identify novel drug targets. Despite concerted efforts in integrating omic datasets, there is an absence of protocols that integrate all four biomolecules in a single extraction process. Here, we demonstrate for the first time a minimally destructive integrated protocol for the simultaneous extraction of artificially degraded DNA, proteins, lipids and metabolites from pig brain samples. We used an MTBE-based approach to separate lipids and metabolites, followed by subsequent isolation of DNA and proteins. We have validated this protocol against standalone extraction protocols and show comparable or higher yields of all four biomolecules. This integrated protocol is key to facilitating the preservation of irreplaceable samples while promoting downstream analyses and successful data integration by removing bias from univariate dataset noise and varied distribution characteristics.

Keywords: Degraded DNA, Ancient DNA (aDNA), Integrated extraction protocol, Multi-omic protocol, Evolutionary biology

Subject terms: Biological techniques, Genetics, Environmental sciences


Over the last decade, multi-omic techniques have been in the forefront of biological sciences. The analytical power integrating multiple datasets has facilitated major advances in medicine, microbiology, modern human disease studies1,2 ecology and agriculture3. The purpose of multi-omic studies is to provide a deeper understanding of the interplay between multiple layers of biological regulation and in particular to support biomarker discovery, predict novel drug targets, increase diagnostic power for health and improve disease prognosis4. Despite the potential of machine learning technologies, multi-omic data integration faces many challenges such as data normalisation and compatibility, clustering, functional characterisation and visualisation which confound their analyses5,6. An added level of complexity is that multi-omics datasets are derived from blending data from individual workflows, different processing platforms, and intra-sample biomolecule preservation.

In heavily degraded remains, such as ancient, forensic, archival or even clinical Formalin-Fixed-Paraffin-Embedded (FFPE) samples, the situation is even more challenging. Sample heterogeneity, limited DNA quantity, fragmentation, degradation, contamination, post-mortem damage and chemical modifications can all pose challenges and introduce further bias in downstream analyses which complicates data integration7,8. For these reasons, multi-omic analyses and data integration in the ancient DNA (aDNA) field have been massively overlooked. To this day most research focusses either on genomics, or proteomics, and most often individually, rather than through a combined approach. A major drawback is that archaeological samples are rare and irreplaceable so there is a need to increase the amount of data that can be derived from a single biopsy. Integration on the workflow level, i.e., the extraction of two or more biomolecules from a single sample/biopsy, has been limited. The few examples where it has been attempted in the literature have largely concentrated on the combination of DNA and protein, for example from dental remains9 and dental calculus in particular10. Furthermore, in the former case, the efficacy of the combined protocol was not compared to similar, individual protocols, making it difficult to assess the effectiveness of the method. The latter study reported a nearly 50% decrease in endogenous DNA recovered when compared to a DNA-only protocol, likely due to the method requiring long incubation times (72 h) with no nuclease protection. Previous studies have shown that an optimised silica-based spin column can effectively co-extract DNA, RNA and protein11 and commercial kits have been available for a number of years (AllPrep DNA/RNA/Protein Mini Kit—Qiagen). Such kits could likely be optimised for aDNA and peptide recovery in a similar fashion to other proprietary chemistry12. One of the main practical challenges in co-extracting aDNA and peptides is the long incubation time during extraction and fragile nature of aDNA, whereby inhibition of nucleases is essential to preserve as much genomic content as possible. As with most DNA extraction protocols, proteinases are the gold standard. However, such treatments are not amenable to proteomic workflows for already degraded peptides in bottom-up workflows. Hence, inactivation of nucleases could be achieved with a strong denaturant, such as methanol, where loss of structure will effectively inhibit nuclease activity. Furthermore, such denaturants will lead to the precipitation of genomic and proteomic contents of the extracted sample, partially purify the sample and facilitate the recovery of polar and non-polar small molecules. As such, solvent mixtures (chlorofolm:methanol)13,14, have been adopted as the gold standard for lipids and metabolites for over 70 years and are widely used for the co-extraction of protein15.

Another popular biphasic organic extraction protocol for lipids and metabolites, adopts Methyl-tert-butyl-ether (MTBE) as a safer alternative to chloroform and allows faster and cleaner recovery of a broad range of lipid classes16. As with chloroform:methanol this protocol has been shown to effectively co-extract proteomic content and provide a comprehensive view of cell signalling mechanisms17.

However, a protocol that includes DNA, lipids, proteins, and metabolites does not currently exist in published literature for ancient, forensic or clinical samples. Here we demonstrate a novel, fully integrated extraction protocol implemented on artificially degraded porcine brains that is suitable for degraded samples. We designed our protocol the following rationale:

First, we consider that biomolecular analyses of archaeological remains involve destructive sampling and require a significant amount of initial template for separate extraction methods. In addition, such analyses are often complemented by radiocarbon dating and stable isotope analysis, which require additional sample material. Ancient soft tissues, such as hair, skin, muscle and brain, have rarely been considered for biomolecular and biomedical research because these tissues are often part of an intact, mummified individual, and are subject to different sampling considerations than skeletal remains. Therefore, by combining the extraction of aDNA, proteins, and lipids from a single biopsy, we significantly reduce the amount of template material required, thus preserving a finite and irreplaceable resource, and facilitating greater depth of analysis that would otherwise be prohibited due to the requirement for destructive sampling.

Second, of all the soft tissues preserved in the archaeological record, brain is one of the most overlooked tissues as its recovery was previously considered to be rare phenomenon. This tendency to refer to it as rare has caused a paucity of studies into ancient brain tissue analyses, as well as studies in brain preservation and degradation mechanisms18. However, to date there are over 200 independent published reports about preserved brain specimens from a wide range of depositional environments, and around a thousand more reports about brain specimens dating back to the seventeenth century, in which the brain appears in various states of preservation1924. Moreover, forensic studies in brain tissue have found that DNA is relatively resistant to putrefaction and in contrast to bone, the evidence suggests a certain degree of protection against DNA degradation25,26. Such studies have been supported by Serrulla et al.26 who recovered 45 well-preserved brains from a Spanish civil war mass grave. In this case, the wet conditions at the level of the burials have led to poor skeletal preservation but exceptional soft tissue preservation including brains27. Despite brain tissues having been recovered from both forensic and archaeological environments, biomolecular research from brain tissue is extremely limited with only a handful of examples28,29. Therefore, the necessity of having an optimised protocol in place to allow the extraction of biomolecules from soft tissues that are resistant to degradation is highlighted.

Third, in aDNA studies, mineralised matrices such as bones and teeth are preferred, largely because early studies showed that, in comparison to skeletal remains, DNA derived from soft tissues is significantly more fragmented, has a low endogenous DNA content, and only survives under certain climatic conditions in the archaeological record30. As a result, most methods upstream of high throughput DNA sequencing (HTS) have been optimised for mineralised samples. However, soft tissues can be more informative than skeletal tissues, especially in palaeopathological studies, as they carry different molecular information31. Brain samples can facilitate omics analyses with unprecedented resolution as neurodegenerative and psychiatric disorders are affected by a multitude of factors such as pathogen-driven selection, genetics and environmental influences the signatures of which can be detected with omics analyses, specifically through lipidomics and proteomics. In addition, brain samples can play a key role in elucidating evolutionary patterns, as the brain has been shaped over the course of time32. Other soft tissues such as muscle and skin can offer a more direct representation of an individual’s metabolic state, including disease-related markers or evidence of exposure to toxins. Therefore, developing an extraction protocol that is optimised for soft tissues and facilitates multi-omics analyses is of paramount importance.

Therefore, we have devised an integrated two-step protocol of extracting lipids, metabolites, proteins and DNA (Fig. S1). We combined a well-established solvent-based approach that facilitates co-extraction of lipids and metabolites through phase separation by centrifugation. This results into two layers: the non-polar lipids in the top phase and the polar metabolites in the lower phase, leaving denatured protein and DNA pelleted at the bottom of the tube. The second step involves the resuspension of DNA and protein, separating the DNA from the protein-tissue pellet with a final precipitation step, leaving the DNA soluble in the supernatant. The subsequent extracts are then ready to be prepared for downstream data acquisition.

Results

Artificial desiccation results

Four pig brain biopsies (SS1, SS2, SS3, SS4) from the cerebral cortex were obtained and each was split into two hemispheres. Because our goal for this paper was not to observe degradation in real-time, we followed a basic desiccation protocol that has been used before in similar studies33. Three of the samples were desiccated at one time point, and the fourth sample was desiccated for twice the time to confirm degradation occurrence and highlight discernible differences. A total of 2 mg of cerebral cortex was collected from the frontal lobe to facilitate extraction using a) standalone (SA) protocols of DNA12, protein, lipid and metabolite14 extractions and b) our integrated (INTG) extraction protocol (Table S1). Whilst mummification by desiccation can lead to good preservation of biomolecules, previous experimental mummification studies using a range of desiccation methods have illustrated that significant DNA degradation can be measured in a range of both human and pig tissues3335.

To assess degradation, we made morphological observations typically associated with artificial desiccation experiments including visible shrinkage of the cerebral cortex and decrease in percentage mass due to loss of water content (Figs. S2–S5, Table S2). On a molecular level we assessed degradation through observations within the genome and proteome, as the mechanisms underlying lipid degradation are not entirely understood, and the main product, oxidation can be a result of either a signalling process or degradation. We found approximately 15% of protein-containing modifications, of which 50% were deamidation. Upon further inspection, we identified deamidation in proteins not associated with these modifications in vivo, such as Syntaxin binding proteins (F1RS11) and tubulin (F1SHQ8), as well as N-acetylation at predicted residue sites in Serine- and Arginine-rich splicing factor 3 (F1RY92) within both protocols. This illustrates that modifications from both post-translational and post-mortem processes were recovered, and demonstrates that our artificial desiccation protocol successfully mimics spontaneous protein diagenesis. Despite the limitations in obtaining quantitative estimation of DNA degradation using Bayesian computation through mapDamage36 we employed a qualitative approach to assess DNA sequence characteristics indicative of degradation. Through visual inspection, we observed signs of deamination, fragmentation, low base quality scores and a decrease in mean sequence length, all of which indicate damaged sequences suggestive of degradation. After extraction of the desiccated tissues, DNA was quantified using a Qubit™ 4 fluorimeter (Figure S6).

DNA results

DNA recovery showed the largest deviation when compared to the SA protocol from equivalent masses of brain tissue. All four samples showed a significant increase in yield which is evident from the amount of raw and mapped reads per chromosome (Table 1) as well as the number of single nucleotide polymorphisms (SNPs) recovered (Table 2).The sequencing results for both the INTG and SA protocols can be found in Table 1. In line with increased yields of DNA, the number of raw reads following sequencing was significantly higher than the integrated protocol compared to that of the dedicated DNA extractions with an approximate average of 27 M reads compared to16M reads, respectively. However, the increased number of reads is to be expected given the higher DNA concentration recovered, but it should be noted that identical elution volumes were used to resuspend the DNA post-extraction. Furthermore, for DNA samples processed by the dedicated protocol, on average more than 60% of reads aligned to the Sus Scrofa genome were removed during the duplicate removal stage of data pre-processing. This is indicative of a less-than-optimal sample preparation where the PCR is dominated by a low yield of correctly formatted amplifiable library molecules.

Table 1.

Mapping Statistics per library and mapped reads per chromosome. 

SS1
SA
SS1
INTG
SS2
SA
SS2
INTG
SS3
SA
SS3
INTG
SS4
SA
SS4
INTG
Raw reads 7,003,531 36,467,461 21,619,544 13,386,446 21,761,393 40,849,208 15,540,794 52,678,198
Mapped reads 4,496,535 22,075,298 13,667,758 9,399,508 10,680,890 26,018,098 10,290,120 38,969,751
Unique reads 814,656 16,941,890 7,956,412 7,326,233 4,045,118 7,702,800 3,278,082 32,096,857
Redundancy 81.87% 23.25% 41.78% 22.05% 62.12% 70.39% 68.13% 17.63%
μ read length 60 bp 56 bp 55 bp 68 bp% 68 bp 62 bp 69 bp
End. DNA 11.6% 46.5% 36.8% 54.7% 18.6% 18.8% 21.0% 60.9%
Chr1 85,904 133,158 803,696 752,368 427,869 808,989 345,566 3,233,544
Chr2 49,126 1,662,529 501,769 471,091 245,168 479,456 207,966 2,155,310
Chr3 46,128 1,088,957 483,881 447,903 229,480 451,855 196,438 2,073,002
Chr4 43,105 1,059,710 414,411 384,060 212,955 408,760 175,313 1,703,578
Chr5 34,394 879,573 346,039 322,773 171,922 331,634 145,233 1,466,299
Chr6 59,542 745,787 627,085 573,367 294,955 583,271 253,859 2,713,065
Chr7 40,976 1,390,611 411,008 372,069 203,856 392,386 169,593 1,686,410
Chr8 43,164 878,092 399,120 378,586 216,363 406,545 173,414 1,604,841
Chr9 45,355 833,367 429,374 397,536 221,642 423,235 180,345 1,737,267
Chr10 23,742 899,091 235,310 213,352 119,332 225,803 97,399 949,637
Chr11 25,114 503,701 238,235 227,187 124,018 236,179 101,802 977,115
Chr12 23,523 509,816 261,811 240,418 112,750 229,708 102,836 1,197,572
Chr13 64,133 608,040 600,899 559,136 320,681 603,339 258,840 2,380,203
Chr14 47,232 1,232,094 475,958 431,745 235,702 457,671 196,264 1,963,887
Chr15 44,087 1,017,433 408,414 382,245 218,323 413,062 176,667 1,643,580
Chr16 25,458 845,872 254,141 229,567 125,801 237,558 103,267 963,285
Chr17 22,050 494,678 230,570 208,237 108,455 212,637 92,550 968,453
Chr18 19,182 506,110 191,335 176,804 95,352 184,820 78,909 801,665
ChrX 36,067 414,699 345,218 325,376 187,258 342,860 74,495 703,636
ChrY 268 724,315 2,323 2,455 1,196 2,532 10,224 89,596
MT 15,749 6,483 61,921 23,159 42,193 57,335 41,859 171,291

Each sample yielded between 7.1 and 52.7 million reads, with an average of 60% of the reads uniquely mapping to the Sus Scrofa genome. The samples consistently exhibited a substantial amount of endogenous content, ranging from 5.3% to 61.2%. The average fragment lengths varied between 55 and 69 bp. Unique mapped reads are in bold.

Table 2.

Variant calling per sample, per chromosome.

Chromosome RefSeq SS1
SA
SS1 INTG SS2
SA
SS2 INTG SS3
SA
SS3 INTG SS4
SA
SS4 INTG
Chr 1 NC_010443.5 0 3,506 87 274 10 73 15 22,595
Chr 2 NC_010444.4 0 4,784 126 431 5 73 14 24,844
Chr 3 NC_010445.4 2 5,298 134 314 21 117 12 28,906
Chr 4 NC_010446.5 4 3,104 66 239 27 69 22 16,470
Chr 5 NC_010447.5 6 3,230 224 400 43 152 100 19,317
Chr 6 NC_010448.4 1 5,391 149 441 17 106 16 19,648
Chr 7 NC_010449.5 2 2,372 75 179 9 51 11 19,899
Chr 8 NC_010450.4 2 1,793 150 268 24 98 45 14,880
Chr 9 NC_010451.4 1 2,392 172 353 30 107 25 17,676
Chr10 NC_010452.4 4 2,060 118 203 21 142 23 12,010
Chr 11 NC_010453.5 0 1,638 27 125 2 50 0 10,906
Chr 12 NC_010454.4 0 4,977 112 360 18 60 11 23,295
Chr 13 NC_010455.5 1 1,879 93 171 12 54 18 13,213
Chr 14 NC_010456.5 0 2,915 77 247 6 46 8 22,075
Chr 15 NC_010457.5 8 2,292 100 209 17 55 20 12,205
Chr16 NC_010458.4 12 1,280 121 202 78 125 62 7,703
Chr 17 NC_010459.5 1 2,501 78 186 6 40 19 13,331
Chr18 NC_010460.4 0 1,564 37 117 1 29 3 9,915
Chr X NC_010461.5 3 814 29 75 0 18 0 1,440
ChrY NC_010462.3 0 30 2 3 0 1 509 2,009
MtDNA NC_000845.1 173 182 177 174 19 20 20 1,440

SA: Standalone Protocol, INTG: Integrated Protocol.

The most notable difference between the two protocols was observed during variant calling. The integrated protocol yielded a much higher number of SNPs across all chromosomes. Manual inspection of regions that were mutually covered by aligned sequencing reads exhibited a greater depth of coverage favouring the integrated protocol (Table 2). We further analysed samples SS2 (SA)/SS2 (INTG) which have largely comparable numbers of aligned reads (in total and per chromosome) and found a significantly higher number of SNPs obtained from the integrated protocol. We compared the read-to-SNP ratio of the unfiltered VCF files and the percentage of SNPs passing filtering (MQ < 30, DP > 5, QUAL30, QD < 2.0, SQR > 3.0, FS > 60.0, MQRankSum < − 12.5, ReadPosRankSum < − 8.0) was much higher from the INTG protocol. For example, sample SS2 (SA) has a higher number of aligned reads overall and following read duplicate removal compared to its equivalent SS2 (INTG) (Table 1). Despite the higher number of aligned reads the number of SNPs passing quality filtering was more than double for the INTG protocol. Furthermore, the percentage of reads passing quality filtering was much higher (X% vs Y%) for the INTG approach. Due to the differences in the number of raw reads per library, the remaining samples could not be directly compared. However, we normalised the data and analysed SNP-to-aligned read ratio, again finding that it was significantly higher in favour of the integrated protocol. Our results suggest that, when considered holistically, DNA preparation using the integrated protocol was more successful compared to the standalone approach throughout all stages, from sample preparation to computational processing.

Protein results

The results are summarised in Fig. 1a,b. Following shotgun proteomics, we validated peptide sequence matches (PSMs) of raw files through MaxQuant37 search using MSStats38 for processing and summarisation, before comparing SA with INTG protocols for both normalised peak intensities of total proteins discovered (Fig. S7) and categorising their general functions (Fig. 1a). Both protocols show that most proteins are related to neuronal and metabolic functions, which is expected within brain samples. Furthermore, proteins and isoforms specific to the brain and nervous system were the dominant class in each protocol, showing a high degree of uniformity. While there were more PSMs in the SA protocol than in the INTG post-processing, we found that INTG SS2 had significantly fewer proteins (Fig. 1b) recovered and at lower peak intensities (Figs. S7, S8) than the other three INTG samples, indicating a loss or disruption of the sample in the process. In contrast, a greater overall recovery of individual protein abundance in SS1-SS3 samples (Fig. S8), such as with the ubiquitous tubulin, spectrin, alpha (axon actin-ring cytoskeleton) and syntaxin binding protein 1 (neurotransmitter release in synaptic vesicles), further validates the comparability between protocols. To further asses the recovery of peptides derived from regionally enriched protein expression39 in the cerebral cortex of Sus Scrofa (n = 109), we first filtered the identified proteins to remove contaminants, with any protein that did not have at least one unique peptide removed from the dataset. Data was filtered to retain proteins identified with high confidence according to an FDR below the 0.01 threshold. Only proteins identified with high confidence (FDR threshold = 0.01) were retained (Fig. 2). Venn diagrams were generated using jvenn40.

Figure 1.

Figure 1

(a): Allocated categories of peptide sequence matches for each protocol. Labelled as Neuronal are the matches where the protein or isoform is specifically located within the brain or central nervous system. (b): Protein Recovery between INTG and SA Protocols. 163 proteins and 192 proteins were identified in all INTG (blue) and SA (red) samples collectively.

Figure 2.

Figure 2

Comparison of enriched protein expression between protocols. Expected (green) represents protein where expression is regionally enhanced within the cerebral cortex of Sus Scrofa (n = 109). The list of expected proteins was acquired from the Human Protein Atlas39.Only proteins identified with high confidence across all samples were retained for comparison with a total of 215 and 184 proteins for SA and INTG respectively with a single expected protein recovered by both methods.

Lipid and metabolite results

The initial lipid spectral identifications showed that, in general, the INTG protocol recovered more peaks across both mass spectrometric modes than Bligh and Dyer14, with a significant increase in a positive mode (Fig. S9). When examining the lipid family composition (Figs. 3a, b), the protocols are highly comparable and consistent with current brain lipid literature. The predominance of glycerophospholipids in both modes is expected due to their being the main constituents of neuronal cell membranes41 while the significant abundance of sphingolipids is commonly found within the brain due to their roles in neurogenesis and synaptogenesis42. These general observations of tissue specificity and lipid recovery are similarly reflected when discriminating into lipid classes (Figs. 3c, d), where sphingomyelin and its biosynthetic ceramide precursors, both highly enriched in oligodendrocytes and myelin, consist of the majority of positive mode identifications. In contrast, the membrane-based phosphatidylethanolamine (PE) and its derivatives dominate the negative mode, while also elucidating cardiolipin, a hallmark of inner mitochondrial membrane that is predicted to be abundant within the brain43. When examining the lipid subclasses composition (Fig. 4), we observed approximately double the amount of lipids with the INTG protocol in positive mode compared to the SA protocol, where these lipids predominantly account for ceramides, monoacylglycerophosphocholines and sphingomyelins. In negative ionisation mode, diacylglycerophosphates and monoacylglycerophosphoglycerols account for most of the exclusive identifications with INTG. Conversely, phosphatidylethanolamines mainly account for compounds identified exclusively with SA in both ionisation modes.

Figure 3.

Figure 3

(a, b): Lipid families found in both MS and MS/MS. (c, d): Individual lipid classes of MS/MS matches.

Figure 4.

Figure 4

Comparison of MS2 acquired lipid identifications. In total, SA and INTG generated 276 and 445 MS2 acquired identifications in positive mode and 196 and 221 in negative mode respectively. INTG showed approximately double the amount of lipids that were acquired exclusively in both positive and negative modes when compared to the SA protocol.

The only sample which had significantly less recovery when using the INTG protocol was SS2 negative mode lipids (Figs. 3b, d), which further supports loss of sample during phase separation or handling. However, statistically, there was virtually no difference between protocols in terms of peak identifications (Fig. S9).

Statistical analysis of small molecules and metabolites showed no significant difference between the Integrated and traditional protocols used, with comparable recovered MS peak identifications (Figs. 5, S9). Due to the lack of acquired MS2 data for both protocols, we compared the total number of compounds with suggested identifications (Fig. 6). Therefore, it was not possible to focus our comparison on reference-matched identifications as was conducted with the lipids. Identifications were excluded if they did not display 100% coverage across all samples within each protocol (n = 4) to minimise any bias from differing metabolism between animals. We observed a comparable recovery of molecular classes in both protocols with the most abundant class of molecules shared by both protocols belonging to amino acids. However, we are unable to distinguish if this is a result of natural protein turnover in vivo or a result of protein degradation. Despite this, it is logical to assume that this is a combination of protein metabolism and degradation resulting from the treatment of the tissues prior to extraction. Differences between protocols are generally attributed to variation between a small number of amino acids and saturated fatty acids. The exact reasoning is unclear as both protocols rely on MeOH for the recovery of polar metabolites. However, the ratio of MeOH to non-polar solvent does differ, and hence the final concentration of MeOH differs between protocols and may explain the slight difference in the recovery of polar metabolites.

Figure 5.

Figure 5

Brain Lipids and Metabolites obtained from INTG (blue) and SA (red) protocols.

Figure 6.

Figure 6

Comparison of MS1 acquired metabolite identifications. SA and INTG protocols generated 328 and 322 identifications respectively with approximately 50% of the matches being concordant between methods. Due to lack of acquired MS2 data for both protocols, we compared the total number of compounds with suggested identifications.

Discussion

This proof-of-concept paper demonstrates for the first time the successful implementation of a novel multi-omic integrated protocol for the simultaneous extraction of DNA, proteins, metabolites and lipids from a single degraded sample. This approach increases the acquisition of omics data from a single sample while reducing time, associated costs, and more importantly, sampling requirements (which is of paramount importance, when finite or archival and clinical FFPE samples are processed).

Four porcine brain samples were artificially degraded for up to three months at 37 °C and subsequently sub-samples were processed using well-established standalone protocols for the individual extraction of DNA12, proteins, lipids and metabolites14. Equivalent sub-samples of the same four samples were also processed using our integrated protocol. Comparisons between the integrated and standalone protocols demonstrate that our multi-omic protocol is either comparable or superior to the single-omic protocols. Both protocols generated a substantial number of raw reads, with the SA protocol producing between 6 and 21 million reads, while the INTG protocol produced between 13 and 52 million reads. Discrepancies in the raw data could be attributed to differences in DNA concentration, pooling and sequencing bias. However, when we normalised the results to account for biases, we observed consistently better results in favour of the integrated protocol for all samples (except for SS1, which exhibited more than twice the proportion of mapped reads compared to raw reads (Table 1), but this did not translate into higher rates of SNP calling). Additionally, when mapping to the Sus Scrofa genome, the samples processed with the INTG protocol showed a significant percentage of high-quality SNP calls per sample compared to the dedicated protocol (Table 1), even when the number of aligned reads was lower than the equivalent SA sample (Tables 1 and 2). Aligned reads per chromosome generally showed comparable or better results between the two protocols, except for a few instances where the SA protocol exhibited more Y-chromosomal reads (Table 1). In addition, there was increased depth of coverage which is of great significance for applications in ancient and forensic fields, as it decreases the need for additional target enrichment. Similarly to the DNA results, protein recovery was also comparable or higher in all four samples apart from sample SS2 which is likely the result of a loss of sample during phase separation or handling. Organic molecule, metabolite and lipid recovery was consistently comparable among all samples, with the INTG protocol recovering more identifications in both positive and negative modes during lipidomic analysis.

So far, biomolecular analyses of degraded brain samples have been extremely rare, with only two successful examples documented in the literature27,28 following singe-omic approaches. Additionally, one study produced dubious results29 due to the lack of adherence to strict anti-contamination criteria. With rare exceptions, the standardised and comprehensive analysis of ancient, forensic, mummified and other heavily degraded brain tissues has been notably lacking on a large scale. This gap signifies a vast reservoir of unexploited tissue resources that could significantly contribute to the exploration of brain evolution. As a consequence, contemporary research on human brain evolution4447 has predominantly relied on modern samples, primarily focusing on the genomic level. Computational approaches, cross-species comparative genomics, as well as population genetic-level approaches, have been instrumental in exploring the diversity of the human genome. Notably, these studies have highlighted the role of Transposable Elements (TE) as influential drivers of human brain evolution32.

However, pathogenesis encompasses a complex interplay of various regulatory elements such as proteins, metabolites, lipids, genes and epigenetic features such as DNA methylation and histone modifications. Considering the brain is abundantly found in mummified remains and is resistant to degradation due to its high lipid content, multi-omics studies including both ancient and modern data are critical in providing insights into the intricate dynamics underlying brain evolution such as genetic, epigenetic and biochemical factors. Furthermore, it is possible to unravel the mechanisms behind well-documented brain preservation in the archaeological record.

By successfully characterising biomolecules from degraded brain samples, we systematically address the limitations posed by degraded soft tissues, thereby challenging preconceived notions regarding the applicability of such samples in biomolecular and biomedical research and re-assessing their value within the context of brain evolution and pathogenesis. In particular, the protocol’s ability to reduce processing time and material requirements while also minimising biases associated with sample preservation variations, which are responsible for deviations in results and a lack of authentication and replication, enhances its practical utility. Processing a single sample under the same protocol and storage conditions, has the potential to help with noise reduction and artefact reduction for downstream bioinformatic applications by facilitating top-down approaches to simultaneous integration and dimensionality reduction of both NGS and MS datasets, which will further improve robustness for subsequent machine learning methods. This quality is especially valuable in studies involving finite or archival samples, where obtaining meaningful data can be challenging.

Additionally, we underscore the brain’s significance as a model tissue and lay the groundwork for future investigations into various tissues. The initial focus on brain tissues serves as a strategic starting point acknowledging the inherent challenges associated with these samples. As the protocol proves its efficacy in the brain, the potential application to other tissues becomes evident, promising a versatile tool for multi-omic analyses in diverse biological contexts.

Methods

Integrated Protocol Overview: The integrated protocol was validated using four artificially desiccated samples (SS1, SS2, SS3, SS4) against dedicated well-established and widely used extraction protocols for DNA, protein, lipids and metabolites. Subsequent extracts were processed for individual downstream workflows, and data acquisition performed via high throughput sequencing (HTS) for DNA and liquid chromatography with tandem mass spectrometry (LC–MS/MS) for protein, lipids and metabolites. For both the INTG and SA protocol for DNA, vortexing and ultrasonication were avoided to prevent fragmentation of the DNA. This is important in this study as we want to ensure this protocol is suitable for forensic and archaeological studies and case work. A schematic representation of the workflow is presented in Fig. S1.

Step 1: Organic molecule separation

For our INTG protocol, we opted for a relatively large volume of extract as we did with the standalone protocol14. The initial ratio of methyl-tetr-butyl ether (MTBE) and methanol (MeOH) (10:3 v/v) based lysis remains true to the original description16. However, we included 0.01% butylated hydroxytoluene (BHT) in our MTBE:MeOH mixture, to minimise oxidation of lipids. Furthermore, we incorporated the use of 0.1% ammonium acetate (AA) to induce phase separation17 while aiding precipitation of insoluble material including DNA and protein. Lipids and metabolites were first extracted using 4 mL MTBE:MeOH in a 10:317 ratio with a 60 min incubation step at 4 °C with gentle agitation in a shaking incubator. Phase separation was induced with 770 μL 0.1% AA (10:3:2.5 (v/v/v)) to aid precipitation of insoluble material, including protein and DNA. The upper layer, containing the non-polar lipids, and the lower layer, containing the polar metabolites, were then decanted into fresh glass tubes for drying and resuspension. Extracts were then dried under a gentle flow of nitrogen at 40 °C and re-suspended in either 200 μL MTBE containing 0.01% BHT (lipids) or 200 μL of 80% aqueous acetonitrile (ACN) with 0.1% formic acid (FA) (metabolites) and submitted to the Bio-MS core facility at the University of Manchester for data acquisition.

Step 2: DNA and Protein separation

The resultant pellets from the original extraction tubes were left to air dry for 5–10 min. The pellets were then resuspended in 500 µl of 5% sodium dodecyl sulfate (SDS) based lysis buffer (pH8) and left to incubate at room temperature for 30 min with gentle agitation. Following centrifugation at 1500×g for 10 min to pellet any insoluble material, the supernatant was transferred to 2 mL DNA Lobind tubes (Eppendorf) and 250 μL ammonium acetate (AA) (7.5 M) was added for a final concentration of 2.5 M. Following the addition of AA, samples were left to incubate at room temperature for 15 min and the protein was pelleted by centrifugation at 16,000×g for 10 min. The resultant supernatant, containing DNA, was then suspended in five volumes of spiked Qiagen PB buffer12 and purified by MinElute (Qiagen) with the addition of reservoirs for larger sample volumes12, followed by two rounds of elution with 30 µL elution buffer (Qiagen EB). Extracts were stored at − 20 °C until library preparation and sequencing at the Genomics Facility of the University of Manchester for data acquisition.

The remaining protein pellet was resuspended in 500 µL of a protein lysis buffer consisting of 5% SDS, 50 mM Tetraethylammonium bromide (TEAB) and LC–MS grade H2O, (pH 7.5). All samples were reduced with dithiothreitol (DTT) at a final concentration of 5 mM, for 10 min at 60 °C and allowed to cool to room temperature. Alkylation was facilitated with iodoacetamide (IAM) at a final concentration of 15 mM with a 30 min incubation at room temperature in the dark, followed by quenching with DTT. Samples were then centrifuged at 14,000×g for 10 min to pellet insoluble debris and the resultant clarified lysate was transferred to a fresh protein LoBind tube (Eppendorf).

Lysates were quantified on a Qubit™ 4 fluorimeter, with the Protein Broad Range assay kit (Invitrogen). Trypsin digestion was performed using the S-Trap column digestion protocol48 (ProtiFi). All subsequent binding, washing and elution steps were followed by centrifugation at 4000×g for 2 min. Samples were diluted to 1 mg/mL and 50 µL (containing 50 µg) was acidified with 5 µL of 12% aqueous phosphoric acid. The total 55 µL of acidified sample was loaded onto the columns with 330 µL S-Trap binding buffer (90% MeOH, 100 mM TEAB, PH 7.1). One wash with MTBE, followed by four washes with S-Trap binding buffer, was performed. Lyophilised trypsin was dissolved in 50 mM TEAB and loaded onto the columns (final protease: protein ratio of 1:10) and incubated for 60 min at 47 °C. Three elution steps were performed with 65 µL of 50 mM TEAB, 65 µL of 0.1% aqueous Formic Acid (FA) and 30 µL of 30% aqueous acetonitrile (ACN) containing 0.1% formic acid (FA). Samples were further desalted using Oligo™ R3 reversed-phase resin (Thermo Scientific) and eluted twice in 50 µL of 30% aqueous ACN containing 0.1% FA. Samples were dried by SpeedVac and submitted to the Bio-MS core facility at the University of Manchester for data acquisition.

Supplementary Information

Acknowledgements

The authors extend their appreciation to the Genomic Technologies Core Facility, the BioMS Facility and the Computational Shared Facility (CSF) at the University of Manchester. For advice and additional training we thank Professor Anna Nicolaou and Dr Peter Freeman. We extend our special thanks to Emeritus Professor of Biomolecular Archaeology Terence A. Brown for his advice and support.

Author contributions

BB: All laboratory procedures excluding data acquisition and artificial degradation. Methodology, investigation, analysis, writing, review, editing. JDAS: experimental mummification, proteomics, lipidomics, metabolomics analysis, writing, review, editing. GT: LC–MS/MS, protocol development paper review. KD: conceptualization, DNA sequencing pre-processing including variant calling and further downstream data analysis and visualisation, data collection and curation. Paper writing, review, editing, formatting submission. Supervision, training, budget management, project administration.

Funding

This study was funded by a £1.1 m donation to the KNH Centre for Biomedical Egyptology.

Data availability

Genomic data are curated at the European Nucleotide Archive under project accession number PRJEB61253. Proteomics data are available via ProteomeXchange with identifier PXD047606. Protein, Lipid and Metabolite Datasets are provided as Additional Files.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-67104-8.

References

  • 1.Karczewski, K. J. & Snyder, M. P. Integrative omics for health and disease. Nat. Rev. Genet.19, 299–310 (2018). 10.1038/nrg.2018.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Quinn, R. A. et al. From Sample to Multi-Omics Conclusions in under 48 Hours. mSystems1, e00038–16 (2016). [DOI] [PMC free article] [PubMed]
  • 3.Ichihashi, Y. et al. Multi-omics analysis on an agroecosystem reveals the significant role of organic nitrogen to increase agricultural crop yield. Proc. Natl. Acad. Sci.117, 14552–14560 (2020). 10.1073/pnas.1917259117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Subramanian, I., Verma, S., Kumar, S., Jere, A. & Anamika, K. Multi-omics data integration, interpretation, and its application. Bioinf. Biol. Insights14, 1177932219899051 (2020). 10.1177/1177932219899051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li, R., Li, L., Xu, Y. & Yang, J. Machine learning meets omics: applications and perspectives. Brief. Bioinform.23, bbab460 (2022). [DOI] [PubMed]
  • 6.Picard, M., Scott-Boyer, M.-P., Bodein, A., Périn, O. & Droit, A. Integration strategies of multi-omics data for machine learning analysis. Comput. Struct. Biotechnol. J.19, 3735–3746 (2021). 10.1016/j.csbj.2021.06.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Emmons, A. L., Davoren, J., DeBruyn, J. M. & Mundorff, A. Z. Inter and intra-individual variation in skeletal DNA preservation in buried remains. Forensic Sci. Int. Genet.44, 102193 (2020). 10.1016/j.fsigen.2019.102193 [DOI] [PubMed] [Google Scholar]
  • 8.Sedlackova, T., Repiska, G., Celec, P., Szemes, T. & Minarik, G. Fragmentation of DNA affects the accuracy of the DNA quantitation by the commonly used methods. Biol. Proced. Online15, 5 (2013). 10.1186/1480-9222-15-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rusu, I. et al. Dual DNA-protein extraction from human archeological remains. Archaeol. Anthropol. Sci.11, 3299–3307 (2019). 10.1007/s12520-018-0760-1 [DOI] [Google Scholar]
  • 10.Fagernäs, Z. et al. A unified protocol for simultaneous extraction of DNA and proteins from archaeological dental calculus. J. Archaeol. Sci.118, 105135 (2020). 10.1016/j.jas.2020.105135 [DOI] [Google Scholar]
  • 11.Tolosa, J. M., Schjenken, J. E., Civiti, T. D., Clifton, V. L. & Smith, R. Column-based method to simultaneously extract DNA, RNA, and proteins from the same sample | BioTechniques. 43, 799–804 (2007). [DOI] [PubMed]
  • 12.Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci.110, 15758–15763 (2013). 10.1073/pnas.1314445110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Folch, Jordi., Ascoli, I., Lees, M., Meath, J. A. & LeBaron, F. N. Preparation of lipid extracts from brain tissue. J. Biol. Chem.191, 833–841 (1951). [PubMed]
  • 14.Bligh, E. G. & Dyer, W. J. A rapid method of total lipid extraction and purification. Can. J. Biochem. Physiol.37, 911–917 (1959). 10.1139/y59-099 [DOI] [PubMed] [Google Scholar]
  • 15.Breil, C., Abert Vian, M., Zemb, T., Kunz, W. & Chemat, F. “Bligh and Dyer” and folch methods for solid–liquid–liquid extraction of lipids from microorganisms. Comprehension of solvatation mechanisms and towards substitution with alternative solvents. Int. J. Mol. Sci.18, 708 (2017). 10.3390/ijms18040708 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Matyash, V., Liebisch, G., Kurzchalia, T. V., Shevchenko, A. & Schwudke, D. Lipid extraction by methyl-tert-butyl ether for high-throughput lipidomics. J. Lipid Res.49, 1137–1146 (2008). 10.1194/jlr.D700041-JLR200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Coman, C. et al. Simultaneous metabolite, protein, lipid extraction (SIMPLEX): A combinatorial multimolecular omics approach for systems biology. Mol. Cell. Proteom.15, 1435–1466 (2016). 10.1074/mcp.M115.053702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Morton-Hayward, A. L. et al. A conscious rethink: Why is brain tissue commonly preserved in the archaeological record? Commentary on: Petrone P, Pucci P, Niola M, et al. Heat-induced brain vitrification from the Vesuvius eruption in C.E. 79. N. Engl. J. Med. 2020;382:383–4. 10.1056/NEJMc1909867. STAR Sci. Technol. Archaeol. Res.6, 87–95 (2020). [DOI] [PubMed]
  • 19.O’Connor, S., Edwards, H. G. M. & Ali, E. M. A. The preservation of archaeological brain remains in a human skeleton. Philos. Trans. R. Soc. Math. Phys. Eng. Sci.374, 20160208 (2016). [DOI] [PubMed] [Google Scholar]
  • 20.O’Connor, S. et al. Exceptional preservation of a prehistoric human brain from Heslington, Yorkshire, UK. J. Archaeol. Sci.38, 1641–1654 (2011). 10.1016/j.jas.2011.02.030 [DOI] [Google Scholar]
  • 21.Hayman, J. & Oxenham, M. Estimation of the time since death in decomposed bodies found in Australian conditions. Aust. J. Forensic Sci.49, 31–44 (2017). 10.1080/00450618.2015.1128972 [DOI] [Google Scholar]
  • 22.Graw, M., Weisser, H. J. & Lutz, S. DNA typing of human remains found in damp environments. Forensic Sci. Int.113, 91–95 (2000). 10.1016/S0379-0738(00)00221-8 [DOI] [PubMed] [Google Scholar]
  • 23.Prats-Muñoz, G. et al. A paleoneurohistological study of 3000-year-old mummified brain tissue from the Mediterranean bronze age. Pathobiology79, 239–246 (2012). 10.1159/000334353 [DOI] [PubMed] [Google Scholar]
  • 24.Thakar, M. K., Joshi, B., Shrivastava, P., Raina, A. & Lalwani, S. An assessment of preserved DNA in decomposed biological materials by using forensic DNA profiling. Egypt. J. Forensic Sci. 9, (2019).
  • 25.Bär, W., Kratzer, A., Mächler, M. & Schmid, W. Postmortem stability of DNA. Forensic Sci. Int.39, 59–70 (1988). 10.1016/0379-0738(88)90118-1 [DOI] [PubMed] [Google Scholar]
  • 26.Serrulla, F. et al. Preserved brains from the Spanish Civil War mass grave (1936) at La Pedraja1, Burgos. Spain. Sci. Justice56, 453–463 (2016). 10.1016/j.scijus.2016.08.001 [DOI] [PubMed] [Google Scholar]
  • 27.Oh, C. S. et al. Amplification of DNA remnants in mummified human brains from medieval Joseon tombs of Korea. Anthropol. Anz.70, 57–81 (2013). 10.1127/0003-5548/2012/0225 [DOI] [PubMed] [Google Scholar]
  • 28.Petzold, A. et al. Protein aggregate formation permits millennium-old brain preservation. J. R. Soc. Interface17, 20190775 (2020). 10.1098/rsif.2019.0775 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pääbo, S., Gifford, J. A. & Wilson, A. C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res.16, 9775–9787 (1988). 10.1093/nar/16.20.9775 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Higgins, D. & Austin, J. J. Teeth as a source of DNA for forensic identification of human remains: A review. Sci. Justice53, 433–441 (2013). 10.1016/j.scijus.2013.06.001 [DOI] [PubMed] [Google Scholar]
  • 31.Grove, C., Peschel, O. & Nerlich, A. G. A systematic approach to the application of soft tissue histopathology in paleopathology. BioMed Res. Int.2015, e631465 (2015). 10.1155/2015/631465 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wang, Y., Zhao, B., Choi, J. & Lee, E. A. Genomic approaches to trace the history of human brain evolution with an emerging opportunity for transposon profiling of ancient humans. Mob. DNA12, 22 (2021). 10.1186/s13100-021-00250-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Shved, N. et al. Post mortem DNA degradation of human tissue experimentally mummified in salt. PLoS ONE9, e110753 (2014). 10.1371/journal.pone.0110753 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Cockitt, J., Lamb, A. & Metcalfe, R. An ideal solution? Optimising pretreatment methods for artificially mummified ancient Egyptian tissues. Rapid Commun. Mass Spectrom.34, e8686 (2020). 10.1002/rcm.8686 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Öhrström, L. et al. Experimental mummification—In the tracks of the ancient Egyptians. Clin. Anat.33, 860–871 (2020). 10.1002/ca.23568 [DOI] [PubMed] [Google Scholar]
  • 36.Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics29, 1682–1684 (2013). [DOI] [PMC free article] [PubMed]
  • 37.Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol.26, 1367–1372 (2008). 10.1038/nbt.1511 [DOI] [PubMed] [Google Scholar]
  • 38.Choi, M. et al. MSstats: An R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics30, 2524–2526 (2014). 10.1093/bioinformatics/btu305 [DOI] [PubMed] [Google Scholar]
  • 39.Sjöstedt, E. et al. An atlas of the protein-coding genes in the human, pig, and mouse brain. Science367, eaay5947 (2020). [DOI] [PubMed]
  • 40.Bardou, P., Mariette, J., Escudié, F., Djemiel, C. & Klopp, C. jvenn: An interactive Venn diagram viewer. BMC Bioinf15, 293 (2014). 10.1186/1471-2105-15-293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Farooqui, A. A., Horrocks, L. A. & Farooqui, T. Glycerophospholipids in brain: their metabolism, incorporation into membranes, functions, and involvement in neurological disorders. Chem. Phys. Lipids106, 1–29 (2000). 10.1016/S0009-3084(00)00128-6 [DOI] [PubMed] [Google Scholar]
  • 42.Yoon, J. H. et al. Brain lipidomics: From functional landscape to clinical significance. Sci. Adv.8, eadc9317 (2022). [DOI] [PMC free article] [PubMed]
  • 43.Ahmadpour, S. T., Mahéo, K., Servais, S., Brisson, L. & Dumas, J. F. Cardiolipin, the mitochondrial signature lipid: Implication in cancer. Int. J. Mol. Sci.21, 8031 (2020). 10.3390/ijms21218031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sikela, J. M. The jewels of our genome: The search for the genomic changes underlying the evolutionarily unique capacities of the human brain. PLOS Genet.2, e80 (2006). 10.1371/journal.pgen.0020080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Vallender, E. J., Mekel-Bobrov, N. & Lahn, B. T. Genetic basis of human brain evolution. Trends Neurosci.31, 637–644 (2008). 10.1016/j.tins.2008.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Enard, W. The molecular basis of human brain evolution. Curr. Biol.26, R1109–R1117 (2016). 10.1016/j.cub.2016.09.030 [DOI] [PubMed] [Google Scholar]
  • 47.Hunter, P. Ancient rules of memory. EMBO Rep.9, 124–126 (2008). 10.1038/sj.embor.2008.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Zougman, A., Selby, P. J. & Banks, R. E. Suspension trapping (STrap) sample preparation method for bottom-up proteomics analysis. PROTEOMICS14, 1006–1000 (2014). 10.1002/pmic.201300553 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

Genomic data are curated at the European Nucleotide Archive under project accession number PRJEB61253. Proteomics data are available via ProteomeXchange with identifier PXD047606. Protein, Lipid and Metabolite Datasets are provided as Additional Files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES