Skip to main content
Communications Biology logoLink to Communications Biology
. 2020 Sep 28;3:535. doi: 10.1038/s42003-020-01262-z

Machine learning assistive rapid, label-free molecular phenotyping of blood with two-dimensional NMR correlational spectroscopy

Weng Kung Peng 1,, Tian-Tsong Ng 2, Tze Ping Loh 3,
PMCID: PMC7522972  PMID: 32985608

Abstract

Translation of the findings in basic science and clinical research into routine practice is hampered by large variations in human phenotype. Developments in genotyping and phenotyping, such as proteomics and lipidomics, are beginning to address these limitations. In this work, we developed a new methodology for rapid, label-free molecular phenotyping of biological fluids (e.g., blood) by exploiting the recent advances in fast and highly efficient multidimensional inverse Laplace decomposition technique. We demonstrated that using two-dimensional T1-T2 correlational spectroscopy on a single drop of blood (<5 μL), a highly time- and patient-specific ‘molecular fingerprint’ can be obtained in minutes. Machine learning techniques were introduced to transform the NMR correlational map into user-friendly information for point-of-care disease diagnostic and monitoring. The clinical utilities of this technique were demonstrated through the direct analysis of human whole blood in various physiological (e.g., oxygenated/deoxygenated states) and pathological (e.g., blood oxidation, hemoglobinopathies) conditions.

Subject terms: Magnetic resonance imaging, Haematological diseases, Molecular medicine, Biomarkers


Weng Kung Peng et al. present a novel scheme using two-dimensional NMR spectroscopy for rapid and label-free testing of biological fluids (e.g., red blood cells) at point-of-care. They demonstrate its clinical utility (e.g., with haemoglobin disorders) and report a unique and specific ‘molecular fingerprint’ through direct analysis of a single drop of blood (<5 µl). Machine learning is used to facilitate simpler decision making.

Introduction

High-resolution nuclear magnetic resonance (NMR) spectroscopy is a powerful and attractive technique in biochemistry (e.g., for structural protein analysis1, characterizing metabolomics responses in biological samples24) and inorganic chemistry5. However, high-resolution NMR systems are large, expensive and incompatible with in situ or portable applications. There is an increasing demand for low-field portable NMR system for use in food sciences6, oil-gas exploration7, and clinical diagnostic at point-of-care testing (POCT)811. In high-field NMR, biochemical information is typically detected and encoded in the frequency domain (“chemical shift”), in which the spectral resolution scale with respect to the external magnetic field. This reduces its portability and limit its downstream application in a large scale manner.

However, biochemical and biophysical information (e.g., molecular rotational, diffusional motion) can also be encoded in the relaxation times frame, namely the longitudinal (T1) and transverse (T2) using NMR-based POCT. In addition, molecular information in the time-domain can be inversely decoded with the availability of fast and reliable Laplace inversion algorithm7,12. This can provide parallel information that is not available in the traditional NMR frequency domain based spectra.

In recent years, significant advances in NMR system miniaturization8,1315 (e.g., electronic console13,14,16,17, radio-frequency probe9,10,1820, microfluidic-based chip21,22) utilizing small foot-print permanent magnetic (<1 Tesla) for one-dimensional NMR relaxometry on water-proton (e.g., T2-relaxation) have been widely applied for point-of-care medical testing8,9,14. These include immuno-magnetic labeled (e.g., tumour cells8,22, tuberculosis23 and magneto-DNA detection of bacteria24) and the label-free detection of various pathological states such as oxygenation20/oxidation level10 of the blood, malaria screening9,25, and rapid phenotyping of oxidative stress in diabetes mellitus26,27.

We demonstrated (to the best of our knowledge) the first unique two-dimensional ‘molecular fingerprint' of a single drop of blood (<5 µL) obtained in minutes using two dimensional T1-T2 correlational spectroscopy with an inexpensive, benchtop-sized NMR spectrometer28,29. By exploiting the recent development of fast and highly efficient multidimensional inverse Laplace decomposition algorithm7,30, unique two-dimensional signature of various hemoglobin (Hb) derivatives with respect to its magnetic resonance relaxation reservoirs in oxygenated (oxy-Hb), deoxygenated (deoxy-Hb) and oxidized (oxidized Hb) states were observed for the first time (to the best of our knowledge) and its phenotypic expression in various pathological states (e.g., blood oxidation, hemoglobinopathies) are reported in this work. Machine-learning techniques (e.g., multidimensional scaling (MDS), t-SNE, Isomap) were introduced to transform the NMR correlational maps into user-friendly information for medical decision making. We report that the supervised models (e.g., neural network) were at least on par or outperformed the average trained human being in performing the deep image analysis of molecular fingerprint of red blood cells (RBCs).

Results

Water-protein interactions in blood microenvironment

Freshly collected whole blood samples containing predominantly the oxy-Hb were collected from healthy donors ('wild-type'). Oxygenation and re-oxygenation was achieved with rigorous pipetting in ambient air. Using microcapillary tube, the whole blood was sampled and spun (6000 × g, 1 min) into narrowband of RBCs for micro NMR measurements (Figs. 12).

Fig. 1. Two-dimensional NMR T1-T2 correlational spectroscopy for molecular phenotyping of blood.

Fig. 1

a Schematic diagram of the bench-top sized NMR-based POCT system. The applied radio frequencies were centered at 21.57 MHz, which corresponds to the Larmor frequency of water-proton in 0.5 Tesla of the permanent magnet. The 90-degree pulse used is 10 μs. The whole system is lightweight (<2 kg) and portable suitable for in situ measurements. The abbreviations are; USB Universal Serial Bus, trans Transmitter, rcv Receiver, amp pre-amplifier, PA power amplifier, rf radio-frequency, and PC personal computer. b The pulse sequence used for the T1-T2 correlational spectroscopy is the modified inversion recovery with CPMG observation. It is encoded for a period of t1 and subsequently spaced for a period of t2 for n-train pulses, in entirely in analogous to the two-dimensional NMR spectroscopy in the frequency domain. The relaxation properties can be used as a highly sensitive and specific molecular probe, and provide important molecular motion (e.g., correlational relaxation, diffusion properties), which is not readily available in NMR spectra in the frequency domain. c A single drop of whole blood contained in a microcapillary tube was spun using standard hematocrit centrifuge (6000 × g, 1 min) to separate and concentrate the RBCs from the plasma. The capillary tube is then loaded into a permanent magnet. The tube was adjusted as such that the radio frequency coil (inner diameter of 1.20 mm) focuses on the packed RBCs (enrichment part). This is essential to have ‘clean signal' from the RBCs without (or with minimal) interference of blood plasma.

Fig. 2. A proposed scheme of human-machine interaction for rapid, label-free disease detection in clinical hemoglobinopathies.

Fig. 2

a The NMR-based POCT is used with (or without) the assistant of artificial intelligence (AI). b The highly unique and detailed 2D magnetic resonance-based molecular fingerprint can be used directly (without AI) for rapid screening. c Clinical phenotype (e.g., clinical representation) can be bias due to subjective human judgment. With AI, deep image analysis (e.g., hierarchical clustering, dimension reduction) were performed to transform the highly complicated data (e.g., hyper dimension) into human friendly information to assist in medical decision making (e.g., diagnostic, staging) in real-time mode (Fig. 6). d Multi-omics information (e.g., proteomics, genomics) may be performed simultaneously to confirm the genetic variants and/or other anomalies. Back-end laboratory and time consuming test (e.g., high-performance liquid chromatography (HPLC)) may be by-passed depending on the outcome of the molecular phenotyping.

Three peaks (R-peak, S-peak and T-peak) with (T2 = 141 ms, T1 = 562 ms), (T2 = 4.47 ms, T1 = 335 ms) and (T2 = 1.12 ms, T1 = 188 ms) respectively were observed from the T1-T2 correlational spectroscopy performed on the water-proton nuclei (1H) of the RBCs (Fig. 3a). It appeared that RBCs microenvironment could be decomposed into two major relaxation reservoirs, consisting of one slow relaxation component (R-peak), and two fast relaxation components (S-peak, T-peak), attributed to the interaction of the water molecules with its' respective microenvironment i.e., bulk water, intermediate hydration layer, macromolecules protein, respectively (Supplementary Fig. 1). Water molecules are subjected to diverse dynamic processes as a result of their interaction with variety of sites/functional groups.

Fig. 3. The T1-T2 correlational spectrum of blood microenvironment.

Fig. 3

a The decomposed relaxation reservoirs (R-peak, S-peak, and T-peak) of packed red blood cells microenvironment with the hemoglobin in oxygenated state. The coordinate is represented as (T2 relaxation (in ms), T1 relaxation (in ms), A-ratio (unitless)). A-ratio is the ratio between T1/T2. b The multiple relaxation reservoirs of the blood microenvironment in the T1-T2 correlational spectrum in log-log plot; i.e., the bulk water (R-peak), hydration layer (S-peak), and direct macromolecular protein interaction (T-peak) for hemoglobin in oxygenated state. In the oxidized state, the T-peak dropped substantially (T0-peak). The unbound molecule, Ro (e.g., free water) located on the diagonal line (A-ratio approaches unity).

The significantly large signal intensity (and slowest relaxation component) of R-peak is attributable to bulk water molecules which makes up more than 98% of the total mass-ratio of RBCs. The bulk water has minimal and indirect contact with macromolecules protein (through long-range dipolar couplings), and hence the weakest water-protein interactions. The relaxation dephasing system came predominantly from the dipole-dipole homonuclei coupling of water-to-water network. On the other hand, the presence of two distinct individual peaks (i.e., S-peak, T-peak) suggested that the fast relaxation component can be further resolved into sub-regions31,32. The S-peak is the water molecules at the intermediate hydration layer, and the T-peak are water molecules, which came into direct contact with the surface of macromolecular protein. Dortch et al. and McDonald et al., proposed the idea of exchange peaks33 and surface relaxation34, respectively, but the observation in this work is in consistent with the three peaks model proposed by Lores et al. and Thompson et al.35,36.

Interestingly, each peak (R-peak, S-peak, T-peak) possess consistent and yet unique ratio of T1/T2 of (3.99, 74.90, 167.86), respectively, which appeared to characterize the degree of water-protein interactions (Fig. 3b and Table 1). We define here the T1/T2 ratio as A-ratio. With increased water-protein interactions, the motion of water-proton was drastically slower and restricted (and hence the reduced T1 relaxation and T2 relaxation). The spin-spin relaxation appeared to be much more efficient (shorter T2 relaxation) relative to its' spin-lattice relaxation counterpart and hence a large A-ratio. In contrast, an unbound free molecules in the extreme fast motion region, possess large T1 relaxation and T2 relaxation, with A-ratio approaches unity (~1). Importantly, the relaxation profile forms unique and specific two-dimensional ‘molecular fingerprint' of each individual that is very sensitive to its' molecular microenvironment measurable at the timescales of NMR relaxation times.

Table 1.

The decomposed relaxation reservoirs of packed red blood cells with the hemoglobin in (a) oxygenated, (b) oxidized, and (c) deoxygenated states.

Bulk water, R-peak T1 (ms) T2 (ms) A-ratio
a. oxygenated state 562 141 3.99
b. oxidized state 217 120 1.81
c. deoxygenated state 463 102 4.53
Hydration layer, S-peak T1 (ms) T2 (ms) A-ratio
a. oxygenated state 335 4.47 74.94
b. oxidized state 120 4.18 28.71
c. deoxygenated state 242 2.71 89.30
Direct bound macromolecules, T-peak T1 (ms) T2 (ms) A-ratio
a. oxygenated state 188 1.12 167.86
b. oxidized state, (Ti) 50.3 1.34 37.54
b. oxidized state, (T0) 2.43 0.78 3.12
c. deoxygenated state 175 0.565 309.73

A-ratio is the ratio between T1/T2.

Oxidative degradation of hemoglobin in blood

Freshly collected whole blood sample which consists of predominantly the oxy-Hb was oxidized to oxidized Hb in the presence of sodium nitrite, and spun down for NMR measurements (see Methods). The relaxation times of the three major peaks were (R-peak: T2 = 120 ms, T1 = 217 ms), (S-peak: T2 = 4.18 ms, T1 = 120 ms), and (T-peak: T2 = 1.34 ms, T1 = 50.3 ms) in oxidized state reduced considerably as compared to the baseline oxygenated state (non-oxidized, diamagnetic state) (Fig. 4a, b). The presence of excessive oxidized Hb in blood causes serious tissue hypoxia, a pathological state known clinically as methemoglobinemia37.

Fig. 4. The T1-T2 correlational spectrum of blood microenvironment of (wild type) packed red blood cells.

Fig. 4

There were in various physiological states; a oxygenated, b oxidized, and c deoxygenated states. The zoom-in details of decomposed relaxation reservoirs for fast relaxation components (S-peak and T-peak) and the slow relaxation component (bulk water molecules, R-peak) is not shown. The coordinate for R-peak is indicated at upper left of the spectrum. The coordinate is represented by (T2 relaxation (in ms), T1 relaxation (in ms), A-ratio). Freshly prepared oxy-Hb was subjected to oxidation with 10 mM sodium nitrite for 45 min, and sodium dithionite (in excess) for 40 min to chemically locked the Hb in the deoxygenated state. All the samples were washed thrice and resuspended into 1x PBS for micro MR measurements. The experimental parameters used were echo time = 200 µs, T1-incremental steps = 32 steps, and signal averaging = 4. The number of echoes used were 4000 (oxygenated Hb) and 2000 (oxidized Hb, deoxygenated Hb).

The marked relaxation enhancement observed was due to the presence of five unpaired electrons in the ferric iron (Fe3+), which acted as the paramagnetic relaxation center37,38. The magnetic moment of ferric iron is 1000-fold higher than that of one single proton37,39. Significantly, due to the long-range dipolar nuclei-electron, the paramagnetism of the unpaired electrons had considerable effect on the bulk water molecules (R-peak). In contrast to the oxygenated states (in diamagnetic state), the spin-lattice relaxation effect in oxidized states (in paramagnetic state) appeared to be much more efficient in comparison to the spin-spin relaxation effect and hence the reduction in A-ratio = 1.81 (Table 1).

A distinctively long stretch of T1-relaxation distribution, extending across two orders of magnitude (ca., 1 ms to 100 ms along the T1 dimension) displayed by the protein-bound water-proton molecules (from T-peak to T0-peak). The ‘relaxation tail' originating from (T2 = 1.34 ms, T1 = 50.3 ms) to (T2 = 0.78 ms, T1 = 2.43 ms), notably became a distinctive feature of oxidized Hb. This is due to the distance (r)-dependent paramagnetism effect, in which the relaxation efficiency reduced at the rate of 1/r6 from its relaxation center37. As the proton nuclei approach the relaxation center (of the unpaired electron), the T1- and T2-relaxation components reduced to a comparable rate (A-ratio approaching unity, To in Fig. 3b). The gradual process of Hb oxidation under the exposure of mild oxidant were captured in a well-controlled manner confirmed the existence of transitional states in the formation of ‘tail' (Supplementary Fig. 2).

On the other hand, the protein-bound water molecules (T-peak) in the deoxygenated states, exhibited profound T2 shortening (0.565 ms) with relatively very little T1 shortening (175 ms) due to the short relaxation time of electron and its obscure protein configuration40. As a result, the A-ratio of deoxy-Hb (309) is distinctively larger than its oxy-Hb (167.9) and oxidized Hb (37.5) counterparts (Fig. 4c).

Rapid molecular phenotyping in clinical hemoglobinopathies

We demonstrated the clinical utility of molecular phenotyping in clinical hemoglobinopathies by mapping out the spectrum of heterozygous HbE, HbD and a heterozygous beta thalassemia (HBB:c.27_28insG) variants (Fig. 5 and Table 2). An additional six other Hb variants (in Supplementary Fig. 3b) were received for machine learning and blind test studies (Table 3 and Fig. 6). A limitation of this study was that the current study only involve heterozygous HbE phenotype. Given the low prevalence of homozygous HbE variant phenotype (~0.1%) in our population41, therefore, we were unable to include such subject during the study period. The Hb variants were first identified by a cation-exchange high-performance liquid chromatography method (Bio-Rad Variant II analyzer) and further confirmed by capillary electrophoresis (Sebia CAPILLARYS 2 analyzer) and genotyping. NMR measurements were carried out in its native state (without any chemical treatment) of the spun down packed RBCs.

Fig. 5. The T1-T2 correlational spectrum of blood microenvironment of (various hemoglobin variants) packed red blood cells.

Fig. 5

The variants were (a) HbE variant, (b) HbD variant, and (c) rare beta thalassemia variant, and other Hb variants (in Supplementary Fig. 3b). The zoom-in details of decomposed relaxation reservoirs for fast relaxation components (S-peak and T-peak) and the slow relaxation component (bulk water molecules, R-peak) is not shown. The coordinate for R-peak is indicated at upper left of the spectrum. The coordinate is represented by (T2 relaxation (in ms), T1 relaxation (in ms), A-ratio). The experimental parameters used were echo time = 200 µs, number of echoes = 4000, T1-incremental steps = 32 steps, and signal averaging = 4. Note that there is a possible artifact denoted as (*).

Table 2.

The decomposed relaxation reservoirs of packed red blood cells with the hemoglobin variants in (a) wild-type (control), (b) HbE variant, (c) HbD variant, and (d) rare beta thalassemia variant.

Bulk water, R-peak T1 (ms) T2 (ms) A ratio
a. Hb wild-type (control) 562 141 3.99
b. HbE variant 631 158 3.99
c. HbD variant 640 165 3.88
d. rare Hb variant 640 165 3.88
Hydration layer, S-peak T1 (ms) T2 (ms) A-ratio
a. Hb wild-type (control) 335 4.47 74.94
b. HbE variant 335 4.22 79.39
c. HbD variant 373 4.18 89.23
d. rare Hb variant 362 6.48 55.86
Direct bound macromolecules, T-peak T1 (ms) T2 (ms) A-ratio
a. Hb wild-type (control) 188 1.12 167.9
b. HbE variant 106 1.06 100
c. HbD variant 96.3 1.20 80
d. rare Hb variant 172 1.40 122.9

Table 3.

The performance of supervised machine learning models (e.g., neural network, k nearest neighbor (kNN), and logistic regression) in comparison to 5 technicians.

Methods AUC CA Sensitivity Specificity Precision F1
Neural network 0.92 0.906 0.906 1 0.938 0.913
kNN 0.912 0.844 0.844 0.83 0.881 0.855
Logistic regression 0.927 0.906 0.906 0.83 0.914 0.909
Average 0.920 0.885 0.885 0.887 0.911 0.892
Technician 1 0.781 0.778 0.800 0.955 0.857
Technician 2 0.750 0.731 0.833 0.950 0.826
Technician 3 0.813 0.846 0.667 0.917 0.880
Technician 4 0.813 0.885 0.500 0.885 0.885
Technician 5 0.813 0.815 0.800 0.957 0.880
Average 0.794 0.811 0.720 0.932 0.866

The k-fold cross validation sampling methods (e.g., k = 2, 3, 5) and leave-one-out method were used to test and train the data. The performance of naïve Bayes model is well below the average human being (details in Supplementary Fig. 7). The abbreviations used were area under the curve (AUC), classification accuracy (CA), and F1-score is the harmonic mean for precision and sensitivity.

Fig. 6. Machine-learning assisted NMR-based POCT in making medical decision.

Fig. 6

a The workflow of machine learning in processing the complicated data into user-friendly medical decision (e.g., disease subtyping). The maps were converted into machine language using the image embedding (e.g., Squeeze Net) features. Dimensionality reductions were performed using various unsupervised models (e.g., MDS, t-SNE, Isomap). Supervised learning models (e.g., neural network, logistic regression, naïve Bayes) were used to train and predict the data. The performance of supervised learning techniques were compared to that of human performance (Table 3). b The classification of three states (disease, non-disease, variants) and disease subtyping (sub-type 1: oxidized Hb, sub-type 2: partially oxidized Hb), and c heat map of 32 anonymized subjects processed using multidimensional scaling technique (300 max iterations, PCA-Torgersen). The legend (red, white) indicates (longer, shorter) distance between subjects. Other unsupervised models (e.g., linearly local embedding, Isomap, t-sne) were also evaluated for comparison (Supplementary Fig. 5). d The hierarchical clustering enabled disease staging, prognosis or risk factor prediction (high/low-risk subject) with respect to standard reference. For simplicity, three referencing states (WT and oxidized Hb) were shown. The non-disease state consists of (healthy wild-type), and disease state consist of (oxidized Hb, Hb variants). The short forms used were wild type (WT), oxidized Hb (Oxi), and Hb variants (Var). The clustering circles (dotted lines) were drawn for eye-balling purposes. The NMR correlational map of each subject is shown in Supplementary Fig. 4.

The Hb genotyping identified single nucleotide polymorphism in the β-globin in the first and second samples, which was consistent with HbE (Fig. 5a) and HbD variant (Fig. 5b). A third rare Hb variant samples were identified with a G insertion at codon 27 of the β-globin gene (Fig. 5c). These hemoglobin variants exhibit similar clinical phenotype such as mild hemolysis and susceptible to oxidation42,43. The two-dimensional correlational mapping of Hb variants (Fig. 5a–c) revealed an unusual spectrum characteristic as compared to wild-type RBCs (Supplementary Fig. 3). The HbE variant (T2 = 1.06 ms, T1 = 106 ms), HbD variant (T2 = 1.20 ms, T1 = 96.3 ms), and the beta thalassemia variant (T2 = 1.40 ms, T1 = 172 ms) appears to have large and distorted T-peak with relatively short T1- and T2- relaxations as compared to wild-type Hb (T2 = 1.12 ms, T1 = 188 ms). The T-peak dispersion for the beta thalassemia variant with a mutated β-globin chain was particularly large with a flat plateau, suggesting that frame shift mutation causes a greater amount of hemoglobin instability42 (Fig. 5c).

In addition, the Hb variants appear to have much higher concentration of oxidized Hb as compared to the wild-type (Supplementary Fig. 3a). T1-relaxation stretching was observed for HbE variant (T2 = 0.94 ms, T1 = 9.44 ms) and the beta thalassemia variant (T2 = 0.56 ms, T1 = 10 ms), in agreement with commonly observed clinical phenotype such as mild hemolysis due to increased oxidative damage. Interaction of Hb variants and other forms of hemoglobinopathies can lead to complex thalassemia syndromes with varying clinical phenotypes (Fig. 2).

Machine learning assisted medical decision

The 32 anonymized subjects consist of mixture of non-disease samples (wild-type), and disease samples (details in Supplementary Fig. 4). The NMR correlational spectroscopy maps ('molecular fingerprint') were converted into computer language for deep image analysis using statistical programming languages (e.g., R, Orange 3.1.2). Structural abnormalities in hemoglobin variants also lead to the observation of clinical methemoglobinemia in the late stage. The oxidized Hb samples were simulated examples for clinical methemoglobinemia.

The unsupervised learning techniques were used for dimension reduction (e.g., MDS), and classification (e.g., hierarchical clustering) to assist in making medical decision (Fig. 6a). The 2D NMR correlational spectroscopy maps are complex 3D contour plots, and MDS technique was used to reduce higher dimension into two dimensional scatter plot which is more user-friendly for interpretation of information (Fig. 6b). Each feature ('molecular fingerprint' of one subject) was classified based on the common similarity within their intra-cluster as opposed to their inter clusters. Subjects were successfully classified into two clusters (disease (oxidized Hb, blue), non-disease (healthy wild type, red)) using the MDS technique (P < 0.05), apart from the mutated counterpart (Hb variants, orange). In addition, the disease subtypes (sub-type 1: oxidized Hb, sub-type 2: partially oxidized Hb) were also observed (Fig. 6b). Distances between each subjects were shown in the heat map (Fig. 6c). Using hierarchical clustering, disease staging, prognosis or risk factor prediction (high/low-risk factor) were enabled (Fig. 6c, d). Other techniques (e.g., Isomap, linearly local embedding, t-sne) were evaluated and similar results were reproduced qualitatively (Supplementary Fig. 5).

Blinded test: machine vs human learning

The 32 anonymized subjects consist of mixture of non-disease samples (wild-type) and disease samples (details in Supplementary Fig. 4). Supervised learning models (e.g., logistic regression, neural network, k nearest neighbors (kNN) and naïve Bayes) were used to evaluate its' efficiency against human-being. K-fold cross validation (e.g., k = 2, 3, 5) and leave-one-out method were used for samplings. Five technicians were trained to differentiate between (diseases, non-disease) and subsequently were asked to classify the state of the spectrum based on a binary decision (diseases, non-disease) in blinded manner. At the end of the experiment, the results were cross-checked and classified as true positive, true negative, false positive and false negative (Supplementary Fig. 6). On-average, the machine learning models (e.g., CA = 0.885, sensitivity = 0.885, specificity = 0.887) outperformed the human being (e.g., CA = 0.794, sensitivity = 0.811, specificity = 0.720) in many aspects, when k = 5 (Table 3). The performance of the supervised models, in general, improved with increasing value of k and achieved the maximum point when ‘leave-one-out' method was used in training the datasets (details in Supplementary Fig. 7). Noticeably, the performance variation between each individual was larger than that of machine learning models as a result of human subjective judgment. On-average, machine learning models (30 s) also took much shorter time than human (about 10 min) to complete the tasks given.

Discussion

In this work, we showed that detailed and specific molecular microenvironment of water-proton interactions in blood can be mapped out using the two-dimensional T1-T2 correlational spectroscopy (Supplementary Table 1 and Supplementary Table 2). Interestingly, as water is ubiquitous to life form, water-protein interactions (e.g., the protein hydration) attracted considerable interests from terahertz spectroscopy44 to neutron scattering45, provides an equivalent of ‘inverse proteomic' information. This adds a new dimension to the existing traditional omics framework (e.g., genomic, proteomic) potentially revealing many biological pathways and understanding of fundamental of biological processes which have never been examined before.

It is demonstrated that the proposed technique here is capable of rapid label-free phenotyping the biological fluids in various physiological conditions (e.g., de/oxygenation level) and pathological states (e.g., blood oxidation, hemoglobinopathies) in uniquely personalized manner. We showed that time-to-result could be accomplished in minutes (Supplementary Fig. 1). With the recent availability of ultrafast signal acquisition methods12 and efficient inversion algorithm7, real-time characterization and monitoring is possible. Aided with machine learning techniques, complicated NMR correlational maps were immediately transformed into clinically meaningful and user-friendly information.

Secondly, encoding multidimensional biochemical and biophysical information at molecular level using two-dimensional relaxation profiling (instead of chemical shifts), circumvent the limitation of using conventional big footprint NMR. Unlike high-field NMR spectroscopy, mass spectrometry, high-performance liquid chromatography where the instrumentation are often bulky and expensive (Table 4), an interesting NMR-based POCT proposed in this work offers inexpensive assay and instrumentation (e.g., open source code software-defined-radio4648). Importantly, the unique and specific molecular fingerprint of liquid biopsy is able to provide a multiple global snapshot for disease dynamic monitoring in a minimally invasive manner49,50.

Table 4.

Comparison between the current proposed NMR-based POCT with AI-aided technology and existing state-of-the-art technologies (e.g., high-field NMR52, electrophoresis53, hplc54, and PCR-based assay45) which have been reported for clinical hemoglobinopathies.

NMR-based POCT High-field NMR Electrophoresis HPLC cation exchange PCR-based assay
Information Phenomic Proteomic Proteomic Proteomic Genomic
Mode of action Water-protein interactions (time domain) Chemical shift (frequency domain) Electric field (protein charge) Mass transfer DNA amplification
Multi-dimensional Yes Yes Yes No No
AI-assisted Yes No No No No
Equipment size Bench-top Large Bench-top Large Large
Equipment price Cheap Expensive Cheap Expensive Expensive
Price per assay Ultra cheap Medium Medium Expensive Expensive
Sample processing Easy Easy Difficult Difficult Difficult
Time to results Minute Hours Hours Hours Hours
POCT Yes No Yes No No
Non-destructive Yes Yes No No No
Functional test Yes Yes No No No
References This work Levitt et. al., Kutlar et. al., Kehra et. al., Kutlar et. al.,

The price per assay for NMR-based POCT refers to a single microcapillary tube (<$0.10). In addition, the proposed method is label-free and therefore no chemical treatment is required.

HPLC high-performance liquid chromatography.

In summary, a novel concept of high unique and specific ‘molecular fingerprint' of blood was demonstrated using time-domain two-dimensional NMR-based POCT. The assessment of multidimensional relaxation components of the blood was shown to be highly time- and patient-specific, delivering personalized information that is critical in clinical diagnostic, monitoring and prognostic purposes. Such personalized and precise method laid a strong foundation for the next generation of personalized medicine. The rapid, high-throughput and label-free nature of the proposed method has major implication in in vitro disease diagnosis and monitoring whereby the use of minimal invasive liquid biopsy read-out allows frequent testing. The use of machine learning algorithm improves the delivery of information (e.g., speed and accuracy), which may become a key factor in speeding up the translational of technological innovations to clinical routine and practices.

Methods

NMR setup and parameters

The 1H magnetic resonance measurements of bulk packed red blood cells were carried out at the resonance frequency of 21.57 MHz using a portable permanent magnet (Metrolab Instruments, Switzerland), Bo = 0.5 T using a benchtop-type console (Kea Magritek, New Zealand). A temperature controller was set to maintain the measurement chamber at 24.5 °C. The T1-T2 correlational pulse sequences were set at standard inversion recovery, followed by Carr–Purcell–Meiboom–Gill (CPMG) train pulses (Fig. 1).

The experimental parameters used; echo time = 200 µs, number of echoes = 2000 (for oxidized, and deoxygenated state) and 4000 (for oxygenated state), T1 incremental steps = 32 (logarithmic) steps, and signal averaging = 4. A recycle delay of 2 s was set between each experiment to provide sufficiently long time to allow all the molecular spins to return to thermal equilibrium. The total acquisition time depends on the combination of a number of factors (e.g., number of scans, T1-incremental).

We demonstrated that a total experimental time in less than 6 min is sufficient for a high sensitivity and good spectral resolution, and without losing the spectral integrity (details in Supplementary Fig. 1). The 2D correlation maps were processed using built-in ILT algorithm (FISTA inversion)51 method with 5000 iterations and smoothing parameter of 1 were used. The inversion typically completed in less than 2 min using a desktop computer (Intel Core Pentium i3 CPU @ 3.2 GHz, 1.74 Gb RAM).

Clinical ethics and protocols

This study received ethics approval from the local Institutional Review Board of the National Healthcare Group. K2 EDTA-anticoagulated whole blood samples were washed and re-suspended with phosphate buffer saline (PBS). Informed consent was obtained from all subjects involved in this study. All blood samples were either used immediately or kept at 4 °C and used within three to four days (unless mentioned otherwise) of collection before the micro MR analysis. To induce the Hb into various derivative states, the blood samples were incubated with the desired chemical as mentioned in the Text (e.g., sodium nitrite) and finally washed to remove the chemical residual. Heparinized micro capillary tubes (Fisher Scientific, PA) were used to transfer the processed blood and finally spun down at 6000 × g for 1 min to obtain packed red blood cells for MR measurements.

Machine learning algorithm and workflow

The NMR-based POCT can be used with or without the assistant of AI (Fig. 2). Machine learning techniques were used to transform the human complicated data (e.g., 2D NMR correlational maps) into user-friendly medical decision making following the workflow developed (Fig. 6). The maps were converted into machine language using the image embedding features (e.g., Squeeze Net). Machine learning techniques were used to perform dimension reduction using various techniques (e.g., MDS, t-SNE, Isomap) (Supplementary Fig. 5).

Blinded test

Supervised learning models (e.g., neural network, k nearest neighbor, logistic regression, and naïve Bayes) were used to train and predict the data. We first trained 5 human beings to differentiate between (diseases, non-disease) and asked them to classify 32 anonymized subjects that were not seen before (Supplementary Fig. 4). They were allowed to backtrack (and change) the results as long as it was within the allocated time-frame (10 min). At the end of the experiment, the results were cross-checked and classified them as true positive (TP), true negative (TN), false positive (FP) and false negative (FN) (Supplementary Fig. 6). Statistical programming languages (e.g., Orange 3.1.2) was used for machine learning algorithm running on a personal laptop (Intel Core Pentium i7 CPU @ 2.70 GHz, 8.00 GB RAM). Once the models in machine learning were built, the run test takes less than 30 s to complete all the tasks, while each of the human beings took about 10 min on-average.

Statistics and reproducibility

Two tailed Student's t test was used to calculate the P value.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Reporting Summary (325.1KB, pdf)

Acknowledgements

This research was supported by the International Iberian Nanotechnology Laboratory (INL Start Up Grant (S4000040) and INL Seed Grant). T.T.N. acknowledges the support of A*STAR AGA that provides access to a pool of very talented interns through its student attachment programme. W.K.P. would like to personally thank L. Daniel, an internship student along with T.T.N.'s co-supervision. L. Daniel conducted various preliminary MR measurements, and modifying the ILT code using MATLAB. We acknowledge Chin Hin Ng of Department of Oncology, National University of Hospital, Singapore for assistance in obtaining approval from the National Health Group (NHG) Institution Review Board.

Author contributions

W.K.P. conceived the original idea, wrote the first draft of the paper, designed the experiments/protocols (e.g., NMR correlational experiments, machine learning coding), built the entire hardware setup and performed micro MR measurements. T.T.N. contribute in modifications on MATLAB code for ILT analysis. T.P.L. kindly provide the blood samples and engaged in various discussion on translational clinical aspects.

Data availability

The machine learning algorithms and 2D NMR raw maps along with any remaining info are available from corresponding author upon reasonable request at weng.kung@inl.int.

Code availability

Machine learning calculations were made possible with Orange Data Mining. Full documentation, source code, and installation instructions are publicly available at https://orange.biolab.si.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Weng Kung Peng, Email: weng.kung@inl.int.

Tze Ping Loh, Email: tze_ping_loh@nuhs.edu.sg.

Supplementary information

Supplementary information is available for this paper at 10.1038/s42003-020-01262-z.

References

  • 1.Salzmann M, Pervushin K, Wider G, Senn H, Wuthrich K. TROSY in triple-resonance experiments: new perspectives for sequential NMR assignment of large proteins. Proc. Natl Acad. Sci. USA. 1998;95:13585–13590. doi: 10.1073/pnas.95.23.13585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bollard ME, Stanley EG, Lindon JC, Nicholson JK, Holmes E. NMR-based metabonomic approaches for evaluating physiological influences on biofluid composition. NMR Biomed. 2005;18:143–162. doi: 10.1002/nbm.935. [DOI] [PubMed] [Google Scholar]
  • 3.Duarte IF, Diaz SO, Gil AM. NMR metabolomics of human blood and urine in disease research. J. Pharm. Biomed. Anal. 2014;93:17–26. doi: 10.1016/j.jpba.2013.09.025. [DOI] [PubMed] [Google Scholar]
  • 4.Viant MR, Lyeth BG, Miller MG, Berman RF. An NMR metabolomic investigation of early metabolic disturbances following traumatic brain injury in a mammalian model. NMR Biomed. 2005;18:507–516. doi: 10.1002/nbm.980. [DOI] [PubMed] [Google Scholar]
  • 5.Ronconi L, Sadler PJ. Applications of heteronuclear NMR spectroscopy in biological and medicinal inorganic chemistry. Coord. Chem. Rev. 2008;252:2239–2277. doi: 10.1016/j.ccr.2008.01.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hills, B. P. Applications of low-field NMR to food science. in Annual reports on NMR spectroscopy vol. 58 177–230 (Elsevier, 2006).
  • 7.Song Y-Q, et al. T1–T2 correlation spectra obtained using a fast two-dimensional Laplace inversion. J. Magn. Reson. 2002;154:261–268. doi: 10.1006/jmre.2001.2474. [DOI] [PubMed] [Google Scholar]
  • 8.Haun JB, et al. Micro-NMR for rapid molecular analysis of human tumor samples. Sci. Transl. Med. 2011;3:71ra16–71ra16. doi: 10.1126/scitranslmed.3002048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Peng WK, et al. Micromagnetic resonance relaxometry for rapid label-free malaria diagnosis. Nat. Med. 2014;20:1069–1073. doi: 10.1038/nm.3622. [DOI] [PubMed] [Google Scholar]
  • 10.Peng WK, Chen L, Han J. Development of miniaturized, portable magnetic resonance relaxometry system for point-of-care medical diagnosis. Rev. Sci. Instrum. 2012;83:095115. doi: 10.1063/1.4754296. [DOI] [PubMed] [Google Scholar]
  • 11.Veiga MI, Peng WK. Rapid phenotyping towards personalized malaria medicine. Malar. J. 2020;19:68. doi: 10.1186/s12936-020-3149-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ahola S, Telkki V-V. Ultrafast two-dimensional NMR relaxometry for investigating molecular processes in real time. ChemPhysChem. 2014;15:1687–1692. doi: 10.1002/cphc.201301117. [DOI] [PubMed] [Google Scholar]
  • 13.Anders, J., SanGiorgio, P. & Boero, G. An integrated CMOS receiver chip for NMR-applications. In 2009 IEEE Custom Integrated Circuits Conference 471–474 10.1109/CICC.2009.5280786. (IEEE, 2009).
  • 14.Lee H, Sun E, Ham D, Weissleder R. Chip–NMR biosensor for detection and molecular analysis of cells. Nat. Med. 2008;14:869–874. doi: 10.1038/nm.1711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Dupré A, Lei K-M, Mak P-I, Martins RP, Peng WK. Micro- and nanofabrication NMR technologies for point-of-care medical applications – a review. Microelectron. Eng. 2019;209:66–74. [Google Scholar]
  • 16.Takeda K. OPENCORE NMR: open-source core modules for implementing an integrated FPGA-based NMR spectrometer. J. Magn. Reson. 2008;192:218–229. doi: 10.1016/j.jmr.2008.02.019. [DOI] [PubMed] [Google Scholar]
  • 17.Sun N, Liu Y, Lee H, Weissleder R, Ham D. CMOS RF biosensor utilizing nuclear magnetic resonance. IEEE J. Solid-State Circuits. 2009;44:1629–1643. [Google Scholar]
  • 18.Ehrmann K, et al. Microfabricated solenoids and Helmholtz coils for NMR spectroscopy of mammalian cells. Lab Chip. 2007;7:373. doi: 10.1039/b614044k. [DOI] [PubMed] [Google Scholar]
  • 19.Olson DL, Peck TL, Webb AG, Magin RL, Sweedler JV. High-resolution Microcoil 1H-NMR for mass-limited, nanoliter-volume samples. Science. 1995;270:1967–1970. [Google Scholar]
  • 20.Kong, T. F., Peng, W. K., Luong, T. D., Nguyen, N.-T. & Han, J. Adhesive-based liquid metal radio-frequency microcoil for magnetic resonance relaxometry measurement. Lab Chip10.1039/C1LC20853E (2012). [DOI] [PubMed]
  • 21.Kong, T. F. et al. Enhancing malaria diagnosis through microfluidic cell enrichment and magnetic resonance relaxometry detection. Sci. Rep. 11425 10.1038/srep11425 (2015). [DOI] [PMC free article] [PubMed]
  • 22.Castro CM, et al. Miniaturized nuclear magnetic resonance platform for detection and profiling of circulating tumor cells. Lab Chip. 2014;14:14–23. doi: 10.1039/c3lc50621e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liong M, et al. Magnetic barcode assay for genetic detection of pathogens. Nat. Commun. 2013;4:55–65. doi: 10.1038/ncomms2745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chung HJ, Castro CM, Im H, Lee H, Weissleder R. A magneto-DNA nanoparticle system for rapid detection and phenotyping of bacteria. Nat. Nanotechnol. 2013;8:369–375. doi: 10.1038/nnano.2013.70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Han J, Peng WK. Reply to ‘Considerations regarding the micromagnetic resonance relaxometry technique for rapid label-free malaria diagnosis’. Nat. Med. 2015;21:1387–1389. doi: 10.1038/nm.3959. [DOI] [PubMed] [Google Scholar]
  • 26.Peng, W. K., Chen, L., Boehm, B. O., Han, J. & Loh, T. P. Molecular phenotyping of oxidative stress in diabetes mellitus with point-of-care NMR system. bioRxiv 565325 10.1101/565325 (2019). [DOI] [PMC free article] [PubMed]
  • 27.Peng, W. K., Han, J. & Loh, T. P. Micro magnetic resonance relaxometry. U.S. Patent Application No. 15/136,887 (2016).
  • 28.Peng, W. K., Ng, T.-T. & Loh, T. P. Machine learning assistive rapid, label-free molecular phenotyping of blood with two-dimensional NMR correlational spectroscopy. Preprint at https://www.biorxiv.org/content/10.1101/2020.06.20.162974v1 (2020). [DOI] [PMC free article] [PubMed]
  • 29.Peng, W. K. Clustering NMR: Machine learning assistive rapid (pseudo) two-dimensional relaxometry mapping. Preprint at https://www.biorxiv.org/content/10.1101/2020.04.29.069195v1 (2020).
  • 30.Song Y-Q, Ryu S, Sen PN. Determining multiple length scales in rocks. Nature. 2000;406:178–181. doi: 10.1038/35018057. [DOI] [PubMed] [Google Scholar]
  • 31.Otting G. NMR studies of water bound to biological molecules. Prog. Nucl. Magn. Reson. Spectrosc. 1997;31:259–285. [Google Scholar]
  • 32.Mathur-De Vré R. The NMR studies of water in biological systems. Prog. Biophysics Mol. Biol. 1980;35:103–134. doi: 10.1016/0079-6107(80)90004-8. [DOI] [PubMed] [Google Scholar]
  • 33.Dortch RD, Horch RA, Does MD. Development, simulation, and validation of NMR relaxation-based exchange measurements. J. Chem. Phys. 2009;131:164502. doi: 10.1063/1.3245866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.McDonald PJ, Korb J-P, Mitchell J, Monteilhet L. Surface relaxation and chemical exchange in hydrating cement pastes: a two-dimensional NMR relaxation study. Phys. Rev. E. 2005;72:011409. doi: 10.1103/PhysRevE.72.011409. [DOI] [PubMed] [Google Scholar]
  • 35.Lores Guevara, M. A., Naranjo, J. C. G. & Mirabal, C. A. C. MR relaxation studies of hemoglobin aggregation process in sickle cell disease: application for diagnostics and therapeutics. Appl. Magn. Resonan.10.1007/s00723-018-1104-0 (2018).
  • 36.Thompson BC, Waterman MR, Cottam GL. Evaluation of the water environments in deoxygenated sickle cells by longitudinal and transverse water proton relaxation rates. Arch. Biochem. Biophysics. 1975;166:193–200. doi: 10.1016/0003-9861(75)90380-x. [DOI] [PubMed] [Google Scholar]
  • 37.Aime S, Fasano M, Paoletti S, Arnelli A, Ascenzi P. NMR relaxometric investigation on human methemoglobin and fluoromethemoglobin. an improved quantitativein vitro assay of human methemoglobin. Magn. Reson. Med. 1995;33:827–831. doi: 10.1002/mrm.1910330613. [DOI] [PubMed] [Google Scholar]
  • 38.Peng, W. K., Samoson, A. & Kitagawa, M. Simultaneous adiabatic spin-locking cross polarization in solid-state NMR of paramagnetic complexes. Chem. Phys. Lett.10.1016/j.cplett.2008.06.027 (2008).
  • 39.Moallempour M, et al. Methemoglobin effects on coagulation: a dose-response study with HBOC-200 (Oxyglobin) in a thrombelastogram model. J. Cardiothorac. Vasc. Anesthesia. 2009;23:41–47. doi: 10.1053/j.jvca.2008.06.006. [DOI] [PubMed] [Google Scholar]
  • 40.Bradley WG. MR appearance of hemorrhage in the brain. Radiology. 1993;189:15–26. doi: 10.1148/radiology.189.1.8372185. [DOI] [PubMed] [Google Scholar]
  • 41.Lee SY, et al. Evaluation of thalassaemia screening tests in the antenatal and non-antenatal populations in Singapore. Ann. Acad. Med. Singap. 2019;48:5–15. [PubMed] [Google Scholar]
  • 42.Yasmeen H, Toma S, Killeen N, Hasnain S, Foroni L. The molecular characterization of Beta globin gene in thalassemia patients reveals rare and a novel mutations in Pakistani population. Eur. J. Med. Genet. 2016;59:355–362. doi: 10.1016/j.ejmg.2016.05.016. [DOI] [PubMed] [Google Scholar]
  • 43.Fucharoen S, Siritanaratkul N, Wasi P. Clinical manifestation of ␤-Thalassemia/Hemoglobin E Disease. J. Pediatr. Hematol. Oncol. 2000;22:6. doi: 10.1097/00043426-200011000-00022. [DOI] [PubMed] [Google Scholar]
  • 44.Zhong D, Pal SK, Zewail AH. Biological water: a critique. Chem. Phys. Lett. 2011;503:1–11. [Google Scholar]
  • 45.Svergun DI, et al. Protein hydration in solution: experimental observation by x-ray and neutron scattering. Proc. Natl Acad. Sci. USA. 1998;95:2267–2272. doi: 10.1073/pnas.95.5.2267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Peng WK, Paesani D. Omics meeting onics: towards the next generation of spectroscopic-based technologies in personalized medicine. JPM. 2019;9:39. doi: 10.3390/jpm9030039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Asfour A, Raoof K, Yonnet J-P. Software defined radio (SDR) and direct digital synthesizer (DDS) for NMR/MRI instruments at low-field. Sensors. 2013;13:16245–16262. doi: 10.3390/s131216245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hasselwander CJ, Cao Z, Grissom WA. gr-MRI: A software package for magnetic resonance imaging using software defined radios. J. Magn. Reson. 2016;270:47–55. doi: 10.1016/j.jmr.2016.06.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Yip, S. et al. A Canadian guideline on the use of next-generation sequencing in oncology. Curr. Oncol. 26, e241–e254 (2019). [DOI] [PMC free article] [PubMed]
  • 50.Alix-Panabières C, Pantel K. Challenges in circulating tumour cell research. Nat. Rev. Cancer. 2014;14:623–631. doi: 10.1038/nrc3820. [DOI] [PubMed] [Google Scholar]
  • 51.Zhou X, Su G, Wang L, Nie S, Ge X. The inversion of 2D NMR relaxometry data using L1 regularization. J. Magn. Reson. 2017;275:46–54. doi: 10.1016/j.jmr.2016.12.003. [DOI] [PubMed] [Google Scholar]
  • 52.Arjmand M, et al. Nuclear magnetic resonance-based screening of thalassemia and quantification of some hematological parameters using chemometric methods. Talanta. 2010;81:1229–1236. doi: 10.1016/j.talanta.2010.02.014. [DOI] [PubMed] [Google Scholar]
  • 53.Kutlar F. Diagnostic approach to hemoglobinopathies. Hemoglobin. 2007;31:243–250. doi: 10.1080/03630260701297071. [DOI] [PubMed] [Google Scholar]
  • 54.Khera R, Singh T, Khuana N, Gupta N, Dubey AP. HPLC in characterization of hemoglobin profile in thalassemia syndromes and hemoglobinopathies: a clinicohematological correlation. Indian J. Hematol. Blood Transfus. 2015;31:110–115. doi: 10.1007/s12288-014-0409-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary (325.1KB, pdf)

Data Availability Statement

The machine learning algorithms and 2D NMR raw maps along with any remaining info are available from corresponding author upon reasonable request at weng.kung@inl.int.

Machine learning calculations were made possible with Orange Data Mining. Full documentation, source code, and installation instructions are publicly available at https://orange.biolab.si.


Articles from Communications Biology are provided here courtesy of Nature Publishing Group

RESOURCES