Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Mar 26.
Published in final edited form as: Anal Chem. 2019 Jun 24;91(14):9221–9228. doi: 10.1021/acs.analchem.9b01988

Next-Generation Glycan Microarray Enabled by DNA-Coded Glycan Library and Next-Generation Sequencing Technology

Maomao Yan , Yuyang Zhu , Xueyun Liu , Yi Lasanajak †,, Jinglin Xiong , Jingqiao Lu §, Xi Lin §, David Ashline , Vernon Reinhold , David F Smith †,, Xuezheng Song †,‡,*
PMCID: PMC13016201  NIHMSID: NIHMS2148954  PMID: 31187982

Abstract

Interactions of glycans with proteins, cells, and microorganisms play important roles in cell–cell adhesion and host–pathogen interaction. Glycan microarray technology, in which multiple glycan structures are immobilized on a single glass slide and interrogated with glycan-binding proteins (GBPs), has become an indispensable tool in the study of protein–glycan interactions. Despite its great success, the current format of the glycan microarray requires expensive, specialized instrumentation and labor-intensive assay and image processing procedures, which limit automation and possibilities for high-throughput analyses. Furthermore, the current microarray is not suitable for assaying interaction with intact cells due to their large size compared to the two-dimensional microarray surface. To address these limitations, we developed the next-generation glycan microarray (NGGM) based on artificial DNA coding of glycan structures. In this novel approach, a glycan library is presented as a mixture of glycans and glycoconjugates, each of which is coded with a unique oligonucleotide sequence (code). The glycan mixture is interrogated by GBPs followed by the separation of unbound coded glycans. The DNA sequences that identify individual bound glycans are quantitatively sequenced (decoded) by powerful next-generation sequencing (NGS) technology, and copied numbers of the DNA codes represent relative binding specificities of corresponding glycan structures to GBPs. We demonstrate that NGGM generates glycan–GBP binding data that are consistent with that generated in a slide-based glycan microarray. More importantly, the solution phase binding assay is directly applicable to identifying glycan binding to intact cells, which is often challenging using glass slide-based glycan microarrays.

Graphical Abstract

graphic file with name nihms-2148954-f0006.jpg


Glycans play important roles in biological systems and many disease processes through specific interactions with other biomolecules such as glycan-binding proteins (GBPs).14 The glycan microarray has become a standard tool for the analysis of ligand specificities for GBPs and a discovery platform for potential functions of GBPs.58 In a current glycan microarray experiment, a library of glycans is immobilized onto solid surfaces such as a glass microscope slide. The microarray slide is interrogated by exposing the array to fluorescently tagged GBPs such as plant and animal lectins and anti-glycan antibodies. After washing and drying, the slide is scanned in a fluorescence scanner and an image of the protein bound to glycans is obtained and processed into a histogram showing a fluorescence signal for each bound glycan on the microarray. The intensity of the fluorescence signal is directly related to the strength of binding. The ability to analyze hundreds of glycans in a single experiment has transformed studies on protein–glycan interactions; however, there are several challenges facing this technology that need to be addressed. First, the number of glycans included in the current glycan microarray is limited. Even the most recent version of the glycan microarray produced by the Consortium for Functional Glycomics (CFG) presents only 600 glycans, which represent only a small fraction of the mammalian glycome.9 Currently, this is mostly due to the availability of glycans. The number of glycans available for glycan microarray preparation is growing quickly owing to recent advances in chemoenzymatic synthesis and novel methods to generate glycans from natural sources.1014 However, with the current glycan microarray technology and existing format (~300 μm distance between each spot and six replicates), it would be very challenging to incorporate more than 1000 glycans onto a single microscope slide. Second, while the glycan microarray is considered a high-throughput platform due to the large number of glycans that can be analyzed simultaneously, the process, which involves applying GBP, washing away unbound protein, addition of detection reagents, scanning, and image analysis are often time-consuming. A major bottleneck in processing that is seldom discussed is the manual alignment of a grid over the fluorescent image that is required to quantify the fluorescence intensity at each individual spot. This alignment process is slow, which prevents microarray analyses from being a true high-throughput process. Third, despite the simple concept, glycan microarray technology is limited to a number of very specialized laboratories due to the high cost of instrumentation including a microarray printer and fluorescence scanner. Finally, although theoretically the microarray can be used to directly assay glycan-binding properties of intact cells, this turned out to be technically difficult presumably due to the limited amount of adhesion provided by the two-dimensional surface of slides to large objects such as intact cells.

To address these challenges, we reasoned that the current glycan microarray is essentially a coded-glycan library where the code of each glycan is its physical location on a microarray slide. Current printed glycan microarray technology requires that glycans be printed on a surface where the positions of all glycans are recorded by the microarray printer and stored in a Gal file. One can, therefore, consider the glycan library to be a set of coded glycans where the codes are the physical locations of the spots on the slide that are decoded when the fluorescent image is computationally processed into a histogram that presents relative binding of the glycans in average relative fluorescence units (RFU) for each glycan ID number (Figure 1, upper panel). To address the limitations generated by the printing/scanning procedure, we reasoned that the physical codes (location) could be switched to individual oligonucleotide sequences, and that each glycan structure could be tagged with a unique, defined oligonucleotide sequence. A coded library of oligonucleotide-tagged glycans can be mixed together in a single vial, incubated with a biologically relevant GBP immobilized onto a microsphere or to intact microorganisms or cells to “pull down” specifically bound glycans by separating the GBP or intact microorganisms from unbound tagged glycans. The oligonucleotide codes associated with the GBP can be quantitatively sequenced by next-generation sequencing (NGS) technology,1518 and the copy number of the sequence read becomes equivalent to the average RFU of the classic microarray and is converted to a histogram presenting copy number for each glycan ID number (Figure 1, lower panel). NGS technology allows for the addition of an indexing signal to identify individual experiments so that hundreds of GBPs can be analyzed in a single NGS analysis, making this an automatable, low cost, and truly high-throughput process. In this report, we successfully demonstrate that this new technology, termed next-generation glycan microarray (NGGM), represents a new platform of glycan microarray technology that offers great potential for the study of interactions between glycans and proteins or glycans and intact bacteria.

Figure 1.

Figure 1.

Comparison of NGGM with the current glycan microarray. (A) Traditional glycan microarray technology. (B) Next-generation glycan microarray (NGGM) technology.

RESULTS

Covalent Conjugation of DNA and Glycan.

To code an individual glycan with a unique DNA sequence, we have designed a chemical scheme shown in Figure 2 to tag an individual glycan structure with a unique DNA code using amide coupling reaction19 and copper-free click-chemistry reaction,20 which are widely used reactions for covalent bioconjugation. A glycan tagged with 2-amino-N-(2-aminoethyl)benzamide (AEAB)21 is reacted with a bifunctional linker dibenzocyclooctyne–PEG4–N-hydroxysuccini-midyl ester (DBCO–PEG4–NHS) to incorporate a DBCO group. Copper-free click chemistry between DBCO and the 5′-azido group on double-stranded DNA containing a coding sequence region yields a DNA-coded glycan. Efficient glycan–DNA conjugation was demonstrated by high-performance liquid chromatography (HPLC) analysis and matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) analysis using lacto-N-neotetraose (LNnT)–AEAB, DBCO–PEG4–NHS, and 5′azido-function-alized DNA oligo (code3s) (Figure S1).

Figure 2.

Figure 2.

Chemistry strategy for glycan–DNA conjugation.

Identification of Interactions between Biotinylated Proteins and Glycans by NGGM.

To determine the feasibility and specificity of this technology, we first constructed a small library containing six DNA-coded glycans and interrogated this library with five lectins. The results are consistent with well-known specificities (Figure S2 and Table S1). Then, we constructed a larger library containing 48 DNA-coded glycans of defined structures and one control DNA code (defined array) (Figure S3 and Table S2). Before the analysis of lectin specificity using this NGGM, we checked the potential bias in polymerase chain reaction (PCR) amplification by monitoring a handmade mixture of 50 DNA codes at different quantities during PCR amplification by NGS sequencing. No significant bias in PCR amplification was visible during 10–40 cycles of amplification of these 50 DNA codes (Figure S4). Then, nine biotinylated lectins, including concanavalin A (ConA), Aleuria aurantia lectin (AAL), RCA-I (Ricinus communis agglutinin I (RCA-I), Sambucus nigra agglutinin (SNA), Maackia amurensis lectin I (MAL-I), Helix pomatia lectin (HPA), peanut agglutinin lectin (PNA), Griffonia simplicifolia lectin I-B4 (GS-I-B4), and Bauhinia purpurea lectin (BPL), were incubated separately with the defined array in solution and pulled down using streptavidin-coated magnetic beads (Figure 3). The DNA codes immobilized on the magnetic beads with the corresponding glycans were amplified by PCR, which also added the index codes for each lectin sample and the appropriate NGS flow cell adapter code. All of the PCR-amplified samples were pooled together and subjected to NGS analysis. After NGS sequencing, the data was split according to the index pair (sample coding) because index codes identify specific lectin pull-down samples. In order to avoid the variation of sequencing depth from independent NGS analyses, the “copy number per millions reads” of each glycan code in one lectin pull-down sample was calculated by normalizing the total copy number of codes with the corresponding index pair to 1 million so that standard deviation can be calculated when repeated experiments are carried out for certain lectins, as shown for ConA and AAL in Figure 3. We observed that copy number corresponding to ConA binding (Figure 3A) correlated well with the known relative binding strengths of this lectin to defined glycans; strongest to N-linked high-mannose glycans (glycans 47 and 49), intermediate to complex type biantennary N-glycans and hybrid N-glycans containing terminal α-Man residues (glycans 40, 41, and 46), and weakly to bisected and multiantennary complex type N-glycans (glycans 35–37 and 46). AAL binds to fucose-containing glycans and shows interesting discriminations among the different structures (Figure 3B). RCA-I binds to all the glycans with terminal LacNAc or α2,6-sialylated LacNAc (Figure 3C). SNA binds to all three glycans which contain terminal α2,6-linked sialic acid on galactose (Figure 3D), MAL-I binds to the only glycan containing terminal Neu5Aα2–3LacNAc (Figure 3E), HPA binds most strongly to Forssman structures with terminal GalNAcα1–3GalNAcβ1–3Gal and also to all blood group A glycans with terminal GalNAcα1–3(Fucα1–2)Gal (Figure 3F), but not to the glycan terminating in GalNAcβ1–3Gal (glycan 24). PNA binds to the only glycan with terminal Galβ1–3GalNAc (Figure 3G), and GS-I-B4 binds to every glycan which contains terminal α-Gal and nicely discriminates the different structures suggesting differences in relative binding (Figure 3H). The glycan-binding of BPL is much more complicated (Figure 3I): it binds to Lex antigen (glycan 16) with highest affinity and also binds to isoglobopentaose (glycan 21), globotetraose P antigen (glycan 24), LNT (glycan 1), and some N-linked glycans (glycans 36–38). These lectin bindings are largely consistent with the glycan microarray data found in the consortium for functional glycomics (CFG) database (http://functionalglycomics.org).

Figure 3.

Figure 3.

Identification of specificities of nine lectins (ConA, AAL, RCA-I, SNA, MAL-I, HPA, PNA, GS-I-B4, and BPL) by NGGM with the defined glycan library. Bars represent means ± SD from analyses of five independent experiments. The structures of positive glycans are presented on the tops of the graphs.

In order to compare these data from NGGM with the current slide glycan microarray technology, we printed the corresponding AEAB-tagged glycans onto NHS-coated slides and performed microarray analyses against the same lectins as in NGGM according to the protocol described previously.21 In general, results based on the printed microarray and NGGM showed similar lectin binding patterns (Figure 3, Figure S5, and Table S3). Nevertheless, there are also noteworthy differences presumably due to the totally different presentations of glycans in these two technologies. Using NGGM, ConA showed clearly differentiated binding toward high-mannose glycans (glycans 39 and 47) versus other N-glycans as described above, but the conventional glycan microarray data at 10 μg/mL of ConA (Figure S5A) did not show this discrimination among the different glycans, most of which were bound with a saturation signal of approximately 50 000 RFU. To observe differential binding with the printed glycan microarray, a dilution series of the GBP is required, and that is shown for ConA in Figure S6AD and Table S4. This is due to the striking difference in the dynamic range of the two methods. The dynamic range of the printed glycan is limited by the saturation of the detecting instrumentation at approximately 50K RFU. Using NGGM we observed a 20-fold increase in dynamic range based copy numbers at 106 “reads”, and this can be expanded if NGS analysis of highest capacity (~8G reads in HiSeq platform) is used (systematic error of NGS sequencing is ~10 reads). Similar dynamic range increases were observed for AAL (Figure S6EH), RCA-I (Figure S6IJ), and BPL (Figure S6KL) binding. Interestingly, while both the printed glycan microarray and NGGM showed expected binding of SNA to Neu5Acα2–6LacNAc (glycans 46, 48), only NGGM showed binding of SNA to 6′-SL (glycan 45) at a significant level (Figure 3D and Figure S5D). This presumably reflects the differences in the presentation of glycans in the two different formats where the size of the glycan may impact binding on solid surfaces and not binding to a DNA-coded glycan in solution. In addition, it is worth noting that reductive amination of the reducing end of glycans may affect the binding to GBPs that recognize the glycosidic linkage at the reducing end.

Sensitivity of NGGM in Detection of Protein–Glycan Interaction.

The sensitivity to detect protein–glycan interaction is a critical parameter of glycan microarray analysis. To study the sensitivity of NGGM, we analyzed ConA and AAL binding to the defined NGGM at various concentrations of both lectins and the DNA-coded glycan library (Figure S7). When ConA and AAL concentration was kept relatively high (10 μg/mL), as low as 0.0025 fmol (2.5 amol) of DNA-coded glycan library is needed to generate an expected binding profile, which is much lower than the normal usage of glycans on the print microarray (133–400 fmol). On the other hand, when a 25 fmol DNA-coded glycan library is used, ConA and AAL even as low as 0.01 μg/mL provide a successful NGGM analysis. Since the volume of the NGGM experiment is only 10 μL, minimal quantities of GBP are required. While protein–glycan interactions with different affinity constants might give significantly different limits of detection for the corresponding GBP, the NGGM appears to be at least as sensitive in detecting binding by GBP as the current slide format microarray if a high enough concentration of DNA-coded glycan library is used. In addition, the extremely low amounts of coded glycan required suggest that thousands of glycans may be capable of interrogation in a single NGGM assay.

Analysis of Antibody Specificities by NGGM.

To evaluate the ability of NGGM to determine the specificity of anti-glycan antibodies, we analyzed several well-known antibodies. We attached four commercial blood group antibodies (anti-A, anti-B, anti-Lea, and anti-Leb) separately to streptavidin-coated magnetic beads through biotinylated secondary antibodies and tested their specificities using our defined NGGM array that contained a variety of different configurations of human blood group antigens. Consistent with their known specificity based on hemagglutination tests performed by the vendor, all antibodies showed specific binding to their corresponding glycan epitopes without significant cross-reactivity to other structures (Figure 4). While the NGGM clearly discriminated the glycan binding of anti-A and anti-B antibodies to different glycans, the binding specificity on the printed array showed similar binding specificity (Figure S8, parts A and B) with little discrimination at this antibody concentration. With the printed array, we observed a relatively strict specificity of the anti-Lea (glycan 18) with slight crossreactivity to Lex antigen (glycan 16) and the anti-Leb antibody cross-reacting with Ley, Lea, and Lex antigens (Figure S8, parts C and D, Table S5). While the cross-reactivity of anti-Lea antibody greatly diminished using diluted antibody (Figure S8E, Table S5), the cross-reactivity of anti-Leb to Lea and Ley did not change significantly (Figure S8F, Table S5) suggesting a relatively strong cross-reactivity. These results suggest that the multivalent presentation of glycans on the printed glycan microarray greatly increases the binding avidity,22 which at certain circumstances might be considered a misleading identification of possibly irrelevant pseudoepitopes for GBPs being analyzed that have been defined by their functional specificity at proper dilutions. The NGGM platform, which detects GBP binding to monovalent glycan structures in solution, may be more useful in identifying biologically relevant protein–glycan interactions.

Figure 4.

Figure 4.

Specificity validation of four blood group antibodies (anti-A, anti-B, anti-Lea, and anti-Leb) by NGGM with the defined glycan library. The structures of positive glycans are presented on the tops of the graphs.

NGGM Analysis of Intact Bacteria.

Despite the success of the slide-based glycan microarray in recent years for identifying the glycan binding specificities of GBPs and fluorescently labeled viruses,23 direct binding of intact mammalian cells and bacteria on the glass slides has had limited success except for Paulson group’s work on siglec-expressing human cells.24 This is presumably due to the large size of these particles making it physically challenging for them to form sufficiently strong adhesions on a two-dimensional surface to survive washing procedures required for the slide format microarray experiments. Since NGGM uses immobilized GBP and DNA-coded glycan solution in the assay, larger-sized particles like whole cells and intact bacteria with surface-expressed adhesins or GBPs can be easily processed for NGGM since they are simply separated by centrifugation or natural sedimentation. To demonstrate this unique advantage of the NGGM we analyzed several Escherichia coli strains on the defined glycan library, including K12, K12 with FimH knocked out, Top 10, and BL21 (Figure S9). It is known that E. coli strains bind to high-mannose N-glycans through FimH protein in flagella.2527 In NGGM analysis, E. coli K12 clearly bound to high-mannose glycans, while the other three strains, in which FimH is knocked out or highly attenuated,28 did not show significant binding to high-mannose glycans, suggesting successful application in glycan-binding analysis of intact bacteria.

The utility of a glycan microarray experiment is fundamentally determined by the diversity and biological relevance of the glycan library. Thus, to expand our NGGM library, we prepared a shotgun, DNA-coded glycan library from pig kidney using our recently developed method of oxidative release of natural glycans (ORNG).13 N-Glycans are released by treatment with sodium hypochlorite, tagged with AEAB, and separated into fractions by two-dimensional HPLC. After quantification, 96 (48 unique DNA codes × 2) pig kidney glycan fractions were coded with DNA to produce two “48-glycan subarrays” comprising a tissue-specific shotgun DNA-coded library (see the Methods and Materials and Table S6 in the Supporting Information). We selected kidney as a major mammalian organ that is often subjected to inflammation and bacterial infection, and its glycans have important implication in xenotransplantation of kidneys in primates.29,30 Plant lectin binding to this shotgun NGGM showed distinct binding patterns for each lectin (Figure S10 and Table S7), providing important structural information on each fraction. We used this library to test the binding of several bacterial strains to the pig kidney shotgun NGGM. All four strains showed distinct binding patterns. E. coli K12 binds to a number of fractions, while the FimH-KO-K12 showed essentially no binding (Figure 5, parts A and B). The binding pattern of K12 to this array closely resemble that of ConA (Figure S10A), suggesting its binding to high-mannose glycans, and MALDI-TOF-MS analysis of these fractions confirmed the high-mannose structures (predicted structures shown in Figure 5). This result is also consistent with that from the defined NGGM (Figure S9, parts A and B) and previous reports,31 and further confirmed the binding specificity of K12 and FimH protein to high-mannose glycans. Acinetobacter baumannii is a multidrug-resistant pathogenic bacterium affecting the immune-compromised population.32 On the shotgun NGGM, A. baumannii binds to several glycans, which were identified as small paucimannose and αGal-terminated neutral N-glycans by MALDI-MS (Figure 5C). Another bacterium, Staphylococcus aureus, on the other hand, showed specific binding to a tetraantennary N-glycan with α6-Neu5Gc terminal moiety (Figures S11 and S12). To further elucidate the binding specificity of this bacterium, we also tested its binding to the defined NGGM. Among the 48 glycan structures, only FBS-1 (glycan 48) (Figure S13) was positive, which is a sialylated triantennary N-glycan. These results suggested the binding specificity of S. aureus is directed toward a multiantennary and α6-sialylated N-glycan. While it is yet premature to determine the biological implications of this specific binding, the ability of the NGGM to identify bacterial binding to glycans and the high-throughput potency indicates that the NGGM is an excellent system for screening glycan binding specificity of pathological and commensal bacteria of the microbiome, leading to new directions in diagnosis and therapeutic treatment.

Figure 5.

Figure 5.

Identification of glycan bindings of 4 bacteria strains (E. coli K12, E. coli FimH-KO K12, A. baumannii, and S. aureus) by NGGM with the pig kidney shotgun DNA-coded glycan fraction library. The predicted structures of positive glycan fractions are presented on the tops of the graphs.

DISCUSSION

Glycan microarray has been a great success in the last decades for the study of protein–glycan interactions. The printed microarray platform, however, poses some serious limitations as a high-throughput format that is required for further advances in functional glycomics. The development of the printed glycan microarray followed in the footsteps of the DNA microarray, which has been gradually replaced by NGS technology. NGS may now be the technology that will provide a high-throughput format to functional glycomics in the NGGM.

Current printed glycan microarray development requires specialized robotic printing and fluorescence scanning technologies using expensive and high-maintenance instruments that are generally only available in highly specialized laboratories. DNA codes are commercially available, and DNA-coding of glycans can be easily carried out in any laboratory. NGS services are widely available, and the capacity of this platform is constantly expanding. Therefore, NGGM has great potential to enable glycan microarray analysis to become a routine laboratory technique or a relatively inexpensive commercially available service.

Although DNA has been used as a powerful tool for molecular barcoding,33,34 DNA coding of glycans has not been widely adopted except for DNA-directed immobilization (DDI),35,36 which only utilizes specific base pairing as a printing method for microarray preparation, not for the detection of protein–glycan interactions. Here, we used DNA to code glycans by straightforward click chemistry and NGS sequencing to decode glycans for the detection of protein–glycan interactions. Since protein–glycan interactions are generally thought to be of low affinity with multivalent presentation of glycans providing the avidity that makes them physiologically relevant,3739 it was surprising that this solution-based assay of monomeric glycans was effective in detecting these interactions. This observation is presumably due to the amplification provided by the PCR after the pull-down assay and during NGS sequencing.

Multiplexing of the glycan microarray is currently one of its greatest limitations. Glass slides can be separated into smaller areas with appropriate adapters, in which replicate subarrays can be printed and assayed with different GBP samples. However, this approach not only results in the reduction of the number of glycans in each subarray and potential leaking between subarrays, each image must also be processed separately and manually. Furthermore, the data digitization of fluorescent image by software often requires manual grid localization and adjustment for high-quality microarray analysis, making the process laborious and difficult to automate. A multiplexing strategy for the glycan microarray using Luminex colored beads has been reported recently.40 Although this method may significantly increase the throughput of sample analysis, the detection is still based on fluorescence of labeled GBPs and special instrumentation is still required. On the other hand, each NGGM experiment is carried out in a separate vial in solution that can be automated with liquid handling systems using a 96- or 384-well plate. After GBP pull-down, each sample can be indexed by PCR extension, and all samples can be mixed for NGS decoding, and no image processing is involved. More importantly, the NGS sequencing and data analysis do not require any manual adjustment, making the whole NGGM procedure automatable in the future. Because of the high capacity of NGS, we estimate that screening hundreds of GBPs would be possible in a single NGS order.

The lectin and antibody binding data generated from NGGM are generally consistent with those generated from the printed glycan microarray, and the slight differences observed are easily explained by the differences in the affinity of monomeric glycans in solution versus avidity of printed glycans in a relatively uncontrolled multivalent format where density is unknown, making NGGM a credible platform for microarray analysis. The issue of multivalent presentation is a major point of discussion among investigators studying protein–glycan interactions. It is clear that the density of glycans immobilized is directly related to the strength of binding, and comparison of data from array formats at different densities has been interpreted as demonstrating different specificities or generating false-positive binding by detection of weak cross-reactions.41 The NGGM format presents all glycans as monovalent in solution with an immobilized GBP, and although this format eliminates avidity, the PCR amplification permits detection of even weak interactions. While arrayed glycans are spatially separated from each other on the slide surface, in the NGGM format all DNA-coded glycans are mixed in solution during the binding portion of the assay where glycans may compete for interaction with the GBP and serve as “internal control” for other glycans. This competition obviously does not cause a complete inhibition as the concentrations of DNA-coded glycans in the library are the same, but the competition of binding to GBP between different glycans may provide a better representation of physiologic conditions.

It is clear that the number and diversity of glycans in any format used for studying protein–glycan interaction is more important than the presentation of the glycans. Thus, the major challenge for NGGM will be to expand the number of glycans in the assay. It is noteworthy that, when the number of glycans in the library increased from 6 (Figure S2) to 48 (Figure 3), we observed greatly reduced background and noise in the NGGM results. In addition, the observation that only extremely low concentrations of the DNA-coded library are required to obtain useful results suggests that increasing the number of glycans in the library will unlikely adversely affect the results. Once a maximum number of DNA-coded glycans per NGGM assay tube that generates valid results is identified, the library can be expanded simply by increasing the number of tubes or “subarrays”.

Infectious diseases mediated by bacterial pathogens remain a major cause of death in many parts of the world. A better understanding of the key events during pathogenesis and clearance is required for successful vaccination and therapy. Pathogen adhesion, the first key event in infection, has been reported to rely on the interactions between glycans and proteins for all the classes of pathogens.4244 To identify the key glycans that contribute to pathogen adhesion, screening of the glycan binding of purified pathogen adhesins by the current slide-based glycan microarray was the best strategy for bacteria and fungi in the past, and it also succeeded in many cases.6,45,46 One major limitation of printed glycan microarray is the detection of intact cell binding. While this is theoretically possible, in practice the large size of intact cells on a two-dimensional glass surface never provided clean binding data in our experience using printed glycan microarrays. The NGGM format solved this problem because the large bacteria with expressed adhesins simply pull down DNA-coded glycans from the solution. To our knowledge, this is the first real microarray analysis of intact bacterial cells, which will greatly facilitate the identification of bacteria adhesion specificities and potential development of therapeutic agents based on specific protein–glycan interactions.

In summary, by coding glycans with DNA sequences and adopting NGS as the decoding method, we have developed a next-generation glycan microarray platform offering higher capacity and throughput, easier and more flexible operation, and lower cost. This format should also be useful for studying the interaction between other biomolecules such as protein–protein interactions and possibility glycan–glycan interactions, especially in nonspecialized laboratories that do not have access to printing and scanning equipment.

METHODS

Design of DNA Codes for the Large Glycan Library.

In this study, we use double-stranded DNAs with two uniform 18-bp primer binding regions and a 20-bp coding region. To minimize the potential bias in the subsequent PCR amplifications, all DNA codes are equal in length, the G/C bases were distributed evenly in the coding region, and the total GC content of coding region was fixed at 50%. In addition, the sequences were designed to ensure that at least five bases were different between any two DNA coding regions.

DNA-Coded Glycan Library Construction.

Azide-modified double-stranded DNA codes and AEAB-conjugated glycans were used for the preparation of DNA-coded glycans. To obtain azide-modified double-strand DNA, the same amounts of 5′-azide-modified DNA oligo (IDT Technologies) and its complementary unmodified DNA oligo (IDT Technologies) were annealed together by heating at 95 °C for 5 min, and then cooled down at 0.1 °C/s to 10 °C using a PCR instrument. Then, AEAB-conjugated glycans and azide-modified DNA codes were conjugated together through a bifunctional cross-linker DBCO–PEG4–NHS (Conju-Probe).

For defined glycan library preparation, 1 equiv of AEAB-conjugated glycan was reacted with 2 equiv of DBCO–PEG4–NHS in phosphate buffer (50 mM Na2HPO4, pH = 8.5) at room temperature for 2 h. The intermediate product glycan–PEG4–DBCO was purified and quantified by reversed-phase HPLC. Then, 1 μLof 0.1 mM double-strand DNA was incubated with 1 μL of 0.5 mM AEAB-conjugated glycan–PEG4–DBCO at room temperature. After 3 h of incubation, 2 μL of 0.1 M azidoethanol solution47 was added to quench the click-chemistry reaction. The final mixture was stored as DNA-coded glycans at −20 °C.

For shotgun library, 1 nmol AEAB-conjugated glycan was reacted with 0.2 nmol DBCO-PEG4-NHS in 2 μL phosphate buffer (50 mM Na2HPO4, pH = 8.5) at room temperature for 2 h. Then, 1 μL 0.1 mM double-strand DNA was directly added into the mixture. After 3 h of incubation, 1 μL 0.2 M azidoethanol solution47 was added to quench the click-chemistry reaction. 96 fractions were split into two subgroups for sublibrary preparation: fractions 1–48 were coded by codes 1–48 as the sublibrary 1 and fractions 49–96 were coded by codes 1–48 as the sublibrary 2. The final mixture was stored as DNA-coded glycans at −20 °C.

Pull-Down Assay.

For lectin pull-down assay, 1 μg or the indicated amount of biotinylated lectin was incubated with glycan–DNA mixture (1.25 fmol for each glycan–DNA conjugate) in 10 μL of 1× TSM buffer (20 mM Tris–HCl, pH 7.4, 150 mM NaCl, 2 mM CaCl2, 2 mM MgCl2) containing 1% BSA, 0.05% TritonX-100, and 100 ng/mL salmon sperm DNA (Thermo Fisher Scientific) for 2.5 h. Then, 1 μL of 1 mg/mL streptavidin-coated magnetic beads (Dynabeads MyOne Streptavidin C1, Thermo Fisher Scientific) was added into the mixture and incubated for 30 min. For antibody pull-down assay, 1 μL of blood-type antibody was incubated with glycan–DNA mixture (1.25 fmol for each glycan–DNA conjugate) in 10 μLof 1× TSM buffer containing 1% BSA, 0.05% TritonX-100, and 100 ng/mL salmon sperm DNA for 2.5 h. Then, 1 μL of 1 mg/mL streptavidin-coated magnetic beads which had been preincubated with 10 μg/mL biotinylated antimouse IgM secondary antibody in 10 μL of 1× TSM buffer for 2 h was added into the mixture and incubated for 30 min. After incubation, the beads were separated on a magnet plate (low elution magnet plate for 96-well PCR plates, E&K Scientific) and washed with 10 μL of 1× TSM buffer containing 1% BSA and 0.05% TritonX-100 for 3 times. Then, the beads were resuspended into 10 μL of ddH2O and boiled at 95 °C for 10 min. The final mixture was stored at −20 °C.

For bacteria pull-down assay, 400 μL of bacteria cultures (OD ~ 1) was spin down at 3000g for 1 min. The bacteria pellet was resuspended and incubated with glycan–DNA mixture (25 fmol for each glycan–DNA conjugate) in 100 μL of 1× TSM buffer containing 1% BSA, 0.05% TritonX-100, and 100 ng/mL salmon sperm DNA for 1 h. Then, the bacteria cells were separated by centrifugation and washed with 100 μL of 1× TSM buffer containing 1% BSA and 0.05% TritonX-100 for 3 times. Then, the cells were resuspended in 100 μL of 1× PBS buffer. An amount of 1 μL of suspension was used for PCR amplification immediately.

Sample Preparation for NGS Sequencing.

An amount of 1 μL of supernatant (except for bacteria pull-down assay) or suspension (bacteria pull-down assay) of each sample was subjected to a 20–30 cycles PCR amplification with a pair of primers including two sample coding sequences as indexes for lectin/antibody/serum/bacteria. Comparable amounts of different PCR products of the first-round amplification were mixed together and purified with PureLink Quick PCR purification kit (Thermo Fisher Scientific). After quantification, 10 μg of DNA mixture was subjected to another round of 10 cycles amplification for adding NGS adaptors. The second-round PCR product was purified again before submission to NGS sequencing (Admera Health, LLC).

NGS Data Analysis.

The resulting data file (.gz) was analyzed under the Ubuntu operation system. The target sequences were matched in the data file by Python3 software and counted. The numbers were output into an excel file.

For each pull-down assay coded by a unique index pair, the total DNA (50 codes) copy number was normalized to 1 million. After normalization, the copy number of each DNA code was graphed.

For shotgun glycan array, as we only use 50 codes for 96 fractions, so 96 fractions were divided into two subarrays; each subarray contains 48 DNA-coded fractions, one DNA-coded 2-FL and a DNA control. Two subarrays were pulled down separately in a different well but simultaneously. After pull-down assay, NGS sequencing, and data analysis, the copy numbers of the two subarrays were combined together after copy number adjustment so that the total copy number of 2-FL (code 49) and a DNA control (code 50) of each subarray equals. Then, the total copy number of 96 fractions, 2-FL (code 49), and a DNA control (code 50) was normalized to 1 million. After normalization, the copy number of each fractions was graphed.

Supplementary Material

Supplemental figures and tables

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.9b01988.

Materials and methods of MS, HPLC, and AEAB conjugation, ORNG, PAGE analysis, printed glycan microarray, structure characterization, NGGM method validation, DNA code design, microarray raw data, and glycan structural analysis (PDF)

ACKNOWLEDGMENTS

We thank Dr. Sean R. Stowell, Dr. Bo Liang, and Dr. Marcin Grabowicz for providing the bacteria strains. This study was supported in part by the Emory Comprehensive Glycomics Core (ECGC), which is subsidized by the Emory University School of Medicine and is one of the Emory Integrated Core Facilities, and NIH Common Fund Glycoscience (R21GM122632 to X.S.) and partially by U01GM116254 (to X.S. and V.R.) and an STTR Grant (R41GM122139 to X.S.).

Footnotes

The authors declare no competing financial interest.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental figures and tables

RESOURCES