Symbiotic gut microbes modulate human metabolic phenotypes

Li et al. 10.1073/pnas.0712038105.

Supporting Information

Files in this Data Supplement:

SI Table 1
SI Figure 4
SI Figure 5
SI Figure 6
SI Text
SI Figure 7
SI Figure 8
SI Figure 9
SI Figure 10
SI Table 2
SI Table 3

SI Figure 4

Fig. 4. Collector's curves of observed and estimated OTU richness for each sample: (A) GM, (B) GF, (C) UC, (D) GG, (E) FA, (F) BB, (G) MO, and (H) the combination of the seven samples. The upper and lower 95% confidence interval values were plotted by using the dotted line with the same color as the average (bold solid line) plot. For each sample, two estimator's curves, Chao1 (red) and ACE (blue), and the observed curves (black) were calculated, and a small leap occurred on the curves when adding a new sample.

SI Figure 5

Fig. 5. Rarefaction curves of observed OTU richness for the combination of the seven samples at multiple OTU determination levels.

SI Figure 6

Fig. 6. Phylogenetic tree based on the combination of the seven individual fecal 16S rRNA sequence dataset. The total number of sequences, OTUs, and novel OTUs of each clade were shown in parentheses.

SI Figure 7

Fig. 7. PCA scores plot of DGGE profiles for predominant bacteria (V3 region of 16S rRNA gene) (A), Bacteroides spp. (B), C. leptum subgroup (C) from fecal samples, and ¹H NMR spectra of urine metabolites of the family members (D). Red circle, female; blue square, male.

SI Figure 8

Fig. 8. Sex-related differences in gut microbiome and urinary metabonome in the Chinese family. OPLS-DA scores plot (one predictive component vs. one orthogonal component) for V3 region of 16S rRNA gene DGGE profile (A), Bacteroides spp. DGGE profile (C), C. leptum subgroup DGGE profile (E), and ¹H NMR spectral profiles (G); and OPLS-DA coefficient plot color-coded for key bands that are the most discriminatory for sex in V3 DGGE gel (B), Bacteroides spp. (D), C. leptum DGGE (F), and key metabolite peaks which are the most discriminatory for sex (H). The identified metabolite and nearest neighbors of DGGE bands corresponding to the key discriminatory peaks are shown (detailed in SI Table 2).

SI Figure 9

Fig. 9. OPLS modeling between 16S rRNA V3 region DGGE data for predominate bacteria and NMR data. (A) OPLS prediction of bacterial bands from the NMR urinary profile data color-coded according to Q² (goodness of prediction). The OPLS modelings were obtained by NMR profile and each DGGE variable. (B) DGGE gel for the predominant bacteria (V3 region of 16S rRNA gene diversity profiling); the bands with arrows were selected by OPLS prediction, the band with the asterisk is the outlier that was statistically significant but appeared just in one sample. Mr, the maker for DGGE analysis. (C) Sequenced bands from the V3 DGGE gel.

SI Figure 10

Fig. 10. OPLS modeling between Bacteroides spp. group-specific DGGE data and NMR data. (A) DGGE gel for Bacteroides spp. Mr, the maker for DGGE analysis. (B) The OPLS prediction of Bacteroides spp. DGGE bands from the NMR urinary profile data color-coded according to Q² (goodness of prediction). The OPLS modelings were obtained by NMR profile and each DGGE variable. The nearest neighbors of these arrowed bands that were associated with NMR profile are listed as follows: band 16, Bacteroides thetaiotaomicron AY895191 (100% sequence similarity); band 31, Bacteroides uniformis AB215084 (100% sequence similarity).

Table 1. Numbers of sequences, observed OTUs, singletons (one sequence per OTU), and library coverage estimation of each individual library and combined sequence set

Subjects	GG	GM	GF	UC	FA	MO	BB	Total
No. of sequences	1,145	711	1,225	1,004	1,141	701	1,328	7,255
No. of OTUs	150	123	136	160	127	154	52	476
No. of singletons	39	62	53	72	44	65	18	176
Chao1 estimator of OTUs	160.1	249.6	213	242.4	174.7	216.4	203	772.2
Good's coverage, %	97.8	96.9	98.1	97.3	97.4	93.9	99.7	97.6

Table 2. Sequence identified to be the most discriminatory for sex by OPLS-DA modeling from predominate bacteria 16S rRNA V3 DGGE, Bacteroides spp. and C. leptum subgroup specific DGGE profiles

DGGE variables	Corresponding bands	Closest relatives (Genbank accession no.)	Similarity, %
V3-76	band21	Eubacterium eligens (T)â€‚(L34420)	100
V3-121	band31	Ruminococcus obeumâ€‚(X85101)	100
V3-140	band35	Klebsiella pneumoniaeâ€‚(AY918490)	98.5
V3-146	band37	Sutterella stercoricanis (T)â€‚(AJ566849)	96.9
BAC-96	band16	Bacteroides thetaiotaomicronâ€‚(AY895191)	100
CL-77	band8	Ruminococcus bromiiâ€‚(X85099)	100

SI Text

SI Materials and Methods

Subject Selection. A seven-membered four-generation family [aged 1.5 to 95 years; adults body mass index (BMI) 16.5-25.6 kg/m²] from Hangzhou in China was selected as the study cohort (Fig. 1A). All subjects had no medical history of gastrointestinal and metabolic diseases, nor had they admitted to ingesting special diets, herbal supplements, probiotics, or antibiotics within at least 3 months before the sampling. No subjects had any active gastrointestinal symptoms such as abdominal pain, constipation, or diarrhea except MO(MO, mother), who had the symptom of constipation 15 days before the second sampling but without any medical treatment. The baby (BB, baby) was breast-fed for 3 months, then bottle-fed. GG (GG, great-grandmother), GM (GM, grandmother), and GF (GF, grandfather) lived in the same house, FA (FA, father), MO, and BB lived in another house in the same city. UC (UC, uncle) had been lived with GG, GM, and GF, but had spent 2 years in the United Kingdom before sampling in China. The simple dietary questionnaire for this family revealed that the adults living in China had a typical Chinese diet with rice as staple food, but UC adopted local European diet that was rich in red meat. All participants in this study provided their informed consent.

Sample Collection and Preparation. Fresh fecal and early morning urinary samples of the same day were collected twice with a one-month interval. Fecal sampling was performed under anaerobic conditions (GENbox anaer, BiomÃ©rieux). Collected samples were put in sterile tubes (50 ml), transferred to the laboratory immediately in an ice-box, and stored at -70Â°C after preparation within 15 min.

Fecal sample preparation. One gram wet weight fresh sample was suspended in 30 ml of sterile ice-cold sodium phosphate buffer (0.1 M SPB: 1 liter contained 1 M Na₂HPO₄ 57.7 ml, 1 M NaH₂PO₄ 42.3 ml, pH 7.4) followed by vortexing for 30 min in a 50-ml tube. The suspension was centrifuged at 200 Â´ g for 5 min three times to remove coarse particles. The cells in the supernatant were collected and washed three times by centrifuging at 9,000 Â´ g for 5 min followed by resuspension in 30 ml of fresh SPB. The washed cell pellets were resuspended in 10 ml of sterile SPB, allocated into 1-ml aliquots and stored at -70Â°C for further DNA extraction.

Urine preparation. The urine solution was agitated and then centrifuged at 10, 000 rpm for 5 min to remove insoluble materials. SPB (160 ml, 0.2 M, pH 7.4) was added to 320 ml of urine sample to minimize pH variations and 120 ml of D₂O containing 0.05% (wt/vol) 3-trimethylsilyl-1-[2,2,3,3-²H₄] propionate (TSP) was added as NMR standard reference. The mixture was transferred to 5-mm NMR tubes (Wilmad, D = 5 mm) for further analysis.

Extraction and Purification of DNA from Fecal Samples. A frozen fecal sample aliquot of 1 ml was thawed on ice and centrifuged at 9,000 Â´ g for 5 min. Cell pellets were suspended in 1 ml of buffer Z (10 mM Tris-HCl, 150 mM NaCl, pH 8.0), then 0.3 g of zirconium beads (0.1 mm) and 150 ml of phenol were added (pH 8.0). The mixtures were agitated in a mini bead beater (Biospec Products) three times, 80 s each time, and placed in the ice for 1 min after each agitation. The suspension was gently mixed with 110 ml 10% SDS and ice-bathed for 10 min, then 150 ml of chloroform-isoamyl alcohol (vol/vol, 24:1) was added and the mixture was centrifuged at 15, 000 Â´ g for 10 min. The supernatant was sequentially extracted by equal volumes of phenol, phenol/chloroform-isoamyl alcohol (vol/vol/vol 25:24:1), and chloroform-isoamyl alcohol (vol/vol, 24:1). DNA was precipitated with two volumes of ethanol and 1/10 volume of sodium acetate (3 M, pH 5.2), then collected by centrifugation (15, 000 Â´ g, 15 min), air-dried, and dissolved in 100 ml of sterile ddH₂O. RNA was digested by RNase (1 mg/ml) at 37Â°C for 30 min. The concentration of extracted DNA was determined by using a DyNA Quant 200 fluorometer (Amersham Pharmacia Biotech); its integrity and size were checked by 0.8% agarose gel electrophoresis containing 0.5 mg/ml ethidium bromide. All DNA were stored at -20Â°C before further analysis.

Near Full-Length 16S rRNA Genes Clone Library Construction and Sequences Analysis

16S rRNA genes amplification and purification. The extracted fecal DNA of one sampling point from each subject was used as template to amplify near full-length 16S rRNA genes by universal bacterial primers P0 (5'-GAGAGTTTGATCCTGGCTCAG-3') and 1492r (5'-CGGC/TTACCTTGTTACGACTT-3') (1, 2). The 25-ml reaction mixture contained 0.75 U of TaKaRa rTaq polymerase (Takara), 1Â´PCR buffer (Mg²⁺ free), 2 mM MgCl₂, 10 pmol of each primer, 200 mM each deoxynucleoside triphosphate (dNTP) and 10 ng of template DNA. A 20-cycles PCR program (3) was performed with a thermocycler PCR system (PCR Sprint, Thermo electron Corp.). PCR products were purified with DNA Gel Extraction Kit (V-gene) according to the manufacturer's instruction.

Clone library construction and sequencing. The purified amplicons were ligated with pGEM-T Easy Vector (Promega), then electro-transformed into ElectroTen-Blue Electroporation Competent Cells (Stratagene). Transformants were randomly selected by plating into SOB/amp agar plates and incubated at 37Â°C overnight. An average of 1,000 white clones of each subject was randomly picked for sequencing.

The plasmid DNA of clones was prepared by an alkaline lysis method. The inserts were sequenced bidirectionally by using Bigdye Terminator (Applied Biosystems) and 3.2 m mol/liter of T7 (5'-TAATACGACTCACTATAGGG-3') and SP6 (5'-ATTTAGGTGACACTATAG-3') sequencing primers on ABI 3730xl sequencers (Applied Biosystems). Manually checked reads with high quality and average length of 750 bp were assembled into consensus sequences by the program CodonCode Aligner (CodonCode Corp.). The sequences trimmed without vector sequences were applied to phylogenetic analysis.

Sequence alignment and phylogenetic analysis of clone library. The sequences were checked for chimeras by using CCODE (4) and Chimera Check v2.7 on Ribosomal Database Project II (RDP II) website (http://rdp.cme.msu.edu/cgis/chimera.cgi?su = SSU). Then the sequences were aligned to the RDP II database (release 9, update 37) by using the ARB package (5). Each sequence was manually checked. In total, 1,336 unambiguous filter positions were left for further analysis. Nearest neighbors of our sequences were found by RDPquery (6) that used the RDP II online database. The phylogenetic tree of all of the sequences was constructed by a neighbor-joining algorithm from an Olsen-corrected distance matrix. Sequences were grouped into OTUs at the threshold of 99% minimum similarity from Olsen-corrected distance matrix by DOTUR with the furthest neighbor algorithm and 0.001 precision (7). Novel OTUs were defined to be <99% sequence similarity with nearest neighbors in GenBank database.

The sequences obtained in this study are available in the GenBank database under accession nos. EF398274-EF405528.

Diversity and richness estimation of the clone library. Collector's curves for richness estimation for each subject and the combination of the seven subjects and rarefaction curves of observed OTU richness for the combination sequences at multiple OTU determination levels were calculated with DOTUR program (SI Figs. 4 and 5). Based on the OTU richness and abundance data, collector's and estimator's curves can be calculated to evaluate the representative of original community by the constructed clone libraries. The conservative estimation of species richness is the observed number of OTUs. The most nonparametric and stable richness estimators are Chao1 and ACE. Chao1 estimator considers only the singletons and doubletons as the undetected rare OTUs in the sources community. For ACE estimator, all OTUs containing <10 sequences are included in the calculation. Sampling effort can be evaluated by plotting the accumulated number of OTUs observed versus the number of clones sampled, where OTU number can be observed and estimated (Chao1 and ACE) corresponding to collector's curves and estimator's curves. Rarefaction curve was computed with a resampling without replacement process of subsets of the original clone library data. Along with increment of clones sampled, rarefaction richness values were determined as well as the 95% lower and upper confidence interval values, which present the randomness of that sampling effort and provide an objective means of comparing the intercommunity richness. Representative of source community by the clone library can be assessed from the shape of the curves, where strongly asymptotic plots may indicate an exhaustive sampling and weakly curvilinear plots for underestimation.

Good's coverage was calculated as [1-(n/N)] Â´100 to estimate the coverage of the clone libraries, where n is the number of singletons and N is the total number of sequences (8).

Statistical Comparison of 16S rRNA Gene Clone Libraries

Ã²-LIBSHUFF. To determine whether the significance of differences between clone libraries was caused by artifacts of sampling, Ã²-LIBSHUFF analysis (9, 10) based on Olsen-corrected dissimilarity matrix was performed to compare each pairwise combination of the Chinese and American subjects, and the P value was estimated by 10,000 random permutations of sequences between libraries.

UniFrac. UniFrac metric was applied to investigate the differences between the Chinese and American gut microbial communities. UniFrac is a recent phylogenetic method for comparing microbial communities that takes into account the degree of divergence between different sequences. The method being applied in this study was for the reason that it is powerful to compare 16S rRNA gene clone libraries from different studies (11, 12). Based on the 7,255 near full-length 16S rRNA sequences from our study and 4,401 16S rRNA sequences from other two studies (3, 13), a phylogenetic tree was constructed, exported from ARB, and measured by UniFrac. The principal coordinates analysis based on UniFrac metric was performed to overview the difference among subjects (Fig. 1D).

Denaturing Gradient Gel Electrophoresis (DGGE) Analysis of Fecal Samples

PCR amplification. The V3 regions of 16S rRNA genes and two specific DNA fragments of Bacteroides spp., C. leptum subgroup from fecal samples of two time points were amplified by universal bacterial primers (P2 5'- ATTACCGCGGCTGCTGG -3', P3 5'-CGCCCGCCGCGCGCGGCGGGCGGGGCGGGGGCACGGGGGGCCTACGGGAGGCAGCAG -3') (14) and group-specific primers, respectively. For group specific DGGE analysis, a 40-bp GC clamp (5'-CGC CCG CCG CGC GCG GCG GGC GGG GCG GGG GCA CGG GGG G) was attached to the 5' end of each reserve primers (14). The 16S rRNA-V3 PCR amplification used the hot-start touchdown protocol described by Muyzer et al. (14), and the reaction mixture contained 1 unit of TaKaRa (Takara) rTaq polymerase, 1Â´ PCR buffer (Mg²⁺ free), 2 mM MgCl₂, 10 pmol of each primer, 200 mM each deoxynucleoside triphosphate (dNTP) and 20 ng of extracted fecal DNA in a total volume of 25 ml.

For the group-specific PCR amplification, 230-bp specific fragments of Bacteroides spp. were amplified with primers Bfr-F (5'-CTGAACCAGCCAAGTAGCG-3') and Bfr-R (5'-CCGCAAACTTTCACAACTGACTTA-3') and by procedure as described by Pang (15). The 239-bp products of C. leptum subgroup were amplified with primers Clept-F (5'-GCACAAGCAGTGGAGT-3') and Clept-R (5'- CTTCCTCCGTTTTGTCAA-3') and by procedure as described by Shen (16).

To minimize heteroduplex formation and single-stranded DNA (ssDNA) contamination during PCR amplification that might cause sequence heterogeneity in a single DGGE band, an additional 5 cycles of reconditioning PCR (17) and PCR products purification by polyacrylamide gel electrophoresis (d-PAGE) were preformed before DGGE analysis according our previous work (18). All PCRs were performed in a thermocycler PCR system (PCR Sprint, Thermo electron). The PCR products were checked by electrophoresis on 1% (wt/vol) agarose gel containing 0.5 mg/ml ethidium bromide, and the concentrations were measured by using a DyNA Quant 200 fluorometer (Amersham Pharmacia Biotech), then stored at -20Â°C before DGGE analysis.

DGGE gel analysis. Parallel DGGE was performed with a Dcode System apparatus (Bio-Rad) according to the manufacturer's protocol. The 16S rRNA genes V3 regions and group-specific PCR products were electrophoresed in 8% (wt/vol) polyacrylamide gels. The 16S rRNA-V3 DGGE gel contained a linear 27% to 50% denaturant gradient (100% denaturant corresponds to 7 M urea and 40% deionized formamide). For Bacteroides spp. specific analysis, the DGGE gel contained denaturant gradient from 22.5% to 45%, C. leptum DGGE gel contained a denaturant gradient from 30% to 52.5%. The same amount of DNA from each subject was parallel loaded in each lane of DGGE gel. Electrophoresis was performed in 1 Â´ Tris-acetate-EDTA (TAE) buffer at a constant voltage of 200 V and a temperature of 60Â°C for 240 min. The DNA bands were stained by SYBR green I (Amresco) and photographed with UVI gel documentation system (UVItec).

Identification of DGGE bands. DGGE bands were excised from original gels, and eluted in 100 ml of sterile distilled water at 4Â°C overnight. A 2-ml elution was used as template to reamplify the DNA fragments in the excised bands with corresponding primers. PCR products were excised from 1.0% agarose gel and purified with DNA Gel Extraction Kit (V-gene), then ligated with pGEM-T Easy Vector (Promega), transformed into competent Escherichia coli DH5a cells. The inserted DNA of positive clones were amplified by corresponding primers and electrophoresed by DGGE to verify the position to the original band. The clones migrating to the same position with the original band were sequenced (Invitrogen).

The sequences of excised DGGE bands were submitted to RDP II release 9 database to determine their closest isolate relatives with length >1,200 bp. The phylogenetic tree based on sequences of the closest neighbors of DGGE bands was constructed by a neighbor-joining algorithm from a felsenstein-corrected distance matrix by using the ARB software package.

The sequences of excised DGGE bands are available in the GenBank database under accession nos. EF395822-EF395843.

¹H NMR spectroscopic analysis of urine. Analysis of urine was carried out at 25Â°C on a Varian INOVA-600 spectrometer, operating at 600.13 MHz for ¹H resonance frequency, equipped with a 5 mm ID cryo-probe. A standard one-dimensional pulse sequence (RD - 90Â° - t₁ - 90Â° - t_m - 90Â° ACQ) was used. Water suppression was achieved with saturation pulse during the recycle delay (2 s) and mixing time (100 ms). 128 scan transients were collected into 32,768 data points for each spectrum with a width of 20 ppm; total repetition time was 3 s.

Two-dimensional NMR spectroscopy. ¹H-¹H COSY, ¹H-¹H TOCSY, ^IH-¹³C HSQC, and ^IH-¹³C HMBC 2D NMR spectra were recorded at 298k on a Varian INOVA-600 spectrometer equipped with a 5-mm inverse probe for selected samples to aid metabolite identification (SI Table 3). Water suppression was achieved with a weak irradiation on the water peak during the recycle delay (RD) of 1.5 s in these experiments. ¹H-¹H COSY 2D NMR spectra were acquired by using a standard COSY-90 sequence with gradient selection. The 90Â° pulse length was 9.1 ms and 2,048 data points were collected with 64 transients. A total of 128 increments were collected for the evolution. ¹H-¹H TOCSY 2D NMR spectra were recorded by using the MLEV-17 spin-lock scheme with STATES phase increments. Sixty-four transients were collected into 2,048 data points for 200 increments with a spectral width of 6,786 Hz. The mixing time was 80 ms and the spin-lock power was adjusted to be equivalent to 7 kHz. ^IH-¹³C heteronuclear single-quantum correlation (HSQC) 2D NMR spectra were collected in the phase-sensitive mode by using gradient selection. Composite pulse broadband ¹³C decoupling (globally alternating optimized rectangular pulses, GARP) was used during the acquisition period. Typically, 2,048 data points with 256 scans per increment and 200 increments were acquired with spectral widths of 6,300 Hz in the ¹H and 25,641 Hz in the ¹³C dimensions. ¹H-¹³C HMBC 2D NMR spectra were acquired by using the gradient-selected sequence with 400 transients per increment for 100 increments collected into 2,048 data points. A spectral width of 6,786 Hz in the ¹H dimension and 31,496 Hz in the ¹³C dimension was used. For all 2D NMR spectra, the data were zero-filled to 2,048 in both dimensions before Fourier transformation (FT).

Data Preprocessing

DGGE data preprocessing. Each DGGE fingerprint lane was read by software ImageJ (http://rsb.info.nih.gov/ij/) as a spectrum containing 249 variables, representing the band position and band intensity of a sample. The acquired spectra were then linearly aligned to rectify the band mobile shifts by two steps. First, the predominate band shared by the majority of members with the same position or distance with one specific band of the standard marker were determined to be the same band, which was also confirmed by the band identification described previously. Then the determined predominate band peaks were selected as the internal standard to rectify the band mobile shifts by using the in-house programs under the Matlab environment.

¹H NMR data preprocessing. The acquired original ¹H NMR spectra were read by XWIN-NMR3.5 as Bruker files and transferred to the Matlab environment for further pretreatment and analysis by using in-house routines. The spectra were interpolated on a common chemical shift scale by using cubic spline interpolation and calibrated according to TSP. The water/urea regions (4.5-6.0 ppm) were removed from all spectra. Each spectrum was then normalized by its total intensity.

Multivariate Statistical Analysis

Multivariate statistical analysis was performed in the Matlab environment by using an in-house program. Unless otherwise stated, mean-centered DGGE and NMR data were used.

In OPLS-DA and OPLS analysis, two time points of baby were excluded because she is too different from the others in the clone library analysis as well as PCA of DGGE profiling. To keep analysis consistent, one time point of GM (GM2) was also excluded because her corresponding urine sample was polluted during sample collection.

Principal components analysis (PCA). PCA (19), an unsupervised method, was always initially used to observe the structure of the acquired data (DGGE and ¹H NMR data), such as clustering and outliers.

Orthogonal projections to latent structure-discriminant analysis (OPLS-DA). Once clustering tendency was observed in PCA, a supervised method, OPLS-DA (20, 21), was used to investigate the potential association between the separation and the suspected factors, for instance, sex. In OPLS-DA, a data matrix was regressed against a dummy matrix of ones and zeros indicating class. The model was evaluated by assessment of the cross-validated scores from the model based on fivefold cross-validation (data were mean-centered and scaled to unit variance before modeling). Suspected factor-related variables were determined by interpretation of individual OPLS-DA model.

Orthogonal projection to latent structure (OPLS). OPLS modeling (20-22) was used to investigate the association between the NMR profile and each DGGE variable. In OPLS, the whole ¹H NMR data were regressed on individual DGGE variables by using one OPLS predictive component and one DGGE-orthogonal component. The predictive performance was evaluated by fivefold cross-validation, where the Q² (>0.4) value (23) (goodness of prediction) was used to select DGGE bands of interest. Bands that were statistically significant but not representative [for instance, the asterisk-marked band in the DGGE gel appeared only in one sample (SI Fig. 9B)] were not considered. ¹H NMR peaks (urine metabolites) related to the DGGE bands were determined by interpretation of each respective OPLS model.

SI Results

Phylogenetic Analysis of 16S rRNA Gene Clone Library

A total of 7,255 nonchimeric near full-length 16S rRNA gene sequences were subjected to phylogenetic analysis. The phylogenetic tree in SI Fig. 6 showed that the fecal bacteria community was affiliated with six major phyla: Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria, Verrucomicrobia, and Fusobacteria. The phylum distribution of combined community and each subject was illustrated in Fig. 1C, showing the predominance of phyla Firmicutes and Bacteroidetes in the human gut bacterial community.

Based on 99% sequence similarity, the total clones were assigned to 476 OTUs with 203 novel OTUs that had <99% sequence similarity to phylogenetic relatives aligned with the public databases (GenBank and RDP II). Among these OTUs, 364 (65% of the total sequences) were classified into Firmicutes in which 159 were novel. The majority of sequences (4,695 of 7,255) belonged to Clostridia class with 360 OTUs, and 159 of them were novel. The phylogenetic information of phyla Bacteroidetes (2,334 sequences, 89 OTUs, 36 novel OTUs) and Proteobacteria (167, 16, 7), Actinobacteria (18, 4, 0), Verrucomicrobia (8, 2, 1), and Fusobacteria (4, 1, 0) were shown in the phylogenetic tree (SI Fig. 6). The detailed distribution of OTUs in each subject is shown in SI Table 1.

The diversity of fecal bacteria community of each and the combined library were calculated by Good's coverage which was >93%, indicating that the 16S rRNA gene sequences from these samples represented the majority of the human intestinal bacteria community in this study (SI Table 1).

At gut microbial division level, the distribution of predominant phylum in Chinese feces was similar to that of American subjects (Fig. 1C and SI Fig. 6). The percentage of relative sequences in Firmicutes was slightly lower in Chinese feces than in American (65.1% vs. 67.5%), whereas the percentage was higher in Bacteroidetes (32.2% vs. 30.9%). The approximately identical proportion of division in the microbial community of human gut reflected the evolutionary selection pressure of the host on gut microbial colonization (24).

However, every pairwise comparison of fecal clone libraries from Chinese and American subjects by Ã²-LIBSHUFF analysis was significantly different (every pairwise comparison, P value Â£ 0.025) (data were not shown), which indicated each subject has a unique gut microbial community.

The result of the UniFrac method based on phylogenetic distances of sequences, which is suitable for large-scale comparisons of libraries from many diverse studies (12), revealed the distinct differences between the Chinese and American gut microbial community, indicating the population difference in human gut flora likely due to genetic specificities, dietary difference, or other factors.

DGGE Analysis of Gut Microbiota

The dominant gut bacterial community structure from the Chinese family members at two sampling times were illustrated in 16S rRNA gene V3 region (SI Fig. 9B) and two specific groups, Bacteroides spp. (SI Fig. 10A) and C. leptum subgroup DGGE patterns (Fig. 2A). Most of V3 DGGE bands were identified by sequencing; these are listed in SI Fig. 9C. Identification of some bands at the same position from different samples showed that most of them were the same amplicons. The identification of V3 DGGE bands provided the phylogenetic information at species level for each band, and also reflected the bacterial distribution at division level in gel from the top to the bottom as follows: phylum Bacteroidetes, Firmicutes, Proteobacteria, and Actinobacteria (SI Fig. 9C), indicating that the DGGE technique is robust not only to assess the gut microbial community structure and monitor the community dynamics, but also to provide the detailed information of community composition from division level to species level (25).

In PCA, PC1-PC2 plot of 16S rRNA gene V3 region DGGE profile (SI Fig. 7A) shows the separation in predominant gut microbiota between the females and males. PCA of Bacteroides spp. and C. leptum subgroup specific DGGE profiles also showed the separation due to sex (SI Fig. 7 B and C). To more clearly clarify the variables related to the sex-induced separations, OPLS-DA was subsequently performed on individual DGGE profile. OPLS-DA score plots of V3 DGGE (SI Fig. 8A) and two group specific DGGE profiles (SI Fig. 8 C and E) all showed the existence of sex-associated separation, and the key bands contributing to the separation were selected by OPLS-DA coefficients (Fig. 8 B, D, and F), and further identified by sequencing (for detailed information, see SI Table 2).

Although effects of sex on gut microbiota (for the group Bacteroidetes-Prevetella including Bacteroides) have also been reported in a cohort study with 230 subjects from four European countries by using a fluorescence in situ hybridization (FISH) approach, this method is limited to discrimination of bacteria at the superfamily taxonomic level and no species identifications were made (26). Our studies indicate a sex-specific microbiome that may arise as a result of host physiological selection pressures in the gut environment.

In addition to sex separation in PCA of DGGE profiling, the mother-daughter clustering was also found in PCA analysis; for instance, GG and GM always clustered in PCA of V3 DGGE and C. leptum DGGE, and the baby was also nearer to her mother (MO) in PCA of V3 DGGE (SI Fig. 7), indicating that genetic relatedness might also have an impact on gut bacteria. However, it should be studied further.

¹H NMR Data Analysis

PCA (PC1 vs. PC2) of ¹H NMR spectral profiles also showed the suspected sex-related difference in urine metabolites of family members (SI Fig. 7D). The sex-related separation was observed by OPLS-DA and the key metabolites, 3-aminoisobutyrate (BAIB) and creatine, were considered as the most discriminatory for sex in human urine (SI Fig. 8 G and H), in which BAIB was higher in males of family members, but the creatine was lower.

1. DiCello F, et al. (1997) Biodiversity of a Burkholderia cepacia population isolated from the maize rhizosphere at different plant growth stages. Appl Environ Microbiol 63:4485-4493.

2. Hayashi H, Sakamoto M, Benno Y (2002) Phylogenetic analysis of the human gut microbiota using 16S rDNA clone libraries and strictly anaerobic culture-based methods. Microbiol Immunol 46:535-548.

3. Eckburg PB, et al. (2005) Diversity of the human intestinal microbial flora. Science 308:1635-1638.

4. Gonzalez JM, Zimmermann J, Saiz-Jimenez C (2004) Evaluating putative chimeric sequences from PCR-amplified products. Bioinformatics 21:333-375.

5. Ludwig W, et al. (2004) ARB: A software environment for sequence data. Nucleic Acids Res 32:1363-1371.

6. Dyszynski G, Sheldon WM (2006). RDPquery: A Java program from the Sapelo Program Microbial Observatory for automatic classification of bacterial 16S rRNA sequences based on Ribosomal Database Project taxonomy and Smith-Waterman alignment. Available at: http://simo.marsci.uga.edu/public_db/rdp_query.htm.

7. Schloss PD, Handelsman J (2005) Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 71:1501-1506.

8. Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237-264.

9. Schloss PD, Larget BR, Handelsman J (2004) Integration of microbial ecology and statistics: a test to compare gene libraries. Appl Environ Microbiol 70:5485-5492.

10. Singleton DR, Furlong MA, Rathbun SL, Whitman WB (2001) Quantitative comparisons of 16S rRNA gene sequence libraries from environmental samples. Appl Environ Microbiol 67:4374-4376.

11. Lozupone C, Hamady M, Knight R (2006) UniFrac-an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics 7:371.

12. Lozupone C, Knight R (2005) UniFrac: A new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71:8228-8235.

13. Gill SR, et al. (2006) Metagenomic analysis of the human distal gut microbiome. Science 312:1355-1359.

14. Muyzer G, de Waal EC, Uitterlinden AG (1993) Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Appl Environ Microbiol 59:695-700.

15. Pang X, et al. (2005) Molecular profiling of Bacteroides spp. in human feces by PCR-temperature gradient gel electrophoresis. J Microbiol Methods 61:413-417.

16. Shen J, et al. (2006) Molecular profiling of the Clostridium leptum subgroup in human fecal microflora by PCR-denaturing gradient gel electrophoresis and clone library analysis. Appl Environ Microbiol 72:5232-5238.

17. Thompson JR, Marcelino LA, Polz MF (2002) Heteroduplexes in mixed-template amplifications: formation, consequence and elimination by 'reconditioning PCR'. Nucleic Acids Res 30:2083-2088.

18. Zhang X, et al. (2005) Optimized sequence retrieval from single bands of temperature gradient gel electrophoresis profiles of the amplified 16S rDNA fragments from an activated sludge system. J Microbiol Methods 60:1-11.

19. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Phil Mag 2:559-572.

20. Cloarec O, et al. (2005) Statistical total correlation spectroscopy: An exploratory approach for latent biomarker identification from metabolic H-1 NMR datasets. Anal Chem 77:1282-1289.

21. Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemometrics 16:119-128.

22. Rantalainen M, et al. (2006) Statistically integrated metabonomic-proteomic studies on a human prostate cancer xenograft model in mice. J Proteome Res 5:2642-2655.

23. Trygg J, Wold S (2003) O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral OSC filter. J Chemometrics 17:53-64.

24. Ley RE, Peterson DA, Gordon JI (2006) Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell 124:837-848.

25. Ercolini D (2004) PCR-DGGE fingerprinting: novel strategies for detection of microbes in food. J Microbiol Methods 56:297-314.

26. Mueller S, et al. (2006) Differences in fecal microbiota in different European study populations in relation to age, gender, and country: a cross-sectional study. Appl Environ Microbiol 72:1027-1033.