Skip to main content
PLOS One logoLink to PLOS One
. 2022 Aug 1;17(8):e0272354. doi: 10.1371/journal.pone.0272354

MiCloud: A unified web platform for comprehensive microbiome data analysis

Won Gu 1,#, Jeongsup Moon 2,#, Crispen Chisina 1, Byungkon Kang 3, Taesung Park 2,4,‡,*, Hyunwook Koh 1,‡,*
Editor: Jean-François Humbert5
PMCID: PMC9342768  PMID: 35913976

Abstract

The recent advance in massively parallel sequencing has enabled accurate microbiome profiling at a dramatically lowered cost. Then, the human microbiome has been the subject of intensive investigation in public health and medicine. In the meanwhile, researchers have developed lots of microbiome data analysis methods, protocols, and/or tools. Among those, especially, the web platforms can be highlighted because of the user-friendly interfaces and streamlined protocols for a long sequence of analytic procedures. However, existing web platforms can handle only a categorical trait of interest, cross-sectional study design, and the analysis with no covariate adjustment. We therefore introduce here a unified web platform, named MiCloud, for a binary or continuous trait of interest, cross-sectional or longitudinal/family-based study design, and with or without covariate adjustment. MiCloud handles all such types of analyses for both ecological measures (i.e., alpha and beta diversity indices) and microbial taxa in relative abundance on different taxonomic levels (i.e., phylum, class, order, family, genus and species). Importantly, MiCloud also provides a unified analytic protocol that streamlines data inputs, quality controls, data transformations, statistical methods and visualizations with vastly extended utility and flexibility that are suited to microbiome data analysis. We illustrate the use of MiCloud through the United Kingdom twin study on the association between gut microbiome and body mass index adjusting for age. MiCloud can be implemented on either the web server (http://micloud.kr) or the user’s computer (https://github.com/wg99526/micloudgit).

Introduction

The human microbiome is the entire set of all microbes that live in and on the human body. The recent advance in massively parallel sequencing has enabled accurate microbiome profiling at a dramatically lowered cost. Then, the human microbiome has been the subject of intensive investigation in public health and medicine. Researchers have, for example, found numerous microbiome-associated disorders (e.g., obesity [1, 2], diabetes [3, 4], inflammatory bowel disease [5], cancers [611]), behavioral/environmental factors (e.g., diet [12], residence [13], smoking [14]), medical interventions (e.g., antibiotics [3], non-antibiotic drugs [15]), and so forth.

In the meanwhile, researchers have also developed lots of microbiome data analysis methods, protocols and/or tools. For example, microbiome profiling has been streamlined by the recent bioinformatic pipelines, such as QIIME [16], MG-RAST [17], Mothur [18], MEGAN [19] and MetaPhlAn [20]. Researchers can thereby easily process raw sequence data from either 16S rRNA amplicon sequencing [16, 21] or shotgun metagenomics [22], and acquire precise metagenomic information on microbial abundance, taxonomic annotation, gene/functional attribute and phylogenetic tree [23]. A variety of downstream data analysis methods have also been developed for ecological (e.g., PERMANOVA [2426], MiRKAT [27, 28], aMiAD [29]), taxonomical (e.g., metagenomeSeq [30], ANCOM [31]) and functional (e.g., STAMP [32]) analysis, and their software packages are widely available.

We especially note here that recent web platforms have empowered user-friendly and interactive operations over the past command-line analytic tools. Besides, the web platforms provide standardized protocols for a long sequence of analytic procedures in data filtering, quality control, data transformation and analysis. Hence, even non-professional programmers like clinicians and biologists can easily deal with the microbiome data, and it is straightforward to reproduce the results. However, existing web platforms for downstream microbiome data analysis, including MicrobiomeAnalyst [33], METAGENassist [34] and EzBioCloud [35], can handle only a categorical trait of interest (e.g., diseased vs. healthy, treatment vs. placebo), cross-sectional study, and the analysis with no covariate adjustment. Yet, in microbiome studies, researchers often employ family-based or longitudinal study designs [2, 3] to survey different types of traits. Especially, in observational studies, the covariate-adjusted analyses are necessary to properly control for potential confounders (e.g., age, gender).

We introduce here a unified web platform, named MiCloud, for comprehensive microbiome data analysis. MiCloud performs microbiome data analysis for a binary (e.g., diseased vs. healthy, treatment vs. placebo) or continuous (e.g., body mass index, immune/metabolic activity level, brain quotient) trait of interest, cross-sectional or longitudinal/family-based study design, and with or without covariate adjustment with respect to both ecological measures (i.e., alpha and beta diversity indices) and microbial taxa in relative abundance on different taxonomic levels (i.e., phylum, class, order, family, genus and species). Moreover, MiCloud provides a unified analytic protocol that streamlines data inputs (individual and integrated data forms), quality controls (with respect to kingdom, library size, mean proportion and taxonomic name), data transformations (various alpha and beta diversity indices, and taxonomic abundance forms of count, rarefied count [36], proportion and centered log-ratio (CLR) [37]), statistical methods (various methods for different study designs, data forms and analytic schemes) and visualizations (various plots for data summary and ecological/taxonomical analyses) that are suited to microbiome data analysis. Therefore, users can enjoy comprehensive microbiome data analysis on user-friendly web environments with vastly extended utility and flexibility. MiCloud can be implemented on the web server or locally on the user’s computer when the web server is busy.

The rest of the paper is organized as follows. In Materials and Methods, all the details on the machinery of MiCloud are dissected, compared with the other existing web platforms, MicrobiomeAnalyst [33], METAGENassist [34] and EzBioCloud [35], that intensely handle downstream data analysis rather than raw sequence data processing and microbiome profiling. In Results, we illustrate the use of MiCloud through the reanalysis of the United Kingdom (UK) twin data on the association between gut microbiome and body mass index (BMI) adjusting for age [2]. In Discussion, we discuss potential extensions and implementations of MiCloud.

Materials and methods

MiCloud consists of three main components, named Data Processing, Ecological Analysis and Taxonomical Analysis, and many sub-components as in Fig 1. First, in Data Processing, users can upload microbiome data in different formats through Data Input, and then perform data filtering and quality controls through Quality Control. Then, users can move to either Ecological Analysis or Taxonomical Analysis. In Ecological Analysis, users can calculate ecological measures (i.e., alpha diversity and beta diversity indices) through Diversity Calculation, and then perform comparative/association analyses through Alpha Diversity and Beta Diversity. In Taxonomical Analysis, users can normalize taxonomic abundances in different ways through Data Transformation, and then perform comparative/association analyses for microbial taxa in relative abundance through Comparison/Association. MiCloud can handle all types of comparative/association analyses for a binary or continuous trait of interest, cross-sectional or longitudinal/family-based study design, and with or without covariate adjustment while the other existing web platforms cannot handle a continuous trait, longitudinal/family-based study design and covariate-adjusted analysis (Table 1).

Fig 1. Overview of MiCloud.

Fig 1

MiCloud consists of three main components, Data Processing, Ecological Analysis and Taxonomical Analysis and many sub-components.

Table 1. The characteristics of MiCloud distinguished from the other existing web platforms, MicrobiomeAnalyst [33], METAGENassist [34] and EzBioCloud [35].

MiCloud MicrobiomeAnalyst METAGENassist EzBioCloud
Data Processing
    Data input Individual & Integrated data Individual data Individual data Raw sequence data
Ecological Analysis
Alpha Diversity
    Longitudinal? O X X X
    Continuous trait? O X X X
    Covariate(s)? O X X X
Beta Diversity
    Longitudinal? O X X X
    Continuous trait? O X X X
    Covariate? O X X X
Taxonomical Analysis
    Longitudinal? O X X X
    Continuous trait? O X X X
    Covariate? O X X X
Implementation Facility
      Local? O X X X

There are many other statistical methods that can be considered for microbiome downstream data analysis, but the rationale for the selected statistical methods (Fig 1) is in their popularity, statistical validity and easy interpretation/presentation of the results as follows.

First, for the cross-sectional studies, statistical methods based on the independence assumption have been widely used. The Welch t-test and Wilcoxon rank-sum test [38] can be used for non-covariate adjusted comparative analysis with a nice graphical presentation using box plots and summary statistics such as, mean, minimum, Q1, median, Q3 and maximum values. The linear regression, logistic regression, negative binomial regression, and beta regression models can be used for the continuous, binary, count and proportional response variables, respectively, with or without covariate adjustment, where the estimated regression coefficients, standard errors, confidence intervals, and P-values serve as a breadth of statistical inference facilities for the effect direction and size, variability and significance. The forest plot, line graph, and/or dendrogram can also efficiently summarize the results. Lastly, microbiome regression-based kernel association test (MiRKAT) [27, 28] has recently been highlighted for the beta-diversity analysis with or without covariate adjustment, where the principal coordinate analysis (PCoA) plot [39] can nicely summarize the results.

Second, in the longitudinal/family-based microbiome data, the repeated measurements from the same subject or the subjects from the same family tend to be correlated with each other because of the shared genetic components and environmental factors (e.g., diet, residence, etc). Hence, the statistical methods based on the independence assumption described above are not statistically valid, leading to inflated type I error rates, for longitudinal/family-based studies. Hence, we selected a series of statistical methods that are based on the random effects models (i.e., the linear mixed model (LMM) [40] and generalized linear mixed model (GLMM) [41]) or generalized estimating equations (GEE) [42] for both ecological and taxonomical analyses because of their well-known statistical validity (i.e., robust controls of type I error rate) for correlated data analysis. The results can also be presented using a breadth of statistical inference facilities, summary statistics and visualizations.

More details on each sub-component are addressed in following sections.

Data processing: Data input

MiCloud requires four data components: feature table, taxonomic table, metadata, and phylogenetic tree. Users can upload them individually or in a single integrated format, called phyloseq [43]. The feature table is the count table where rows are OTUs or ASVs and columns are subjects. Users can upload it in a tab-delimited text (.txt), comma-delimited text (.csv) or biological observation matrix (BIOM) format [44]. Especially, the BIOM format is the most widely used output format in many popular microbiome profiling pipelines, such as QIIME [16], MG-RAST [17], Mothur [18], MEGAN [19] and MetaPhlAn [20]; hence, users can directly upload it with no hassles. The taxonomic table should contain taxonomic names for microbial features (OTUs or ASVs) on seven taxonomic ranks, kingdom/domain, phylum, class, order, family, genus and species. Users can upload it in a tab-delimited text (.txt) or comma-delimited text (.csv) format. The metadata should contain variables for the subjects that are, for example, on host phenotypes, medical interventions, health/disease status, demographics, and so forth. Users can upload it in a tab-delimited text (.txt) or comma-delimited text (.csv) format. The phylogenetic tree represents evolutionary relationships across microbial features (OTUs or ASVs). Users can upload it in a Newick (.tre or.nwk) format. phyloseq is a well-organized microbiome data format that integrates all the four data components in a single R object, and it can be uploaded using a.rdata or.rds file. Once the data are uploaded, MiCloud verifies them before advancing to next steps. By default, MiCloud matches feature IDs and subject IDs across the four data components, and makes a rooted phylogenetic tree (if it is not rooted) through midpoint rooting [45].

Distinguished from the other existing web platforms, MiCloud can take the integrated data format, phyloseq (Table 1). EzBioCloud takes raw sequence data as inputs and performs microbiome profiling, while MiCloud, MicrobiomeAnalyst [33] and METAGENassist [34] do not (Table 1). Nephele [46], Qiita [47] and PUMAA [48] also take raw sequence data as inputs and perform comprehensive microbiome profiling for the 16S rRNA amplicon sequencing and/or shotgun metagenomics, yet they conduct only few exploratory downstream data analysis. For the raw sequence data processing and microbiome profiling, we recommend other popular and well-established bioinformatic pipelines, such as Nephele [46], QIIME2 (q2studio) [49], Qiita [47] and PUMAA [48] for web platforms, or QIIME [16], QIIME2 (q2cli) [49], MG-RAST [17], Mothur [18], MEGAN [19] and MetaPhlAn [20] for command line interfaces.

Data processing: Quality control

MiCloud performs data filtering and quality controls in four criteria, kingdom, library size (i.e., total read count), mean proportion and taxonomic names as follows. Users can first type a kingdom of interest, which is, for example, Bacteria (default) for 16S data, Fungi for Internal Transcribed Spacer (ITS) [50] data or any other kingdom of interest for shotgun metagenomics. Then, users can remove subjects that have low library sizes (e.g., < 3,000 total read count (default)) and features (OTUs or ASVs) that have low mean proportions (e.g., < 0.002% (default)) using a slide bar. By default, MiCloud removes monotone and singleton features as they are likely to be sequencing errors and have almost no variation to be handled in downstream data analysis. Users can also remove erroneous taxonomic names in the taxonomic table that are completely or partially matched with the specified character strings, such as “uncultured”, “incertae”, “Incertae”, “unidentified”, “unclassified”, “unknown”, “metagenome”, “gut metagenome”, “mouse gut metagenome”.

MiCloud visualizes microbiome data using summary boxes, histograms and box plots. The sample size and numbers of features (OTUs or ASVs), phyla, classes, orders, families, genera and species of the microbiome data are displayed in summary boxes. Library sizes across subjects and mean proportions across features are displayed in adjustable histograms and box plots. The graphs are updated in real-time to any changes in data filtering and quality controls. As such, users can interactively perform data filtering and quality controls. For additional reference, MiCloud rarefies the count data to control varying library sizes [36]. The graphs and data after quality controls can be downloaded, where the graphs are especially in high resolution and appropriate size to be published.

Ecological analysis: Diversity calculation

MiCloud performs ecological analyses in alpha diversity (a.k.a. within-sample diversity) and beta diversity (a.k.a. between-sample diversity). MiCloud calculates nine alpha diversity indices (i.e., Observed, Shannon [51], Simpson [52], Inverse Simpson [52], Fisher [53], Chao1 [54], abundance-based coverage estimator (ACE) [55], incidence-based coverage estimator (ICE) [56] and phylogenetic diversity (PD) [57]) and five beta diversity indices (i.e., Jaccard dissimilarity [58], Bray-Curtis dissimilarity [59], Unweighted UniFrac distance [60], Generalized UniFrac distance [61] and Weighted UniFrac distance [62]) (Fig 1). These indices are a proper mixture of richness and evenness, count and proportion with or without phylogenetic tree incorporation. MiCloud uses rarefied count data to calculate alpha diversity indices and count-based beta diversity indices (i.e., Jaccard dissimilarity [58] and Bray-Curtis dissimilarity [59]) because varying library sizes can heavily affect these indices [63]. For reference, the calculated diversity indices can be downloaded.

Ecological analysis: Alpha diversity

MiCloud performs comparative/association analyses in alpha diversity. Users first need to click a tab for cross-sectional or longitudinal/family-based data analysis. More details on each are as follows.

Cross-sectional (Fig 1)

Users first need to choose a primary variable that is a major trait of interest, such as host phenotypes, medical interventions and health/disease status, using a drop-down list. MiCloud automatically detects if it is binary or continuous. Then, MiCloud gives a chance to rename the categories (if it is binary) or the variable name (if it is continuous) to be appropriately displayed in later graphs. Then, users can choose covariates, such as age and gender, or not. Then, MiCloud lists statistical methods as follows. For a binary trait with no covariates, the Welch t-test and Wilcoxon rank-sum test [38] are listed. For a binary trait with covariates, the linear regression (with each alpha diversity index as a response, and the primary variable as a predictor) and the logistic regression (with the primary variable as a response, and each alpha diversity index as a predictor) are listed. For a continuous trait with or without covariates, the linear regression (with each alpha diversity index as a response, and the primary variable as a predictor) is listed. Lastly, users can address the multiplicity issue or not. For the multiple testing adjustment, the Benjamini-Hochberg (BH) procedures [64] can be employed to control false discovery rate (FDR).

Longitudinal (Fig 1)

All the widgets for the cross-sectional data analysis (i.e., primary variable, rename categories/variable, covariate(s), method and multiple testing adjustment) are retained for the longitudinal/family-based data analysis, yet there are some additional widgets for the longitudinal/family-based data analysis as follows. First, users need to choose a cluster variable that contains, for example, subject IDs for repeated measurements or family IDs for family-based studies. Second, MiCloud lists statistical methods as follows. For a binary trait with or without covariates, LMM [40] (with each alpha diversity index as a response, and the primary variable as a predictor), GEE (Binomial) [42] and GLMM (Binomial) [41] (with the primary variable as a response, and each alpha diversity index as a predictor) are listed. For a continuous trait with or without covariates, LMM (with each alpha diversity index as a response, and the primary variable as a predictor) is listed.

MiCloud visualizes the results using box plots, line graphs or forest plots, calculates summary statistics, and organizes them in output tables. The graphs (by clicking the right mouse button on the plot then through “Save Image as”) and output tables can be downloaded, and the graphs are in high resolution and appropriate size to be published.

Ecological analysis: Beta diversity

MiCloud performs comparative/association analyses in beta diversity. As in alpha diversity analysis, users first need to click a tab for cross-sectional or longitudinal/family-based data analysis. More details on each are as follows.

Cross-sectional (Fig 1)

All the widgets for the cross-sectional data analysis in alpha diversity (i.e., primary variable, rename categories/variable, covariate(s) and method) are retained for the cross-sectional data analysis in beta diversity. Yet, in methodology, MiRKAT [27, 28] is listed.

Longitudinal (Fig 1)

The widgets in the longitudinal/family-based data analysis in alpha diversity (i.e., primary variable, rename categories/variable, cluster variable, covariate(s) and method) are retained in the longitudinal/family-based data analysis in beta diversity. Yet in methodology, generalized linear mixed model—microbiome regression-based kernel association test (GLMM-MiRKAT) [28, 65] is listed.

The results are visualized using PCoA plots [39]. Again, the graphs (by clicking the right mouse button on the plot then through “Save Image as”) and output tables can be downloaded, and the graphs are in high resolution and appropriate size to be published.

Taxonomical analysis: Data transformation

MiCloud considers four commonly used taxonomic abundance forms of count, rarefied count [36], proportion and CLR [37]. For the CLR transformation, MiCloud replaces zeros with non-zero values using the Bayesian multiplicative replacement [66]. For reference, users can download all the four data forms.

Taxonomical analysis: Comparison/association

MiCloud performs comparative/association analysis for microbial taxa in relative abundance on different taxonomic levels (i.e., phylum, class, order, family, genus and species). As in ecological analysis, users first need to click a tab for cross-sectional or longitudinal/family-based data analysis. More details on each are as follows.

Cross-sectional (Fig 1)

The widgets for the cross-sectional data analysis in alpha diversity (i.e., primary variable, rename categories/variable, covariate(s) and method) are retained in the cross-sectional data analysis for microbial taxa in relative abundance, yet there are some additional widgets as follows. First, users need to choose a data form among CLR (default), count and proportion. Second, MiCloud lists statistical methods that are suited to the chosen data form as follows.

  1. CLR. For a binary trait without covariates, the Welch t-test, Wilcoxon rank-sum test [38], linear regression (with each taxon as a response, and the primary variable as a predictor) and logistic regression (with the primary variable as a response, and each taxon as a predictor) are listed. For a binary trait with covariates, the linear regression (with each taxon as a response, and the primary variable as a predictor) and the logistic regression (with the primary variable as a response, and each taxon as a predictor) are listed. For a continuous trait with or without covariates, the linear regression (with the primary variable as a response, and each taxon as a predictor) is listed.

  2. Count. For a binary trait without covariates, the Welch t-test, Wilcoxon rank-sum test [38] and logistic regression (with the primary variable as a response, and each taxon as a predictor) using rarefied count data, and the negative binomial regression (with each taxon as a response, and the primary variable as a predictor) using original count data with the library size (total read count) as an offset variable are listed. For a binary trait with covariates, the logistic regression (with the primary variable as a response, and each taxon as a predictor) using rarefied count data, and the negative binomial regression (with each taxon as a response, and the primary variable as a predictor) using original count data with the library size (total read count) as an offset variable are listed. For a continuous trait with or without covariates, the negative binomial regression (with each taxon as a response, and the primary variable as a predictor) using original count data with the library size (total read count) as an offset variable is listed.

  3. Proportion. For a binary trait without covariates, the Welch t-test, Wilcoxon rank-sum test [38], logistic regression (with the primary variable as a response, and each taxon as a predictor) and beta regression (with each taxon as a response, and the primary variable as a predictor) are listed. For a binary trait with covariates, the logistic regression (with the primary variable as a response, and each taxon as a predictor), and beta regression (with each taxon as a response, and the primary variable as a predictor) are listed. For a continuous trait with or without covariates, the beta regression (with each taxon as a response, and the primary variable as a predictor) is listed.

Longitudinal (Fig 1)

The widgets in the cross-sectional data analysis for microbial taxa (i.e., primary variable, rename categories/variable, covariate(s) and method) are retained in the longitudinal/family-based data analysis for microbial taxa, yet there are some additional widgets as follows. First, users need to choose a cluster variable that contains, for example, subject IDs for repeated measures or family IDs for family-based studies. Second, MiCloud lists different statistical methods as follows.

  1. CLR. For a binary trait with or without covariates, LMM [40] (with each taxon as a response, and the primary variable as a predictor), GLMM (Binomial) [41] (with the primary variable as a response, and each taxon as a predictor) and GEE (Binomial) [42] (with the primary variable as a response, and each taxon as a predictor) are listed. For a continuous trait with or without covariates, LMM [40] (with each taxon as a response, and the primary variable as a predictor) is listed.

  2. Count. For a binary trait with or without covariates, GLMM (Binomial) [41] (with the primary variable as a response, and each taxon as a predictor) and GEE (Binomial) [42] (with the primary variable as a response, and each taxon as a predictor) using rarefied count data, and GLMM (Negative Binomial) [41] (with each taxon as a response, and the primary variable as a predictor) using original count data with the library size (total read count) as an offset variable are listed. For a continuous trait with or without covariates, GLMM (Negative Binomial) [41] (with each taxon as a response, and the primary variable as a predictor) using original count data with the library size (total read count) as an offset variable is listed.

  3. Proportion. For a binary trait with or without covariates, GLMM (Binomial) [41] (with the primary variable as a response, and each taxon as a predictor), GEE (Binomial) [42] (with the primary variable as a response, and each taxon as a predictor) and GLMM (Beta) [41] (with each taxon as a response, and the primary variable as a predictor) are listed. For a continuous trait with or without covariates, the GLMM (Beta) [41] (with each taxon as a response, and the primary variable as a predictor) is listed.

We note that the use of the rarefied count data or the original count data with the library size (total read count) as an offset variable is to account for varying library sizes (total read counts) due to uneven sequencing depths across subjects when the count data form is employed. Users can perform taxonomical analyses from phylum to genus (e.g., for 16S data) or from phylum to species (e.g., for shotgun metagenomics). For the multiple testing adjustment, MiCloud applies the BH procedures [64] to control FDR per taxonomic level. MiCloud visualizes the results using box plots, forest plots, and dendrograms. Especially, the dendrogram presents the hierarchical discovery status using colors (red: positive association, blue: negative association, gray: non-significance). Again, the graphs and output tables can be downloaded, and the graphs are in high resolution and appropriate size to be published.

Web server and local implementation

We wrote MiCloud in R language using the R package, called Shiny (https://shiny.rstudio.com/), and deployed the web application using ShinyProxy (https://www.shinyproxy.io/). Currently, the web server has the specification of Intel Core i7 processor (8 cores, 2.90–4.80 GHz) and 36 GB DDR4 memory, and supports up to ten concurrent users. We are committed to monitoring the usage, performance and availability of the web server periodically to maintain it stable. However, in case that the web server is too busy, we created the GitHub repository that enables local implementation on the user’s computer, while the other existing web platforms can be implemented only on the web server (Table 1). The URLs are http://micloud.kr (web application) and https://github.com/wg99526/micloudgit (GitHub).

Results

Here, we illustrate the use of MiCloud through the reanalysis of the UK twin study data [2] on the association between gut microbiome and BMI adjusting for age. This example illustration is for a continuous trait of interest (BMI), family-based study design (twin study) and covariate-adjusted (age-adjusted) analysis, which cannot be handled by the other existing web platforms.

Goodrich et al. (2014) collected fecal samples from the UK twin population, and then profiled their microbiomes targeting the V4 region of the 16S rRNA gene [2]. The raw sequence data are publicly available in the European Bioinformatics Institute (EMBL-EBI) database (access number: ERP006339 and ERP006342) [2]. We processed the raw sequence data using QIIME [16], and acquired the feature table and taxonomic table using open-reference OTU picking with 97% sequence similarity, and the phylogenetic tree using FastTree [67]. The original microbiome data we used consist of 7349 OTUs, 17 phyla, 31 classes, 60 orders, 105 families, 232 genera and 173 species for 370 monozygotic twins. We stored them as example 16S data in the phyloseq format [43] on MiCloud. The rest of the data processing and analytic procedures are as follows.

We uploaded the data in the phyloseq format [43], and then performed data filtering and quality controls using default settings. Then, 1622 OTUs, 10 phyla, 18 classes, 25 orders, 41 families, 77 genera and 55 species for 370 monozygotic twins were retained. The library sizes across subjects and the mean proportions across OTUs are visualized in histograms and box plots (S1 and S2 Figs). There, we can observe varying library sizes (S1 Fig) and highly skewed mean proportions (S2 Fig).

We performed family-based data analyses for ecological measures (i.e., alpha and beta diversity indices) and microbial taxa from phyla to genera, while setting BMI as the primary variable, family ID as the cluster variable and age as a covariate. We fitted LMM [40] for alpha diversity analysis (Fig 2) and GLMM-MiRKAT [28, 65] for beta diversity analysis (Fig 3). Then, we found negative associations between BMI and seven alpha diversity indices (Observed, Shannon [51], Fisher [53], Chao1 [54], ACE [55], ICE [56] and PD [57]) at the significance level of 5%, yet the results for the Simpson [52] and Inverse Simpson [52] indices are not statistically significant (Fig 2). We also found significant associations between BMI and four beta diversity indices (i.e., Jaccard dissimilarity [58], Bray-Curtis dissimilarity [59], Unweighted UniFrac distance [60] and Generalized UniFrac distance [61]) at the significance level of 5%, yet the result for the Weighted UniFrac distance [62] is not statistically significant (Fig 3). aGLMM-MiRKAT, that is the significance test that combines all the results from the five beta diversity indices, shows a significant association between BMI and beta diversity (Fig 3). Lastly, for taxonomical analysis, we fitted LMM using CLR transformed data. Then, we found 1) positive associations between BMI and two phyla (Firmicutes and Actinobacteria), three classes (Bacilli, Clostridia and Actinobacteria), two orders (Lactobacillates and Actinomycetales), three families (Streptococcaeae, Lactobacillaceae and Actinomycetaceae) and three genera (Streptococcus, Lactobacillus and Acidaminococcus), and 2) negative association between BMI and one phylum (Tenericules), two classes (Mollicutes and RF3), two orders (ML615J-28 and RF 39) and two families (Christensenellaceae and S24-7) at the significance level of 5% after addressing the multiplicity issue using the BH procedures [64] (Figs 4 and 5).

Fig 2. The results for alpha diversity analysis.

Fig 2

Est represents the estimated coefficient, and SE represents the standard error.

Fig 3. The results for beta diversity analysis.

Fig 3

The two-dimensional PCoA plots visualize each beta diversity index stratified by two BMI categories (i.e., BMI < 25.57 and BMI ≥ 25.57, where 25.57 is the median BMI). *p represents the P-values estimated by GLMM-MiRKAT [28, 65]. Jacaard: Jaccard dissimilarity [58]. BC: Bray-Curtis dissimilarity [59]. U.UniFrac: Unweighted UniFrac distance [60]. G.UniFrac: Generalized UniFrac distance [61]. W.UniFrac: Weighted UniFrac disance [62].

Fig 4. The results for taxonomical analysis in forest plot.

Fig 4

Est represents the estimated coefficient, and Q-value represents the FDR-adjusted P-value.

Fig 5. The results for taxonomical analysis in dendrogram.

Fig 5

The numbers in circles are the IDs in the forest plot (Fig 4). Red: positive association. Blue: negative association. Gray: non-significance.

Discussion

In this paper, we introduced MiCloud for comprehensive microbiome data analysis on user-friendly web environments. MiCloud enables comparative/association analysis for a binary or continuous trait of interest, cross-sectional or longitudinal/family-based study design, and with or without covariate adjustment while other existing web platforms cannot handle a continuous trait, longitudinal/family-based study design and covariate-adjusted analysis. Especially, in the longitudinal/family-based microbiome data, the repeated measurements from the same subject or the subjects from the same family tend to be correlated with each other due to the shared genetic components and environmental factors. Hence, the statistical methods based on the independence assumption, used in other existing web platforms or used for cross-sectional studies in MiCloud, are not statistically valid, leading to inflated type I error rates. However, MiCloud employs, in addition, a series of statistical methods that are based on the random effects models [40] or GEE [42] (Table 1) for both ecological and taxonomical analyses; as such, users can easily handle correlated data from longitudinal/family-based microbiome studies on our user-friendly web environments. We demonstrated the use of MiCloud through the reanalysis of the UK twin study data [2] for a continuous trait of interest (i.e., BMI), family-based study design and covariate-adjusted (i.e., age-adjusted) analysis that cannot be handled by other existing web platforms.

We used R Shiny to develop MiCloud. Many of the current statistical methods and visualization approaches are written in R language, and they are freely available through R libraries; hence, we could easily transfer them to MiCloud. Galaxy [68] is another popular platform to develop web applications in computational biology, but many of the current Galaxy applications are written in different programming languages, and focus more on raw sequence data processing and genome/microbiome profiling, rather than downstream data analysis. It is beyond the scope of this research to compare R Shiny to Galaxy, but we would say that R Shiny can be better for downstream data analysis, while Galaxy can be better for upstream data processing.

We also elaborated in many other facilities, such as data inputs (individual and integrated data forms), quality controls (with respect to kingdom, library size, mean proportion and taxonomic name), data transformations (various alpha and beta diversity indices, and taxonomic abundance forms of count, rarefied count [36], proportion and CLR [37]), statistical methods (various methods for different study designs, data forms and analytic schemes), visualizations (various plots for data summary and ecological/taxonomical analyses) and implementations (on the web server or user’s computer). Hence, users in various disciplines, even non-professional programmers like clinicians and biologists, can flexibly perform microbiome data analysis. All the normalized data, output tables and graphs generated by MiCloud are downloadable and/or publishable; hence, it is straightforward to present or reanalyze the results.

However, we note here that in microbiome studies, researchers have performed more types of data analysis with different aims and data forms, such as prediction analysis, gene-level/functional analysis, multivariate analysis, survival analysis, time-series analysis, and so forth. MiCloud extends web-based microbiome data analytics to covariate-adjusted analysis and longitudinal/family-based data analysis, yet MiCloud does not handle upstream data processing (raw sequence data processing and microbiome profiling) and all possible types of downstream data analysis. Further extensions of MiCloud are therefore needed for more comprehensive microbiome data analysis.

Supporting information

S1 Fig

The histogram (A) and box plot (B) for library sizes across subjects after quality controls.

(TIF)

S2 Fig

The histogram (A) and box plot (B) for mean proportions across OTUs after quality controls.

(TIF)

Data Availability

The raw sequence data for the UK twin study are publicly available in the European Bioinformatics Institute (EMBL-EBI) database (access number: ERP006339 and ERP006342) (Goodrich et al., 2014). The feature table, taxonomic table, phylogenetic tree and metadata for the UK twin study are publicly available as example 16S data on MiCloud. The URLs for MiCloud are http://micloud.kr (web application) and https://github.com/wg99526/micloudgit (GitHub).

Funding Statement

HK was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2021R1C1C1013861) and by Incheon Technopark. TP was supported by the Bio & Medical Technology Development Program of the National Research Foundation of Korea (NRF) grant (2013M3A9C4078158).

References

  • 1.Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444(7122):1027–31. doi: 10.1038/nature05414 [DOI] [PubMed] [Google Scholar]
  • 2.Goodrich JK, Waters JL, Poole AC, Sutter JL, Koren O, Blekhman R, et al. Human genetics shape the gut microbiome. Cell. 2014;159(4):789–99. doi: 10.1016/j.cell.2014.09.053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zhang XS, Li J, Krautkramer KA, Badri M, Battaglia T, Borbet TC, et al. Antibiotic-induced acceleration of type 1 diabetes alters maturation of innate intestinal immunity. Elife. 2018;7:e37816. doi: 10.7554/eLife.37816 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sharma S, Tripathi P. Gut microbiome and type 2 diabetes: where we are and where to go?. J Nutr Biochem. 2019;63:101–8. doi: 10.1016/j.jnutbio.2018.10.003 [DOI] [PubMed] [Google Scholar]
  • 5.Glassner KL, Abraham BP, Quigley EM. The microbiome and inflammatory bowel disease. J Allergy Clin Immunol. 2020;145(1):16–27. doi: 10.1016/j.jaci.2019.11.003 [DOI] [PubMed] [Google Scholar]
  • 6.Frankel AE, Coughlin LA, Kim J, Froehlich TW, Xie Y, Frenkel EP, et al. Metagenomic shotgun sequencing and unbiased metabolomic profiling identify specific human gut microbiota and metabolites associated with immune checkpoint therapy efficacy in melanoma patients. Neoplasia. 2017;19(10):848–55. doi: 10.1016/j.neo.2017.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gopalakrishnan V, Spencer CN, Nezi L, Reuben A, Andrews MC, Karpinets TV, et al. Gut microbiome modulates response to anti-PD-1 immunotherapy in melanoma patients. Science. 2018;359(6371):97–103. doi: 10.1126/science.aan4236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Matson V, Fessler J, Bao R, Chongsuwat T, Zha Y, Alegre ML, et al. The commensal microbiome is associated with anti-PD-1 efficacy in metastatic melanoma patients. Science. 2018;359(6371):104–8. doi: 10.1126/science.aao3290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Peters BA, Wilson M, Moran U, Pavlick A, Izsak A, Wechter T, et al. Relating the gut metagenome and metatranscriptome to immunotherapy responses in melanoma patients. Genome Med. 2019;11(1):61. doi: 10.1186/s13073-019-0672-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Limeta A, Ji B, Levin M, Gatto F, Nielsen J. Meta-analysis of the gut microbiota in predicting response to cancer immunotherapy in metastatic melanoma. JCI Insight. 2020;5(23):e140940. doi: 10.1172/jci.insight.140940 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cullin N, Azevedo Antunes C, Straussman R, Stein-Thoeringer CK, Elinav E. Microbiome and cancer. Cancer Cell. 2021;39(10):1317–41. doi: 10.1016/j.ccell.2021.08.006 [DOI] [PubMed] [Google Scholar]
  • 12.Singh RK, Chang HW, Yan D, Lee KM, Ucmak D, Wong K, et al. Influence of diet on the gut microbiome and implications for human health. J Transl Med. 2017;15(1):73. doi: 10.1186/s12967-017-1175-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liu M, Koh H, Kurtz ZD, Battaglia T, PeBenito A, Li H, et al. Oxalobacter formigenes-associated host features and microbial community structures examined using the American Gut Project. Microbiome. 2017;5(1):108. doi: 10.1186/s40168-017-0316-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gui X, Yang Z, Li MD. Effect of cigarette smoke on gut microbiota: state of knowledge. Front Physiol. 2021;12:816. doi: 10.3389/fphys.2021.673341 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Vich Vila A, Collij V, Sanna S, Sinha T, Imhann F, Bourgonje AR, et al. Impact of commonly used drugs on the composition and metabolic function of the gut microbiota. Nat Commun. 2020;11(1):1–11. doi: 10.1038/s41467-019-14177-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–6. doi: 10.1038/nmeth.f.303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, et al. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9(1):1–8. doi: 10.1186/1471-2105-9-386 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41. doi: 10.1128/AEM.01541-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res. 2007;17(3):377–86. doi: 10.1101/gr.5969107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods. 2012;9(8):811–4. doi: 10.1038/nmeth.2066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hamady M, Knight R. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res. 2009;19(7):1141–52. doi: 10.1101/gr.085464.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Thomas T, Gilbert J, Meyer F. Metagenomics-a guide from sampling to data analysis. Microb Inform Exp. 2012;2(1):1–12. doi: 10.1186/2042-5783-2-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jovel J, Patterson J, Wang W, Hotte N, O’Keefe S, Mitchel T, et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front Microbiol. 2016;7:459. doi: 10.3389/fmicb.2016.00459 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Anderson MJ. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 2001;26(1):32–46. [Google Scholar]
  • 25.McArdle BH, Anderson MJ. Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology. 2001;82(1):290–7. [Google Scholar]
  • 26.Tang Z-Z, Chen G, Alekseyenko AV. PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics. 2016;32(17):2618–25. doi: 10.1093/bioinformatics/btw311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhao N, Chen J, Carroll IM, Ringel-Kulka T, Epstein MP, Zhou H, et al. Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test. Am J Hum Genet. 2015;96(5):797–807. doi: 10.1016/j.ajhg.2015.04.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wilson N, Zhao N, Zhan X, Koh H, Fu W, Chen J, et al. MiRKAT: kernel machine regression-based global association tests for the microbiome. Bioinformatics. 2021;37(11):1595–7. doi: 10.1093/bioinformatics/btaa951 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Koh H. An adaptive microbiome α-diversity-based association analysis method. Sci Rep. 2018;8(1):1–12. doi: 10.1038/s41598-018-36355-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2. doi: 10.1038/nmeth.2658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015;26:27663. doi: 10.3402/mehd.v26.27663 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Parks DH, Tyson GW, Hugenholtz P, Beiko RG. STAMP: statistical analysis of taxonomic and functional profiles. Bioinformatics. 2014;30(21):3123–4. doi: 10.1093/bioinformatics/btu494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dhariwal A, Chong J, Habib S, King IL, Agellon LB, Xia J. MicrobiomeAnalyst: a web-based tool for comprehensive statistical, visual and meta-analysis of microbiome data. Nucleic Acids Res. 2017;45(W1):W180–W188. doi: 10.1093/nar/gkx295 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Arndt D, Xia J, Liu Y, Zhou Y, Guo AC, Cruz JA, et al. METAGENassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 2012;40(Web Server issue):W88–W95. doi: 10.1093/nar/gks497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yoon SH, Ha SM, Kwon S, Lim J, Kim Y, Seo H, et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol. 2017;67(5):1613–1617. doi: 10.1099/ijsem.0.001755 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sanders HL. Marine benthic diversity: a comparative study. Am Nat. 1968;102(925):243–82. [Google Scholar]
  • 37.Aitchison J. The statistical analysis of compositional data. J R Stat Soc Series B Stat Methodol. 1982;44(2):139–60. [Google Scholar]
  • 38.Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics. 1947;18(1):50–60. [Google Scholar]
  • 39.Torgerson WS. Multidimensional scaling: I. Theory and method. Psychometrika. 1952;17(4):401–19. [DOI] [PubMed] [Google Scholar]
  • 40.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38(4):963–74. [PubMed] [Google Scholar]
  • 41.Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88(421):9–25. [Google Scholar]
  • 42.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
  • 43.McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013;8(4):e61217. doi: 10.1371/journal.pone.0061217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.McDonald D, Clemente JC, Kuczynski J, Rideout JR, Stombaugh J, Wendel D, et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience. 2012;1(1):7. doi: 10.1186/2047-217X-1-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–3. doi: 10.1093/bioinformatics/btq706 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Weber N, Liou D, Dommer J, MacMenamin M, Quiñones M, Misner I, et al. Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis. Bioinformatics, 2017;34(8):1411–1413. doi: 10.1093/bioinformatics/btx617 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Gonzalez A, Navas-Molina JA, Kosciolek T, McDonald D, Vázquez-Baeza Y, Ackermann G, et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods, 2018;15:796–798. doi: 10.1038/s41592-018-0141-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Mitchell K, Ronas J, Dao C, Freise AC, Mangul S, Shapiro C, et al. PUMAA: a platform for accessible microbiome analysis in the undergraduate classroom. Front Microbiol. 2020;11(584699). doi: 10.3389/fmicb.2020.584699 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME2. Nat Biotechnol. 2019;37(8):852–857. doi: 10.1038/s41587-019-0209-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Baldwin BG, Sanderson MJ, Porter JM, Wojciechowski MF, Campbell CS, Donoghue MJ. The ITS region of nuclear ribosomal DNA: a valuable source of evidence on angiosperm phylogeny. Ann Mo Bot Gard. 1995;82(2):247. [Google Scholar]
  • 51.Shannon CE. A mathematical theory of communication. The Bell System Technical Journal. 1948;27(3):379–423. [Google Scholar]
  • 52.Simpson EH. Measurement of diversity. Nature. 1949;163(4148):688. [Google Scholar]
  • 53.Fisher RA, Corbet AS, Williams CB. The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol. 1943;12(1):42–58. [Google Scholar]
  • 54.Chao A. Non-parametric estimation of the number of classes in a population. Scandinavian Journal of statistics. 1984;11:265–70. [Google Scholar]
  • 55.Chao A, Lee SM. Estimating the number of classes via sample coverage. J Am Stat Assoc. 1992;87(417):210–7. [Google Scholar]
  • 56.Lee SM, Chao A. Estimating Population Size Via Sample Coverage for Closed Capture-Recapture Models. Biometrics. 1994;50(1):88–97. [PubMed] [Google Scholar]
  • 57.Faith DP. Conservation evaluation and phylogenetic diversity. Biol Conserv. 1992;61(1):1–10. [Google Scholar]
  • 58.Jaccard P. The distribution of the flora in the alpine zone. New Phytol. 1912;11(2):37–50. [Google Scholar]
  • 59.Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr. 1957;27(4):325–49. [Google Scholar]
  • 60.Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005;71(12):8228–35. doi: 10.1128/AEM.71.12.8228-8235.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics. 2012;28(16):2106–13. doi: 10.1093/bioinformatics/bts342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol. 2007;73(5):1576–85. doi: 10.1128/AEM.01996-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017;5(1):1–18. doi: 10.1186/s40168-017-0237-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57(1):289–300. [Google Scholar]
  • 65.Koh H, Li Y, Zhan X, Chen J, Zhao N. A distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies. Front Genet. 2019;10:458. doi: 10.3389/fgene.2019.00458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling. 2015;15(2):134–58. [Google Scholar]
  • 67.Price MN, Dehal PS, Arkin AP. FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol. 2009;26(7):1641–50. doi: 10.1093/molbev/msp077 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Afgan E, Baker D, Beek MVD, Blankenberg D, Bouvier D, Čech M, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016. update. Nucleic Acids Res. 44(W1): W3–W10. doi: 10.1093/nar/gkw343 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Jean-François Humbert

18 May 2022

PONE-D-22-09539MiCloud: A unified web platform for comprehensive microbiome data analysisPLOS ONE

Dear Dr. Koh,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised by the reviewer and by myself.

My first comment concerns the validation of the findings provided by MiCloud. You have used a dataset of a twin study to show what your pipeline is able to do and you presented the obtained data. But, there is no validation of this data for example by comparing these findings with those obtained on the same dataset by using another analysis tool.

My second comments concern the discussion, which is very short. Knowing that MiCloud seems to perform a restricted number (but very useful) analyses that cannot be handle by other existing web platform, I ask me for example why you have chosen to develop a new platform and not to improve one of these existing platform? More globally, I ask you to discuss more in deep, the advantages and the limits of your tool.

Please submit your revised manuscript by Jul 02 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jean-François Humbert

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper describes MiCloud, an online computer application that aims to expand beyond the functionality of existing web platforms that focus on categorical traits, cross-sectional study design, and no covariate adjustment. The additional functionality provided by the web-service is useful for the community to fill in these gaps. The system enables users to perform processing and statistical analyses on microbiome data, especially providing ways to analyze longitudinal data, continuous and data with covariates. The authors implemented the statistical analyses tools using R Shiny, making it easy for users to run on a server or local compute.

Overall, the paper is sound and seems to address a need for more advanced statistical analysis (at least for cross-sectional and longitudinal). It mentions in discussion several other areas (e.g. survival, etc.) that it can’t currently handle but would like to address in future.

Comments

(1) The GitHub Repo https://github.com/wg99526/micloud doesn’t appear to be available but the web server hosting MiCloud was available http://223.194.200.160:3838/ .

The closest I found of the Repo was https://github.com/wg99526/MiCloudGit The final repo should have information/instruction in their README on how to run this in local computer.

This sentence makese me wonder how supported the system is and whether I can rely on it: "In case that the web server is too busy, we created the GitHub repository that enables local implementation on the user’s computer, while the other existing web platforms can be implemented only on the web server (Table 1)."

(2) The authors describe the data input and data processing methods in good detail. They then proceed to describe the statistical methods available for ecological analyses and finally discuss the available tools for taxonomical analysis. The last sentence before Taxonomical Analysis, says “The results are visualized using principal coordinate analysis (PCoA) plots (Torgerson, 1952). Again, the graphs and output tables can be downloaded, and the graphs are in high resolution and appropriate size to be published.” This is not entirely accurate because in the tool, while a table of beta diversity results could be exported, the images were not available for download directly from the tool.

(3) Authors listed and referenced the statistical methods used for analysis as well as appropriate data transformation methods applicable to specific tools. It was nice to see references in the actual R Shiny as well.

(4) In this sentence: “All the normalized data, output tables and graphs generated by MiCloud are adjustable, downloadable and/or publishable; hence, it is straightforward to present or reanalyze the results.” – the use of “adjustable” doesn’t seem always the case as only the quality control plots are interactive with the use of plotly.

(5) While the choice of methods for the cross-sectional and longitudinal methods appears to be sound, the rationale for the choice is not elaborated in the methods and could benefit from some additional references and description of why the methodology is most suitable. Moreover, the discussion section could include a comparison of the selected methods and the benefits/disadvantages to other alternatives for these use cases such as cross-sectional versus longitudinal modeling.

(6) The manuscript introduction could benefit from a more thorough referencing of available web-based services relating to microbiome analysis as a few are excluded. For example, in the last sentence of “Data Processing: Data Input” consider citing Nephele as a cloud plaform for raw sequence data processing and microbiome profiling.

Weber N., et al. (2018) Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis. Bioinformatics, 34(8): 1411–1413. https://doi.org/10.1093/bioinformatics/btx617

See also: Gonzalez et al. [Qiita], Mitchell et al. [PUMAA]).

In fact, it would be helpful for MiCloud’s users if they add “external resource” information on their site for those who wonder how they prepare the input files for MiCloud, e.g. "For the raw sequence data processing and microbiome profiling, we recommend other popular and well-established bioinformatic pipelines, such as online platform Nephele, Galaxy, QIIME2, etc..."

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 1;17(8):e0272354. doi: 10.1371/journal.pone.0272354.r002

Author response to Decision Letter 0


27 Jun 2022

Response Letter

Reviewer’s Comments

The paper describes MiCloud, an online computer application that aims to expand beyond the functionality of existing web platforms that focus on categorical traits, cross-sectional study design, and no covariate adjustment. The additional functionality provided by the web-service is useful for the community to fill in these gaps. The system enables users to perform processing and statistical analyses on microbiome data, especially providing ways to analyze longitudinal data, continuous and data with covariates. The authors implemented the statistical analyses tools using R Shiny, making it easy for users to run on a server or local compute. Overall, the paper is sound and seems to address a need for more advanced statistical analysis (at least for cross-sectional and longitudinal). It mentions in discussion several other areas (e.g. survival, etc.) that it can’t currently handle but would like to address in future.

(Response) Thank you very much for your careful observations and insightful comments. They have made much improvement to the manuscript and web platform. We have responded your comments below, and updated the app and manuscript, accordingly.

Comments:

1. The GitHub Repo https://github.com/wg99526/micloud doesn’t appear to be available but the web server hosting MiCloud was available http://223.194.200.160:3838/. The closest I found of the Repo was https://github.com/wg99526/micloudgit. The final repo should have information/instruction in their README on how to run this in local computer. This sentence makes me wonder how supported the system is and whether I can rely on it: "In case that the web server is too busy, we created the GitHub repository that enables local implementation on the user’s computer, whilethe other existing web platforms can be implemented only on the web server (Table 1)."

(Response) We also found that the URL for the local implementation (GitHub) was wrong, and the information/instruction was confusing. We apologize for the inconvenience caused. We have first corrected the URL for the GitHub repository to https://github.com/wg99526/micloudgit in the revised manuscript, and updated the online manual (readme) including 1) description, 2) URLs, 3) references, 4) prerequites, 5) launch app, 6) troubleshooting tips, and 7) external resources. Users should be able to run it by simply following the step-by-step instructions. We also described current specification and capacity of the web server in the revised manuscript as follows. “Currently, the web server has the specification of Intel Core i7 processor (8 cores, 2.90-4.80 GHz) and 36 GB DDR4 memory, and supports up to ten concurrent users. We are committed to monitoring the usage, performance and availability of the web server periodically to maintain it stable.” In addition, we bought a domain name, http://micloud.kr, for the web application URL.

2. The authors describe the data input and data processing methods in good detail. They then proceed to describe the statistical methods available for ecological analyses and finally discuss the available tools for taxonomical analysis. The last sentence before Taxonomical Analysis, says “The results are visualized using principal coordinate analysis (PCoA) plots (Torgerson, 1952). Again, the graphs and output tables can be downloaded, and the graphs are in high resolution and appropriate size to be published.” This is not entirely accurate because in the tool, while a table of beta diversity results could be exported, the images were not available for download directly from the tool.

(Response) The PCoA plot can be downloaded by clicking the right mouse button on the plot and then through “Save Image As”. We revised the manuscript as follows.

“The results are visualized using PCoA plots (Torgerson, 1952). Again, the graphs (by clicking the right mouse button on the plot then through “Save Image as”) and output tables can be downloaded, and the graphs are in high resolution and appropriate size to be published.”

3. Authors listed and referenced the statistical methods used for analysis as well as appropriate data transformation methods applicable to specific tools. It was nice to see references in the actual R Shiny as well.

(Response) Thank you very much for your positive comments.

4. In this sentence: “All the normalized data, output tables and graphs generated by MiCloud are adjustable, downloadable and/or publishable; hence, it is straightforward to present or reanalyze the results.” - the use of “adjustable” doesn’t seem always the case as only the quality control plots are interactive with the use of plotly.

(Response) We deleted “adjustable” in the manuscript as follows.

“All the normalized data, output tables and graphs generated by MiCloud are downloadable and/or publishable; hence, it is straightforward to present or reanalyze the results.”

5. While the choice of methods for the cross-sectional and longitudinal methods appears to be sound, the rationale for the choice is not elaborated in the methods and could benefit from some additional references and description of why the methodology is most suitable. Moreover, the discussion section could include a comparison of the selected methods and the benefits/disadvantages to other alternatives for these use cases such as cross-sectional versus longitudinal modeling.

(Response) We do not propose any new statistical methods in this paper. MiCloud is a web platform that makes user-friendly implementation of existing methods that have been only available through command line interfaces. Thus, we referenced the original articles for more details. However, we described the rationale for the selected statistical methods in the Materials and Methods as follows.

“There are many other statistical methods that can be considered for microbiome downstream data analysis, but the rationale for the selected statistical methods (Fig 1) is in their popularity, statistical validity and easy interpretation/presentation of the results as follows.

First, for the cross-sectional studies, statistical methods based on the independence assumption have been widely used. The Welch t-test and Wilcoxon rank-sum test (Mann and Whitney, 1947) can be used for non-covariate adjusted comparative analysis with a nice graphical presentation using box plots and summary statistics such as, mean, minimum, Q1, median, Q3 and maximum values. The linear regression, logistic regression, negative binomial regression, and beta regression models can be used for the continuous, binary, count and proportional response variables, respectively, with or without covariate adjustment, where the estimated regression coefficients, standard errors, confidence intervals, and P-values serve as a breadth of statistical inference facilities for the effect direction and size, variability and significance. The forest plot, line graph, and/or dendrogram can also efficiently summarize the results. Lastly, microbiome regression-based kernel association test (MiRKAT) (Zhao et al., 2015, Wilson et al., 2021) has recently been highlighted for the beta-diversity analysis with or without covariate adjustment, where the principal coordinate analysis (PCoA) plot (Torgerson, 1952) can nicely summarize the results.

Second, in the longitudinal/family-based microbiome data, the repeated measurements from the same subject or the subjects from the same family tend to be correlated with each other because of the shared genetic components and environmental factors (e.g., diet, residence, etc). Hence, the statistical methods based on the independence assumption described above are not statistically valid, leading to inflated type I error rates, for longitudinal/family-based studies. Hence, we selected a series of statistical methods that are based on the random effects models (i.e., the linear mixed model (LMM) (Laird and Ware, 1982) and generalized linear mixed model (GLMM) (Breslow and Clayton, 1993)) or generalized estimating equations (GEE) (Liang and Zeger, 1986) for both ecological and taxonomical analyses because of their well-known statistical validity (i.e., robust controls of type I error rate) for correlated data analysis. The results can also be presented using a breadth of statistical inference facilities, summary statistics and visualizations.

More details on each sub-component are addressed in following sections.”

We also included some description on cross-sectional vs. longitudinal methods in the Discussion as follows.

“Especially, in the longitudinal/family-based microbiome data, the repeated measurements from the same subject or the subjects from the same family tend to be correlated with each other due to the shared genetic components and environmental factors. Hence, the statistical methods based on the independence assumption, used in other existing web platforms or used for cross-sectional studies in MiCloud, are not statistically valid, leading to inflated type I error rates. However, MiCloud employs, in addition, a series of statistical methods that are based on the random effects models (Laird and Ware, 1982) or GEE (Liang and Zeger, 1986) (Table 1) for both ecological and taxonomical analyses; as such, users can easily handle correlated data from longitudinal/family-based microbiome studies on our user-friendly web environments.”

6. The manuscript introduction could benefit from a more thorough referencing of available web-based services relating to microbiome analysis as a few are excluded. For example, in the last sentence of “Data Processing: Data Input” consider citing Nephele as a cloud plaform for raw sequence data processing and microbiome profiling. Weber N., et al. (2018) Nephele: a cloud platform for simplified, standardized and reproducible microbiome data analysis. Bioinformatics, 34(8): 1411–1413.

https://doi.org/10.1093/bioinformatics/btx617 See also: Gonzalez et al. [Qiita], Mitchell et al. [PUMAA]).

In fact, it would be helpful for MiCloud’s users if they add “external resource” information on their site for those who wonder how they prepare the input files for MiCloud, e.g. "For the raw sequence data processing and microbiome profiling, we recommend other popular and well-established bioinformatic pipelines, such as online platform Nephele, Galaxy, QIIME2, etc..."

(Response) MiCloud handles downstream data analysis. Thus, we compared MiCloud with MicrobiomeAnalyst (Dhariwal et al., 2017), METAGENassist (Arndt et al., 2012) and EzBioCloud (Yoon et al., 2017) that intensely handle downstream data analysis rather than raw sequence data processing and microbiome profiling. We also surveyed Nephele, Qiita and PUMAA, and found that they are mostly for raw sequence data processing and microbiome profiling (they conduct only some exploratory data analysis in different contexts). Thus, it was difficult to directly compare them with MiCloud.

We revised the Introduction section to better clarify it as follows.

“However, existing web platforms for downstream microbiome data analysis, including MicrobiomeAnalyst (Dhariwal et al., 2017), METAGENassist (Arndt et al., 2012) and EzBioCloud (Yoon et al., 2017), can handle only a categorical trait of interest (e.g., diseased vs. healthy, treatment vs. placebo), cross-sectional study, and the analysis with no covariate adjustment.”

“In Materials and Methods, all the details on the machinery of MiCloud are dissected, compared with the other existing web platforms, MicrobiomeAnalyst (Dhariwal et al., 2017), METAGENassist (Arndt et al., 2012) and EzBioCloud (Yoon et al., 2017), that intensely handle downstream data analysis rather than raw sequence data processing and microbiome profiling.”

We also revised the Data Processing: Data Input section as follows.

“Nephele (Weber et al., 2018), Qiita (Gonzalez et al., 2018) and PUMAA (Mitchell et al., 2020) also take raw sequence data as inputs and perform comprehensive microbiome profiling for the 16S rRNA amplicon sequencing and/or shotgun metagenomics, yet they conduct only some exploratory downstream data analysis. For the raw sequence data processing and microbiome profiling, we recommend other popular and well-established bioinformatic pipelines, such as Nephele (Weber et al., 2018), QIIME2 (q2studio) (Bolyen et al., 2019), Qiita (Gonzalez et al., 2018) and PUMAA (Mitchell et al., 2020) for web platforms, or QIIME (Caporaso et al., 2010), QIIME2 (q2cli) (Bolyen et al., 2019), MG-RAST (Meyer et al., 2008), Mothur (Schloss et al., 2009), MEGAN (Huson et al., 2007) and MetaPhlAn (Segata et al., 2012) for command line interfaces.”

We also added the “external resources” on the app as follows.

“External resources: MiCloud does not take raw sequence data. For the raw sequence data processing and microbiome profiling, we recommend other popular and well-established bioinformatic pipelines, such as

Nephele (https://nephele.niaid.nih.gov), Qiita (https://qiita.ucsd.edu), QIIME2 (q2studio) (https://qiime2.org) and PUMAA (https://sites.google.com/g.ucla.edu/pumaa) for web platforms, or

QIIME (http://qiime.org), QIIME2 (q2cli) (https://qiime2.org), MG-RAST (https://www.mg-rast.org), Mothur (https://mothur.org), MEGAN (http://ab.inf.uni-tuebingen.de/software/megan6) and MetaPhlAn (https://huttenhower.sph.harvard.edu/metaphlan) for command line interfaces.”

Editor's Comments:

My first comment concerns the validation of the findings provided by MiCloud. You have used a dataset of a twin study to show what your pipeline is able to do and you presented the obtained data. But, there is no validation of this data for example by comparing these findings with those obtained on the same dataset by using another analysis tool.

(Response) We do not propose any new statistical methods in this paper. MiCloud is a web platform that makes user-friendly implementation of existing methods that have been only available through command line interfaces. Of course, different methods provide different results, but they do not rigorously tell which method is valid or not. We referenced the original articles for more details (e.g., theories, methodologies and validity issues). However, as you and the reviewer suggested, we described the rationale for the selected statistical methods in the Materials and Methods as follows.

“There are many other statistical methods that can be considered for microbiome downstream data analysis, but the rationale for the selected statistical methods (Fig 1) is in their popularity, statistical validity and easy interpretation/presentation of the results as follows.

First, for the cross-sectional studies, statistical methods based on the independence assumption have been widely used. The Welch t-test and Wilcoxon rank-sum test (Mann and Whitney, 1947) can be used for non-covariate adjusted comparative analysis with a nice graphical presentation using box plots and summary statistics such as, mean, minimum, Q1, median, Q3 and maximum values. The linear regression, logistic regression, negative binomial regression, and beta regression models can be used for the continuous, binary, count and proportional response variables, respectively, with or without covariate adjustment, where the estimated regression coefficients, standard errors, confidence intervals, and P-values serve as a breadth of statistical inference facilities for the effect direction and size, variability and significance. The forest plot, line graph, and/or dendrogram can also efficiently summarize the results. Lastly, microbiome regression-based kernel association test (MiRKAT) (Zhao et al., 2015, Wilson et al., 2021) has recently been highlighted for the beta-diversity analysis with or without covariate adjustment, where the principal coordinate analysis (PCoA) plot (Torgerson, 1952) can nicely summarize the results.

Second, in the longitudinal/family-based microbiome data, the repeated measurements from the same subject or the subjects from the same family tend to be correlated with each other because of the shared genetic components and environmental factors (e.g., diet, residence, etc). Hence, the statistical methods based on the independence assumption described above are not statistically valid, leading to inflated type I error rates, for longitudinal/family-based studies. Hence, we selected a series of statistical methods that are based on the random effects models (i.e., the linear mixed model (LMM) (Laird and Ware, 1982) and generalized linear mixed model (GLMM) (Breslow and Clayton, 1993)) or generalized estimating equations (GEE) (Liang and Zeger, 1986) for both ecological and taxonomical analyses because of their well-known statistical validity (i.e., robust controls of type I error rate) for correlated data analysis. The results can also be presented using a breadth of statistical inference facilities, summary statistics and visualizations.

More details on each sub-component are addressed in following sections.”

My second comment concerns the discussion, which is very short. Knowing that MiCloud seems to perform a restricted number (but very useful) analyses that cannot be handled by other existing web platform, I ask me for example why you have chosen to develop a new platform and not to improve one of these existing platform? More globally, I ask you to discuss more in deep, the advantages and the limits of your tool.

(Response) MiCloud handles more types of microbiome data analysis (e.g., covariate-adjusted analysis, longitudinal/family-based data analysis) that other existing web platforms, yet MiCloud cannot handle all possible types of microbiome data analysis. We revised the Discussion as follows.

“MiCloud extends web-based microbiome data analytics to covariate-adjusted analysis and longitudinal/family-based data analysis, yet MiCloud does not handle upstream data processing (raw sequence data processing and microbiome profiling) and all possible types of downstream data analysis.”

We also included some description on cross-sectional vs. longitudinal methods in Discussion as follows.

“Especially, in the longitudinal/family-based microbiome data, the repeated measurements from the same subject or the subjects from the same family tend to be correlated with each other due to the shared genetic components and environmental factors. Hence, the statistical methods based on the independence assumption, used in other existing web platforms or used for cross-sectional studies in MiCloud, are not statistically valid, leading to inflated type I error rates, for longitudinal/family-based studies. However, MiCloud additionally employs a series of statistical methods that are based on the random effects models (Laird and Ware, 1982) and GEE (Liang and Zeger, 1986) (Table 1) for both ecological and taxonomical analyses; as such, users can easily handle correlated data from longitudinal/family-based microbiome studies on our user-friendly web environments.”

We do not have access to other existing web platforms and thus cannot directly improve them. We used R Shiny to develop MiCloud because many of the current statistical methods and visualization approaches are freely available through R libraries. We included some description in Discussion as follows.

“We used R Shiny to develop MiCloud. Many of the current statistical methods and visualization approaches are written in R language, and they are freely available through R libraries; hence, we could easily transfer them to MiCloud. Galaxy (Afgan et al., 2016) is another popular platform to develop web applications in computational biology, but many of the current Galaxy applications are written in different programming languages, and focus more on raw sequence data processing and genome/microbiome profiling, rather than downstream data analysis. It is beyond the scope of this research to compare R Shiny to Galaxy, but we would say that R Shiny can be better for downstream data analysis, while Galaxy can be better for upstream data processing.“

Attachment

Submitted filename: Response letter.docx

Decision Letter 1

Jean-François Humbert

19 Jul 2022

MiCloud: A unified web platform for comprehensive microbiome data analysis

PONE-D-22-09539R1

Dear Dr. Koh,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jean-François Humbert

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Jean-François Humbert

22 Jul 2022

PONE-D-22-09539R1

MiCloud: A unified web platform for comprehensive microbiome data analysis

Dear Dr. Koh:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jean-François Humbert

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    The histogram (A) and box plot (B) for library sizes across subjects after quality controls.

    (TIF)

    S2 Fig

    The histogram (A) and box plot (B) for mean proportions across OTUs after quality controls.

    (TIF)

    Attachment

    Submitted filename: Response letter.docx

    Data Availability Statement

    The raw sequence data for the UK twin study are publicly available in the European Bioinformatics Institute (EMBL-EBI) database (access number: ERP006339 and ERP006342) (Goodrich et al., 2014). The feature table, taxonomic table, phylogenetic tree and metadata for the UK twin study are publicly available as example 16S data on MiCloud. The URLs for MiCloud are http://micloud.kr (web application) and https://github.com/wg99526/micloudgit (GitHub).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES