Figure 3. Expression-based identification of pancreatic regulators.
(A) Schematic of approach used to identify regulators of pancreas development, their targets, and their predicted biological functions using the module network algorithm of Genomica. To identify regulators two lists are loaded into the program: 1) a list of potential regulators and 2) normalized expression values of samples. Genes with similar expression patterns are grouped (termed a module). Regulators that are most predictive of a specific module expression pattern are learned. Output information includes a list of regulators and their potential targets. Functional enrichment analysis is used to predict the biological function of each regulator (see Methods for details). An example of module-network analysis nominating Neurog3 as a candidate regulator of endocrine development is shown along with its potential targets. (B) Optimal number of modules and iterations were determined by calculating the percentage of known regulators of pancreas development for each module and iteration combination. (C) Gene set enrichment analysis (GSEA) for 100 iterations of 75 modules yielded an enrichment score greater than >0.5 when known regulators were used. Distribution of known regulators based on their rank is displayed on the top panel. (D) Ranking of candidate regulators based on their frequency. Among the most reproducible candidates included known pancreas regulators such as Pdx1 and Neurog3 (red font) and candidate regulators validated in subsequent analysis (red font). (E) GSEA plot for the distribution of diabetes risk factors among list of predicted regulators. (F) Ranking of diabetes risk factors based on their frequency score. Validated GWAS genes include Bcl11a (red). (D and F) A frequency of 1.0 means that the candidate regulator appeared in 100% of the iterations performed.