Figure 2.
Schematic overview of GPSEA workflow
(A) Overview. GPSEA is a Python package designed to work well in Jupyter notebooks. GPSEA takes a collection of GA4GH phenopackets as input, performs quality assessment, and visualizes the salient characteristics of the cohort; genotype classes are defined (Figure 3); and one of four classes of statistical test is performed for each hypothesis the user decides to test.
(B) Visualizing data and formulating hypotheses. GPSEA displays tables with the distribution of phenotypic abnormalities, disease diagnoses, variants, and other information, and presents a cartoon with the distribution of variants across the protein. This information intends to help users formulate hypotheses about genotype-phenotype correlations.
(C) Statistical testing. GPSEA offers four main ways of testing phenotypes (see text for details and Figure 5 for examples).
