Abstract
The complex methodology used in many large-scale quantitative proteomics experiments often dictates an experimental design with small sample size and limited repetition. Variation in a complex dataset may hopefully arise from changes caused by the experimental perturbation, but could also arise due to technical noise (poor sample prep, run-to-run variation) and biological noise (normal differences between samples, especially present in clinical samples). Here, we apply principle component analysis (PCA) and unsupervised hierarchical clustering (HC) to the data generated from multi-variable difference gel electrophoresis (DIGE) experiments. This global perspective on multivariable datasets assesses whether the variation in the system describes the biological signal, rather than being derived from technical/biological noise whereby “significant” changes may arise stochastically. Although we use DIGE datasets as examples (due to the low technical noise), these issues are germane to all proteomics experimental platforms.
Examples will be shown from experiments containing samples from microorganisms, tissue culture, and clinical samples, where these tools were instrumental in demonstrating sample outliers, fouled samples, as well as variation in sample preparation that overrides the variation from biological treatment (despite standard biological tests for sample validity). Experiments contained high-resolution datasets from multiple variables, with hundreds to thousands of protein forms monitored within each sample. Even with simple, two-condition experiments (e.g., WT vs. KO), these tests provide essential quality assurance and quality control.
One set of experiments focuses on familial pulmonary arterial hypertension associated with a mutation in the bone morphogenetic protein receptor 2. Affected individuals carrying the mutation, familial obligates (carrying the mutation but asymptomatic), and a control set (married into the family) are all inter-compared. PCA and HC were used to assess the efficacy of drug treatment and the effect of inter-personal variation among the normal samples.
