Setup and reqired data files
For this tutorial we will be processing single cell proteomics data prepared by nPOP. We will showcase data prepared for Prioritized or data dependant data acquisition utilizing TMT labels and plexDIA utilizing mTRAQ labels with a carrier or without a carrier. For brevity, the former will be reffered to as DDA and the later DIA throught the document.
Files needed are search results from MaxQuant or DIANN, a meta data file that links the MS raw files to the wells of the 384 well plate, and the _isolated files from the cellenONE that provide key metadata about the single cells.
However we also support other sample preps that require a more comprehensive meta data file. Example can be found here.
The QuantQC package can be installed from github: https://github.com/Andrew-Leduc/QuantQC
# devtools::install_github("https://github.com/Andrew-Leduc/QuantQC")
library(QuantQC)
# Can make this an individual file or a path it reads all files inside
data_path <- '/Users/andrewleduc/Library/CloudStorage/GoogleDrive-research@slavovlab.net/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/aleduc/TMT29/Protocol_final_data/All_1p_FDR/evidence.txt'
# check out the linker to see what it should look like, super simple, column names need to be same
link_path <- '/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/linker.csv'
## Read in cell isolation files from CellenONE
Mel <-"/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/Melanoma_isolated.xls"
Mon <- "/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/Monocyte_isolated.xls"
PDAC_f <- "/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/PDAC_isolated.xls"
Cell isolation files will be compiled into a list with the name specifying the sample.
Quick report generation
QuantQC allows you to quickly generate QC reports for DIA or DDA experiments. Some aspects will vary as some analysis are more applicable for DIA or DDA style experiments. Work flows are mostly the same by deviations will be explained as we go through functions step by step.
Example reports can be found here: DDA DIA
output_path <- "/Users/andrewleduc/Desktop/round2_Data/DDA.html"
# DDA instant report generation
Gen_QQC_report_DDA(data_path = data_path,
linker_path = link_path,
isolation = all_cells,
output_path = output_path,
plex_exp = 29,
CV_thresh = .4)
# DIA instant report generation
Gen_QQC_report_DIA(data_path = data_path,
linker_path = link_path,
isolation = all_cells,
prot_vis_umap = prot_vis_umap,
plex_exp = 3,
carrier_used = F,
ChQ = .3,
output_path = output_path)
Creating QQC object
The first step of the workflow is to read in the raw data and generate our QQC object.
DDA and DIA params:
data_path: path to maxquant or DIANN raw data, specific file or folder containing raw files
link_path: path to file linking ms runs to well of 384 plate from injection
plex: amount of multiplexing
DIA params:
- carrier: use of carrier channel
DDA params:
PEP: remove precursors with a low confidence
PIF: remove precursors with high coisolaiton
Create peptide X single cell table
Next we transform the raw data in that exists in a long table format to a matrix of peptide expression by single cell.
If the experiment used a carrier channel as in the SCoPE2 design or in Thielert et al., the first step is to normalize the single cell intensities by that of the reference channel.
For DDA experiments, the reference channel is added in addition to the carrier at an amount rougly reflecting 5 single cells. More information on experimental design can be found in the SCoPE2 paper.
DDA and DIA params:
- PCT_car: an adjustable filtering step to remove precursor that are more than X percent of reference channel since we do not expect intensities from single cells to be significantly larger than those from the larger reference channel
DIA params:
- ChQ: remove precursors with a low confidence channel score
Linking Raw data to meta data
The cellenONE records characteristics of cells sorted for nPOP experiments including an features such as diameter, elongation and images of the cells. To link this meta data, along with the identity/condition of the sorted cell, to the raw MS data.
DDA and DIA params:
QQC object
named list of cell isolation files
## mapping cellenONE meta data to raw data
Protocol_DDA <- link_cellenONE_Raw(Protocol_DDA,all_cells)
## Target points completed:
## 100...200...300...400...500...
## 600...700...800...900...1000...
## 1100...1200...1300...1400...1500...
## 1600...1700...1800...
Experimental design over the nPOP slide can be plotted for easy visualization if experimenter wishes to inject certain samples pertaining to given cell types in a specific order.
Run order statistics
Calculating how LC/MS performance changes over time is crutial to understanding potential batch effects in single cell proteomics experiments.
Efficiency of peptide delivery through Ms1 intensities and efficiency of fragmentation and ion isolation through Ms2 intensities visualize over time to identify unwanted trends. Chromatographic seperation is calculated through average retention time drift of peptides as well as deviation of retention times
Evaluating negative controls and filtering failed cells
Negative controls are important to evaluate any errors in sample preperation. These controls correspond to drops that recieve all the reagents use to proccess a single cell but without any single cell. Their locations within MS runs are mapped automatically by QuantQC.
DDA and DIA params:
QQC object
min_pep: minimum number of peptides identified that you are willing to accept
DDA params:
- CV_thresh: due to increased background signal from coisolation in DDA experiments, consistency of quantitation (CV) of peptides mapping to the same protein is a useful feature for determining failed cells.
# Test negative controls, i.e. samples with no cell
Protocol_DDA <- EvaluateNegativeControls(Protocol_DDA)
Plots comparing single cell peptide identification to negative controls and CVs of peptide agreement.
Based off visualization, filtering is then performed, removing columns from the peptide X single cell matrix.
Collapse to protein level
Different options are available for collapsing to protein level:
(1) Median relative peptide level
(2) DirectLFQ
(3) DirectLFQ_with_carrier_norm
Quantitation checks
Data completness
Here we check the coverage of proteins and peptides across single cells, and examine the structure of data completness.
## Warning: Removed 2 rows containing missing values (`geom_bar()`).
## Removed 2 rows containing missing values (`geom_bar()`).
Sample prep quality
To evaluate recovery of our single cell sample prep, QuantQC evaluates two metrics.
- For experiments that use a carrier of known diluted amount, we can evaluate single cell intensities relative to the carrier amount
- Secondly, we evaluate the total protein intensity relative to the
reported diameter of the cell. Params:
- type - Color codes plot by different factors (options):
- “Sample” - by different sort condition
- “Run Order” - by the order samples were run in
- type - Color codes plot by different factors (options):
Degree of quantitative signal
- QuantQC plots the correlation between proteins mapping to a peptide. The correlations are faceted by protein fold change, since fold change indicates more variance which diminishes influence of noise relative to signal, resulting in an expectation for higher correlations.
- For DIA experiments, abundance at two levels, precursor (Ms1) and fragment (Ms2). This allows us to compare data points for peptides between measures that have different interference biases.
Impute and batch correct
Quantative batch effect visualiztion
PCA allows us to visualize different orthogonal sources of variance. Exploring PCA variance by sources of batch effects can let us see if data is properly normalized and unwanted variance regressed out.
Params: * type - Color codes plot by different factors (options): * “Sample” - by different sort condition * “Run Order” - by the order samples were run in * “Total protein” - total cell intensity (sometimes this corelates with cell type) * “Label” - Mass tag used
Examine key protein expression
Now that we have the sources variance sorted out, lets search for our biological signal.
Protein expression plotted on UMAP to identify cell clusters.
# plots by cluster
PlotUMAP(Protocol_DDA)
# Color code umap by proteins
FeatureUMAP(Protocol_DDA, prot = 'P40936', imputed = F)
Now we can double check that expression by examining the consistency of quantitation between multiple peptides derived from the protein of interest.