Setup and reqired data files

For this tutorial we will be processing single cell proteomics data prepared by nPOP. We will showcase data prepared for Prioritized or data dependant data acquisition utilizing TMT labels and plexDIA utilizing mTRAQ labels with a carrier or without a carrier. For brevity, the former will be reffered to as DDA and the later DIA throught the document.

Files needed are search results from MaxQuant or DIANN, a meta data file that links the MS raw files to the wells of the 384 well plate, and the _isolated files from the cellenONE that provide key metadata about the single cells.

However we also support other sample preps that require a more comprehensive meta data file. Example can be found here.

The QuantQC package can be installed from github: https://github.com/Andrew-Leduc/QuantQC

# devtools::install_github("https://github.com/Andrew-Leduc/QuantQC")


library(QuantQC)

# Can make this an individual file or a path it reads all files inside
data_path <- '/Users/andrewleduc/Library/CloudStorage/GoogleDrive-research@slavovlab.net/.shortcut-targets-by-id/1uQ4exoKlaZAGnOG1iCJPzYN3ooYYZB7g/MS/Users/aleduc/TMT29/Protocol_final_data/All_1p_FDR/evidence.txt'

# check out the linker to see what it should look like, super simple, column names need to be same
link_path <- '/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/linker.csv'


## Read in cell isolation files from CellenONE 
Mel <-"/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/Melanoma_isolated.xls"
Mon <- "/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/Monocyte_isolated.xls"
PDAC_f <- "/Users/andrewleduc/Desktop/Github/QuantQC/AnalysisFromPaper/pSCoPE/PDAC_isolated.xls"

Cell isolation files will be compiled into a list with the name specifying the sample.

all_cells <- list(Melanoma = Mel,
                  Monocyte = Mon,
                  PDAC = PDAC_f)

Quick report generation

QuantQC allows you to quickly generate QC reports for DIA or DDA experiments. Some aspects will vary as some analysis are more applicable for DIA or DDA style experiments. Work flows are mostly the same by deviations will be explained as we go through functions step by step.

Example reports can be found here: DDA DIA

output_path  <- "/Users/andrewleduc/Desktop/round2_Data/DDA.html"

  # DDA instant report generation


Gen_QQC_report_DDA(data_path = data_path,
                  linker_path = link_path,
                  isolation = all_cells,
                  output_path = output_path,
                  plex_exp = 29,
                  CV_thresh = .4)


  # DIA instant report generation

Gen_QQC_report_DIA(data_path = data_path,
                  linker_path = link_path,
                  isolation = all_cells,
                  prot_vis_umap = prot_vis_umap,
                  plex_exp = 3,
                  carrier_used = F,
                  ChQ = .3,
                  output_path = output_path)

Creating QQC object

The first step of the workflow is to read in the raw data and generate our QQC object.

DDA and DIA params:

data_path: path to maxquant or DIANN raw data, specific file or folder containing raw files
link_path: path to file linking ms runs to well of 384 plate from injection
plex: amount of multiplexing

DIA params:

carrier: use of carrier channel

DDA params:

PEP: remove precursors with a low confidence
PIF: remove precursors with high coisolaiton

  # DDA

Protocol_DDA <- MQ_to_QQC(data_path,link_path,plex = 29, PIF = .75, PEP = .07)


  # DIA

# Protocol_DIA <- DIANN_to_QQC(data_path,linker_path, plex = 2, carrier = T)

Create peptide X single cell table

Next we transform the raw data in that exists in a long table format to a matrix of peptide expression by single cell.

If the experiment used a carrier channel as in the SCoPE2 design or in Thielert et al., the first step is to normalize the single cell intensities by that of the reference channel.

For DDA experiments, the reference channel is added in addition to the carrier at an amount rougly reflecting 5 single cells. More information on experimental design can be found in the SCoPE2 paper.

DDA and DIA params:

PCT_car: an adjustable filtering step to remove precursor that are more than X percent of reference channel since we do not expect intensities from single cells to be significantly larger than those from the larger reference channel

DIA params:

ChQ: remove precursors with a low confidence channel score

# Normalize single cell runs to reference channel,
# filter out data points over twice reference

  # DDA

Protocol_DDA <- cellXpeptide(Protocol_DDA)


  # DIA

# Protocol_DIA <- cellXpeptide(Protocol_DIA, ChQ = .01)

Linking Raw data to meta data

The cellenONE records characteristics of cells sorted for nPOP experiments including an features such as diameter, elongation and images of the cells. To link this meta data, along with the identity/condition of the sorted cell, to the raw MS data.

DDA and DIA params:

QQC object
named list of cell isolation files

## mapping cellenONE meta data to raw data

Protocol_DDA <- link_cellenONE_Raw(Protocol_DDA,all_cells)

## Target points completed: 
##      100...200...300...400...500...
##      600...700...800...900...1000...
##      1100...1200...1300...1400...1500...
##      1600...1700...1800...

Experimental design over the nPOP slide can be plotted for easy visualization if experimenter wishes to inject certain samples pertaining to given cell types in a specific order.

  PlotSlideLayout_celltype(Protocol_DDA)

  PlotSlideLayout_label(Protocol_DDA)

Run order statistics

Calculating how LC/MS performance changes over time is crutial to understanding potential batch effects in single cell proteomics experiments.

Efficiency of peptide delivery through Ms1 intensities and efficiency of fragmentation and ion isolation through Ms2 intensities visualize over time to identify unwanted trends. Chromatographic seperation is calculated through average retention time drift of peptides as well as deviation of retention times

# Calculate statistics to look for decrease in LC/MS performance
# over time
Protocol_DDA <- Calculate_run_order_statistics(Protocol_DDA)


# Plot MS Intensity Statistics
PlotIntensityDrift(Protocol_DDA)

#Plot Retention time Statistics
PlotRTDrift(Protocol_DDA)

Evaluating negative controls and filtering failed cells

Negative controls are important to evaluate any errors in sample preperation. These controls correspond to drops that recieve all the reagents use to proccess a single cell but without any single cell. Their locations within MS runs are mapped automatically by QuantQC.

DDA and DIA params:

QQC object
min_pep: minimum number of peptides identified that you are willing to accept

DDA params:

CV_thresh: due to increased background signal from coisolation in DDA experiments, consistency of quantitation (CV) of peptides mapping to the same protein is a useful feature for determining failed cells.

# Test negative controls, i.e. samples with no cell
Protocol_DDA <- EvaluateNegativeControls(Protocol_DDA)

Plots comparing single cell peptide identification to negative controls and CVs of peptide agreement.

  # DDA

PlotNegCtrl(Protocol_DDA, CV_thresh = .38)

  # DIA

#PlotNegCtrl(Protocol_DIA)

Based off visualization, filtering is then performed, removing columns from the peptide X single cell matrix.

# filter bad cells based off above plot with CV and min pep
Protocol_DDA <- FilterBadCells(Protocol_DDA, CV_thresh = .38)

Collapse to protein level

Different options are available for collapsing to protein level:

(1) Median relative peptide level

(2) DirectLFQ

(3) DirectLFQ_with_carrier_norm

Protocol_DDA <- CollapseToProtein(Protocol_DDA,1)

Quantitation checks

Data completness

Here we check the coverage of proteins and peptides across single cells, and examine the structure of data completness.

PlotProtAndPep(Protocol_DDA)

## Warning: Removed 2 rows containing missing values (`geom_bar()`).
## Removed 2 rows containing missing values (`geom_bar()`).

PlotDataComplete(Protocol_DDA)

Sample prep quality

To evaluate recovery of our single cell sample prep, QuantQC evaluates two metrics.

For experiments that use a carrier of known diluted amount, we can evaluate single cell intensities relative to the carrier amount

# Compute the size relative to the carrier of each cell
PlotSCtoCarrierRatio(Protocol_DDA)

Secondly, we evaluate the total protein intensity relative to the reported diameter of the cell. Params:
- type - Color codes plot by different factors (options):
  - “Sample” - by different sort condition
  - “Run Order” - by the order samples were run in

# Plot cell size vs intensity in MS, options to color code by" "Run order" or "sample"
PlotCellSizeVsIntensity(Protocol_DDA, type = 'sample')

Degree of quantitative signal

QuantQC plots the correlation between proteins mapping to a peptide. The correlations are faceted by protein fold change, since fold change indicates more variance which diminishes influence of noise relative to signal, resulting in an expectation for higher correlations.

Protocol_DDA <- SharedPeptideCor(Protocol_DDA)

PlotPepCor(Protocol_DDA)

For DIA experiments, abundance at two levels, precursor (Ms1) and fragment (Ms2). This allows us to compare data points for peptides between measures that have different interference biases.

  # DIA Only

PlotMS1vMS2(Protocol_DIA)

Impute and batch correct

# Impute, still old imputation very slow, soon will update with C++ implementation
Protocol_DDA <- KNN_impute(Protocol_DDA)


# currently does label and LC/MS run but options comming soon


# no batch correcting used for this data set

#Protocol_DDA <- BatchCorrect(Protocol_DDA,run = T,labels = F)

Quantative batch effect visualiztion

PCA allows us to visualize different orthogonal sources of variance. Exploring PCA variance by sources of batch effects can let us see if data is properly normalized and unwanted variance regressed out.

Params: * type - Color codes plot by different factors (options): * “Sample” - by different sort condition * “Run Order” - by the order samples were run in * “Total protein” - total cell intensity (sometimes this corelates with cell type) * “Label” - Mass tag used

Protocol_DDA <- ComputePCA(Protocol_DDA)

PlotPCA(Protocol_DDA, by = "Run order")

Examine key protein expression

Now that we have the sources variance sorted out, lets search for our biological signal.

Protocol_DDA <- ComputeUMAP(Protocol_DDA)

Protein expression plotted on UMAP to identify cell clusters.

# plots by cluster
PlotUMAP(Protocol_DDA)



# Color code umap by proteins
FeatureUMAP(Protocol_DDA, prot = 'P40936', imputed = F)

Now we can double check that expression by examining the consistency of quantitation between multiple peptides derived from the protein of interest.

# Cool plot that does some cool stuff, try it! ;)
ProteinClustConsistency(Protocol_DDA, prot = 'P40936', type = 'line')

QuantQC Tutorial

Andre Leduc, Saad Khan, Nikolai Slavov

8/29/2023

Setup and reqired data files

Quick report generation

Creating QQC object

Create peptide X single cell table

Linking Raw data to meta data

Run order statistics

Evaluating negative controls and filtering failed cells

Collapse to protein level

Quantitation checks

Data completness

Sample prep quality

Degree of quantitative signal

Impute and batch correct

Quantative batch effect visualiztion

Examine key protein expression