Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups

. 2022 Nov 1;21(12):100437. doi: 10.1016/j.mcpro.2022.100437

Pseudocode for generation of simulated datasets

Input:-
n_exp: number of experiments-
n_prot_mean: mean number of proteins present per experiment-
n_prot_stdev: stdev number of proteins present per experiment-
tp_score_mean: mean of score distribution for true positives-
tp_score_stdev: stdev of score distribution for true positives-
fp_score_mean: mean of score distribution for false positives-
fp_score_stdev: stdev of score distribution for false positives-
incorrect_ratio: proportion of incorrect peptides without FDR threshold-
peptide_fdr: peptide fdr threshold, should be lower than protein_fdr-
protein_probs: probability for each protein to be present-
peptide_probs: probability for each peptide (including shared peptides!) to be present given that the protein is present
Algorithm:

1.
For each experiment
- a.
  Pick n_prot ∼ N(n_prot_mean, n_prot_stdev) proteins to be present
- b.
  Draw n_prot true positive target proteins from the target database
- c.
  Calculate minimum peptide score corresponding to given peptFDR and incorrect_ratio: min_score = Φ^-1(1 - peptFDR ∗ (1 - incorrect_ratio) / incorrect_ratio; fp_score_mean, fp_score_stdev)
- d.
  For each true positive target protein
  - i.
    Draw true positive peptides based on their peptide_probs
  - ii.
    Draw score for each true positive peptide from a truncated normal distribution: trunc_norm(min_score, ∞; tp_score_mean, tp_score_stdev)
- e.
  Randomly draw 2 ∗ tp_peptides ∗ peptide_fdr / (1 - peptFDR) false positve peptides from all peptides in target and decoy databases
- f.
  Draw score for each false positive peptide from a truncated normal distribution: trunc_norm(min_score, ∞; fp_score_mean, fp_score_stdev)
2.
Do protein grouping based on observed peptides
3.
Calculate protein group FDRs