Skip to main content
. 2022 Nov 1;21(12):100437. doi: 10.1016/j.mcpro.2022.100437

Pseudocode for generation of simulated datasets

Input:-
n_exp: number of experiments-
n_prot_mean: mean number of proteins present per experiment-
n_prot_stdev: stdev number of proteins present per experiment-
tp_score_mean: mean of score distribution for true positives-
tp_score_stdev: stdev of score distribution for true positives-
fp_score_mean: mean of score distribution for false positives-
fp_score_stdev: stdev of score distribution for false positives-
incorrect_ratio: proportion of incorrect peptides without FDR threshold-
peptide_fdr: peptide fdr threshold, should be lower than protein_fdr-
protein_probs: probability for each protein to be present-
peptide_probs: probability for each peptide (including shared peptides!) to be present given that the protein is present
Algorithm:
  • 1.
    For each experiment
    • a.
      Pick n_protN(n_prot_mean, n_prot_stdev) proteins to be present
    • b.
      Draw n_prot true positive target proteins from the target database
    • c.
      Calculate minimum peptide score corresponding to given peptFDR and incorrect_ratio: min_score = Φ-1(1 - peptFDR ∗ (1 - incorrect_ratio) / incorrect_ratio; fp_score_mean, fp_score_stdev)
    • d.
      For each true positive target protein
      • i.
        Draw true positive peptides based on their peptide_probs
      • ii.
        Draw score for each true positive peptide from a truncated normal distribution: trunc_norm(min_score, ∞; tp_score_mean, tp_score_stdev)
    • e.
      Randomly draw 2 ∗ tp_peptidespeptide_fdr / (1 - peptFDR) false positve peptides from all peptides in target and decoy databases
    • f.
      Draw score for each false positive peptide from a truncated normal distribution: trunc_norm(min_score, ∞; fp_score_mean, fp_score_stdev)
  • 2.

    Do protein grouping based on observed peptides

  • 3.

    Calculate protein group FDRs