Skip to main content
. Author manuscript; available in PMC: 2015 Nov 1.
Published in final edited form as: Proteomics. 2014 Sep 23;14(0):2389–2399. doi: 10.1002/pmic.201400080

Table 1.

New CV terms for reporting protein set (group) relationships and global statistics about the protein identification results. The semantic validation software for mzIdentML (v.1.2) reports an error (MUST), a warning (SHOULD) or an informational message (MAY) if these terms are not reported within the file.

mzIdentML context CV term Values Requirement level Description
ProteinDetectionList count of identified proteins xsd:integer MUST The value reported MUST equal the number of PAGs with “protein group passes threshold” value = “true”
ProteinDetectionList count of identified clusters xsd:integer MAY If protein clusters have been reported in the file, the exporter may choose to annotate the ProteinDetectionList with the number identified above threshold.
ProteinAmbiguity-Group number of distinct protein sequences xsd:integer MAY The number of distinct protein sequences among the PDHs in the group. For example, if there are two PDHs with different identifiers that have identical full length sequences, the value would be 1.
ProteinAmbiguity-Group cluster identifier xsd:integer MAY An identifier applied to protein groups to indicate that they are linked by shared peptides.
ProteinDetection-Hypothesis leading protein
OR
non-leading protein
- MUST
OR
MUST
Every PDH in each PAG MUST be flagged as a leading protein or a non-leading protein and each PAG MUST contain at least one leading protein, but MAY contain more than one. A “leading protein” is defined as a protein that has the strongest or near strongest (further explained in Table 2) set of evidence for being present in the sample studied, amongst the grouped protein accessions. A “non-leading protein” is defined as a protein that has (substantially) less evidence than other proteins within the same group, and is thus less likely to have been present in the sample studied.
ProteinDetection-Hypothesis group representative - MAY Each PAG MAY contain zero or one PDH flagged as the group representative, if the software wishes to flag a preference (often arbitrary or for example based on alphabetical ordering) amongst the leading proteins. The group representative term can thus be viewed a “tiebreaker” if the export software wishes to make this distinction.
ProteinDetection-Hypothesis Sequence Same-Set Protein xsd: “list_of_strings” space separated list of PDH IDs that are same-set. MAY A protein that is indistinguishable or equivalent to another protein in the group, having matches to an identical set of peptide sequences.
ProteinDetection-Hypothesis Spectrum Same-Set Protein xsd: “list_of_strings” space separated list of PDH IDs that are same-set. MAY A protein that is indistinguishable or equivalent to another protein in the group, having PSMs derived from the same set of spectra.
ProteinDetection-Hypothesis Sequence Subset Protein xsd: “list_of_strings” space separated list of PDH IDs that are super-set. MAY A protein for which the matched peptide sequences are a subset of the matched peptide sequences for another protein in the group.
ProteinDetection-Hypothesis Spectrum Subset Protein xsd: “list_of_strings” space separated list of PDH IDs that are super-set. MAY A protein for which the matched spectra are a subset of the matched spectra for another protein in the group.
ProteinDetection-Hypothesis Sequence Multiply Subsumable Protein xsd: “list_of_strings” space separated list of PDH IDs that subsume this PDH. MAY A protein for which the matched peptide sequences are the same, or a subset of, the matched peptide sequences for two or more other proteins combined. These other proteins need not all be in the same group.
ProteinDetection-Hypothesis Spectrum Multiply Subsumable Protein xsd: “list_of_strings” space separated list of PDH IDs that subsume this PDH. MAY A protein for which the matched spectra are the same, or a subset of, the matched spectra for two or more other proteins combined. These other proteins need not all be in the same group.
ProteinDetection-Hypothesis Marginally distinguished protein - MAY Assigned to a non-leading PDH that has some independent evidence to support its presence relative to the leading protein(s) e.g. the PDH may have a unique peptide but not sufficient to be promoted as, for example, a leading protein of another a PAG.