. 2022 Oct 7;9:1000205. doi: 10.3389/fmolb.2022.1000205

TABLE 5.

Strategies for the construction of a negative dataset for RNA-protein interaction prediction.

Strategy	Assumption	Description
Random pairing	The likelihood of interaction occurring between randomly paired RNAs and proteins is low	By using known interacting pairs as starting point, the same number of non-interacting pairs are generated by randomly pairing RNAs and proteins from the positive set, followed by discarding pairs that are similar to interactions already present in the positive set
FIRE method	Given a known RNA-protein interacting pair (p1, r), and given a second protein p2, the smaller the sequence similarity between p1 and p2, the lower the likelihood that r interacts with p2	For each positive RNA-protein interaction (p1, r) the p2 protein that is most dissimilar to p1 is selected, similarity between each pair of proteins was computed by taking into account functional annotations and protein domain information in addition to sequence similarity
Subcellular localization method	RNAs and proteins that are not in the same subcellular compartment do not interact with each other	This method requires subcellular localization data
Least atom distance criterion	Only applicable to interactions derived from known-structure complexes	Given a multimolecular RNA-protein complex, for each pairwise combination of its constituent RNA and protein molecules, if there is at least one atom of the RNA located closer than a threshold to at least one protein atom, the pair is considered to be interacting otherwise it is included in the negative dataset