Algorithm 2. Data stream preprocessing: duplicate instance rule.
Input: A sequence of input stream S = s1s2…sn. |
Output: Preprocessed stream. |
Apply the Data Quality Rule (DQR) on input data // Duplicate Instance Rule |
• Find Duplicate: Given an input stream S = s1s2…sn, where si ∈ [m] and n > m, find a ∈ [m], which appears more than once. |
Begin |
For each si ∈ S do |
Get K records si+1 … sk |
If any of the above k records are similar to si then |
Flag = “Y” |
Skip the input stream. |
Else |
Flag = “N” |
Consider the input stream. |
End For |
End |