Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

. Author manuscript; available in PMC: 2016 Mar 11.

Published in final edited form as: Proc Int Conf Data Eng. 2015 Apr;2015:1035–1046. doi: 10.1109/ICDE.2015.7113354


Algorithm 3 PFS² Algorithm

Input:
	Original database D; Threshold θ; Maximal length constraint upper bound l₁;
	Percentage η; Privacy budgets ε₁, …, ε₅.
Output:
	Frequent Sequences FS;
1:	/** Pre-Mining Phase **/
2:	\|D\| ← get the noisy number of total input sequences using ε₁;
3:	for l₂ = 1; p < η; l₂ ++ do
4:	α_l₂ = get the noisy number of input sequences with length l₂ using ε₂;
5:	$p = (\sum_{j = 1}^{l_{2}} α_{j}) / \| D \|$ ;
6:	end for
7:	l_max ← min{l₁, l₂};
8:	β ← get noisy maximal support of sequences of length from 1 to l_max using ε₃;
9:	L_f ← estimate_max_frequent_sequence_length (θ, β);
10:	/** Mining Phase **/
11:	FS ← ø;
12:	dbSet ← randomly_partition_database (D, L_f);
13:	ε′ ← ε₅/L_f;
14:	for k from 1 to L_f do
15:	if k == 1 then
16:	Candidate Set C_k ← all items in the alphabet;
17:	else
18:	Candidate Set C_k ← generate_candidates (FS_k_—1);
19:	end if
20:	$C_{k}^{'}$ ← Sampling_based_Candidate_Pruning(C_k, db_k, ε₄, θ, l_max); /** See Algorithm 1 **/
21:	FS_k ← discover_frequent_sequences ( $C_{k}^{'}$ , D, ε′, θ);
22:	FS += FS_k;
23:	end for
24:	return FS;