Skip to main content
. 2021 Oct 23;21(21):7035. doi: 10.3390/s21217035
Algorithm 1: STDF IoT-based Data Features Manager.
  Input: Arraylist ‘L’ of acquired data units
  Output: HashMap ‘HC’ of key: source id and value: Arraylist of arraylist
 1
 2
 3
 4
  
 5
 6
 7
 8
 9
 10
  
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
  
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
  
 41
 42
 43
 44
  
 45
 46
 47
 48
 49
  
 50
 51
  
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
  
 62
 63
 64
 65
 Begin
  Initialize DL as an empty Arraylist//initialize the list for the sample
  For each data unit ‘D’ in L//loop for all received data units
     If (IoT Data Source Validator (‘D’) = True and IoT Data Quality and Freshness Handler (‘D’)
     = True) then//check if the source ID is valid and data unit is fresh
  {Add ‘D’ in DL}//add this data unit to the list as a candidate
  End if
  End for
  Initialize HC as an empty HashMap
  For each data unit ‘D’ in DL//loop for all data units to group them based on their source ID
     If (source ID of ‘D’ in HC keys) then//check if there is a key in HC equals the source ID
     of the current data unit
  {Add ‘D’ in HC key}//add the current data unit to this key
  Else//create a new key in HC with the value of this source ID
  {Set source ID of ‘D’ as a new key
  Add ‘D’ in HC key}
  End if
  End for
  For each key in HC//loop for all source IDs in HC to calculate mean value ‘M’ for each attribute
  Initialize data_units_List with the current source ID’s data units
  Initialize attributes_mean_List as an empty array list
  For each attribute ‘AR’ in data_units_List//loop for all attributes
  Initialize attribute_sum = 0//The summation of attribute’s values
  Initialize data_units_count = 0//The count of data units
  Initialize M = 0//Attribute’s mean value
  For each data unit ‘D’ in AR//loop for all attribute’s values per each data unit
  If (low–high pass filter = True) then//check if there is no outlier or missing value
  {attribute_sum = attribute_sum + D
  data_units_count = data_units_count + 1}
  End if
  End for
  M = attribute_sum/data_units_count//divide the summation of attribute’s values by
  the count of data units
  Add M to attributes_mean_List
  End For
  End for
  For each key in HC//loop for all source IDs to clean missing and outlier data units in each group
  Initialize data_units_List with the current source ID’s data units
  For each data unit ‘D’ in data_units_List//loop for all data units’ candidates
  For each attribute ‘AR’ in D//loop for all attributes per each data unit
  Get attribute’s mean value ‘M’ from attributes_mean_List
  If (low–high pass filter = False) then//check if there is an outlier or missing value
  {Replace AR of D with M}//Replace the outlier or missing attribute value in
  the data unit by the attribute mean value
  End if
  End for
  End For
  Update the current source ID’s data units with data_units_List//update each source ID
  corrupted data units with the cleansed data units
  End for
  Initialize population_counter = 0
  Initialize sample_size = X
  For each key in HC//loop for all source IDs in HC to calculate the population_counter over them
     Add count of data units to population_counter//add the data units count of the current
     source ID to population counter
  End for
  For each key in HC//loop for all source IDs in HC to perform sampling and create the attributes
  mean vector as the state-estimator
  Initialize data_units_List with the current source ID’s data units
  Initialize MV as an empty Arraylist//initialize the attributes mean vector
  For each attribute ‘AR’ in data_units_List
  Calculate AM using all data units of the current source ID//obtain every attribute mean
  Add AM in MV//add the attribute mean value to the attributes mean vector
  End for
     Calculate P1 using the population_counter and the data units of the current source ID
     Calculate P2 using the sample size and data units of the current source ID
     Calculate Sample weight using P1 and P2 of the current source ID
     Apply sampling with PPS using the current sample weight and data units of the current
     source ID//reduce the data units of the current source ID
  End for
  Return HC as a list of pairs (source ID, sampled data units, MV)//return a HashMap of
  key: source ID and values: list of sampled data units and the attributes mean vector
  End