The Spatiotemporal Data Fusion (STDF) Approach: IoT-Based Data Fusion Using Big Data Analytics

. 2021 Oct 23;21(21):7035. doi: 10.3390/s21217035

Algorithm 1: STDF IoT-based Data Features Manager.
Input: Arraylist ‘L’ of acquired data units Output: HashMap ‘HC’ of key: source id and value: Arraylist of arraylist
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65	Begin Initialize DL as an empty Arraylist//initialize the list for the sample For each data unit ‘D’ in L//loop for all received data units If (IoT Data Source Validator (‘D’) = True and IoT Data Quality and Freshness Handler (‘D’) = True) then//check if the source ID is valid and data unit is fresh {Add ‘D’ in DL}//add this data unit to the list as a candidate End if End for Initialize HC as an empty HashMap For each data unit ‘D’ in DL//loop for all data units to group them based on their source ID If (source ID of ‘D’ in HC keys) then//check if there is a key in HC equals the source ID of the current data unit {Add ‘D’ in HC key}//add the current data unit to this key Else//create a new key in HC with the value of this source ID {Set source ID of ‘D’ as a new key Add ‘D’ in HC key} End if End for For each key in HC//loop for all source IDs in HC to calculate mean value ‘M’ for each attribute Initialize data_units_List with the current source ID’s data units Initialize attributes_mean_List as an empty array list For each attribute ‘AR’ in data_units_List//loop for all attributes Initialize attribute_sum = 0//The summation of attribute’s values Initialize data_units_count = 0//The count of data units Initialize M = 0//Attribute’s mean value For each data unit ‘D’ in AR//loop for all attribute’s values per each data unit If (low–high pass filter = True) then//check if there is no outlier or missing value {attribute_sum = attribute_sum + D data_units_count = data_units_count + 1} End if End for M = attribute_sum/data_units_count//divide the summation of attribute’s values by the count of data units Add M to attributes_mean_List End For End for For each key in HC//loop for all source IDs to clean missing and outlier data units in each group Initialize data_units_List with the current source ID’s data units For each data unit ‘D’ in data_units_List//loop for all data units’ candidates For each attribute ‘AR’ in D//loop for all attributes per each data unit Get attribute’s mean value ‘M’ from attributes_mean_List If (low–high pass filter = False) then//check if there is an outlier or missing value {Replace AR of D with M}//Replace the outlier or missing attribute value in the data unit by the attribute mean value End if End for End For Update the current source ID’s data units with data_units_List//update each source ID corrupted data units with the cleansed data units End for Initialize population_counter = 0 Initialize sample_size = X For each key in HC//loop for all source IDs in HC to calculate the population_counter over them Add count of data units to population_counter//add the data units count of the current source ID to population counter End for For each key in HC//loop for all source IDs in HC to perform sampling and create the attributes mean vector as the state-estimator Initialize data_units_List with the current source ID’s data units Initialize MV as an empty Arraylist//initialize the attributes mean vector For each attribute ‘AR’ in data_units_List Calculate AM using all data units of the current source ID//obtain every attribute mean Add AM in MV//add the attribute mean value to the attributes mean vector End for Calculate P1 using the population_counter and the data units of the current source ID Calculate P2 using the sample size and data units of the current source ID Calculate Sample weight using P1 and P2 of the current source ID Apply sampling with PPS using the current sample weight and data units of the current source ID//reduce the data units of the current source ID End for Return HC as a list of pairs (source ID, sampled data units, MV)//return a HashMap of key: source ID and values: list of sampled data units and the attributes mean vector End