Skip to main content
. 2026 Mar 25;26(7):2055. doi: 10.3390/s26072055
Algorithm 1: Dataset splitting based on geographic region hashing (hash bucketing)
Input: region: string   // Geographic region name
Output: subset: {train, val, test}
 
1: // Step 1: Generate normalized hash value
2: hash_str ← MD5(region)                       // Compute MD5 hash of region
3: hash_hex ← Substring(hash_str, 0, 8)   // Extract first 8 hexadecimal characters
4: hash_int ← HexToInt(hash_hex)           // Convert hexadecimal substring to integer
5: hash_mod ← hash_int MOD 10,000     // Limit value range via modulo operation
6: hash_ratio ← hash_mod/10,000.0       // Normalize to [0, 1)
 
7: // Step 2: Assign subset based on ratio
8: if hash_ratio < 0.8 then
9:          subset ← “train”          // 80% for training set
10: else if hash_ratio < 0.9 then
11:          subset ← “val”          // 10% for validation set
12: else
13:          subset ← “test”          // 10% for testing set
14: end if