Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Sep 30;37(5):1026–1048. doi: 10.1007/s11390-022-2409-x

Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing

She Sun 1, Shuai Ma 1,, Jing-He Song 1, Wen-Hai Yue 1, Xue-Lian Lin 1,, Tiejun Ma 2
PMCID: PMC9581755  PMID: 36281257

Abstract

With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez et al. is exactly opposite to that of De Montjoye et al., which raises the first question: “what is the true situation of the privacy preservation for trajectories in terms of reidentification?” Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: “what is the true situation of the utility of anonymized trajectories?” To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, k-anonymity and ε-differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye et al. and Sánchez et al. To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications.

Supplementary Information

The online version contains supplementary material available at 10.1007/s11390-022-2409-x.

Keywords: anonymization, privacy, reidentification, trajectory, utility

Supplementary Information

ESM 1 (159KB, pdf)

(PDF 159 kb)

Footnotes

Shuai Ma contributed the key ideas and Xue-Lian Lin was in charge of the experiments.

Contributor Information

She Sun, Email: sunshe@buaa.edu.cn.

Shuai Ma, Email: mashuai@buaa.edu.cn.

Jing-He Song, Email: songjh@buaa.edu.cn.

Wen-Hai Yue, Email: yuewh@buaa.edu.cn.

Xue-Lian Lin, Email: linxl@buaa.edu.cn.

Tiejun Ma, Email: tiejun.ma@ed.ac.uk.

References

  • [1].De Montjoye Y A, Hidalgo C A, Verleysen M, Blondel V D. Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 2013, 3(6): Article No. 1376. 10.1038/srep01376. [DOI] [PMC free article] [PubMed]
  • [2].De Montjoye YAD, Radaelli L, Singh VK, Pentland AS. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science. 2015;347(6221):536–539. doi: 10.1126/science.12562. [DOI] [PubMed] [Google Scholar]
  • [3].De Montjoye YAD, Pentland AS. Response to comment on “unique in the shopping mall: On the reidentifiability of credit card metadata”. Science. 2016;351(6279):1274. doi: 10.1126/science.aaf15. [DOI] [PubMed] [Google Scholar]
  • [4].Rocher L, Hendrickx J M, De Montjoye Y A. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications, 2019, 10(1): Article No. 3069. 10.1038/s41467-019-10933-3. [DOI] [PMC free article] [PubMed]
  • [5].Lin X, Ma S, Zhang H, Wo T, Huai J. One-pass error bounded trajectory simplification. Proceedings of the VLDB Endowment. 2017;10(7):841–852. doi: 10.14778/3067421.3067432. [DOI] [Google Scholar]
  • [6].Lin X, Jiang J, Ma S, Zuo Y, Hu C. One-pass trajectory simplification using the synchronous Euclidean distance. The VLDB Journal. 2019;28(6):897–921. doi: 10.1007/s00778-019-00575-8. [DOI] [Google Scholar]
  • [7].Lin X, Ma S, Jiang J, Hou Y, Wo T. Error bounded line simplification algorithms for trajectory compression: An experimental evaluation. ACM Trans. Database Syst., 2021, 46(3): Article No. 11. 10.1145/3474373.
  • [8].Zaeem R N, Barber K S. The effect of the GDPR on privacy policies: Recent progress and future promise. ACM Trans. Manag. Inf. Syst., 2021, 12(1): Article No. 2. 10.1145/3389685.
  • [9].Wicker SB. The loss of location privacy in the cellular age. Communications of the ACM. 2012;55(8):60–68. doi: 10.1145/2240236.2240255. [DOI] [Google Scholar]
  • [10].Abul O, Bonchi F, Nanni M. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proc. the 24th IEEE International Conference on Data Engineering, April 2008, pp.376-385. 10.1109/ICDE.2008.4497446.
  • [11].Fung B C M, Wang K, Chen R, Yu P S. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv., 2010, 42(4): Article No. 14. 10.1145/1749603.1749605.
  • [12].Chow C, Mokbel M F. Privacy of spatial trajectories. In Computing with Spatial Trajectories, Zheng Y, Zhou X (eds.), Springer, 2011, pp.109-141. 10.1007/978-1-4614-1629-6_4.
  • [13].Schwartz PM, Solove DJ. Reconciling personal information in the United States and European Union. California Law Review. 2014;102(4):877–916. doi: 10.2139/ssrn.2271442. [DOI] [Google Scholar]
  • [14].Gidófalvi G, Huang X, Pedersen T B. Privacy-preserving data mining on moving object trajectories. In Proc. the 2007 International Conference on Mobile Data Management, May 2007, pp.60-68. 10.1109/MDM.2007.18.
  • [15].Kido H, Yanagisawa Y, Satoh T. An anonymous communication technique using dummies for location-based services. In Proc. the 2005 International Conference on Pervasive Services, July 2005, pp.88-97. 10.1109/PERSER.2005.1506394.
  • [16].Sweeney L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 2002;10(5):557–570. doi: 10.1142/S0218488502001648. [DOI] [Google Scholar]
  • [17].Zhao K, Tu Z, Xu F, Li Y, Member S, Zhang P, Pei D, Su L, Jin D. Walking without friends: Publishing anonymized trajectory dataset without leaking social relationships. IEEE Transactions on Network and Service Management. 2019;16(3):1212–1225. doi: 10.1109/TNSM.2019.2907542. [DOI] [Google Scholar]
  • [18].Gursoy ME, Liu L, Truex S, Yu L. Differentially private and utility preserving publication of trajectory data. IEEE Transactions on Mobile Computing. 2019;18(10):2315–2329. doi: 10.1109/TMC.2018.2874008. [DOI] [Google Scholar]
  • [19].He X, Cormode G, Machanavajjhala A, Procopiuc CM, Srivastava D. DPT: Differentially private trajectory synthesis using hierarchical reference systems. Proceedings of the VLDB Endowment. 2015;8(11):1154–1165. doi: 10.14778/2809974.2809978. [DOI] [Google Scholar]
  • [20].Andrés M E, Bordenabe N E, Chatzikokolakis K, Palamidessi C. Geo-indistinguishability: Differential privacy for location-based systems. In Proc. the 2013 ACM SIGSAC Conference on Computer and Communications Security, Nov. 2013, pp.901-914. 10.1145/2508859.2516735.
  • [21].Sánchez D, Martínez S, Domingo-Ferrer J. Comment on “Unique in the shopping mall: On the reidentifiability of credit card metadata”. Science. 2016;351(6279):1274. doi: 10.1126/science.aad9295. [DOI] [PubMed] [Google Scholar]
  • [22].Xiao Z, Wang C, Han W, Jiang C. Unique on the road: Reidentification of vehicular location-based metadata. In Proc. the 12th International Conference on Security and Privacy in Communication Networks, Oct. 2016, pp.496-513. 10.1007/978-3-319-59608-2_28.
  • [23].Chatzikokolakis K, ElSalamouny E, Palamidessi C, Pazii A. Methods for location privacy: A comparative overview. Found. Trends Priv. Secur. 2017;1(4):199–257. doi: 10.1561/3300000017. [DOI] [Google Scholar]
  • [24].Henriksen-Bulmer J, Jeary S. Re-identification attacks—A systematic literature review. Int. J. Inf. Manag. 2016;36(6):1184–1192. doi: 10.1016/j.ijinfomgt.2016.08.002. [DOI] [Google Scholar]
  • [25].Wagner I, Eckho_ D. Technical privacy metrics: A systematic survey. ACM Comput. Surv., 2018, 51(3): Article No. 57. 10.1145/3168389.
  • [26].Primault V, Boutet A, Mokhtar SB, Brunie L. The long road to computational location privacy: A survey. IEEE Commun. Surv. Tutorials. 2019;21(3):2772–2793. doi: 10.1109/COMST.2018.2873950. [DOI] [Google Scholar]
  • [27].Peters F, Menzies T, Gong L, Zhang H. Balancing privacy and utility in cross-company defect prediction. IEEE Trans. Software Eng. 2013;39(8):1054–1068. doi: 10.1109/TSE.2013.6. [DOI] [Google Scholar]
  • [28].Xu J, Wang W, Pei J, Wang X, Shi B, Fu A W. Utility-based anonymization using local recoding. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2006, pp.785-790. 10.1145/1150402.1150504.
  • [29].Jr R J B, Agrawal R. Data privacy through optimal k-anonymization. In Proc. the 21st International Conference on Data Engineering, April 2005, pp.217-228. 10.1109/ICDE.2005.42.
  • [30].Peters F, Menzies T. Privacy and utility for defect prediction: Experiments with MORPH. In Proc. the 34th International Conference on Software Engineering, June 2012, pp.189-199. 10.1109/ICSE.2012.6227194.
  • [31].Hua J, Gao Y, Zhong S. Differentially private publication of general time-serial trajectory data. In Proc. the 2015 IEEE Conference on Computer Communications, April 26-May 1, 2015, pp.549-557. 10.1109/INFOCOM.2015.7218422.
  • [32].Cunha M, Mendes R, Vilela J P. A survey of privacy-preserving mechanisms for heterogeneous data types. Computer Science Review, 2021, 41: Article No. 100403. 10.1016/j.cosrev.2021.100403.
  • [33].Casas-Roma J. DUEF-GA: Data utility and privacy evaluation framework for graph anonymization. International Journal of Information Security. 2020;19(4):465–478. doi: 10.1007/s10207-019-00469-4. [DOI] [Google Scholar]
  • [34].Ni C, Cang LS, Gope P, Min G. Data anonymization evaluation for big data and IoT environment. Information Sciences. 2022;605:381–392. doi: 10.1016/j.ins.2022.05.040. [DOI] [Google Scholar]
  • [35].You T, Peng W, Lee W. Protecting moving trajectories with dummies. In Proc. the 2007 International Conference on Mobile Data Management, May 2007, pp.278-282. 10.1109/MDM.2007.58.
  • [36].Wang Y, Zheng Y, Xue Y. Travel time estimation of a path using sparse trajectories. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2014, pp.25-34. 10.1145/2623330.2623656.
  • [37].Wang H, Tang X, Kuo Y, Kifer D, Li Z. A simple baseline for travel time estimation using large-scale trip data. ACM Trans. Intell. Syst. Technol., 2019, 10(2): Article No. 19. 10.1145/3293317.
  • [38].Eldawy A, Alarabi L, Mokbel MF. Spatial partitioning techniques in spatial Hadoop. Proceedings of the VLDB Endowment. 2015;8(12):1602–1605. doi: 10.14778/2824032.2824057. [DOI] [Google Scholar]
  • [39].Yuan J, Zheng Y, Xie X. Discovering regions of different functions in a city using human mobility and POIs. In Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2012, pp.186-194. 10.1145/2339530.2339561.
  • [40].Jiang K, Shao D, Bressan S, Kister T, Tan K. Publishing trajectories with differential privacy guarantees. In Proc. the 25th International Conference on Scientific and Statistical Database Management, July 2013, Article No. 12. 10.1145/2484838.2484846.
  • [41].Nergiz ME, Atzori M, Saygin Y, Güç B. Towards trajectory anonymization: A generalization-based approach. Trans. Data Privacy. 2009;2(1):47–75. [Google Scholar]
  • [42].Zhang C, Han J, Shou L, Lu J, Porta TL. Splitter: Mining fine-grained sequential patterns in semantic trajectories. Proceedings of the VLDB Endowment. 2014;7(9):769–780. doi: 10.14778/2732939.2732949. [DOI] [Google Scholar]
  • [43].Li N, Li T, Venkatasubramanian S. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proc. the 23rd IEEE International Conference on Data Engineering, April 2007. 10.1109/ICDE.2007.367856.
  • [44].Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1): Article No. 3. 10.1145/1217299.1217302.
  • [45].Abul O, Bonchi F, Nanni M. Anonymization of moving objects databases by clustering and perturbation. Information Systems. 2010;35(8):884–910. doi: 10.1016/j.is.2010.05.003. [DOI] [Google Scholar]
  • [46].Trujillo-Rasua R, Domingo-Ferrer J. On the privacy offered by (k, δ)-anonymity. Information Systems. 2013;38(4):491–494. doi: 10.1016/j.is.2012.12.003. [DOI] [Google Scholar]
  • [47].Dwork C, McSherry F, Nissim K, Smith A D. Calibrating noise to sensitivity in private data analysis. In Proc. the 3rd Theory of Cryptography Conference, March 2006, pp.265-284. 10.1007/11681878_14.
  • [48].Dwork C, Roth A. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science. 2014;9(3/4):211–407. doi: 10.1561/0400000042. [DOI] [Google Scholar]
  • [49].McSherry F, Talwar K. Mechanism design via differential privacy. In Proc. the 48th Annual IEEE Symposium on Foundations of Computer Science, Oct. 2007, pp.94-103. 10.1109/FOCS.2007.66.
  • [50].McSherry F. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proc. the ACM SIGMOD International Conference on Management of Data, June 29-July 2, 2009, pp.19-30. 10.1145/1559845.1559850.
  • [51].Chen R, Fung B C M, Desai B C. Differentially private trajectory data publication. arXiv:1112.2020, 2011. https://arxiv.org/abs/1112.2020, July 2022.
  • [52].Yao L, Chen Z, Hu H, Wu G, Wu B. Privacy preservation for trajectory publication based on differential privacy. ACM Trans. Intell. Syst. Technol., 2022, 13(3): Article No. 42. 10.1145/3474839.
  • [53].Yuan NJ, Zheng Y, Zhang L, Xie X. T-finder: A recommender system for finding passengers and vacant taxis. IEEE Transactions on Knowledge and Data Engineering. 2013;25(10):2390–2403. doi: 10.1109/TKDE.2012.153. [DOI] [Google Scholar]
  • [54].Yuan J, Zheng Y, Xie X, Sun G. T-drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge and Data Engineering. 2013;25(1):220–232. doi: 10.1109/TKDE.2011.200. [DOI] [Google Scholar]
  • [55].Zhang D, Ding M, Yang D, Liu Y, Fan J, Shen HT. Trajectory simplification: An experimental study and quality analysis. Proceedings of the VLDB Endowment. 2018;11(9):934–946. doi: 10.14778/3213880.3213885. [DOI] [Google Scholar]
  • [56].Ali ME, Eusuf SS, Abdullah K, Choudhury FM, Culpepper JS, Sellis T. The maximum trajectory coverage query in spatial databases. Proceedings of the VLDB Endowment. 2018;12(3):197–209. doi: 10.14778/3291264.3291266. [DOI] [Google Scholar]
  • [57].Yuan H, Li G, Bao Z, Feng L. Effective travel time estimation: When historical trajectories over road networks matter. In Proc. the 2020 ACM SIGMOD International Conference on Management of Data, June 2020, pp.2135-2149. 10.1145/3318464.3389771.
  • [58].Shah D, Kumaran A, Sen R, Kumaraguru P. Travel time estimation accuracy in developing regions: An empirical case study with Uber data in Delhi-NCR*. In Proc. Companion of the 2019 World Wide Web Conference, May 2019, pp.130-136. 10.1145/3308560.3317057.
  • [59].Ma S, Yu Z, Wolfson O. T-share: A large-scale dynamic taxi ridesharing service. In Proc. the 29th IEEE International Conference on Data Engineering, April 2013, pp.410-421. 10.1109/ICDE.2013.6544843.
  • [60].Wang Y, Lin X, Wei H, Wo T, Huang Z, Zhang Y, Xu J. A unified framework with multi-source data for predicting passenger demands of ride services. ACM Transactions on Knowledge Discovery from Data, 2019, 13(6): Article No. 56. 10.1145/3355563.
  • [61].Li Y, Fu K, Wang Z, Shahabi C, Ye J, Liu Y. Multi-task representation learning for travel time estimation. In Proc. the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Aug. 2018, pp.1695-1704. 10.1145/3219819.3220033.
  • [62].Fang X, Huang J, Wang F, Zeng L, Liang H, Wang H. ConSTGAT: Contextual spatial-temporal graph attention network for travel time estimation at Baidu maps. In Proc. the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Aug. 2020, pp.2697-2705. 10.1145/3394486.3403320.
  • [63].Wang L, Ma W, Fan Y, Zuo Z. Trip chain extraction using smartphone-collected trajectory data. Transportmetrica B: Transport Dynamics. 2019;7(1):255–274. doi: 10.1080/21680566.2017.1386599. [DOI] [Google Scholar]
  • [64].Newson P, Krumm J. Hidden Markov map matching through noise and sparseness. In Proc. the 17th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, Nov. 2009, pp.336-343. 10.1145/1653771.1653818.
  • [65].Cao H, Wolfson O, Trajcevski G. Spatiotemporal data reduction with deterministic error bounds. The VLDB Journal. 2006;15(3):211–228. doi: 10.1007/s00778-005-0163-7. [DOI] [Google Scholar]
  • [66].Gao Z, Zhai R, Wang P, Yan X, Qin H, Tang Y, Ramesh B. Synergizing appearance and motion with low rank representation for vehicle counting and traffic flow analysis. IEEE Transactions on Intelligent Transportation Systems. 2018;19(8):2675–2685. doi: 10.1109/TITS.2017.2757040. [DOI] [Google Scholar]
  • [67].Zang H, Bolot J. Anonymization of location data does not work: A large-scale measurement study. In Proc. the 17th Annual International Conference on Mobile Computing and Networking, Sept. 2011, pp.145-156. 10.1145/2030613.2030630.
  • [68].Shokoohyar S, Sobhani A, Sobhani A. Impacts of trip characteristics and weather condition on ride-sourcing network: Evidence from Uber and Lyft. Research in Transportation Economics, 2020, 80: Article No. 100820. 10.1016/j.retrec.2020.100820.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ESM 1 (159KB, pdf)

(PDF 159 kb)


Articles from Journal of Computer Science and Technology are provided here courtesy of Nature Publishing Group

RESOURCES