Abstract
Social networks like Twitter, Facebook have recently become the most widely used communication platforms for people to propagate information rapidly. Fast diffusion of information creates accuracy and scalability issues towards topic detection. Most of the existing approaches can detect the most popular topics on a large scale. However, these approaches are not effective for faster detection. This article proposes a novel topic detection approach – Node Significance based Label Propagation Community Detection (NSLPCD) algorithm, which detects the topic faster without compromising accuracy. The proposed algorithm analyzes the frequency distribution of keywords in the collection of tweets and finds two types of keywords: topic-identifying and topic-describing keywords, which play an important role in topic detection. Based on these defined keywords, the keyword co-occurrence graph is built, and subsequently, the NSLPCD algorithm is applied to get topic clusters in the form of communities. The experimental results using the real data of Twitter, show that the proposed method is effective in quality as well as run-time performance as compared to other existing methods.
Keywords: Tweet clustering, Supervised and Unsupervised technique, Label propagation, Keyword co-occurrence, Topic modeling
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jagrati Singh, Email: singh.jagriti5@gmail.com.
Anil Kumar Singh, Email: ak@mnnit.ac.in.
References
- 1.Sakaki T, Okazaki M, Matsuo Y. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. Knowl. Data Eng. 2013;25(4):919–931. doi: 10.1109/TKDE.2012.29. [DOI] [Google Scholar]
- 2.Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.-C.: Tedas: A twitter-based event detection and analysis system. In: Data engineering (icde), 2012 ieee 28th international conference on, IEEE, pp 1273–1276 (2012)
- 3.Sayyadi H, Raschid L. A graph analytical approach for topic detection. ACM Transactions on Internet Technology (TOIT) 2013;13(2):4. doi: 10.1145/2542214.2542215. [DOI] [Google Scholar]
- 4.Newman MEJ. Analysis of weighted networks. Physical review E. 2004;70(5):056131. doi: 10.1103/PhysRevE.70.056131. [DOI] [PubMed] [Google Scholar]
- 5.Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Physical review E. 2007;76(3):036106. doi: 10.1103/PhysRevE.76.036106. [DOI] [PubMed] [Google Scholar]
- 6.Becker H, Naaman M, Gravano L. Beyond trending topics: Real-world event identification on twitter. ICWSM. 2011;11(2011):438–441. [Google Scholar]
- 7.Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: Twitterstand: news in tweets. In: Proceedings of the 17th acm sigspatial international conference on advances in geographic information systems, ACM, pp 42–51 (2009)
- 8.Kim, H.-G., Lee, S., Kyeong, S.: Discovering hot topics using twitter streaming data social topic detection and geographic clustering. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, IEEE, pp 1215–1220 (2013)
- 9.Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, ACM, pp 1155–1158 (2010)
- 10.O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: ICWSM, pp 384–385 (2010)
- 11.Papadopoulos, S., Kompatsiaris, Y., Vakali, A.: A graph-based clustering scheme for identifying related tags in folksonomies. In: International Conference on Data Warehousing and Knowledge Discovery, Springer, pp 65–76 (2010)
- 12.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research. 2003;3:993–1022. [Google Scholar]
- 13.Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, pp 536–544 (2012)
- 14.Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 181–189 (2010)
- 15.Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: First story detection using twitter and wikipedia. In: SIGIR 2012 Workshop on Time-aware Information Access (2012)
- 16.Petrović, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and twitter. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp 338–346 (2012)
- 17.Feng, X., Zhang, S., Liang, W., Liu, J.: Efficient location-based event detection in social text streams. In: International Conference on Intelligent Science and Big Data Engineering, Springer, pp 213–222 (2015)
- 18.Hasan M, Orgun MA, Schwitter R. Twitternews: real time event detection from the twitter data stream. PeerJ PrePrints. 2016;4:e2297v1. [Google Scholar]
- 19.Alsaedi N, Burnap P, Rana O. Can we predict a riot? disruptive event detection using twitter. ACM Transactions on Internet Technology (TOIT) 2017;17(2):18. doi: 10.1145/2996183. [DOI] [Google Scholar]
- 20.Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, pp 155–164 (2012)
- 21.Ifrim, G., Shi, B., Brigadir, I.: Event detection in twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014, ACM (2014)
- 22.Zhao, S., Gao, Y., Ding, G., Chua, T.-S.: Real-time multimedia social event detection in microblog. IEEE Transactions on Cybernetics, 3218–3231 (2017) [DOI] [PubMed]
- 23.Zhang C, Lei D, Yuan Q, Zhuang H, Kaplan L, Wang S, Han J. Geoburst+: Effective and real-time local event detection in geo-tagged tweet streams. ACM Transactions on Intelligent Systems and Technology (TIST) 2018;9(3):34. [Google Scholar]
- 24.Hossny, A.H., Mitchell, L.: Event detection in twitter: A keyword volume approach. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 1200–1208 (2018)
- 25.Choi H-J, Park CH. Emerging topic detection in twitter stream based on high utility pattern mining. Expert Syst. Appl. 2019;115:27–36. doi: 10.1016/j.eswa.2018.07.051. [DOI] [Google Scholar]
- 26.Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving lda topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 889–892 (2013)
- 27.Zhou X, Chen L. Event detection over twitter social media streams. The VLDB journal. 2014;23(3):381–400. doi: 10.1007/s00778-013-0320-3. [DOI] [Google Scholar]
- 28.Jin D, Liu D-Y, Yang B, Liu J, He D-X, Tian Y. Fast complex network clustering algorithm using local detection. Dianzi Xuebao(Acta Electronica Sinica) 2011;39(11):2540–2546. [Google Scholar]
- 29.Cruz JD, Bothorel C, Poulet F. Community detection and visualization in social networks: Integrating structural and semantic information. ACM Transactions on Intelligent Systems and Technology (TIST) 2013;5(1):11. [Google Scholar]
- 30.Nguyen, T., Phung, D., Adams, B., Tran, T., Venkatesh, S.: Hyper-community detection in the blogosphere. In: Proceedings of second ACM SIGMM workshop on Social media, ACM, pp 21–26 (2010)
- 31.Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD workshop, 8, p 2008 (2008)
- 32.Hashimoto T, OKAMOTO TetsujiKUBOYAMAbHiroshi, SHIN K. Topic extraction from millions of tweets based on community detection in bipartite networks. Information Modelling and Knowledge Bases XXIX. 2018;301:395. [Google Scholar]
- 33.Girvan M, Newman MarkEJ. Community structure in social and biological networks. Proceedings of the national academy of sciences. 2002;99(12):7821–7826. doi: 10.1073/pnas.122653799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Newman MarkEJ, Girvan M. Finding and evaluating community structure in networks. Physical review E. 2004;69(2):026113. doi: 10.1103/PhysRevE.69.026113. [DOI] [PubMed] [Google Scholar]
- 35.Newman MarkEJ. Fast algorithm for detecting community structure in networks. Physical review E. 2004;69(6):066133. doi: 10.1103/PhysRevE.69.066133. [DOI] [PubMed] [Google Scholar]
- 36.Clauset A, Newman MarkEJ, Moore C. Finding community structure in very large networks. Physical review E. 2004;70(6):066111. doi: 10.1103/PhysRevE.70.066111. [DOI] [PubMed] [Google Scholar]
- 37.Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment. 2008;2008(10):P10008. doi: 10.1088/1742-5468/2008/10/P10008. [DOI] [Google Scholar]
- 38.Waltman L, VanEck NJ. A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B. 2013;86(11):471. doi: 10.1140/epjb/e2013-40829-0. [DOI] [Google Scholar]
- 39.Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435(7043):814. doi: 10.1038/nature03607. [DOI] [PubMed] [Google Scholar]
- 40.Kumpula JM, Kivelä M, Kaski K, Saramäki J. Sequential algorithm for fast clique percolation. Phys. Rev. E. 2008;78(2):026109. doi: 10.1103/PhysRevE.78.026109. [DOI] [PubMed] [Google Scholar]
- 41.Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. arXiv preprint arXiv:1002.1827 (2010)
- 42.Gregory, S.: Finding overlapping communities using disjoint community detection algorithms. In: Complex networks, Springer, pp 47–61 (2009)
- 43.Xie, J., Szymanski, B.K.: Towards linear time overlapping community detection in social networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 25–36 (2012)
- 44.Xing, Y., Meng, F., Zhou, Y., Zhu, M., Shi, M., Sun, G.: A node influence based label propagation algorithm for community detection in networks. Sci. World J. 2014, 1–13 (2014) [DOI] [PMC free article] [PubMed]
- 45.Liu W, Jiang X, Pellegrini M, Wang X. Discovering communities in complex networks by edge label propagation. Scientific reports. 2016;6:22470. doi: 10.1038/srep22470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Gui Q, Deng R, Xue P, Cheng X. A community discovery algorithm based on boundary nodes and label propagation. Pattern Recogn. Lett. 2018;109:103–109. doi: 10.1016/j.patrec.2017.12.018. [DOI] [Google Scholar]
- 47.Cheng X, Yan X, Lan Y, Guo J. Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 2014;26(12):2928–2941. doi: 10.1109/TKDE.2014.2313872. [DOI] [Google Scholar]
- 48.Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 16–22 (1999)
- 49.Rand WM. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association. 1971;66(336):846–850. doi: 10.1080/01621459.1971.10482356. [DOI] [Google Scholar]
