A Fake Profile Detection Model Using Multistage Stacked Ensemble Classification

Swetha Chikkasabbenahalli Venkatesh; Sibi Shaji; Balasubramanian Meenakshi Sundaram

doi:10.46604/peti.2024.13200

Authors

Swetha Chikkasabbenahalli Venkatesh School of Computational Sciences & IT, Garden City University, Bangalore, India
Sibi Shaji School of Computational Sciences & IT, Garden City University, Bangalore, India
Balasubramanian Meenakshi Sundaram Department of Computer Science & Engineering, New Horizon College of Engineering, Bangalore, India

DOI:

https://doi.org/10.46604/peti.2024.13200

Keywords:

fake profile, online social networks, stacked ensemble, imbalanced dataset, cost-sensitive learning

Abstract

Fake profile identification on social media platforms is essential for preserving a reliable online community. Previous studies have primarily used conventional classifiers for fake account identification on social networking sites, neglecting feature selection and class balancing to enhance performance. This study introduces a novel multistage stacked ensemble classification model to enhance fake profile detection accuracy, especially in imbalanced datasets. The model comprises three phases: feature selection, base learning, and meta-learning for classification. The novelty of the work lies in utilizing chi-squared feature-class association-based feature selection, combining stacked ensemble and cost-sensitive learning. The research findings indicate that the proposed model significantly enhances fake profile detection efficiency. Employing cost-sensitive learning enhances accuracy on the Facebook, Instagram, and Twitter spam datasets with 95%, 98.20%, and 81% precision, outperforming conventional and advanced classifiers. It is demonstrated that the proposed model has the potential to enhance the security and reliability of online social networks, compared with existing models.

References

N. Thakur, “Social Media Mining and Analysis: A Brief Review of Recent Challenges,” Information, vol. 14, no. 9, article no. 484, September 2023.

P. Wanda, “RunMax: Fake Profile Classification Using Novel Nonlinear Activation in CNN,” Social Network Analysis and Mining, vol. 12, no. 1, article no. 158, December 2022.

R. Kaur, S. Singh, and H. Kumar, “Rise of Spam and Compromised Accounts in Online Social Networks: A State-of-the-Art Review of Different Combating Approaches,” Journal of Network and Computer Applications, vol. 112, pp. 53-88, June 2018.

B. Drury, S. M. Drury, M. A. Rahman, and I. Ullah, “A Social Network of Crime: A Review of the Use of Social Networks for Crime and the Detection of Crime,” Online Social Networks and Media, vol. 30, article no. 100211, July 2022.

C. Zhao, Y. Xin, X. Li, Y. Yang, and Y. Chen, “A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data,” Applied Sciences, vol. 10, no. 3, article no. 936, February 2020.

A. Hassan, A. G. I. Alhalangy, and F. Alzahrani, “Fake Accounts Identification in Mobile Communication Networks Based on Machine Learning,” International Journal of Interactive Mobile Technologies, vol. 17, no. 04, pp. 64-74, February 2023.

S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature Selection Using an Improved Chi-Square for Arabic Text Classification,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 2, pp. 225-231, February 2020.

K. R. Purba, D. Asirvatham, and R. K. Murugesan, “Classification of Instagram Fake Users Using Supervised Machine Learning Algorithms,” International Journal of Electrical and Computer Engineering, vol. 10, no. 3, pp. 2763-2772, June 2020.

Y. Elyusufi, Z. Elyusufi, and M. H. A. Kbir, “Social Networks Fake Profiles Detection Based on Account Setting and Activity,” Proceedings of the 4th International Conference on Smart City Applications, pp. 1-5, October 2019.

M. S. Karakaşlı, M. A. Aydin, S. Yarkan, and A. Boyaci, “Dynamic Feature Selection for Spam Detection in Twitter,” International Telecommunications Conference: Lecture Notes in Electrical Engineering, vol. 504, pp. 239-250, 2019.

J. Liang, P. Jin, L. Mu, and J. Zhao, “Detecting Spammers from Hot Events on Microblog Platforms: An Experimental Study,” The 32nd International Conference on Software Engineering and Knowledge Engineering, pp. 445-450, July 2020.

A. Mughaid, I. Obeidat, S. AlZu’bi, E. A. Elsoud, A. Alnajjar, A. R. Alsoud, et al., “A Novel Machine Learning and Face Recognition Technique for Fake Accounts Detection System on Cyber Social Networks,” Multimedia Tools and Applications, vol. 82, no. 17, pp. 26353-26378, July 2023.

A. Sallah, E. A. A. Alaoui, and S. Agoujil, “Interpretability Based Approach to Detect Fake Profiles in Instagram,” International Conference on Networking, Intelligent Systems and Security: Lecture Notes on Data Engineering and Communications Technologies, vol. 147, pp. 306-314, 2022.

F. C. Akyon and M. E. Kalfaoglu, “Instagram Fake and Automated Account Detection,” Innovations in Intelligent Systems and Applications Conference, pp. 1-7, October-November 2019.

M. Aljabri, R. Zagrouba, A. Shaahid, F. Alnasser, A. Saleh, and D. M. Alomari, “Machine Learning-Based Social Media Bot Detection: A Comprehensive Literature Review,” Social Network Analysis and Mining, vol. 13, no. 1, article no. 20, December 2023.

A. Sallah, E. A. Abdellaoui Alaoui, S. Agoujil, and A. Nayyar, “Machine Learning Interpretability to Detect Fake Accounts in Instagram,” International Journal of Information Security and Privacy, vol. 16, no. 1, pp. 1-25, 2022.

I. Aydin, M. Sevi, and M. U. Salur, “Detection of Fake Twitter Accounts with Machine Learning Algorithms,” Proceedings of International Conference on Artificial Intelligence and Data Processing (IDAP), pp. 1-4, September 2018.

M. B. Albayati and A. M. Altamimi, “Identifying Fake Facebook Profiles Using Data Mining Techniques,” Journal of ICT Research and Applications, vol. 13, no. 2, pp. 107-117, September 2019.

S. R. Sahoo and B. B. Gupta, “Fake Profile Detection in Multimedia Big Data on Online Social Networks,” International Journal of Information and Computer Security, vol. 12, no. 2-3, pp. 303-331, 2020.

A. Dey, H. Reddy, M. Dey, and N. Sinha, “Detection of Fake Accounts in Instagram Using Machine Learning,” AIRCC’s International Journal of Computer Science and Information Technology, vol. 11, no. 5, pp. 83-90, October 2019.

K. Kaushik, A. Bhardwaj, M. Kumar, S. K. Gupta, and A. Gupta, “A Novel Machine Learning‐Based Framework for Detecting Fake Instagram Profiles,” Concurrency and Computation: Practice and Experience, vol. 34, no. 28, article no. e7349, December 2022.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

C. V. Swetha, S. Shaji, and B. M. Sundaram, “Feature Selection Using Chi-Squared Feature-Class Association Model for Fake Profile Detection in Online Social Networks,” The 3rd International Conference on Advanced Computing and Intelligent Technologies, article no. 24, December 2023.

J. Yan and S. Han, “Classifying Imbalanced Data Sets by a Novel Re-Sample and Cost-Sensitive Stacked Generalization Method,” Mathematical Problems in Engineering, vol. 2018, article no. 5036710, January 2018.

P. Sterner, D. Goretzko, and F. Pargent, “Everything Has Its Price: Foundations of Cost-Sensitive Machine Learning and Its Application in Psychology,” Psychological Methods, in press. https://doi.org/10.1037/met0000586

P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive,” Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155-164, August 1999.

N. Ghatasheh, H. Faris, I. AlTaharwa, Y. Harb, and A. Harb, “Business Analytics in Telemarketing: Cost-Sensitive Analysis of Bank Campaigns Using Artificial Neural Networks,” Applied Sciences, vol. 10, no. 7, article no. 2581, April 2020.

C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, et al., “A Performance Evaluation of Machine Learning-Based Streaming Spam Tweets Detection,” IEEE Transactions on Computational Social Systems, vol. 2, no. 3, pp. 65-76, September 2015.

W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan, “AdaCost: Misclassification Cost-Sensitive Boosting,” Proceedings of the Sixteenth International Conference on Machine Learning, pp. 97-105, June 1999.

A. Sze-To and A. K. C. Wong, “A Weight-selection Strategy on Training Deep Neural Networks for Imbalanced Classification,” International Conference Image Analysis and Recognition: Lecture Notes in Computer Science, vol. 10317, pp. 3-10, 2017.