Statistical Analysis of Automatic Seed Word Acquisition to Improve Harmful Expression Extraction in Cyberbullying Detection


  • Suzuha Hatakeyama
  • Fumito Masui
  • Michal Ptaszynski
  • Kazuhide Yamamoto


cyberbullying, information extraction, text mining, seed word, SO-PMI-IR


We study the social problem of cyberbullying, defined as a new form of bullying that takes place in the Internet space. This paper proposes a method for automatic acquisition of seed words to improve performance of the original method for the cyberbullying detection by Nitta et al. [1]. We conduct an experiment exactly in the same settings to find out that the method based on a Web mining technique, lost over 30% points of its performance since being proposed in 2013. Thus, we hypothesize on the reasons for the decrease in the performance and propose a number of improvements, from which we experimentally choose the best one. Furthermore, we collect several seed word sets using different approaches, evaluate and their precision. We found out that the influential factor in extraction of harmful expressions is not the number of seed words, but the way the seed words were collected and filtered.


How to Cite

S. Hatakeyama, F. Masui, M. Ptaszynski, and K. Yamamoto, “Statistical Analysis of Automatic Seed Word Acquisition to Improve Harmful Expression Extraction in Cyberbullying Detection”, Int. j. eng. technol. innov., vol. 6, no. 2, pp. 165–172, Apr. 2016.