Detection of cyber bullying on social media networks

Priyangika. S.; Jayalal, S.

UoK Repository Home
→
Science
→
Symposia & Conferences
→
International Research Symposium on Pure and Applied Sciences (IRSPAS)
→
IRSPAS 2016
→
View Item

Detection of cyber bullying on social media networks

Priyangika. S.; Jayalal, S.

URI: http://repository.kln.ac.lk/handle/123456789/15746

Citation: Priyangika. S. and Jayalal, S. 2016. Detection of cyber bullying on social media networks. In Proceedings of the International Research Symposium on Pure and Applied Sciences (IRSPAS 2016), Faculty of Science, University of Kelaniya, Sri Lanka. p 90.

Date: 2016

Abstract:

Social Media is becoming an integral part of people’s daily lives today. It is an effective way of sharing one’s life experiences, special occasions, achievements and other events with their friends and family. Although it is a fruitful way to communicate with groups, some people find themselves being insulted or offended by others who are involved in certain post or conversations. These insulations can be based on racism, using profanity or any other vulgar or lewd language. This cyber bullying needs to be monitored and controlled by the social media site owners since it will highly effect on the number and safety of the active site membership. Currently, there is no automated process of identifying offensive comments by the social network site itself. It can be only diagnosed by humans after reading the comments, flagging or reporting them to the owner of the site or blocking the offender. Considering the massive big data set generated in social media daily, automatically detection of offensive statements is required to reduce insulation effectively. For this purpose, text classification approach can be applied where a given text will be categorized as insulting or not, through learning from a pre-learned model. In order to develop the model, data was collected from the popular data repository site named www.kaggle.com. The dataset consists of comments posted on Facebook and Twitter. Firstly the dataset was divided into training data set and test data set. Then the collected data was preprocessed by removing the unwanted strings, correcting words and eliminating duplicate data fields. In the next step, features or keywords were extracted which are qualified to distinguish a statement as ‘insulting’ using N-grams model and counting methods. Feature selection is done using Chi- Squared test and finally apply classification algorithms for separating insulting comments and non-insulting comments from a dataset given. Machine learning algorithms such as Support Vector Machines (SVM), Naïve Bayes, Logistic Regression and Random Forest are used for this. Out of the classification algorithms, SVM is to be performed better than other algorithms since this is a two-class classification problem and a comment is to be classified only into two separate classes which are ‘insulting’ and ‘neutral’. With an exact separation of a given comment into ‘insulting’ and ‘neutral’ category, cyberbullying happening through offensive comments posted on social media sites can be detected.

Show full item record