Linguistics Colloquium: Detecting and Analyzing Cybercrime in Text-based Communication of Cybercriminal Networks Through Computational Linguistic and Psycholinguistic Feature Modeling

A talk by Alex V Mbaziira, a PhD IT candidate in the Volgenau School of Engineering at George Mason University.

Thursday, April 6, 2017 7:30 PM to 9:00 PM EDT
Research Hall, #163

Cybercriminals are increasingly using Internet-based text messaging applications to exploit their victims. Incidents of deceptive cybercrime in text-based communication are increasing and include fraud, scams, as well as favorable and unfavorable fake online reviews from e-commerce websites. I use a text-based deception detection approach to train models for detecting text-based deceptive cybercrime in native and non-native English-speaking cybercriminal networks. I use both computational linguistic and psycholinguistic features for my models to study four types of deceptive text-based cybercrime: fraud, scams, favorable and unfavorable fake online reviews. The data is obtained from three web genres namely: email, websites and social media. I build 1-dataset non-hybrid models as well as two types of hybrid models for native and non-native English speaking cybercriminal networks: 2-dataset and 3-dataset hybrid models. I use Naïve Bayes, Support Vector Machines and kth Nearest Neighbor to train and test all the models. All the 1-dataset non-hybrid models are trained on data from one web genre and then used to detect and analyze other types of cybercrime in other web genres that are not part of the training set. Furthermore, all the 2-dataset hybrid models are trained on data combined from two web genres and then used to detect cybercrime in other web genres that are not part of the training set. Further still, the 3-dataset models are trained on every triplet of data in three web genres and used to detect and analyze cybercrime in the web genre which was not part of the training set.

Performance of the models on test datasets ranges from 60% to 80% accuracy with best performance on detection of fraud and unfavorable online reviews. There were notable differences in models in detecting and analyzing scams in both native and non-native English speaking cybercriminal networks. This work can be applied as provider- or user-based filtering tools to identify cybercriminal actors and block or label messages before they reach their intended audience.

Directions

Campus Maps and Directions