Lexicon Enriched Hybrid Hate Speech Detection with Human-Centered Explanations

Abstract

The phenomenon of hate messages on the web is unfortunately in continuous expansion and evolution. Even if the big companies that offer their users a social network service have expressly included in their terms of services rules against hate messages, they are still produced at a huge rate. Therefore, moderators are often employed to monitor these platforms and use their critical skills to decide if the content is offensive or not. Unfortunately, this censorship process is complex and costly in terms of human resources. The system we propose in this work is a system that supports moderators by providing them a set of candidate elements to censor with annexed explanations in natural language. It will then be a task of the human operator to understand if to proceed with the censorship and eventually supply feedback to the result of the classification algorithm to extend its data set of examples and improve its future performances. The proposed system has been designed to merge information coming from data, syntactic tags and a manually annotated lexicon. The messages are then processed through deep learning approaches based on both transformer and deep neural network architecture. The output is consequently supported by an explanation in a human-like form. The model has been evaluated on three state-of-the-art datasets showing excellent effectiveness and clear and understandable explanations.

Publication
Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization