TY - GEN
T1 - ProperBERT - Proactive Recognition of Offensive Phrasing for Effective Regulation
AU - Diesenreiter, Clara
AU - Krauss, Oliver
AU - Sandler, Simone
AU - Stöckl, Andreas
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - This work discusses and contains content that may be offensive or unsettling. Hateful communication has always been part of human interaction, even before the advent of social media. Nowadays, offensive content is spreading faster and wider through digital communication channels. To help improve regulation of hate speech, we introduce ProperBERT, a fine-tuned BERT model for hate speech and offensive language detection specific to English. To ensure the portability of our model, five data sets from literature were combined to train ProperBERT. The pooled dataset contains racist, homophobic, misogynistic and generally offensive statements. Due to the variety of statements, which differ mainly in the target the hate is aimed at and the obviousness of the hate, a sufficiently robust model was trained. ProperBERT shows stability on data sets that have not been used for training, while remaining efficiently usable due to its compact size. By performing portability tests on data sets not used for fine-tuning, it is shown that fine-tuning on large scale and varied data leads to increased model portability.
AB - This work discusses and contains content that may be offensive or unsettling. Hateful communication has always been part of human interaction, even before the advent of social media. Nowadays, offensive content is spreading faster and wider through digital communication channels. To help improve regulation of hate speech, we introduce ProperBERT, a fine-tuned BERT model for hate speech and offensive language detection specific to English. To ensure the portability of our model, five data sets from literature were combined to train ProperBERT. The pooled dataset contains racist, homophobic, misogynistic and generally offensive statements. Due to the variety of statements, which differ mainly in the target the hate is aimed at and the obviousness of the hate, a sufficiently robust model was trained. ProperBERT shows stability on data sets that have not been used for training, while remaining efficiently usable due to its compact size. By performing portability tests on data sets not used for fine-tuning, it is shown that fine-tuning on large scale and varied data leads to increased model portability.
KW - Training
KW - Mechatronics
KW - Social networking (online)
KW - Computational modeling
KW - Hate speech
KW - Speech recognition
KW - Digital communication
KW - BERT
KW - hate speech detection
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85146440229&partnerID=8YFLogxK
U2 - 10.1109/ICECCME55909.2022.9987933
DO - 10.1109/ICECCME55909.2022.9987933
M3 - Conference contribution
SN - 978-1-6654-7096-4
T3 - International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2022
SP - 1
EP - 6
BT - International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2022
Y2 - 16 November 2022 through 18 November 2022
ER -