Klasifikasi Ujaran Kebencian Menggunakan 5-Fold Ensemble dengan Weighted Probability Averaging pada Arsitektur Twitter-RoBERTa


  • Ibra Sahrian Alsa * Mail Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Surya Agustian Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Fitra Kurnia Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Pizaini Pizaini Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • Siska Kurnia Gusti Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia
  • (*) Corresponding Author
Keywords: Hate Speech Detection; Twitter-RoBERTa; 5-Fold Ensemble; Weighted Probability Averaging; Bayesian Optimization

Abstract

Social media platforms have become a critical medium for hate speech propagation at unprecedented scale, with over 66.8 million user reports regarding hateful conduct recorded on platform X during the first half of 2024 alone. This study proposes an end-to-end NLP pipeline for automated hate speech classification using the domain-adapted Twitter-RoBERTa architecture, evaluated on the HASOC (Hate Speech and Offensive Content Identification) English datasets from 2020 and 2021. The core challenge addressed is Transformer fine-tuning instability on relatively small annotated corpora caused by extreme sensitivity to random seed initialization and suboptimal hyperparameter configurations. Three methodological innovations are synergistically integrated: (1) Bayesian Optimization via the Optuna framework for automated adaptive hyperparameter search with 15 trials; (2) Stratified 5-Fold Cross-Validation for robust, reproducible data partitioning; and (3) Weighted Probability Averaging (WPA) as the ensemble aggregation strategy. Results demonstrate that the proposed architecture achieves a Macro F1-Score of 80.99% on Subtask 1A and 64.70% on Subtask 1B, positioning it competitively against 65 international research teams on the official HASOC 2021 leaderboard.

Downloads

Download data is not yet available.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Klasifikasi Ujaran Kebencian Menggunakan 5-Fold Ensemble dengan Weighted Probability Averaging pada Arsitektur Twitter-RoBERTa

Dimensions Badge
Article History
Published: 2026-06-22
Abstract View: 0 times
PDF Download: 0 times
How to Cite
Alsa, I., Agustian, S., Kurnia, F., Pizaini, P., & Gusti, S. (2026). Klasifikasi Ujaran Kebencian Menggunakan 5-Fold Ensemble dengan Weighted Probability Averaging pada Arsitektur Twitter-RoBERTa. Bulletin of Data Science, 5(3), 199-208. https://doi.org/10.47065/bulletinds.v5i3.10145
Issue
Section
Articles

Most read articles by the same author(s)

1 2 > >>