Klasifikasi Ujaran Kebencian Menggunakan 5-Fold Ensemble dengan Weighted Probability Averaging pada Arsitektur Twitter-RoBERTa
Abstract
Social media platforms have become a critical medium for hate speech propagation at unprecedented scale, with over 66.8 million user reports regarding hateful conduct recorded on platform X during the first half of 2024 alone. This study proposes an end-to-end NLP pipeline for automated hate speech classification using the domain-adapted Twitter-RoBERTa architecture, evaluated on the HASOC (Hate Speech and Offensive Content Identification) English datasets from 2020 and 2021. The core challenge addressed is Transformer fine-tuning instability on relatively small annotated corpora caused by extreme sensitivity to random seed initialization and suboptimal hyperparameter configurations. Three methodological innovations are synergistically integrated: (1) Bayesian Optimization via the Optuna framework for automated adaptive hyperparameter search with 15 trials; (2) Stratified 5-Fold Cross-Validation for robust, reproducible data partitioning; and (3) Weighted Probability Averaging (WPA) as the ensemble aggregation strategy. Results demonstrate that the proposed architecture achieves a Macro F1-Score of 80.99% on Subtask 1A and 64.70% on Subtask 1B, positioning it competitively against 65 international research teams on the official HASOC 2021 leaderboard.
Downloads
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Klasifikasi Ujaran Kebencian Menggunakan 5-Fold Ensemble dengan Weighted Probability Averaging pada Arsitektur Twitter-RoBERTa
Pages: 199-208
Copyright (c) 2026 Ibra Sahrian Alsa, Surya Agustian, Fitra Kurnia, Pizaini Pizaini, Siska Kurnia Gusti

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).


