People Entity Recognition in Indonesian Alquran Translation using Roberta


  • Aufa Mutia * Mail Telkom University, Bandung, Indonesia
  • Moch Arif Bijaksana Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: People Entity; Quran; Indonesian; NER; RoBERTa

Abstract

The Quran was revealed in Arabic, which has a complex linguistic structure, a unique writing system, and intricate grammar, making it challenging to understand. Therefore, understanding and interpreting the Quran is a primary goal for Muslims. To comprehend the teachings contained in the Quran, Muslims need an understanding of the human entities mentioned in it. However, manually labeling human entities in the Quran can be a complex task prone to errors. The aim of this research is to facilitate the process of labeling human entities in Quranic texts by building a model with good performance. RoBERTa is a Named Entity Recognition (NER) model that is an extension of BERT, trained with enhanced training methodologies. This study focuses on the use of the RoBERTa model to identify human entities in the translated text of the Quran in Bahasa Indonesia. The input to this system consists of translated Quranic sentences, which are then processed by the model to generate output in the form of predicted labels for those sentence entities. The model is constructed by utilizing a dataset from the Tanzil Quran corpus, covering chapters 1 to 6. Data preprocessing involves punctuation removal, tokenization, and case folding. The dataset is divided into training data (80%) and testing data (20%). The RoBERTa model is trained with hyperparameters such as epochs, learning rate, and batch size. Evaluation is performed using metrics such as Precision, Recall, and F-Score on the testing data. The evaluation results of the constructed RoBERTa model show an F-Score value of 52%. This score is not better compared to the BERT model, indicating that the RoBERTa model tends to have inferior performance in identifying human entities in the translated text of the Quran.

Downloads

Download data is not yet available.

References

M. N. Annisa and R. Safii, “Analisis Kebutuhan Belajar Bahasa Arab sebagai Bahasa Asing dalam Konteks Pendidikan Tinggi,” ELOQUENCE : Journal of Foreign Language, vol. 2, no. 2, pp. 313–328, Aug. 2023, doi: 10.58194/eloquence.v2i2.861.

C. K. Dagli, M. M. Dakake, and J. E. B. Lumbard, “The Study Quran : A New Translation with Notes and Commentary,” 2017. [Online]. Available: http://www.harpercollins.com

Permana G and Naelin Najihah B, “A Comparative Study of the Methods of Interpretation of Verses about Satan in Tafsir Al-Manar,” Journal of ‘Ulūm al-Qur’ān and Tafsīr Studies, vol. 2, no. 2, pp. 51–74, 2023, doi: 10.54801/juquts.v2i2.213.

Bayan A, “Studi Kritis Terjemah Al-Qur’an Depag RI (Tela’ah Terhadap Ayat-Ayat Mutasayabihat dan Aqidah Dalam Perspektif NU),” Jurnal Syntax Admiration, vol. 1, pp. 638–655, 2020.

K. Andesa, “Super Agent Chatbot ‘3S’ Sebagai Media Informasi Menggunakan Metoda Natural Language Processing (NLP),” Jurnal Teknologi dan Open Source, pp. 53–64, 2019, [Online]. Available: http://www.sar.ac.id

H. Eka Rosyadi, F. Amrullah, R. David Marcus, and R. Rahman Affandi, “Rancang Bangun Chatbot Informasi Lowongan Pekerjaan Berbasis Whatsapp dengan Metode NLP (Natural Language Processing),” BRILIANT: Jurnal Riset dan Konseptual, vol. 5, 2020, doi: 10.28926/briliant.

A. H. Tantri and N. A. Rakhmawati, “Ekstraksi Informasi Semantik dan Spatiotemporal pada Artikel Online Terkait Bencana di Indonesia,” JURNAL SISTEM INFORMASI BISNIS, vol. 10, no. 1, pp. 114–121, Aug. 2020, doi: 10.21456/vol10iss1pp114-121.

P. J. Gorinski et al., “Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches,” ArXiv, 2019, [Online]. Available: http://arxiv.org/abs/1903.03985

Giarsyani N, Hidayatullah A, and Rahmadi R, “Komparasi Algoritma Machine Learning dan Deep Learning Untuk Named Entity Recognition : Studi Kasus Data Kebencanaan,” Jurnal Informatika & Rekayasa Elektronika, vol. 3, pp. 48–57, 2020.

H. L. Chieu and H. T. Ng, “Named Entity Recognition with a Maximum Entropy Approach,” 2003.

J. Li, A. Sun, J. Han, and C. Li, “A Survey on Deep Learning for Named Entity Recognition,” IEEE Trans Knowl Data Eng, vol. 34, no. 1, pp. 50–70, Jan. 2022, doi: 10.1109/TKDE.2020.2981314.

R. Diah, A. Ningtias, and M. Arif Bijaksana, “People Entity Recognition for the English Quran Translation using BERT,” Jurnal Media Informatika Budidarma, 2023, doi: 10.30865/mib.v7i1.5586.

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” ArXiv, Jul. 2019, [Online]. Available: http://arxiv.org/abs/1907.11692

Haidir M and Purwarianti A, “Short Answer Grading Using Contextual Word Embedding and Linear Regression,” 2020.

Setiawan M, Romadhony A, and Hasmawati, “Ekstraksi Informasi Beasiswa dari Media Sosial menggunakan BiLSTM-CRF,” 2023.

S. Akrah, “DuluthNLP at SemEval-2021 Task 7: Fine-Tuning RoBERTa Model for Humor Detection and Offense Rating,” 2021. [Online]. Available: https://github.com/akrahdan/

Rismayati and Luthfiarta A, “VGG16 Transfer Learning Architecture for Salak Fruit Quality Classification,” Jurnal Informatika dan Teknologi Informasi, vol. 18, no. 1, pp. 37–48, 2021, doi: 10.31515/telematika.v18i1.4025.

Y. A. Suwitono and F. J. Kaunang, “Implementasi Algoritma Convolutional Neural Network (CNN) Untuk Klasifikasi Daun Dengan Metode Data Mining SEMMA Menggunakan Keras,” Jurnal Komtika (Komputasi dan Informatika), vol. 6, no. 2, pp. 109–121, Nov. 2022, doi: 10.31603/komtika.v6i2.8054.

L. Marifatul Azizah, S. Fadillah Umayah, and F. Fajar, “Deteksi Kecacatan Permukaan Buah Manggis Menggunakan Metode Deep Learning dengan Konvolusi Multilayer,” Semesta Teknika, vol. 21, no. 2, 2018, doi: 10.18196/st.212229.

D. I. Dzidny, M. A. Bijaksana, and K. M. Lhaksmana, “Supervised Learning Approaches for Nested People Entity Extraction in Indonesian Translated Quran,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, Jun. 2022, doi: 10.47065/bits.v4i1.1758.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel People Entity Recognition in Indonesian Alquran Translation using Roberta

Dimensions Badge
Article History
Submitted: 2024-01-16
Published: 2024-01-27
Abstract View: 569 times
PDF Download: 407 times
How to Cite
Mutia, A., & Bijaksana, M. A. (2024). People Entity Recognition in Indonesian Alquran Translation using Roberta. Journal of Information System Research (JOSH), 5(2), 648-656. https://doi.org/10.47065/josh.v5i2.4838
Section
Articles