Difficulty Level Identification of Indonesian and Mathematics Multiple Choice Questions using Machine Learning Approach

Shabrina Retno Ningsih; Ade Romadhony

doi:10.47065/bits.v5i1.3649

Shabrina Retno Ningsih * Telkom University, Bandung, Indonesia
Ade Romadhony Telkom University, Bandung, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v5i1.3649

Keywords: Text Classification; Prediction; Machine Learning; Deep Learning

Abstract

Examination question design is an important factor that could improve education, which could help teachers to analyze student understandings. Designing question should consider difficulty level, which commonly classified into three types: easy, medium, difficult. Predicting the difficulty level of questions is very important to help teachers form questions and know the level of student ability. In this study, we tackle question difficulty level identification as a classification problem. We use a dataset of Indonesian and mathematic question from elementary and junior or school exercise questions set and employ several machine learning methods on classification. We use Random Forest, Logistic Regression, SVM, Gaussian, and Dense NN on the experiment, with embeddings, lexical, and syntactic feature. The evaluation result shows that the best method on identifying question difficult level on Indonesian subject is Random Forest with 83% accuracy, while on mathematic subject the best method is Random Forest with 83% accuracy. Result analysis shows that embedding feature affect the model accuracy.

Downloads

Download data is not yet available.

References

T. Kesulitan et al., “Difficulty Level of Test Made by Teachers in the Field of Mathematics Subject according to Classical Test Theory at the Junior High School Level in Baubau City,” Jurnal Akademik Pendidikan Matematika, vol. 8, no. 1, pp. 33–40, May 2022, doi: 10.55340/JAPM.V8I1.699.

P. Mayadewi and E. Rosely, “PREDIKSI NILAI PROYEK AKHIR MAHASISWA MENGGUNAKAN ALGORITMA KLASIFIKASI DATA MINING”, Accessed: Jun. 06, 2023. [Online]. Available: https://www.researchgate.net/publication/283570705

L. A. Ha, V. Yaneva, P. Baldwin, and J. Mee, “Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam,” ACL 2019 - Innovative Use of NLP for Building Educational Applications, BEA 2019 - Proceedings of the 14th Workshop, pp. 11–20, 2019, doi: 10.18653/V1/W19-4402.

M. A. Byrd and S. Srivastava, “Predicting Difficulty and Discrimination of Natural Language Questions,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 119–130, 2022, doi: 10.18653/V1/2022.ACL-SHORT.15.

A. pramono, “Analisis Butir Soal Pilihan Ganda Menggunakan Fuzzy Berdasar Data Learning Management System Studi Kasus: SMK Negeri 2 Kediri,” Cahaya Tech, vol. 7, no. 01, 2018.

H. Nalatissifa et al., “Perbandingan Kinerja Algoritma Klasifikasi Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi Ketidakhadiran di Tempat Kerja,” vol. 5, no. 4, pp. 2622–4615, 2020, doi: 10.32493/informatika.v5i4.7575.

V. Th Stergiopoulos, T. V Tsianaka, and E. N. Tousidou, “AMiner Citation-Data Preprocessing for Recommender Systems on Scien-tific Publications,” 2021, doi: 10.1145/3503823.3503828.

F. Rahutomo et al., “EVALUASI FITUR WORD2VEC PADA SISTEM UJIAN ESAI ONLINE,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 4, no. 1, pp. 36–45, Jun. 2019, doi: 10.29100/JIPI.V4I1.1098.

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal Tekno Kompak, vol. 14, no. 2, pp. 74–79, Aug. 2020, doi: 10.33365/JTK.V14I2.732.

M. A. K. Haliday and C. M. I. M. Matthiessen, “An Introduction to Functional Grammar,” 2014.

Y. Purnamasari, “Keterbacaan Teks Kesehatan dalam Website WHO pada Masa Pandemi Covid-19,” Alinea: Jurnal Bahasa, Sastra, dan Pengajaran, vol. 10, no. 2, pp. 94–105, Oct. 2021, Accessed: Jun. 06, 2023. [Online]. Available: https://jurnal.unsur.ac.id/ajbsi/article/view/1479

F. Rozi, F. Rozi, F. Sukmana, and M. N. Adani, “Pengelompokkan Judul Buku dengan Menggunakan Algoritma K-Nearest Neighbor (K-NN) dan Term Frequency – Inverse Document Frequency (TF-IDF),” JIMP (Jurnal Informatika Merdeka Pasuruan), vol. 6, no. 3, Dec. 2021, doi: 10.37438/jimp.v6i3.346.

Z. Rakhmawati, S. Basuki, and G. W. Wicaksono, “Klasifikasi Kalimat Tanya Berdasarkan Taksonomi Bloom Menggunakan Support Vector Machine,” Jurnal Repositor, vol. 2, no. 4, pp. 427–436, Mar. 2020, doi: 10.22219/REPOSITOR.V2I4.69.

Y. Lei, “Application of Random Forest Prediction Technology in the Management of Public Opinion Events in Colleges and Universities”, doi: 10.1145/3495018.3501216.

V. Wanika Siburian, J. Sistem Komputer Universitas Sriwijaya Palembang, and I. Elvina Mulyana, “Prediksi Harga Ponsel Menggunakan Metode Random Forest,” Prosiding Annual Research Seminar, 2018.

U. Erdiansyah, A. I. Lubis, and K. Erwansyah, “Komparasi Metode K-Nearest Neighbor dan Random Forest Dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 6, no. 1, pp. 208–214, Jan. 2022, doi: 10.30865/MIB.V6I1.3373.

V. Vapnik, S. Golowich, and A. Smola, “Support Vector Method for Function Approximation, Regression Estimation and Signal Processing,” Adv Neural Inf Process Syst, vol. 9, 1996.

Y. Ding and M. Mcculloch, “Additive Gaussian process prediction for electrical loads compared with deep learning models,” 2021, doi: 10.1145/3447555.3466592.

F. Handayani et al., “Komparasi Support Vector Machine, Logistic Regression Dan Artificial Neural Network Dalam Prediksi Penyakit Jantung,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 7, no. 3, pp. 329–334, Dec. 2021, doi: 10.26418/JP.V7I3.48053.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.

I. Markoulidakis, G. Kopsiaftis, I. Rallis, and I. Georgoulas, “Multi-Class Confusion Matrix Reduction method and its application on Net Promoter Score classification problem,” ACM International Conference Proceeding Series, pp. 412–419, Jun. 2021, doi: 10.1145/3453892.3461323.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Difficulty Level Identification of Indonesian and Mathematics Multiple Choice Questions using Machine Learning Approach

Difficulty Level Identification of Indonesian and Mathematics Multiple Choice Questions using Machine Learning Approach

Abstract

Downloads

References

Most read articles by the same author(s)