Comparative Analysis of Ahmad-Yusoff and Jaro-Winkler Approaches for Javanese Language Stemming
Abstract
This research presents a performance comparison between two approaches for identifying the base form of affixed Javanese words: the Ahmad Yusoff Sembok (AYS) rule-based stemming algorithm and the Jaro-Winkler (JW) string similarity approach. Javanese was selected as the focus because of its complex morphological structure, encompassing prefixes, suffixes, infixes, and confixes, along with significant speech-level and dialectal variation, which together pose challenges for natural language processing. The dataset comprises 720 manually annotated word lemma pairs. Evaluation was carried out using precision, recall, F1-score, accuracy, and Cohen’s Kappa, complemented by error analysis on over-stemming and under-stemming cases. Results indicate that JW achieves higher overall performance (83.19% accuracy, 83% F1-score) compared to AYS (73.19% accuracy, 73% F1-score), with AYS producing more over-stemming errors (88 cases) and JW showing more under-stemming errors (47 cases). These outcomes suggest that similarity-based approaches are more effective in addressing Javanese morphological complexity, while also contributing a benchmark dataset of manually annotated Javanese word lemma pairs, a comparative evaluation framework between rule-based and similarity-based approaches, and practical insights for the development of stemming tools in regional languages that currently lack NLP resources.
Downloads
References
W. D. Suryono, E. Utami, and D. Ariatmanto, “Analisa Perbandingan Stemming Dokumen Teks Berbahasa Jawa dengan Algoritma Levenshtein Distance Dan Jaro-Winkler,” JIPI J. Ilm. Penelit. Dan Pembelajaran Inform., vol. 10, no. 1, pp. 774–780, Jan. 2025, doi: 10.29100/jipi.v10i1.6092.
D. A. Sulistyo, A. P. Wibawa, D. D. Prasetya, and F. A. Ahda, “An enhanced pivot-based neural machine translation for low-resource languages,” Int. J. Adv. Intell. Inform., vol. 11, no. 2, Art. no. 2, May 2025, doi: 10.26555/ijain.v11i2.2115.
A. R. R. Ivani, A. Z. Kurniadi, and A. B. A. Andira, “VGG16 untuk Klasifikasi Tingkat Kematangan pada Buah Apel di Kota Batu,” 2025.
M. Syahrullah, F. H. Rachman, and I. O. Suzanti, “Deteksi Kemiripan Dokumen Abstrak Skripsi menggunakan Metode Jaro-Winkler Distance dan Synonym Recognition,” Sains Data J. Studi Mat. Dan Teknol., vol. 2, no. 2, pp. 68–79, Dec. 2024, doi: 10.52620/sainsdata.v2i2.136.
Muharir, E. Noersasongko, Muljono, A. Syukur, and M. Aqqad, “Leveraging Jaro-Winkler for Enhanced Nazief-Adriani Banjar Text Stemming,” in 2024 Ninth International Conference on Informatics and Computing (ICIC), Oct. 2024, pp. 1–6. doi: 10.1109/ICIC64337.2024.10956530.
A. Arif siswandi, Y. Permana, and A. Emarilis, “Stemming Analysis Indonesian Language News Text with Porter Algorithm,” J. Phys. Conf. Ser., vol. 1845, no. 1, p. 012019, Mar. 2021, doi: 10.1088/1742-6596/1845/1/012019.
M. Ashari, D. A. Sulistyo, and F. A. Ahda, “STEMMING IN MADURESE LANGUAGE USING NAZIEF AND ADRIANI ALGORITHM,” J. Tek. Inform. Jutif, vol. 5, no. 4, Art. no. 4, July 2024, doi: 10.52436/1.jutif.2024.5.4.2012.
F. A. Ahda, A. P. Wibawa, D. D. Prasetya, and D. A. Sulistyo, “Comparison of Adam Optimization and RMS prop in Minangkabau-Indonesian Bidirectional Translation with Neural Machine Translation,” JOIV Int. J. Inform. Vis., vol. 8, no. 1, pp. 231–238, Mar. 2024, doi: 10.62527/joiv.8.1.1818.
I. Afanasev and O. Lyashevskaya, “Chapter 2 String Similarity Measures for Evaluating the Lemmatisation in Old Church Slavonic,” Brill, 2024. doi: 10.1163/9789004702660_003.
Z. Abidin, A. Junaidi, and Wamiliana, “Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 2, pp. 217–231, June 2024, doi: 10.20473/jisebi.10.2.217-231.
D. A. Sulistyo, A. P. Wibawa, D. D. Prasetya, and F. A. Ahda, “LSTM-Based Machine Translation for Madurese-Indonesian,” J. Appl. Data Sci., vol. 4, no. 3, Art. no. 3, Sept. 2023, doi: 10.47738/jads.v4i3.113.
Rina, “Memahami Confusion Matrix: Accuracy, Precision, Recall, Specificity, dan F1-Score untuk Evaluasi…,” Medium. Accessed: June 23, 2025. [Online]. Available: https://esairina.medium.com/memahami-confusion-matrix-accuracy-precision-recall-specificity-dan-f1-score-610d4f0db7cf
R. Mohemad, N. N. Mohd Muhait, N. M. Mohamad Noor, and N. F. Akma Mamat, “A Comparative Study of Stemming Techniques on the Malay Text.,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 12, p. 133, Dec. 2023, doi: 10.14569/ijacsa.2023.0141213.
Kulampah, “Kamus Bahasa Jawa Dan Artinya [Lengkap Dan Update 2024].” Accessed: June 16, 2025. [Online]. Available: https://kulampah.com/kamus-bahasa-jawa-dan-artinya/
H. Jayadianti, B. Santosa, J. Cahyaning, S. Saifullah, and R. Drezewski, “Essay auto-scoring using N-Gram and Jaro Winkler based Indonesian Typos,” MATRIK J. Manaj. Tek. Inform. Dan Rekayasa Komput., vol. 22, no. 2, pp. 325–338, Mar. 2023, doi: 10.30812/matrik.v22i2.2473.
V. F. Sopacua, R. A. Da Costa, and L. F. Pesiwarissa, “REDUPLIKASI DALAM BAHASA MELAYU AMBON (KAJIAN MORFOLOGI),” ARBITRER J. Pendidik. Bhs. Dan Sastra Indones., vol. 4, no. 2, pp. 687–704, Aug. 2022, doi: 10.30598/arbitrervol4no2hlm687-704.
D. Kastowo, A. Saputra, W. D. Suryono, and E. Setyowati, “Analisis Perbandingan Algoritma Nazief Adriani dan Levenshtein Distance untuk mengukur Tingkat Similaritas Berita Menggunakan Rabin Krap: Studi Kasus Berita Berbahasa Jawa,” JNANALOKA, pp. 1–10, Mar. 2022, doi: 10.36802/jnanaloka.2022.v3-no1-1-10.
T. Efriyanto and M. Hayaty, “JARO WINKLER ALGORITHM FOR MEASURING SIMILARITY ONLINE NEWS,” J. Tek. Inform. Jutif, vol. 3, no. 4, pp. 975–982, Aug. 2022, doi: 10.20884/1.jutif.2022.3.4.152.
A. F. Aji et al., “One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia,” Mar. 24, 2022, arXiv: arXiv:2203.13357. doi: 10.48550/arXiv.2203.13357.
M. A. Yulianto and N. Nurhasanah, “The Hybrid of Jaro-Winkler and Rabin-Karp Algorithm in Detecting Indonesian Text Similarity,” J. Online Inform., vol. 6, no. 1, pp. 88–95, June 2021, doi: 10.15575/join.v6i1.640.
A. P. Wibawa and M. N. Hakim, “STEMMING BAHASA JAWA MENGGUNAKAN DAMERAU LEVENSHTEIN DISTANCE (DLD),” J. Tek. Inform., vol. 14, no. 1, pp. 22–27, Sept. 2021, doi: 10.15408/jti.v14i1.15010.
M. F. Tanjung, “Boosting Stemmer Performance Using Cache Method,” J. Mat. Dan Ilmu Pengetah. Alam LLDikti Wil. 1 JUMPA, vol. 1, no. 1, pp. 6–9, Mar. 2021, doi: 10.54076/jumpa.v1i1.34.
D. Sulistyo, F. Ahda, and V. A. Fitria, “Epistomologi dalam Natural Language Processing,” J. Inov. Teknol. Dan Edukasi Tek., vol. 1, no. 9, pp. 652–664, Sept. 2021, doi: 10.17977/um068v1i92021p652-664.
O. V. Putra, A. Musthafa, and K. P. Wibowo, “Klasifikasi Ekspresi Teks Berbahasa Jawa Menggunakan Algoritma Long Short Term Memory,” Komputika J. Sist. Komput., vol. 10, no. 2, pp. 137–143, Aug. 2021, doi: 10.34010/komputika.v11i1.4616.
M. A. Nur, “Perbandingan Levenshtein Distance Dan Jaro-Winkler Distance Untuk Koreksi Kata Dalam Preprocessing Analisis Sentimen Pengguna Twitter,” J. Fokus Elektroda Energi List. Telekomun. Komput. Elektron. Dan Kendali, vol. 6, no. 2, p. 88, June 2021, doi: 10.33772/jfe.v6i2.17751.
I. Huda, “IMPLEMENTASI NATURAL LANGUAGE PROCESSING (NLP) UNTUK APLIKASI PENCARIAN LOKASI,” J. Nas. Teknol. Terap. JNTT, vol. 3, no. 2, Art. no. 2, Oct. 2021, doi: 10.22146/jntt.35036.
R. Hayati, “VARIASI BAHASA DAN KELAS SOSIAL,” Pena J. Ilmu Pengetah. Dan Teknol., vol. 35, no. 1, p. 48, Mar. 2021, doi: 10.31941/jurnalpena.v35i1.1348.
A. Zaremba and E. Demir, “ChatGPT: Unlocking the future of NLP in finance,” Mod. Finance, vol. 1, no. 1, pp. 93–98, 2023.
D. Kiela et al., “Dynabench: Rethinking Benchmarking in NLP,” Apr. 07, 2021, arXiv: arXiv:2104.14337. doi: 10.48550/arXiv.2104.14337.
J. M. S. Efani, “NORMALISASI KATA BAHASA JAWA PADA TWEET DENGAN EDIT DISTANCE DAN DICTIONARY LOOKUP,” PhD Thesis, UNIVERSITAS ISLAM NEGERI SULTAN SYARIF KASIM RIAU, 2021. Accessed: Sept. 28, 2025. [Online]. Available: https://repository.uin-suska.ac.id/50618/2/T.A%20JELITA.pdf
W. D. Suryono, E. Utami, and D. Ariatmanto, “Analisa Perbandingan Stemming Dokumen Teks Berbahasa Jawa dengan Algoritma Levenshtein Distance Dan Jaro-Winkler,” JIPI J. Ilm. Penelit. Dan Pembelajaran Inform., vol. 10, no. 1, pp. 774–780, 2025.
- Novi Yulianti, “ALGORITMA STEMMING BAHASA WOLIO BERBASIS ATURAN MORFOLOGI,” skripsi, Universitas Islam Negeri Sultan Syarif kasim Riau, 2021. Accessed: Sept. 28, 2025. [Online]. Available: https://repository.uin-suska.ac.id/54523/
- Nur Hasanah Hrp, = Muhammad Fikry, and - Yusra, “ALGORITMA STEMMING TEKS BAHASA BATAK ANGKOLA BERBASIS ATURAN TATA BAHASA,” ALGORITMA STEMMING TEKS Bhs. BATAK ANGKOLA Berbas. ATURAN TATA Bhs., vol. 4, no. 3, pp. 643–648, May 2023.
Z. Abidin, A. Wijaya, and D. Pasha, “Aplikasi Stemming Kata Bahasa Lampung Dialek Api Menggunakan Pendekatan Brute-Force dan Pemograman C,” J. Media Inform. Budidarma, vol. 5, no. 1, pp. 1–8, 2021.
D. Mustikasari, I. Widaningrum, R. Arifin, and W. H. E. Putri, “Comparison of Effectiveness of Stemming Algorithms in Indonesian Documents,” presented at the 2nd Borobudur International Symposium on Science and Technology (BIS-STE 2020), Atlantis Press, Aug. 2021, pp. 154–158. doi: 10.2991/aer.k.210810.025.
D. O. Dewi, D. Oktafiani, and Y. Astica, “OPTIMASI ALGORITMA STEMMING PORTER UNTUK PEMROSESAN TEKS DALAM BAHASA INDONESIA: OPTIMASI ALGORITMA STEMMING PORTER UNTUK PEMROSESAN TEKS DALAM BAHASA INDONESIA,” J. Inform. Dan Sist. Inf., vol. 6, no. 1, pp. 42–52, June 2025.
M. U. Albab, Y. K. P, and M. N. Fawaiq, “Optimization of the Stemming Technique on Text Preprocessing President 3 Periods Topic,” J. Transform., vol. 20, no. 2, pp. 1–12, Jan. 2023, doi: 10.26623/transformatika.v20i2.5374.
L. Pertiwi, “Penerapan Algoritma Text Mining, Steaming Dan Texrank Dalam Peringkasan Bahasa Inggris,” BIMASATI Bull. Multi-Discip. Sci. Appl. Technol., vol. 1, no. 3, pp. 100–104, 2022.
W. E. S. Nurlina et al., Kamus bahasa Jawa-Indonesia. Yogyakarta: Balai Bahasa Provinsi Daerah Istimewa Yogyakarta, 2021. Accessed: Sept. 28, 2025. [Online]. Available: https://repositori.kemendikdasmen.go.id/28642/
F. Nuryantiningsih, “Relevansi Adjektiva Human Propensity dalam Bahasa Jawa sebagai Cerminan Pandangan Hidup Manusia Jawa,” Deskripsi Bhs., vol. 5, no. 2, pp. 50–57, Oct. 2022, doi: 10.22146/db.v5i2.5849.
“Analisis Interferensi Morfologi Bahasa Jawa ke Bahasa Indonesia dalam Film “Sepatu Dahlan†Karya Benni Setiawan | Karim | Jurnal Bahasa, Sastra, dan Budaya.” Accessed: Sept. 28, 2025. [Online]. Available: https://ejurnal.ung.ac.id/index.php/JBSP/article/view/16002
“INTERFERENSI MORFOLOGIS BAHASA JAWA DALAM PENGGUNAAN BAHASA INDONESIA SISWA SMP IT NURUL IKHWAH NAGAN RAYA ACEH | Prosiding Konferensi Linguistik Tahunan Atma Jaya (KOLITA).” Accessed: Sept. 28, 2025. [Online]. Available: https://ejournal.atmajaya.ac.id/index.php/kolita/article/view/3779
R. A. Sholihah, “VARIASI MORFOLOGI BAHASA JAWA PONOROGO: AFIKSASI, REDUPLIKASI, DAN PEMAJEMUKAN DALAM KONTEKS SOSIOLINGUISTIK,” Pros. Konf. Linguist. Tah. Atma Jaya KOLITA, vol. 23, no. 23, Sept. 2025, doi: 10.25170/kolita.v23i23.7173.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Comparative Analysis of Ahmad-Yusoff and Jaro-Winkler Approaches for Javanese Language Stemming
Pages: 249-259
Copyright (c) 2026 Aysza Belia Auly Andira, Fadhli Almu'iini Ahda; Danang Arbian Sulistyo

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).






















