Question Entailment on Developing Indonesian Covid-19 Question Answering System


  • Muhammad Zaky Aonillah * Mail Telkom University, Bandung, Indonesia
  • Hasmawati Hasmawati Telkom University, Bandung, Indonesia
  • Ade Romadhony Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: Covid-19; Indonesia; Question Answering System; Question Entailment; Supervised Learning

Abstract

Despite the severe impact of COVID-19 on humans has already decreased, people still need to be aware of the recent disease information. A continually updated Frequently Asked Questions (FAQ) system could help the public get valid and relevant information. To maintain a FAQ system manually needs much effort, hence an approach to develop the system automatically is needed. Question Answering System (QAS) is a system that can accept input in question sentences and produces an answer quickly, concisely, and relevantly, and could be used to provide COVID-19 information to the public. One method on developing a QAS is Recognizing Question Entailment (RQE). RQE is a form of relationship based on a cause-and-effect relationship between two pieces of text called text (T) and hypothesis (H). We present a study on developing Covid-19 QAS in Bahasa Indonesia using RQE. The datasets are collected from reputable sources and consist of 725 pairs of questions and answers. The experimental results show that the best performance results were obtained using the Logistic Regression model in training set 1, which contains 54.2% of positive question pairs and 45.8% of negative question pairs with an f-measure value of 83.65%. These results indicate that the RQE method can identify the entailment between new questions and questions in the dataset well.

Downloads

Download data is not yet available.

References

R. Tosepu et al., “Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia,” Sci. Total Environ., vol. 725, 2020, doi: 10.1016/j.scitotenv.2020.138436.

Keputusan Menteri Kesehatan Republik Indonesia, “Keputusan Menteri Kesehatan Republik Indonesia Nomor HK.01.07/MenKes/413/2020 Tentang Pedoman Pencegahan dan Pengendalian Corona Virus Disease 2019 (Covid-19),” MenKes/413/2020, vol. 2019, p. 207, 2020.

B. McCann, N. S. Keskar, C. Xiong, and R. Socher, “The Natural Language Decathlon: Multitask Learning as Question Answering,” 2018, [Online]. Available: http://arxiv.org/abs/1806.08730.

D. Jurafsky and J. H. Martin, Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Third Edition draft. 2021.

A. Ben Abacha, C. Shivade, and D. Demner-Fushman, “Overview of the MEDIQA 2019 shared task on textual inference, question entailment and question answering,” BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, pp. 370–379, 2019, doi: 10.18653/v1/w19-5039.

A. Ben Abacha and D. F. Dina, “Recognizing Question Entailment for Medical Question Answering,” AMIA ... Annu. Symp. proceedings. AMIA Symp., vol. 2016, pp. 310–318, 2016.

R. D. Burke, K. J. Hammond, V. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg, “Question answering from frequently asked question files: Experiences with the FAQ FINDER system,” AI Mag., vol. 18, no. 2, pp. 57–66, 1997.

M. De Bruyn, E. Lotfi, J. Buhmann, and W. Daelemans, “MFAQ: a Multilingual FAQ Dataset,” pp. 1–13, 2021, doi: 10.18653/v1/2021.mrqa-1.1.

J. Jeon, W. B. Croft, and J. H. Lee, “Finding similar questions in large question and answer archives,” Int. Conf. Inf. Knowl. Manag. Proc., pp. 84–90, 2005, doi: 10.1145/1099554.1099572.

S. Bahri, S. Sumpeno, and S. M. S. Nugroho, “An information retrieval approach to finding similar questions in question-answering of Indonesian government e-procurement services using TF∗IDF and LSI model,” Proc. 2018 10th Int. Conf. Inf. Technol. Electr. Eng. Smart Technol. Better Soc. ICITEE 2018, pp. 626–631, 2018, doi: 10.1109/ICITEED.2018.8534856.

A. Ben Abacha and D. Demner-Fushman, “A question-entailment approach to question answering,” BMC Bioinformatics, vol. 20, no. 1, pp. 1–23, 2019, doi: 10.1186/s12859-019-3119-4.

J. Wei, C. Huang, S. Vosoughi, and J. Wei, “What Are People Asking About COVID-19? A Question Classification Dataset,” 2020, [Online]. Available: http://arxiv.org/abs/2005.12522.

M. Benard Magara, S. O. Ojo, and T. Zuva, “A comparative analysis of text similarity measures and algorithms in research paper recommender systems,” 2018 Conf. Inf. Commun. Technol. Soc. ICTAS 2018 - Proc., pp. 1–5, 2018, doi: 10.1109/ICTAS.2018.8368766.

A. Alatawi, W. Xu, and J. Yan, “The Expansion of Source Code Abbreviations Using a Language Model,” Proc. - Int. Comput. Softw. Appl. Conf., vol. 2, pp. 370–375, 2018, doi: 10.1109/COMPSAC.2018.10260.

X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed. Tools Appl., vol. 78, no. 3, pp. 3797–3816, 2019, doi: 10.1007/s11042-018-6083-5.

D. A. Pisner and D. M. Schnyer, Support vector machine. Elsevier Inc., 2019.

H. H. Rashidi, N. K. Tran, E. V. Betts, L. P. Howell, and R. Green, “Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods,” Acad. Pathol., vol. 6, 2019, doi: 10.1177/2374289519873088.

S. Diwandari and N. A. Setiawan, “Perbandingan Algoritme J48 Dan Nbtree Untuk Klasifikasi Diagnosa Penyakit Pada Soybean,” Semin. Nas. Teknol. Inf. dan Komun., vol. 2015, no. Sentika, pp. 205–212, 2015.

H. Hong et al., “Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China),” Catena, vol. 163, no. January, pp. 399–413, 2018, doi: 10.1016/j.catena.2018.01.005.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Question Entailment on Developing Indonesian Covid-19 Question Answering System

Dimensions Badge
Article History
Submitted: 2022-08-04
Published: 2022-09-01
Abstract View: 639 times
PDF Download: 506 times
Issue
Section
Articles