Optimasi Akurasi Jawaban Aplikasi Chatbot Layanan Pelanggan dengan Metode RAGRetrieval-Augmented Generation

Dhaman Dhaman; Sajarwo Anggai; Arya Adhyaksa Waskita

doi:10.47065/josh.v6i4.8048

Dhaman Dhaman * Universitas Pamulang, Tangerang Selatan, Indonesia
Sajarwo Anggai Universitas Pamulang, Tangerang Selatan, Indonesia
Arya Adhyaksa Waskita Universitas Pamulang, Tangerang Selatan, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/josh.v6i4.8048

Keywords: Large Language Model; Llama; Retrieval-Augmented generation; ROUGE; Evaluation

Abstract

This research addresses the issue of low answer accuracy in chatbot systems based on Large Language Models (LLMs) when responding to questions derived from customer service documents. To overcome this problem, the Retrieval-Augmented Generation (RAG) method is applied to improve the quality of responses by adding relevant context from external documents. Three LLM models used in this study are LLaMA3.1 8B, LLaMA3.2 1B, and LLaMA3.2 3B from Meta AI. Evaluation is conducted using automatic ROUGE metrics (ROUGE-1, ROUGE-2, and ROUGE-L) and manual human evaluation assessing accuracy, relevance, and hallucination. This research contributes to the development of more reliable question-answering systems based on LLMs enhanced with external contextual documents related to customer service information. The results show a significant improvement across all models after applying the RAG method. ROUGE F1-scores increased consistently, with Llama3.1:8b showing the highest gain (from 0.12 to 0.58 on ROUGE-1). Human evaluation also confirmed improvements in accuracy (up to +2.73 points) and reductions in hallucination (up to −2.63 points). These improvements were evident not only in larger models but also in smaller ones, indicating that the benefits of RAG are not dependent on model size. In conclusion, RAG is highly effective in enhancing the accuracy and reliability of chatbot responses, especially in document-based question-answering scenarios. By leveraging contextual information from external documents, the system produces more factual, relevant, and hallucination-free responses. RAG has proven to be an effective approach for enhancing the response quality of LLM, including those with smaller parameter sizes.

Downloads

Download data is not yet available.

References

A. Plaat, M. van Duijn, N. van Stein, M. Preuss, P. van der Putten, and K. J. Batenburg, “Agentic Large Language Models, a survey,” 2025, [Online]. Available: http://arxiv.org/abs/2503.23037

L. Huang et al., “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Trans. Inf. Syst., vol. 43, no. 2, pp. 1–58, 2025, doi: 10.1145/3703155.

Wikipedia, “Retrieval-Augmented Generation,” 2024. [Online]. Available: https://en.wikipedia.org/wiki/Retrieval-augmented_generation

Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” pp. 1–21, 2023, [Online]. Available: http://arxiv.org/abs/2312.10997

Y. Ding et al., “A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models,” 2024, [Online]. Available: http://arxiv.org/abs/2405.06211

P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” Adv. Neural Inf. Process. Syst., vol. 2020-Decem, 2020.

V. Karpukhin et al., “Dense passage retrieval for open-domain question answering,” EMNLP 2020 - 2020 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 6769–6781, 2020, doi: 10.18653/v1/2020.emnlp-main.550.

Z. Zhang, Y. Feng, and M. Zhang, “LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers,” 2025, [Online]. Available: http://arxiv.org/abs/2502.18139

F. Liu, Z. Kang, and X. Han, “Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models Optimizing RAG Techniques Based on Locally Deployed Ollama Models A Case Study with Locally Deployed Ollama Models,” 2024, [Online]. Available: https://arxiv.org/abs/2408.05933

G. F. Febrian and G. Figueredo, “KemenkeuGPT: Leveraging a Large Language Model on Indonesia’s Government Financial Data and Regulations to Enhance Decision Making,” 2024, [Online]. Available: http://arxiv.org/abs/2407.21459

A. F. Aji et al., “One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 7226–7249, 2022, doi: 10.18653/v1/2022.acl-long.500.

N. Livathinos et al., “Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion,” 2025, [Online]. Available: http://arxiv.org/abs/2501.17887

A.-L. Bornea, F. Ayed, A. De Domenico, N. Piovesan, and A. Maatouk, “Telco-RAG: Navigating the Challenges of Retrieval-Augmented Language Models for Telecommunications,” 2024, [Online]. Available: http://arxiv.org/abs/2404.15939

The PostgreSQL Global Development Group, “pgvector 0.7.0 Released!” [Online]. Available: https://www.postgresql.org/about/news/pgvector-070-released-2852/

J. Jenq, “Improving Performance of Local Chatbot with Caching,” Proc. 28th World Multi-Conference Syst. Cybern. Informatics WMSCI 2024, vol. 22, no. 5, pp. 68–71, 2024, doi: 10.54808/wmsci2024.01.68.

H. Touvron et al., “LLaMA: Open and Efficient Foundation Language Models,” 2023, [Online]. Available: http://arxiv.org/abs/2302.13971

T. Rehman, S. Ghosh, K. Das, S. Bhattacharjee, D. K. Sanyal, and S. Chattopadhyay, “Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets,” 2025, [Online]. Available: https://huggingface.co/unsloth/llama-3-8b-bnb-4bit

H. Nguyen, H. Chen, L. Pobbathi, and J. Ding, “A Comparative Study of Quality Evaluation Methods for Text Summarization,” 2024, [Online]. Available: http://arxiv.org/abs/2407.00747

E. Kamalloo, N. Dziri, C. L. A. Clarke, and D. Rafiei, “Evaluating Open-Domain Question Answering in the Era of Large Language Models,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 5591–5606, 2023, doi: 10.18653/v1/2023.acl-long.307.

A. Joshi, S. Kale, S. Chandel, and D. Pal, “Likert Scale: Explored and Explained,” Br. J. Appl. Sci. Technol., vol. 7, no. 4, pp. 396–403, 2015, doi: 10.9734/bjast/2015/14975.

LangChain, “How to recursively split text by characters,” 2025. [Online]. Available: https://python.langchain.com/docs/how_to/recursive_text_splitter/

L. Caspari, K. G. Dastidar, S. Zerhoudi, J. Mitrovic, and M. Granitzer, “Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems,” CEUR Workshop Proc., vol. 3784, pp. 62–70, 2024.

J. Isbarov and K. Huseynova, “Enhanced document retrieval with topic embeddings,” 18th IEEE Int. Conf. Appl. Inf. Commun. Technol. AICT 2024, 2024, doi: 10.1109/AICT61888.2024.10740455.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Optimasi Akurasi Jawaban Aplikasi Chatbot Layanan Pelanggan dengan Metode RAGRetrieval-Augmented Generation

Optimasi Akurasi Jawaban Aplikasi Chatbot Layanan Pelanggan dengan Metode RAGRetrieval-Augmented Generation

Abstract

Downloads

References

Most read articles by the same author(s)