A Meta-Synthesis of Factual Accuracy and Citation Hallucination in LLM Academic Assistants

Rizki Anantama; Mohammad Iqbal Bachtiar; Zeinor Rahman

doi:10.47065/bulletinds.v5i3.10443

Rizki Anantama * University of KH. Bahaudin Mudhary Madura, Sumenep, Indonesia
Mohammad Iqbal Bachtiar University of KH. Bahaudin Mudhary Madura, Sumenep, Indonesia
Zeinor Rahman University of KH. Bahaudin Mudhary Madura, Sumenep, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bulletinds.v5i3.10443

Keywords: Large Language Model; Hallucination; Academic Assistant; Referential Integrity; Dual-Layer Evaluation

Abstract

The integration of Large Language Models (LLMs) in higher education presents a paradox between learning efficiency and the risk of misinformation due to the hallucination phenomenon. This study aims to comprehensively evaluate the factual accuracy and referential integrity of LLMs when acting as academic assistants. This research employs a comparative quantitative design through secondary data synthesis from three main empirical studies extracted from global databases. Independent variables include LLM model type, academic discipline, and prompt complexity, while dependent variables encompass concordance rate, citation fabrication rate, and Levenshtein distance deviation on Digital Object Identifiers (DOI). The results indicate that LLMs achieve factual accuracy above 90% on structured analytical tasks but show fatal vulnerability in referential integrity, with citation fabrication rates reaching 55% in GPT-3.5 and DOI hallucination reaching 89.4% in the humanities domain. These findings prove that students' trust in LLM outputs must not be absolute. The novelty of this research lies in the formulation of the "Dual-Layer Evaluation Framework" which separates conceptual validity from referential validity, providing an empirical foundation for educational institutions to formulate stricter digital literacy policies and the development of retrieval-augmented generation-based mitigation systems.

Downloads

Download data is not yet available.

References

M. Hassanzadeh and L. Razmerita, “The Impact of ChatGPT on Higher Education: A Systematic Review,” International Journal of Digital Content Management (IJDCM), vol. 7, no. 12, pp. 146–179, 2026, doi: 10.22054/dcm.2025.84267.1262.

E. Kasneci et al., “ChatGPT for good? On opportunities and challenges of large language models for education,” Learn. Individ. Differ., vol. 103, p. 102274, Apr. 2023, doi: 10.1016/J.LINDIF.2023.102274.

G. Eysenbach, “The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers,” 2023, JMIR Publications Inc. doi: 10.2196/46885.

Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” Dec. 31, 2023, Association for Computing Machinery. doi: 10.1145/3571730.

J. Dempere, K. Modugu, A. Hesham, and L. K. Ramasamy, “The impact of ChatGPT on higher education,” 2023, Frontiers Media SA. doi: 10.3389/feduc.2023.1206936.

B. D. Lund, T. Wang, N. Reddy Mannuru, B. Nie, S. Shimray, and Z. Wang, “ChatGPT and a New Academic Reality: AI-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing.”

H. Alkaissi and S. I. McFarlane, “Artificial Hallucinations in ChatGPT: Implications in Scientific Writing,” Cureus, Feb. 2023, doi: 10.7759/cureus.35179.

D. R. E. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,” Innovations in Education and Teaching International, vol. 61, no. 2, pp. 228–239, 2024, doi: 10.1080/14703297.2023.2190148.

A. Fortino and Z. Yang, “Evaluating Large Language Model Accuracy in Structured Academic Settings: Three Case Studies.”

W. H. Walters and E. I. Wilder, “Fabrication and errors in the bibliographic citations generated by ChatGPT,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-41032-5.

J. Mugaanyi, L. Cai, S. Cheng, C. Lu, and J. Huang, “Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study,” J. Med. Internet Res., vol. 26, no. 1, Jan. 2024, doi: 10.2196/52935.

Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2312.10997

Y. Zhang et al., “Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models,” Sep. 2025, [Online]. Available: http://arxiv.org/abs/2309.01219

J. Gravel, M. D’Amours-Gravel, and E. Osmanlliu, “Learning to fake it: limited responses and fabricated references provided by ChatGPT for medical questions,” Mar. 24, 2023. doi: 10.1101/2023.03.16.23286914.

D. Baidoo-Anu and L. Owusu Ansah, “Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning.”

C. Niu et al., “RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models,” May 2024, [Online]. Available: http://arxiv.org/abs/2401.00396

M. Hosseini and S. P. J. M. Horbach, “Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review,” Res. Integr. Peer Rev., vol. 8, no. 1, May 2023, doi: 10.1186/s41073-023-00133-5.

C. Zhou et al., “LIMA: Less Is More for Alignment,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.11206

M. Perkins, “Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond,” Journal of University Teaching and Learning Practice, vol. 20, no. 2, 2023, doi: 10.53761/1.20.02.07.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel A Meta-Synthesis of Factual Accuracy and Citation Hallucination in LLM Academic Assistants

A Meta-Synthesis of Factual Accuracy and Citation Hallucination in LLM Academic Assistants

Abstract

Downloads

References

Most read articles by the same author(s)