Pendekatan Translasi Otomatis Catatan Medis Indonesia untuk Ekstraksi Informasi dan Pemetaan Medis berbasis cTAKES–UMLS


  • Iwan Kasan * Mail Universitas Cendekia Abditama, Banten, Indonesia
  • Lukman Heryawan Universitas Gadjah Mada, Yogyakarta, Indonesia
  • Ellya Qolina Universitas Cendekia Abditama, Banten, Indonesia
  • Aliyah Aliyah Universitas Cendekia Abditama, Banten, Indonesia
  • (*) Corresponding Author
Keywords: cTAKES; UMLS; Medical Notes; SOAP; NLP; Google Translation API

Abstract

Unstructured medical notes in SOAP format are crucial assets for clinical analysis; however, their automated processing in the Indonesian language remains a significant challenge due to limited support from global NLP technologies. This study evaluates the integration of Apache cTAKES and the Unified Medical Language System (UMLS) to extract medical information from Indonesian electronic health records. The primary obstacle lies in the cTAKES architecture, which is optimized for English, causing direct application to Indonesian texts to yield a very low detection rate (Recall) of only 17.9%. As a pragmatic solution to bridge this linguistic barrier, this research proposes a preprocessing pipeline based on automatic translation using the Google Translate API prior to the cTAKES extraction process. The evaluation was conducted on a dataset of 50 SOAP-format medical records identifying 840 medical entities. Experimental results demonstrate that the automatic translation approach significantly improves entity detection, achieving a Recall of 90.2% and an F1-Score of 93.4%. Despite challenges such as information loss from local medical abbreviations and translation ambiguities, this study proves that automatic translation serves as an effective transitional strategy in resource-limited environments. This approach not only supports clinical information extraction but also enables the automatic mapping of medical terminology to international standards such as ICD-10, SNOMED-CT, and RxNorm to foster national health data interoperability.

Downloads

Download data is not yet available.

References

Abdillah, A. F., Purwitasari, D., Juniat, S., & Purnomo, H. H. (2023). Pengenalan entitas biomedis dalam teks konsultasi kesehatan online berbahasa Indonesia berbasis arsitektur transformers. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 10(1), 131–140

Ananda, N., Haryadi, D., & Fathiyana, R. Z. (2025). NER for Medical Component Classification in Doctor’s Responses Using BERT-CRF. 2025 8th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 1109–1114. https://doi.org/10.1109/ISRITI68345.2025.11393439

Bai, L., Mulvenna, M. D., Wang, Z., & Bond, R. (2021, June 10). Clinical Entity Extraction: Comparison between MetaMap, cTAKES, CLAMP and Amazon Comprehend Medical. 2021 32nd Irish Signals and Systems Conference, ISSC 2021. https://doi.org/10.1109/ISSC52156.2021.9467856

Chen, L., Qi, Y., Wu, A., Deng, L., & Jiang, T. (2022). Enhancing Cross-lingual Medical Concept Alignment by Leveraging Synonyms and Translations of the Unified Medical Language System. 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), 2078–2083. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00309

Chen, L., Qi, Y., Wu, A., Deng, L., & Jiang, T. (2023). Mapping Chinese Medical Entities to the Unified Medical Language System. Health Data Science, 3. https://doi.org/10.34133/hds.0011

Dávila-García, D. M., Schuelke, M. J., & Wilcox, A. B. (2026). Lightweight open-source large language models versus cTAKES for information extraction from discharge summaries: tobacco smoking status test case. JAMIA Open, 9(1). https://doi.org/10.1093/jamiaopen/ooaf182

Hermawan, E. (n.d.). Assessment of Medical Record Documentation and SOAP Completeness in Outpatient Services at a Primary Health Facility. In International Journal of Health and Pharmaceutical. Retrieved https://ijhp.net

Iza, J., Morejon, S., & Uyaguari, A. (2022). Automated Web Annotator of Biomedical Entities in Spanish Language. Proceedings - 3rd International Conference on Information Systems and Software Technologies, ICI2ST 2022, 72–78. https://doi.org/10.1109/ICI2ST57350.2022.00018

Kementerian Kesehatan Republik Indonesia. (2021). Cetak Biru Strategi Transformasi Digital Kesehatan 2024. Jakarta

Kim, M. H., Miramontes, S., Mehta, S., Schwartz, G. L., Kim, Y. J., Yang, Y., Hill-Jarrett, T. G., Cevallos, N., Chen, R., Glymour, M. M., Ferguson, E. L., Zimmerman, S. C., Choi, M., & Sims, K. D. (2025). Extracting Housing and Food Insecurity Information From Clinical Notes Using cTAKES. Health Services Research, 60(S3). https://doi.org/10.1111/1475-6773.14440

Kusumawardani, R. P., & Kusumawati, K. N. (2024). Named entity recognition in the medical domain for Indonesian language health consultation services using bidirectional-lstmcrf algorithm. Procedia Computer Science, 245, 1146–1156. https://doi.org/10.1016/j.procs.2024.10.344

Mariammal, G., Swetha, K., & Samuel Jimrys, S. (2025). Medical Report Simplification System: Enhancing Healthcare Accessibility using NLP-based Extraction and AI-Driven Explanation. Proceedings of 6th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, ICICV 2025, 541–545. https://doi.org/10.1109/ICICV64824.2025.11085700

Purwitasari, D., Abdillah, A. F., Juanita, S., & Purnomo, M. H. (2021). Transfer Learning Approaches for Indonesian Biomedical Entity Recognition. Proceedings of 2021 13th International Conference on Information and Communication Technology and System, ICTS 2021, 348–353. https://doi.org/10.1109/ICTS52701.2021.9608496

Russel Hossain, M., Mahabub, S., Al Masum, A., & Jahan, I. (2024). Natural Language Processing (NLP) in Analyzing Electronic Health Records for Better Decision Making. https://doi.org/10.32996/jcsts

Shafqat, S., Anwar, Z., Javaid, Q., & Ahmad, H. F. (2023). NER Sequence Embedding of Unified Medical Corpora to incorporate Semantic Intelligence in Big Data Healthcare Diagnostics. https://doi.org/10.21203/rs.3.rs-3148503/v1

Shamimul Hasan, S. M., Agasthya, G., Santel, D., Bhatnagar, S., Goethert, I., Glauser, T., & Pestian, J. (2023). Application of Unified Medical Language System (UMLS) to Standardize Pediatric Drug Data. Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023, 753–755. https://doi.org/10.1109/ICHI57859.2023.00138

Sophie, S. L. M., Sathya, S. S., & Deepesh, C. (2022). Analyzing the Performance of Information Extraction System for Annotation of Patient Discharge Summary. 2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, IATMSI 2022. https://doi.org/10.1109/IATMSI56455.2022.10119418

Tran, H. V., Tran, L. Q., Bui, T. P., Nguyen, V. V., & Nguyen, P. T. (2024). An Approach for Standardized Medical Terminology Machine Translation Using Pre-Trained Large Language Models. Proceedings - International Conference on Knowledge and Systems Engineering, KSE, 274–278. https://doi.org/10.1109/KSE63888.2024.11063657

Vayadande, K., Shinde, R., Bende, S., Sathe, H., Walunj, S., & Jha, S. (2026). Specialized Large Language Models for Hindi Medical Natural Language Processing: A Clinical Entity in a Multi-Modal Framework Recognition and Semantic Understanding. 1–8. https://doi.org/10.1109/ictbig68706.2025.11323575

Ye, Q., Yao, Z., Hu, P., Ji, X., Ruan, T., & Hou, R. (2024). Alignment of Chinese-English Medical Terminology in Small-Sample Scenarios: A Two-Stage Approach. Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024, 3908–3911. https://doi.org/10.1109/BIBM62325.2024.10821920


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Pendekatan Translasi Otomatis Catatan Medis Indonesia untuk Ekstraksi Informasi dan Pemetaan Medis berbasis cTAKES–UMLS

Dimensions Badge
Article History
Published: 2026-03-31
Abstract View: 49 times
PDF Download: 34 times
Issue
Section
Articles