Pendekatan Translasi Otomatis Catatan Medis Indonesia untuk Ekstraksi Informasi dan Pemetaan Medis berbasis cTAKES–UMLS
Abstract
Unstructured medical notes in SOAP format are crucial assets for clinical analysis; however, their automated processing in the Indonesian language remains a significant challenge due to limited support from global NLP technologies. This study evaluates the integration of Apache cTAKES and the Unified Medical Language System (UMLS) to extract medical information from Indonesian electronic health records. The primary obstacle lies in the cTAKES architecture, which is optimized for English, causing direct application to Indonesian texts to yield a very low detection rate (Recall) of only 17.9%. As a pragmatic solution to bridge this linguistic barrier, this research proposes a preprocessing pipeline based on automatic translation using the Google Translate API prior to the cTAKES extraction process. The evaluation was conducted on a dataset of 50 SOAP-format medical records identifying 840 medical entities. Experimental results demonstrate that the automatic translation approach significantly improves entity detection, achieving a Recall of 90.2% and an F1-Score of 93.4%. Despite challenges such as information loss from local medical abbreviations and translation ambiguities, this study proves that automatic translation serves as an effective transitional strategy in resource-limited environments. This approach not only supports clinical information extraction but also enables the automatic mapping of medical terminology to international standards such as ICD-10, SNOMED-CT, and RxNorm to foster national health data interoperability.
Downloads
References
Abdillah, A. F., Purwitasari, D., Juniat, S., & Purnomo, H. H. (2023). Pengenalan entitas biomedis dalam teks konsultasi kesehatan online berbahasa Indonesia berbasis arsitektur transformers. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 10(1), 131–140
Ananda, N., Haryadi, D., & Fathiyana, R. Z. (2025). NER for Medical Component Classification in Doctor’s Responses Using BERT-CRF. 2025 8th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 1109–1114. https://doi.org/10.1109/ISRITI68345.2025.11393439
Bai, L., Mulvenna, M. D., Wang, Z., & Bond, R. (2021, June 10). Clinical Entity Extraction: Comparison between MetaMap, cTAKES, CLAMP and Amazon Comprehend Medical. 2021 32nd Irish Signals and Systems Conference, ISSC 2021. https://doi.org/10.1109/ISSC52156.2021.9467856
Chen, L., Qi, Y., Wu, A., Deng, L., & Jiang, T. (2022). Enhancing Cross-lingual Medical Concept Alignment by Leveraging Synonyms and Translations of the Unified Medical Language System. 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), 2078–2083. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00309
Chen, L., Qi, Y., Wu, A., Deng, L., & Jiang, T. (2023). Mapping Chinese Medical Entities to the Unified Medical Language System. Health Data Science, 3. https://doi.org/10.34133/hds.0011
Dávila-García, D. M., Schuelke, M. J., & Wilcox, A. B. (2026). Lightweight open-source large language models versus cTAKES for information extraction from discharge summaries: tobacco smoking status test case. JAMIA Open, 9(1). https://doi.org/10.1093/jamiaopen/ooaf182
Hermawan, E. (n.d.). Assessment of Medical Record Documentation and SOAP Completeness in Outpatient Services at a Primary Health Facility. In International Journal of Health and Pharmaceutical. Retrieved https://ijhp.net
Iza, J., Morejon, S., & Uyaguari, A. (2022). Automated Web Annotator of Biomedical Entities in Spanish Language. Proceedings - 3rd International Conference on Information Systems and Software Technologies, ICI2ST 2022, 72–78. https://doi.org/10.1109/ICI2ST57350.2022.00018
Kementerian Kesehatan Republik Indonesia. (2021). Cetak Biru Strategi Transformasi Digital Kesehatan 2024. Jakarta
Kim, M. H., Miramontes, S., Mehta, S., Schwartz, G. L., Kim, Y. J., Yang, Y., Hill-Jarrett, T. G., Cevallos, N., Chen, R., Glymour, M. M., Ferguson, E. L., Zimmerman, S. C., Choi, M., & Sims, K. D. (2025). Extracting Housing and Food Insecurity Information From Clinical Notes Using cTAKES. Health Services Research, 60(S3). https://doi.org/10.1111/1475-6773.14440
Kusumawardani, R. P., & Kusumawati, K. N. (2024). Named entity recognition in the medical domain for Indonesian language health consultation services using bidirectional-lstmcrf algorithm. Procedia Computer Science, 245, 1146–1156. https://doi.org/10.1016/j.procs.2024.10.344
Mariammal, G., Swetha, K., & Samuel Jimrys, S. (2025). Medical Report Simplification System: Enhancing Healthcare Accessibility using NLP-based Extraction and AI-Driven Explanation. Proceedings of 6th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, ICICV 2025, 541–545. https://doi.org/10.1109/ICICV64824.2025.11085700
Purwitasari, D., Abdillah, A. F., Juanita, S., & Purnomo, M. H. (2021). Transfer Learning Approaches for Indonesian Biomedical Entity Recognition. Proceedings of 2021 13th International Conference on Information and Communication Technology and System, ICTS 2021, 348–353. https://doi.org/10.1109/ICTS52701.2021.9608496
Russel Hossain, M., Mahabub, S., Al Masum, A., & Jahan, I. (2024). Natural Language Processing (NLP) in Analyzing Electronic Health Records for Better Decision Making. https://doi.org/10.32996/jcsts
Shafqat, S., Anwar, Z., Javaid, Q., & Ahmad, H. F. (2023). NER Sequence Embedding of Unified Medical Corpora to incorporate Semantic Intelligence in Big Data Healthcare Diagnostics. https://doi.org/10.21203/rs.3.rs-3148503/v1
Shamimul Hasan, S. M., Agasthya, G., Santel, D., Bhatnagar, S., Goethert, I., Glauser, T., & Pestian, J. (2023). Application of Unified Medical Language System (UMLS) to Standardize Pediatric Drug Data. Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023, 753–755. https://doi.org/10.1109/ICHI57859.2023.00138
Sophie, S. L. M., Sathya, S. S., & Deepesh, C. (2022). Analyzing the Performance of Information Extraction System for Annotation of Patient Discharge Summary. 2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, IATMSI 2022. https://doi.org/10.1109/IATMSI56455.2022.10119418
Tran, H. V., Tran, L. Q., Bui, T. P., Nguyen, V. V., & Nguyen, P. T. (2024). An Approach for Standardized Medical Terminology Machine Translation Using Pre-Trained Large Language Models. Proceedings - International Conference on Knowledge and Systems Engineering, KSE, 274–278. https://doi.org/10.1109/KSE63888.2024.11063657
Vayadande, K., Shinde, R., Bende, S., Sathe, H., Walunj, S., & Jha, S. (2026). Specialized Large Language Models for Hindi Medical Natural Language Processing: A Clinical Entity in a Multi-Modal Framework Recognition and Semantic Understanding. 1–8. https://doi.org/10.1109/ictbig68706.2025.11323575
Ye, Q., Yao, Z., Hu, P., Ji, X., Ruan, T., & Hou, R. (2024). Alignment of Chinese-English Medical Terminology in Small-Sample Scenarios: A Two-Stage Approach. Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024, 3908–3911. https://doi.org/10.1109/BIBM62325.2024.10821920
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Pendekatan Translasi Otomatis Catatan Medis Indonesia untuk Ekstraksi Informasi dan Pemetaan Medis berbasis cTAKES–UMLS
Pages: 1885-1893
Copyright (c) 2026 Iwan Kasan, Lukman Heryawan, Ellya Qolina, Aliyah Aliyah

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).













