Penerapan Logits Processing Pada Teknologi Transformer untuk Penciptaan Melodi Berbentuk Notasi ABC dalam  Pengembangan Game Indie

Muhammad Faishal Ali Dhiaulhaq; Arif Akbarul Huda; Arifiyanto Hadinegoro

doi:10.47065/bits.v6i4.6642

Muhammad Faishal Ali Dhiaulhaq * Universitas Amikom Yogyakarta, Sleman, Indonesia
Arif Akbarul Huda Universitas Amikom Yogyakarta, Sleman, Indonesia
Arifiyanto Hadinegoro Universitas Amikom Yogyakarta, Sleman, Indonesia

(*) Corresponding Author

DOI: https://doi.org/10.47065/bits.v6i4.6642

Keywords: Deep Learning; Transformers; ABC Notation; Logits Processing; Generative AI

Abstract

Generative Artificial Intelligence (Gen AI) technology is increasingly being used by creative professionals, including musicians and game developers. Many game developers now turn to open or paid music assets, but the variety of options is usually quite limited. This research aims to assist game developers in generating music assets in ABC notation format. The research methods include data collection in the form of ABC notation, data processing, model development, and metric evaluation. The data was collected by extracting ABC notation along with the characteristic musical components of each item. Data processing involved handling missing values and feature selection, while data preparation included labeling and tokenization. The model used was GPT-2 based on the Transformer architecture, pretrained on a general dataset. Integration of the model with ABC notation data was enhanced using Logits Processing to improve output control. The evaluation results show that Transformer technology can generate pitch patterns consistent with the validation data, with the EMD values concentrated in the range of 1.0–1.5 and an average of 1.60. Although there are some outliers and differences in pitch distribution between the validation data and generated results, the Horror genre with a Joyful mood and Excitement emotion achieved the highest combined fitness score of 0.528. The model still requires further refinement to produce more consistent pitch distributions. This research demonstrates the potential of Transformer technology in generating music assets for games, but further studies are needed to improve accuracy and consistency in the results.

Downloads

Download data is not yet available.

References

W.J. Baltzell, A Complete History of Music, 1st ed. Frankfurt am Main, Germany: Outlook Verlag GmbH, 2020. Accessed: Jan. 11, 2025. [Online]. Available: https://www.gutenberg.org/ebooks/54392

R. Soelistijadi, “Prototipe Model Generatif dengan LSTM untuk Penciptaan lagu Campur Sari Didi Kempot,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 9, no. 4, pp. 3687–3700, Dec. 2022, doi: 10.35957/jatisi.v9i4.2186.

W. Gibbons and M. Grimshaw-Aagaard, The Oxford Handbook of Video Game Music and Sound. Oxford University Press, 2024. doi: 10.1093/oxfordhb/9780197556160.001.0001.

I. Cardoso and L. N. Ferreira, “The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos,” FDG ’24: Proceedings of the 19th International Conference on the Foundations of Digital Games, pp. 1–6, 2024, doi: 10.1145/3649921.

A. Muhamed et al., “Symbolic Music Generation with Transformer-GANs,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, pp. 408–417, May 2021, doi: 10.1609/aaai.v35i1.16117.

S. Dai, Z. Jin, C. Gomes, and R. B. Dannenberg, “Controllable deep melody generation via hierarchical music structure representation,” Sep. 2021, [Online]. Available: http://arxiv.org/abs/2109.00663

H. Wang, S. Hao, C. Zhang, X. Wang, and Y. Chen, “Motif Transformer: Generating Music with Motifs,” IEEE Access, vol. 11, pp. 63197–63204, 2023, doi: 10.1109/ACCESS.2023.3287271.

Q. Huang et al., “Noise2Music: Text-conditioned Music Generation with Diffusion Models,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.03917

Y.-J. Shih, S.-L. Wu, F. Zalkow, M. Müller, and Y.-H. Yang, “Theme Transformer: Symbolic Music Generation With Theme-Conditioned Transformer,” IEEE Trans Multimedia, vol. 25, pp. 3495–3508, Nov. 2023, doi: 10.1109/TMM.2022.3161851.

X. Li, X. Li, D. Pan, and D. Zhu, “Improving Adversarial Robustness via Probabilistically Compact Loss with Logit Constraints,” pp. 8482–8490, Dec. 2020, Accessed: Dec. 24, 2024. [Online]. Available: http://arxiv.org/abs/2012.07688

A. Amrullah, B. Wiratama, and A. A. Huda, “Performance Analysis of Deep Neural Network based Gamelan Musical Instruments Separation,” in 2023 6th International Conference on Information and Communications Technology (ICOIACT), IEEE, Nov. 2023, pp. 338–342. doi: 10.1109/ICOIACT59844.2023.10455860.

Y. Hao, L. Dong, F. Wei, and K. Xu, “Self-Attention Attribution: Interpreting Information Interactions Inside Transformer,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, pp. 12963–12971, May 2021, doi: 10.1609/aaai.v35i14.17533.

Y. Shao et al., “An Improvement of Adam Based on a Cyclic Exponential Decay Learning Rate and Gradient Norm Constraints,” Electronics (Switzerland), vol. 13, no. 9, May 2024, doi: 10.3390/electronics13091778.

J. von Oswald et al., “Transformers Learn In-Context by Gradient Descent,” 2023. doi: 10.5555/3618408.3619872.

R. Yusuf and A. A. Huda, “Deteksi Emosi Wajah Menggunakan Metode Backpropagation,” Journal Automation Computer Information System, vol. 3, no. 2, pp. 103–114, Sep. 2023, doi: 10.47134/jacis.v3i2.60.

D. Rothman, Transformers for Natural Language Processing and Computer Vision, 3rd ed. Birmingham, UK: Packt Publishing Ltd., 2024. Accessed: Jan. 12, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/10162340

X. Jin, W. Zhou, J. Wang, D. Xu, Y. Rong, and S. Cui, “An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores,” 2023 IEEE International Conference on Multimedia and Expo (ICME), Jan. 2023, [Online]. Available: http://arxiv.org/abs/2301.05908

A. S. Subramanian, C. Weng, S. Watanabe, M. Yu, and D. Yu, “Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition,” Comput Speech Lang, Feb. 2021, [Online]. Available: http://arxiv.org/abs/2102.07955

L.C. Castilho, R. Dias, and J.F. Pinho, Perspectives on Music, Sound and Musicology, vol. 10. Cham: Springer International Publishing, 2021. doi: 10.1007/978-3-030-78451-5.

K. A. Yuana, A. Hadinegoro, Indarto, Deendarlianto, and E. P. Budiana, “Developing from 2D to 3D Droplet Modeling and Simulation Using Lattice Boltzmann Method (LBM),” in ICOIACT 2021 - 4th International Conference on Information and Communications Technology: The Role of AI in Health and Social Revolution in Turbulence Era, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 175–179. doi: 10.1109/ICOIACT53268.2021.9564006.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Penerapan Logits Processing Pada Teknologi Transformer untuk Penciptaan Melodi Berbentuk Notasi ABC dalam Pengembangan Game Indie