Classification of Key and Time Signature in Western Musical Notation by using CRNN Algorithm with Bounding Box
Abstract
This research seeks to employ the Convolutional Recurrent Neural Network (CRNN) algorithm to develop a method for classifying key and time signatures from sheet music images. The research design involved compiling a dataset of 285 sheet music images, which includes 15 types of key signatures and 19 types of time signatures. The methodology encompasses annotation using the bounding box technique, image preprocessing, and applying the CRNN model for classification using K-Fold Cross Validation because of the limited dataset. Then, the model is evaluated using the Multi Class Confusion Matrix and performance metrics. The primary findings of this study reveal that the developed model achieves 96% accuracy in key signature classification and 95% in time signature classification when utilizing bounding boxes. Conversely, the absence of bounding boxes substantially negatively impacted the accuracy of key signature classification, resulting in only a 58% accuracy rate. Time signature classification performed even worse, with an accuracy of just 19%. This research highlights the substantial accuracy enhancements achievable by incorporating bounding boxes. Therefore, we anticipate that this research will help singers, especially those in choirs, to understand and express music better using existing technologies while enhancing the accuracy of optical music recognition using the CRNN model.
Downloads
References
K. B. Pratama, S. Suyanto, and E. Rachmawati, “Human Vocal Type Classification using MFCC and Convolutional Neural Network,” in 2021 International Conference on Communication & Information Technology (ICICT), IEEE, Jun. 2021, pp. 43–48. doi: 10.1109/ICICT52195.2021.9568474.
J. K. L. Dimpudus, A. M. Sambul, and A. S. M. Lumenta, “Transliteration Block Notation Application Into Number Notation Using The MusicXML Format,” Jurnal Teknik Informatika, vol. 7, no. 1, pp. 75–82, Jan. 2022, doi: https://doi.org/10.35793/jti.v17i1.36298.
R. Broude and M. Cyr, “The Emergence of Efficient Musical Texts during the Age of Reason,” Textual Cultures, vol. 15, pp. 159–94, 2022, doi: 10.14434/tc.v15i1.35540.
Q. Wang, L. Zhou, and X. Chen, “Kernel Density Estimation and Convolutional Neural Networks for the Recognition of Multi-Font Numbered Musical Notation,” Electronics (Switzerland), vol. 11, no. 21, Nov. 2022, doi: 10.3390/electronics11213592.
M. Sasaki and J. Masunah, “A Review of The Sundanese Scale Theory,” Harmonia: Journal of Arts Research and Education, vol. 21, no. 2, pp. 318–329, Dec. 2021, doi: 10.15294/harmonia.v21i2.32995.
N. Li, “Generative Adversarial Network for Musical Notation Recognition during Music Teaching,” Comput Intell Neurosci, vol. 2022, 2022, doi: 10.1155/2022/8724688.
J. Calvo-Zaragoza, J. Hajic, and A. Pacha, “Understanding Optical Music Recognition,” ACM Comput Surv, vol. 53, no. 4, Sep. 2020, doi: 10.1145/3397499.
Fabian C. Moss, Maik K¨oster, N´estor N´apoles L´opez, and David Rizo, “Proceedings of the 4th International Workshop on Reading Music Systems,” Challenging sources: a new dataset for OMR of diverse 19th-century music theory examples , Nov. 2022, doi: 10.48550/arXiv.2211.13285.
M. Alfaro-Contreras, A. Ríos-Vila, J. J. Valero-Mas, J. M. Iñesta, and J. Calvo-Zaragoza, “Decoupling music notation to improve end-to-end Optical Music Recognition,” Pattern Recognit Lett, vol. 158, pp. 157–163, Jun. 2022, doi: 10.1016/j.patrec.2022.04.032.
H. Dwiki Kahingide and A. Salam, “Deployment of Kidney Tumor Disease Object Detection Using CT-Scan with YOLOv5,” Journal of Applied Informatics and Computing (JAIC), vol. 8, no. 1, pp. 98–105, Jul. 2024, [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
C. Garrido-Munoz, A. Rios-Vila, and J. Calvo-Zaragoza, “Proceedings of the 4th International Workshop on Reading Music Systems,” in End-to-End Graph Prediction for Optical Music Recognition , Nov. 2022. doi: 10.48550/arXiv.2211.13285.
P. Kania, D. Kania, and T. Łukaszewicz, “A hardware‐oriented algorithm for real‐time music key signature recognition,” Applied Sciences (Switzerland), vol. 11, no. 18, Sep. 2021, doi: 10.3390/app11188753.
H. Nakata and T. Nakanishi, “Music Impression Extraction Method by chord Impressions and Its Application to Music Media Retrieval,” in Proceedings - 22nd IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2021-Fall, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 68–73. doi: 10.1109/SNPD51163.2021.9704990.
A. Saxena et al., “Abnormal Health Monitoring and Assessment of a Three-Phase Induction Motor Using a Supervised CNN-RNN-Based Machine Learning Algorithm,” Math Probl Eng, vol. 2023, 2023, doi: 10.1155/2023/1264345.
Y. Liu, R. Wu, Y. Wu, L. Luo, and W. Xu, “A Stave-Aware Optical Music Recognition on Monophonic Scores for Camera-Based Scenarios,” Applied Sciences (Switzerland), vol. 13, no. 16, Aug. 2023, doi: 10.3390/app13169360.
A. Ríos-Vila, D. Rizo, J. M. Iñesta, and J. Calvo-Zaragoza, “End-to-end optical music recognition for pianoform sheet music,” in International Journal on Document Analysis and Recognition, Springer Science and Business Media Deutschland GmbH, Sep. 2023, pp. 347–362. doi: 10.1007/s10032-023-00432-z.
S. Edirisooriya, H.-W. Dong, J. McAuley, and T. Berg-Kirkpatrick, “An Empirical Evaluation of End-to-End Polyphonic Optical Music Recognition,” Aug. 2021, [Online]. Available: http://arxiv.org/abs/2108.01769
A. Ríos-Vila, M. Esplà-Gomis, D. Rizo, P. J. Ponce de León, and J. M. Iñesta, “Applying Automatic Translation for Optical Music Recognition’s Encoding Step,” Applied Sciences, vol. 11, no. 9, p. 3890, Apr. 2021, doi: 10.3390/app11093890.
P. Torras, A. Barao, L. Kang, and A. Fornes, “Proceedings of the 4th International Workshop on Reading Music Systems,” Improving Handwritten Music Recognition through Language Model Integration, Nov. 2022, doi: 10.48550/arXiv.2211.13285.
Karsono, J. Daryanto, Rukayah, T. Budiharto, A. Yahya, and M. Anton Nugroho, “Musescore Software Training for the Development of TPACK-Based Music Learning in Elementary Schools,” Dinamisia : Jurnal Pengabdian Kepada Masyarakat, vol. 7, no. 4, pp. 1128–1138, Aug. 2023, doi: 10.31849/dinamisia.v7i4.14807.
Khawaja Tehseen Ahmed, N. Shahid, S. B. ud D. Tahir, A. Shabir, M. Y. Khan, and M. Hameed, “Signature Elevation Using Parametric Fusion for Large Convolutional Network for Image Extraction,” VFAST Transactions on Software Engineering, vol. 12, no. 2, pp. 174–191, Jun. 2024, doi: 10.21015/vtse.v12i2.1810.
H. Talebi and P. Milanfar, “Learning to Resize Images for Computer Vision Tasks,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Oct. 2021, pp. 487–496. doi: 10.1109/ICCV48922.2021.00055.
R. Deléarde, C. Kurtz, P. Dejean, and L. Wendling, “Segment my object: A pipeline to extract segmented objects in images based on labels or bounding boxes,” in VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, SciTePress, 2021, pp. 618–625. doi: 10.5220/0010324006180625.
T. Kumar, A. Mileo, R. Brennan, and M. Bendechache, “Image Data Augmentation Approaches: A Comprehensive Survey and Future directions,” Jan. 2023, doi: 10.1109/ACCESS.2024.3470122.
D. Say, S. Zidi, S. M. Qaisar, and M. Krichen, “Automated Categorization of Multiclass Welding Defects Using the X-ray Image Augmentation and Convolutional Neural Network,” Sensors, vol. 23, no. 14, Jul. 2023, doi: 10.3390/s23146422.
M. Alfaro-Contreras and J. J. Valero-Mas, “Exploiting the Two-Dimensional Nature of Agnostic Music Notation for Neural Optical Music Recognition,” Applied Sciences, vol. 11, no. 8, p. 3621, Apr. 2021, doi: 10.3390/app11083621.
B. Ait Skourt, A. El Hassani, and A. Majda, “Mixed-pooling-dropout for convolutional neural network regularization,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 8, pp. 4756–4762, Sep. 2022, doi: 10.1016/j.jksuci.2021.05.001.
Z. Lyu et al., “Back-Propagation Neural Network Optimized by K-Fold Cross-Validation for Prediction of Torsional Strength of Reinforced Concrete Beam,” Materials, vol. 15, no. 4, Feb. 2022, doi: 10.3390/ma15041477.
M. Mamun, A. Farjana, M. Al Mamun, and M. S. Ahammed, “Lung cancer prediction model using ensemble learning techniques and a systematic review analysis,” in 2022 IEEE World AI IoT Congress, AIIoT 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 187–193. doi: 10.1109/AIIoT54504.2022.9817326.
D. S. Soper, “Greed is good: Rapid hyperparameter optimization and model selection using greedy k-fold cross validation,” Electronics (Switzerland), vol. 10, no. 16, Aug. 2021, doi: 10.3390/electronics10161973.
R. Artanto, W. Sujana, I. Made, and A. Agastya, “Application of Machine Learning Algorithm for Osteoporosis Disease Prediction System,” Journal of Applied Informatics and Computing (JAIC), vol. 8, no. 2, p. 304, Dec. 2024, [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
T. Safa Nabila and A. Salam, “Classification of Brain Tumors by Using a Hybrid CNN-SVM Model,” Journal of Applied Informatics and Computing (JAIC), vol. 8, no. 2, p. 241, Dec. 2024, [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.
J. Qu, C. Song, J. Bai, G. Feng, X. Shi, and J. Ma, “A Machine-Learning-Based Method for Identifying the Failure Risk State of Fissured Sandstone under Water–Rock Interaction,” Sensors, vol. 24, no. 17, Sep. 2024, doi: 10.3390/s24175752.
R. R. Adhitya, Wina Witanti, and Rezki Yuniarti, “Perbandingan Metode Cart Dan Naïve Bayes Untuk Klasifikasi Customer Churn,” INFOTECH journal, vol. 9, no. 2, pp. 307–318, Jul. 2023, doi: 10.31949/infotech.v9i2.5641.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Classification of Key and Time Signature in Western Musical Notation by using CRNN Algorithm with Bounding Box
Pages: 2193-2203
Copyright (c) 2025 Dennis Adiwinata Irwan Soeroso, Sri Winarno, Ardytha Luthfiarta, Firda Ayu Dwi Aryanti

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).