Method comparison of Naïve Bayes, Logistic Regression, and SVM for Analyzing Movie Reviews
Abstract
A film can be categorized as a successful film based on the reviews given by the critics. The reviews can range from professional critics to public reviews from the general audience. Due to a large number of reviews and opinions on a film, this study aims to create a sentiment analysis model and compare the methods used to analyze datasets from a movie review. Sentiment Analysis is a method for studying and analyzing opinions, then classifying these opinions into several classes. This research will use the Naïve Bayes method, Logistic Regression, and Support Vector Machine (SVM) to analyze film review data. The film review dataset used is a collection of film reviews taken from the Rotten Tomatoes website and will be pre-processed before implementing the Naïve Bayes, Logistic Regression, and SVM methods. The SVM classifier with 80:20 data splitting has the best performance, with a result of 99.4% accuracy score and 93.5% F1 score.
Downloads
References
M. del Vecchio, A. Kharlamov, G. Parry, and G. Pogrebna, “The Data science of Hollywood: Using emotional arcs of movies to drive business model innovation in entertainment industries,” arXiv preprint arXiv:1807.02221, 2018, doi: https://doi.org/10.1080/01605682.2019.1705194.
A. Vo, “The history of Rotten Tomatoes: A Uniquely Asian-American success story,” May 22, 2021. https://editorial.rottentomatoes.com/article/rotten-tomatoes-asian-american
L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no. 4, p. e1253, 2018, doi: 10.1002/widm.1253.
T. T. Thet, J.-C. Na, and C. S. G. Khoo, “Aspect-based sentiment analysis of movie reviews on discussion boards,” J Inf Sci, vol. 36, no. 6, pp. 823–848, 2010, doi: 10.1177/0165551510388123.
A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on sentiment analysis using review dataset,” in 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), 2019, pp. 266–270. doi: 10.1109/SMART46866.2019.9117512.
A. Bayhaqy, S. Sfenrianto, K. Nainggolan, and E. R. Kaburuan, “Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes,” in 2018 international conference on orange technologies (ICOT), 2018, pp. 1–6. doi: 10.1109/ICOT.2018.8705796.
H. Hasanli and S. Rustamov, “Sentiment analysis of Azerbaijani twits using logistic regression, Naive Bayes and SVM,” in 2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT), 2019, pp. 1–7. doi: 10.1109/AICT47866.2019.8981793.
H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, and R. S. Basuki, “Hoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, pp. 799–806, 2020, doi: 10.12928/telkomnika.v18i2.14744.
K. Dashtipour, M. Gogate, A. Adeel, H. Larijani, and A. Hussain, “Sentiment analysis of persian movie reviews using deep learning,” Entropy, vol. 23, no. 5, p. 596, 2021, doi: 10.3390/e23050596.
S. Qaiser and R. Ali, “Text mining: use of TF-IDF to examine the relevance of words to documents,” Int J Comput Appl, vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.
G. Yunanda, D. Nurjanah, and S. Meliana, “Recommendation system from microsoft news data using TF-IDF and cosine similarity methods,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 277–284, 2022, doi: 10.47065/bits.v4i1.1670.
P. H. Gunawan, T. D. Alhafidh, and B. A. Wahyudi, “The Sentiment Analysis of Spider-Man: No Way Home Film Based on IMDb Reviews,” Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 6, no. 1, pp. 177–182, 2022, doi: 10.29207/resti.v6i1.3851.
M. E. Shipe, S. A. Deppen, F. Farjah, and E. L. Grogan, “Developing prediction models for clinical use using logistic regression: an overview,” J Thorac Dis, vol. 11, no. Suppl 4, p. S574, 2019, doi: 10.21037/jtd.2019.01.25.
J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.
C. Nanda, M. Dua, and G. Nanda, “Sentiment analysis of movie reviews in hindi language using machine learning,” in 2018 International Conference on Communication and Signal Processing (ICCSP), 2018, pp. 1069–1072. doi: 10.1109/ICCSP.2018.8524223.
M. Yasen and S. Tedmori, “Movies reviews sentiment analysis and classification,” in 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 2019, pp. 860–865. doi: 10.1109/JEEIT.2019.8717422.
K. Mouthami, K. N. Devi, and V. M. Bhaskaran, “Sentiment analysis and classification based on textual reviews,” in 2013 international conference on Information communication and embedded systems (ICICES), 2013, pp. 271–276. doi: 10.1109/ICICES.2013.6508366.
C. Manning, “Introduction to Information Retrieval,” Cambridge University Press, 2008. https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
P. Schratz, J. Muenchow, E. Iturritxa, J. Richter, and A. Brenning, “Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data,” Ecol Modell, vol. 406, pp. 109–120, 2019, doi: 10.1016/j.ecolmodel.2019.06.002.
S. Ambesange, A. Vijayalaxmi, S. Sridevi, and B. S. Yashoda, “Multiple heart diseases prediction using logistic regression with ensemble and hyper parameter tuning techniques,” in 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), 2020, pp. 827–832. doi: 10.1109/WorldS450073.2020.9210404.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Method comparison of Naïve Bayes, Logistic Regression, and SVM for Analyzing Movie Reviews
Pages: 1714−1720
Copyright (c) 2023 Muhammad Maulidan Aziz, Mahendra Dwifebri Purbalaksono, Adiwijaya Adiwijaya

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).





















