Method comparison of Naïve Bayes, Logistic Regression, and SVM for Analyzing Movie Reviews


  • Muhammad Maulidan Aziz * Mail Telkom University, Bandung, Indonesia
  • Mahendra Dwifebri Purbalaksono Telkom University, Bandung, Indonesia
  • Adiwijaya Adiwijaya Telkom University, Bandung, Indonesia
  • (*) Corresponding Author
Keywords: Movie Reviews; Sentiment Analysis; Naive Bayes; Logistic Regression; Support Vector Machine

Abstract

A film can be categorized as a successful film based on the reviews given by the critics. The reviews can range from professional critics to public reviews from the general audience. Due to a large number of reviews and opinions on a film, this study aims to create a sentiment analysis model and compare the methods used to analyze datasets from a movie review. Sentiment Analysis is a method for studying and analyzing opinions, then classifying these opinions into several classes. This research will use the Naïve Bayes method, Logistic Regression, and Support Vector Machine (SVM) to analyze film review data. The film review dataset used is a collection of film reviews taken from the Rotten Tomatoes website and will be pre-processed before implementing the Naïve Bayes, Logistic Regression, and SVM methods. The SVM classifier with 80:20 data splitting has the best performance, with a result of 99.4% accuracy score and 93.5% F1 score.

Downloads

Download data is not yet available.

Author Biographies

Mahendra Dwifebri Purbalaksono, Telkom University, Bandung

Dosen Pembimbing 

Mahendra Dwifebri P., S.Kom., M.Kom

Adiwijaya Adiwijaya, Telkom University, Bandung

Dosen Pembimbing

Prof. Dr. Adiwijaya, S.Si., M.Si.

References

M. del Vecchio, A. Kharlamov, G. Parry, and G. Pogrebna, “The Data science of Hollywood: Using emotional arcs of movies to drive business model innovation in entertainment industries,” arXiv preprint arXiv:1807.02221, 2018, doi: https://doi.org/10.1080/01605682.2019.1705194.

A. Vo, “The history of Rotten Tomatoes: A Uniquely Asian-American success story,” May 22, 2021. https://editorial.rottentomatoes.com/article/rotten-tomatoes-asian-american

L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 8, no. 4, p. e1253, 2018, doi: 10.1002/widm.1253.

T. T. Thet, J.-C. Na, and C. S. G. Khoo, “Aspect-based sentiment analysis of movie reviews on discussion boards,” J Inf Sci, vol. 36, no. 6, pp. 823–848, 2010, doi: 10.1177/0165551510388123.

A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on sentiment analysis using review dataset,” in 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), 2019, pp. 266–270. doi: 10.1109/SMART46866.2019.9117512.

A. Bayhaqy, S. Sfenrianto, K. Nainggolan, and E. R. Kaburuan, “Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes,” in 2018 international conference on orange technologies (ICOT), 2018, pp. 1–6. doi: 10.1109/ICOT.2018.8705796.

H. Hasanli and S. Rustamov, “Sentiment analysis of Azerbaijani twits using logistic regression, Naive Bayes and SVM,” in 2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT), 2019, pp. 1–7. doi: 10.1109/AICT47866.2019.8981793.

H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, and R. S. Basuki, “Hoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 18, no. 2, pp. 799–806, 2020, doi: 10.12928/telkomnika.v18i2.14744.

K. Dashtipour, M. Gogate, A. Adeel, H. Larijani, and A. Hussain, “Sentiment analysis of persian movie reviews using deep learning,” Entropy, vol. 23, no. 5, p. 596, 2021, doi: 10.3390/e23050596.

S. Qaiser and R. Ali, “Text mining: use of TF-IDF to examine the relevance of words to documents,” Int J Comput Appl, vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.

G. Yunanda, D. Nurjanah, and S. Meliana, “Recommendation system from microsoft news data using TF-IDF and cosine similarity methods,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 277–284, 2022, doi: 10.47065/bits.v4i1.1670.

P. H. Gunawan, T. D. Alhafidh, and B. A. Wahyudi, “The Sentiment Analysis of Spider-Man: No Way Home Film Based on IMDb Reviews,” Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 6, no. 1, pp. 177–182, 2022, doi: 10.29207/resti.v6i1.3851.

M. E. Shipe, S. A. Deppen, F. Farjah, and E. L. Grogan, “Developing prediction models for clinical use using logistic regression: an overview,” J Thorac Dis, vol. 11, no. Suppl 4, p. S574, 2019, doi: 10.21037/jtd.2019.01.25.

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.

C. Nanda, M. Dua, and G. Nanda, “Sentiment analysis of movie reviews in hindi language using machine learning,” in 2018 International Conference on Communication and Signal Processing (ICCSP), 2018, pp. 1069–1072. doi: 10.1109/ICCSP.2018.8524223.

M. Yasen and S. Tedmori, “Movies reviews sentiment analysis and classification,” in 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 2019, pp. 860–865. doi: 10.1109/JEEIT.2019.8717422.

K. Mouthami, K. N. Devi, and V. M. Bhaskaran, “Sentiment analysis and classification based on textual reviews,” in 2013 international conference on Information communication and embedded systems (ICICES), 2013, pp. 271–276. doi: 10.1109/ICICES.2013.6508366.

C. Manning, “Introduction to Information Retrieval,” Cambridge University Press, 2008. https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

P. Schratz, J. Muenchow, E. Iturritxa, J. Richter, and A. Brenning, “Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data,” Ecol Modell, vol. 406, pp. 109–120, 2019, doi: 10.1016/j.ecolmodel.2019.06.002.

S. Ambesange, A. Vijayalaxmi, S. Sridevi, and B. S. Yashoda, “Multiple heart diseases prediction using logistic regression with ensemble and hyper parameter tuning techniques,” in 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), 2020, pp. 827–832. doi: 10.1109/WorldS450073.2020.9210404.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Method comparison of Naïve Bayes, Logistic Regression, and SVM for Analyzing Movie Reviews

Dimensions Badge
Article History
Submitted: 2022-12-07
Published: 2023-03-29
Abstract View: 2029 times
PDF Download: 1101 times
How to Cite
Aziz, M., Purbalaksono, M., & Adiwijaya, A. (2023). Method comparison of Naïve Bayes, Logistic Regression, and SVM for Analyzing Movie Reviews. Building of Informatics, Technology and Science (BITS), 4(4), 1714−1720. https://doi.org/10.47065/bits.v4i4.2644
Issue
Section
Articles