Toxicity Score and Sentiment Classification of Backpacker Content Reviews using SVM enhanced by SMOTE


  • Yerik Afrianto Singgalen * Mail Atma Jaya Catholic University of Indonesia, Jakarta, Indonesia
  • (*) Corresponding Author
Keywords: Content Reviews; Sentiment Analysis; Toxicity Detection; Backpacker Tourism

Abstract

This research explores the dynamics of backpacker tourism in Indonesia by analyzing online content from various regions, including Bandung, Dieng, Borobudur, Ijen, Bromo, Tumpak Sewu, Malang, Banyuwangi, and Bali. Using the Digital Content Reviews and Analysis Framework, the study systematically processed user-generated content to assess sentiment and toxicity levels. The analysis revealed that while most interactions were non-toxic, there were occasional spikes in harmful language, particularly in the categories of profanity and identity attacks. For example, toxicity scores in Malang, Banyuwangi, and Bali averaged 0.06995, with peaks reaching 0.78207, underscoring the need for ongoing content moderation. In addition, the study employed a Support Vector Machine (SVM) model enhanced by SMOTE to handle class imbalance. The model achieved an accuracy of 82.64% and a recall rate of 97.39%, demonstrating its effectiveness in identifying positive cases with minimal false negatives. The AUC scores, ranging from 0.970 to 0.979, indicated strong discriminatory power. These findings highlight the potential of using machine learning models to analyze large-scale, imbalanced datasets in tourism-related research. Overall, this study provides valuable insights into traveler perceptions of Indonesia’s backpacker destinations, emphasizing the importance of context in understanding online discourse. The integration of toxicity analysis and SVM modeling offers practical implications for improving tourism management, content moderation, and promoting sustainable tourism practices.

Downloads

Download data is not yet available.

References

G. H. Chen and S. (Sam) Huang, “Backpacker tourism: a perspective article,” Tour. Rev., vol. 75, no. 1, pp. 158–161, Jan. 2020, doi: 10.1108/TR-06-2019-0271.

M. R. Martins and R. A. Costa, “Backpacker conceptualisation criteria: discussion, clarification and operationalisation proposal,” Int. J. Tour. Cities, vol. 9, no. 1, pp. 220–243, Jan. 2023, doi: 10.1108/IJTC-07-2022-0175.

C. Kossen, N. McDonald, and P. McIlveen, “International backpackers’ experiences of precarious visa-contingent farmwork,” Career Dev. Int., vol. 26, no. 7, pp. 869–887, Jan. 2021, doi: 10.1108/CDI-12-2020-0320.

P. Andits, “Decay, dirt and backwardness: interpretations of the socialist heritage in Hungary by first and later generation Australian-Hungarians,” Int. J. Cult. Tour. Hosp. Res., vol. 14, no. 3, pp. 429–440, Jan. 2020, doi: 10.1108/IJCTHR-10-2019-0181.

R. Nandasena, A. M. Morrison, and J. A. Coca-Stefaniak, “Transformational tourism – a systematic literature review and research agenda,” J. Tour. Futur., vol. 8, no. 3, pp. 282–297, Jan. 2022, doi: 10.1108/JTF-02-2022-0038.

C. Lejealle, S. Castellano, and I. Khelladi, “The role of members’ lived experience in the evolution of online communities toward online communities of practice,” J. Knowl. Manag., vol. 26, no. 8, pp. 1968–1984, Jan. 2022, doi: 10.1108/JKM-03-2021-0250.

İ. A. Özen and E. Özgül Katlav, “Aspect-based sentiment analysis on online customer reviews: a case study of technology-supported hotels,” J. Hosp. Tour. Technol., vol. 14, no. 2, pp. 102–120, Jan. 2023, doi: 10.1108/JHTT-12-2020-0319.

V. Kaushal and R. Yadav, “Exploring luxury hospitality customer experience of Maldives tourists amidst COVID-19 pandemic,” Consum. Behav. Tour. Hosp., vol. 19, no. 1, pp. 140–165, Jan. 2024, doi: 10.1108/CBTH-04-2022-0085.

S. Mukhopadhyay, T. Jain, S. Modgil, and R. K. Singh, “Social media analytics in tourism: a review and agenda for future research,” Benchmarking, vol. 30, no. 9, pp. 3725–3750, Jan. 2023, doi: 10.1108/BIJ-05-2022-0309.

F. Es-Sabery, I. Es-Sabery, A. Hair, B. Sainz-De-Abajo, and B. Garcia-Zapirain, “Emotion Processing by Applying a Fuzzy-Based Vader Lexicon and a Parallel Deep Belief Network Over Massive Data,” IEEE Access, vol. 10, pp. 87870–87899, 2022, doi: 10.1109/ACCESS.2022.3200389.

F. Es-Sabery, I. Es-Sabery, A. Hair, B. Sainz-De-Abajo, and B. Garcia-Zapirain, “Emotion Processing by Applying a Fuzzy-Based Vader Lexicon and a Parallel Deep Belief Network Over Massive Data,” IEEE Access, vol. 10, no. July, pp. 87870–87899, 2022, doi: 10.1109/ACCESS.2022.3200389.

A. Ramadhanu, R. A. Mahessya, M. R. Zaky, and M. Isra, “Penerapan Teknologi Machine Learning Dengan Metode Vader Pada Aplikasi Sentimen Tamu Di Hotel Dymens,” JOISIE J. Inf. Syst. Informatics Eng., vol. 7, no. 1, pp. 165–173, 2023.

A. Mardjo and C. Choksuchat, “HyVADRF: Hybrid VADER–Random Forest and GWO for Bitcoin Tweet Sentiment Analysis,” IEEE Access, vol. 10, pp. 101889–101897, 2022, doi: 10.1109/ACCESS.2022.3209662.

R. K. Botchway, A. B. Jibril, Z. K. Oplatková, and M. Chovancová, “Deductions from a Sub-Saharan African Bank’s Tweets: A sentiment analysis approach,” Cogent Econ. Financ., vol. 8, no. 1, 2020, doi: 10.1080/23322039.2020.1776006.

N. Dhariwal, S. C. Akunuri, Shivama, and K. S. Banu, “Audio and Text Sentiment Analysis of Radio Broadcasts,” IEEE Access, vol. 11, pp. 126900–126916, 2023, doi: 10.1109/ACCESS.2023.3331226.

I. Awajan, M. Mohamad, and A. Al-Quran, “Sentiment Analysis Technique and Neutrosophic Set Theory for Mining and Ranking Big Data From Online Reviews,” IEEE Access, vol. 9, pp. 47338–47353, 2021, doi: 10.1109/ACCESS.2021.3067844.

S. Bengesi, T. Oladunni, R. Olusegun, and H. Audu, “A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets,” IEEE Access, vol. 11, pp. 11811–11826, 2023, doi: 10.1109/ACCESS.2023.3242290.

M. F. A. Gadi and M. A. Sicilia, “Annotators’ Selection Impact on the Creation of a Sentiment Corpus for the Cryptocurrency Financial Domain,” IEEE Access, vol. 11, pp. 131081–131088, 2023, doi: 10.1109/ACCESS.2023.3334260.

P. Thiengburanathum and P. Charoenkwan, “SETAR: Stacking Ensemble Learning for Thai Sentiment Analysis Using RoBERTa and Hybrid Feature Representation,” IEEE Access, vol. 11, no. July, pp. 92822–92837, 2023, doi: 10.1109/ACCESS.2023.3308951.

Y. A. Singgalen, “Toxicity Analysis and Sentiment Classification of Wonderland Indonesia by Alffy Rev using Support Vector Machine,” J. Sist. Komput. dan Inform., vol. 5, no. 3, pp. 538–548, 2024, doi: 10.30865/json.v5i3.7563.

E. Saad et al., “Determining the Efficiency of Drugs under Special Conditions from Users’ Reviews on Healthcare Web Forums,” IEEE Access, vol. 9, pp. 85721–85737, 2021, doi: 10.1109/ACCESS.2021.3088838.

T. Huu Do, M. Berneman, J. Patro, G. Bekoulis, and N. Deligiannis, “Context-Aware Deep Markov Random Fields for Fake News Detection,” IEEE Access, vol. 9, pp. 130042–130054, 2021, doi: 10.1109/ACCESS.2021.3113877.

T. Chaisen, P. Charoenkwan, C. G. Kim, and P. Thiengburanathum, “A Zero-Shot Interpretable Framework for Sentiment Polarity Extraction,” IEEE Access, vol. 12, no. September 2023, pp. 10586–10607, 2024, doi: 10.1109/ACCESS.2023.3322103.

Z. Li, C. Chan, Y. F. Chen, W. W. H. Chan, and U. L. Im, “Millennials’ Hotel Restaurant Visit Intention: An Analysis of Key Online Opinion Leaders’ Digital Marketing Content,” J. Qual. Assur. Hosp. Tour., vol. 00, no. 00, pp. 1–30, 2023, doi: 10.1080/1528008X.2023.2219467.

Y. A. Singgalen, “Social Network Analysis and Sentiment Classification of Extended Reality Product Content,” KLIK Kaji. Ilm. Inform. dan Komput., vol. 4, no. 4, pp. 2197–2208, 2024, doi: 10.30865/klik.v4i4.1712.

Y. A. Singgalen, “Sentiment Classification of Over-Tourism Issues in Responsible Tourism Content using Naïve Bayes Classifier,” J. Comput. Syst. Informatics, vol. 5, no. 2, pp. 275–285, 2024, doi: 10.47065/josyc.v5i2.4904.

H. Devan et al., “‘Power of Storytelling’: A Content Analysis of Chronic Pain Narratives on YouTube,” Can. J. Pain, vol. 5, no. 1, pp. 117–129, 2021, doi: 10.1080/24740527.2021.1929117.

C. Miguel, C. Clare, C. J. Ashworth, and D. Hoang, “Self-branding and content creation strategies on Instagram: A case study of foodie influencers,” Inf. Commun. Soc., pp. 1–21, 2023, doi: 10.1080/1369118X.2023.2246524.

M. F. A. Gadi and M. Á. Sicilia, “Annotators’ Selection Impact on the Creation of a Sentiment Corpus for the Cryptocurrency Financial Domain,” IEEE Access, vol. 11, pp. 131081–131088, 2023, doi: 10.1109/ACCESS.2023.3334260.

M. Khalid et al., “Novel Sentiment Majority Voting Classifier and Transfer Learning-Based Feature Engineering for Sentiment Analysis of Deepfake Tweets,” IEEE Access, vol. 12, pp. 67117–67129, 2024, doi: 10.1109/ACCESS.2024.3398582.

D. Amangeldi, A. Usmanova, and P. Shamoi, “Understanding Environmental Posts: Sentiment and Emotion Analysis of Social Media Data,” IEEE Access, vol. 12, no. March, pp. 33504–33523, 2024, doi: 10.1109/ACCESS.2024.3371585.

R. Chandra and V. Kulkarni, “Semantic and Sentiment Analysis of Selected Bhagavad Gita Translations Using BERT-Based Language Framework,” IEEE Access, vol. 10, pp. 21291–21315, 2022, doi: 10.1109/ACCESS.2022.3152266.

K. A. Alshaikh, O. A. Almatrafi, and Y. B. Abushark, “BERT-Based Model for Aspect-Based Sentiment Analysis for Analyzing Arabic Open-Ended Survey Responses: A Case Study,” IEEE Access, vol. 12, pp. 2288–2302, 2024, doi: 10.1109/ACCESS.2023.3348342.

A. He and M. Abisado, “Text Sentiment Analysis of Douban Film Short Comments Based on BERT-CNN-BiLSTM-Att Model,” IEEE Access, vol. 12, pp. 45229–45237, 2024, doi: 10.1109/ACCESS.2024.3381515.

P. Rajapaksha, R. Farahbakhsh, and N. Crespi, “BERT, XLNet or RoBERTa: The Best Transfer Learning Model to Detect Clickbaits,” IEEE Access, vol. 9, pp. 154704–154716, 2021, doi: 10.1109/ACCESS.2021.3128742.

J. Thieme, M. P. Hampton, C. Stoian, and K. Zigan, “The political economy of backpacker tourism: explorations of tourism actors’ embeddedness in Colombia,” Curr. Issues Tour., vol. 24, no. 13, pp. 1830–1855, 2021, doi: 10.1080/13683500.2020.1806793.

G. D. Apsarini and E. Ervina, “Keputusan Wisatawan Backpacker dalam Memilih Hostel di Kota Bandung,” J. Ilm. Univ. Muhammadiyah But., vol. 9, no. 4, pp. 864–874, 2023.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Toxicity Score and Sentiment Classification of Backpacker Content Reviews using SVM enhanced by SMOTE

Dimensions Badge
Article History
Submitted: 2024-09-24
Published: 2024-10-14
Abstract View: 528 times
PDF Download: 136 times
How to Cite
Singgalen, Y. (2024). Toxicity Score and Sentiment Classification of Backpacker Content Reviews using SVM enhanced by SMOTE. Journal of Information System Research (JOSH), 6(1), 85-99. https://doi.org/10.47065/josh.v6i1.5961
Section
Articles

Most read articles by the same author(s)

1 2 3 4 5 > >>