Clickbait Classification Model on Online News with Semantic Similarity Calculation Between News Title and Content


  • Hero Akbar Ahmadi * Mail Universitas Bina Nusantara, Jakarta, Indonesia
  • Andry Chowanda Universitas Bina Nusantara, Jakarta, Indonesia
  • (*) Corresponding Author
Keywords: Clickbait; Text Summarization; Semantic Similarity; Text-to-text Transfer Transformer; IndoBERT

Abstract

Clickbait is a sensational title that makes us click internet links to an article, image, or video. Online content providers use clickbait to gain user traffic, that leads to increasing income from the placed ads in their page. To attract more and more traffic, online content providers write sensational and hyperbolic titles, and even misleading and not telling the whole story. This can give us, the internet consumer, wrong perspective, and half-truth. And nowadays, clickbait titles are worse than ever. Modern clickbait titles are not hyperbolic nor ambiguous enough, and sometimes very hard to identify. This paper aims to classify clickbait titles, to help humans identify clickbait and stop sharing more online content that contains clickbait and misleading titles. This model classifies clickbait by calculating semantic similarity between the article title and the summary of the article content. The article content is summarized by T5 (Text-to-text Transfer Transformer) model. IndoBERT is then used to calculate semantic similarity score between generated summary and the article title. The article title, content, summary, and semantic similarity score are used for clickbait classification with various algorithms. The result shows that by adding article content alongside article title in the classification process improves F1-score by 7% when classified with IndoBERT. In another future research, this model can be integrated with another application such as twitter or telegram bot to send us warning every time a user consumes online content with clickbait title. Thus, it can prevent online communities from sharing misleading information caused by clickbait

Downloads

Download data is not yet available.

References

G. C. Foundation, “What is clickbait?,” 24 June 2021. [Online]. Available: https://edu.gcfglobal.org/en/thenow/what-is-clickbait/1/.

D. Y. Hadiyat, “Clickbait on Indonesia Online Media,” Pekommas, vol. 4, p. 4, 2019.

P. Biyani, K. Tsioutsiouliklis dan J. Blackmer, “"8 Amazing Secrets for Getting More Clicks”: Detecting Clickbaits in News Streams Using Article Informality,” 2016.

A. Pujahari dan D. S. Sisodia, “Clickbait Detection using Multiple Categorization Techniques,” 2020.

A. Anand, T. Chakraborty dan N. Park, “We used Neural Networks to Detect Clickbaits: You won’t believe what happened Next!,” 2019.

N. Kaothanthong, S. Kongyoung dan T. Theeramunkong, “Headline2Vec: A CNN-based Feature for Thai Clickbait Headlines Classification,” INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND TECHNOLOGY, vol. 5, 2021.

H.-T. Zheng, J.-Y. Chen, X. Yao, A. K. Sangaiah, Y. Jiang dan C.-Z. Zhao, “Clickbait Convolutional Neural Network,” 2018.

O. Johnson, B. Lou, J. Zhong dan A. Kurenkov, “Saved You A Click: Automatically Answering Clickbait Titles,” arXiv:2212.08196, 2022.

S. Manjesh, T. Kanakagiri, V. P, V. Chettiar dan S. G, “Clickbait Pattern Detection and Classification of News Headlines using Natural Language Processing,” 2017.

A. Agrawal, “Clickbait Detection using Deep Learning,” 2016.

K. Shu, S. Wang, T. Le, D. Lee dan H. Liu, “Deep Headline Generation for Clickbait Detection,” 2018.

V. Indurthi, B. Syed, M. Gupta dan V. Varma, “Predicting Clickbait Strength in Online Social Media,” Proceedings of the 28th International Conference on Computational Linguistics, p. 4835–4846, 2020.

P. Xu, C.-S. Wu, A. Madotto dan P. Fung, “Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning,” Center for Artificial Intelligence Research (CAiRE), 2019.

R. Gothankar, F. D. Troia dan M. Stamp, “Clickbait Detection in YouTube Videos,” 2021.

D. Varshney dan D. K. Vishwakarma, “A unified approach for detection of Clickbait videos on YouTube using cognitive evidences,” 2021.

T. Xie, T. Le dan D. Lee, “CHECKER: Detecting Clickbait Thumbnails with Weak Supervision and Co-teaching,” 2021.

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li dan J. P. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” arXiv:1910.10683, vol. 3, 2020.

A. William dan Y. Sari, “CLICK-ID: A novel dataset for Indonesian clickbait headlines,” Data in Brief, vol. 32, 2020.

Kata.ai, “GitHub - kata-ai/indosum,” 2018. [Online]. Available: https://github.com/kata-ai/indosum. [Diakses 9 August 2022].

Cahya, “Hugging Face - cahya/t5-base-indonesian-summarization-cased,” [Online]. Available: https://huggingface.co/cahya/t5-base-indonesian-summarization-cased. [Diakses 9 August 2022].

B. Wilie, K. Vincentio, G. I. Winata, S. Cahyawijaya, X. Li, Z. Y. Lim, S. Soleman, R. Mahendra, P. Fung, S. Bahar dan A. Purwarianti, “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, p. 843–857, 2020.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Clickbait Classification Model on Online News with Semantic Similarity Calculation Between News Title and Content

Dimensions Badge
Article History
Submitted: 2023-01-26
Published: 2023-03-31
Abstract View: 1269 times
PDF Download: 690 times
How to Cite
Ahmadi, H., & Chowanda, A. (2023). Clickbait Classification Model on Online News with Semantic Similarity Calculation Between News Title and Content. Building of Informatics, Technology and Science (BITS), 4(4), 1986−1994. https://doi.org/10.47065/bits.v4i4.3030
Issue
Section
Articles