Enhanced Detection of Indonesian Online Gambling Advertisements Using Multimodal Ensemble Deep Learning
DOI:
https://doi.org/10.63158/IJAIS.v2i2.49Keywords:
online gambling, NLP, Multimodal Deep Learning, Content Moderation, Ensemble LearningAbstract
The rapid growth of online gambling promotion on Indonesian social media creates significant challenges for automated moderation systems, particularly because the content often appears in multimodal forms, uses slang expressions, and disguises promotional intent. The purpose of this study is to improve the accuracy and robustness of gambling advertisement detection by proposing a multimodal ensemble deep learning framework that integrates information from text, images, and audio. The method combines three independent feature streams, namely native text, OCR-extracted text from images, and ASR-generated speech transcripts. These inputs are processed using three classifiers, namely CNN, BiLSTM, and IndoBERT, which are then fused using a weighted soft-voting ensemble strategy. A dataset consisting of 12,000 multimodal samples collected from Facebook, Instagram, TikTok, and YouTube was used for evaluation. The results show that the ensemble model achieves an accuracy of 95.42 percent, outperforming each individual classifier, with substantial improvements in handling noisy OCR and ASR outputs as well as implicit gambling slang. Compared with single-model baselines, the proposed approach reduces false positives by 18.6 percent and false negatives by 22.3 percent. The novelty of this study lies in the integration of multimodal feature streams with an optimized ensemble mechanism, enabling more reliable detection of concealed gambling promotional patterns. The findings provide a strong foundation for future research on adaptive moderation systems and real-time harmful content detection in Indonesian social media.
References
Q. Yang, Y. Wang, M. Song, Y. Jiang, and Q. Li, “Sonic strategies: unveiling the impact of sound features in short video ads on enterprise market entry performance,” J. Business-to-bus. Mark., vol. 32, no. 1, pp. 95–116, 2025.
A. Shalaby, “Classification for the digital and cognitive AI hazards: urgent call to establish automated safe standard for protecting young human minds,” Digit. Econ. Sustain. Dev., vol. 2, no. 1, p. 17, 2024.
J. Singer, A. Wöhr, and S. Otterbach, “Gambling operators’ use of advertising strategies on social media and their effects: A systematic review,” Curr. Addict. Reports, vol. 11, no. 3, pp. 437–446, 2024.
K. Ataallah et al., “Minigpt4-video: Advancing multimodal llms for video understanding with interleaved visual-textual tokens,” arXiv Prepr. arXiv2404.03413, 2024.
D. Liu et al., “What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations,” arXiv Prepr. arXiv2502.08279, 2025.
M. E. Almandouh, M. F. Alrahmawy, M. Eisa, M. Elhoseny, and A. S. Tolba, “Ensemble based high performance deep learning models for fake news detection,” Sci. Rep., vol. 14, no. 1, p. 26591, 2024.
S. J. Johnson, M. R. Murty, and I. Navakanth, “A detailed review on word embedding techniques with emphasis on word2vec,” Multimed. Tools Appl., vol. 83, no. 13, pp. 37979–38007, 2024.
G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification.,” Int. J. Electr. Comput. Eng., vol. 14, no. 1, 2024.
A. F. Hidayatullah, “Code-Mixed Sentiment Analysis on Indonesian-Javanese-English Text using Transformer Models,” in 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), IEEE, 2024, pp. 340–345.
A. NANYONGA, K. F. Joiner, U. Turhan, and G. Wild, “Comparative Analysis of Bert, Cnn, and Lstm Models for Classifying Aviation Safety Incidents in Australia,” Cnn, Lstm Model. Classifying Aviat. Saf. Incidents Aust..
S. Albelali and M. Ahmed, “Evaluating the Sensitivity of BiLSTM Forecasting Models to Sequence Length and Input Noise,” arXiv Prepr. arXiv2512.06926, 2025.
C. Shaw, P. LaCasse, and L. Champagne, “Exploring emotion classification of indonesian tweets using large scale transfer learning via IndoBERT,” Soc. Netw. Anal. Min., vol. 15, no. 1, p. 22, 2025.
S. Lecheheb, S. Boulehouache, and S. Brahimi, “Optimized automated analysis using AutoGluon-driven deep learning for advancing self-adaptive systems,” Computing, vol. 107, no. 11, pp. 1–24, 2025.
A. Pandey, J. Singh, and M. Kaur, “Bridging Text and Speech for Emotion Understanding: An Explainable Multimodal Transformer Fusion Framework with Unified Audio–Text Attribution,” J. Intell., vol. 13, no. 12, p. 159, 2025.
Published
Issue
Section
License
Copyright (c) 2025 M Ihksan Alfiansyah, Ari Muzakir (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.


