.
The Evolving Threat of Unstructured Spam Messages
As digital communication continues to grow rapidly, malicious actors constantly change their tactics. Modern spam messages no longer rely on simple keywords; instead, they utilize complex linguistic variations, text structural manipulation, and content obfuscation to mimic legitimate human conversations. Traditional rule-based security systems fail to adapt to these unstructured and dynamic text patterns, making automated machine learning solutions necessary.For years, IT systems deployed traditional models like Logistic Regression combined with statistical features like TF-IDF. While computationally efficient and stable, these systems treat words independently and completely ignore word order and sentence structure. Sequential deep learning models—such as Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU)—were engineered to solve this issue by capturing long-term dependencies within a text. However, prior to this study by Harliana, Hartatik, and Yudanuari, the industry lacked a fair evaluation framework, as previous experiments used different datasets, inconsistent preprocessing techniques, and varying evaluation metrics.
Standardizing the Experimental Framework
To resolve the inconsistencies in existing literature, the research team from Universitas Nahdlatul Ulama Blitar and Universitas AMIKOM Yogyakarta built a controlled empirical evaluation framework. They utilized a publicly available SMS spam dataset consisting of 5,572 messages. The dataset exhibited a stark class imbalance typical of real-world scenarios, containing 4,825 legitimate messages (ham) and only 747 spam messages. The team addressed this imbalance by applying an oversampling technique exclusively to the training subset. This balanced the minority class distribution and prevented data leakage into the testing subset, maintaining strict experimental integrity. The data was systematically split into an 80:20 ratio for training and testing. The text preprocessing pipeline was standardized across all four tested configurations:
- Logistic Regression (with TF-IDF): Served as the statistical, non-sequential baseline model.
- Recurrent Neural Network (RNN): A basic sequential deep learning model.
- Long Short-Term Memory (LSTM): An advanced network with specialized gating mechanisms to prevent data loss.
- Gated Recurrent Unit (GRU): A streamlined, computationally efficient alternative to LSTM.
Key Performance Findings and Trade-OffsThe controlled experiments revealed that all four models achieved exceptionally high classification capabilities, with overall accuracy scores ranging from 0.98 to 0.99. The traditional Logistic Regression baseline excelled at catching actual spam, securing a strong recall rate of 0.95. However, it registered the lowest precision score (0.93), generating 11 false positives. This indicates that keyword-dependent traditional machine learning often misclassifies legitimate messages as spam when they share identical vocabulary. Conversely, the basic RNN proved to be highly conservative. It achieved a near-perfect precision of 0.99 by triggering only 1 false alarm, but it missed a significant amount of actual spam, dropping its recall score down to 0.92 due to the vanishing gradient problem. The clear winners of the comparative study were the LSTM and GRU neural networks, both achieving peak F1-scores of 0.97. By leveraging gating structures to retain contextual data over text sequences, they maintained a precision of 0.99 while raising the recall rate to 0.94. Analysis of the training curves indicated that the GRU model offered slightly better stability and faster training convergence than the LSTM, making it a highly reliable architecture for text classification.
Real-World Impact and Industrial Applications
The research by Harliana, Hartatik, and Yudanuari delivers immense value to digital enterprises, software vendors, and cybersecurity policymakers. In professional email systems and messaging applications, a false positive where an essential business message or notification is incorrectly flagged as spam severely damages the user experience. By proving that LSTM and GRU models lower false positives down to just a single occurrence, this study provides a clear architectural roadmap for communication platforms aiming to maximize filtering reliability. Furthermore, the study confirms that traditional statistical models remain highly competitive. For startups, small businesses, or edge-computing environments with restricted processing power, Logistic Regression remains a practical, low-cost solution.
Research Profiles
Harliana, S.Kom., M.Kom. is a faculty member and leading computer science researcher at Universitas Nahdlatul Ulama Blitar, specializing in Natural Language Processing (NLP) and machine learning applications.
Hartatik, S.Si., M.Cs. is an academician and data science expert affiliated with Universitas AMIKOM Yogyakarta, focusing on computational intelligence and advanced algorithms.
Achmad Alvi Yudanuari, S.Kom., M.M. is a researcher at Universitas Nahdlatul Ulama Blitar whose professional work centers on information systems and data analytics.
Source
Harliana, Hartatik, Achmad Alvi Yudanuri (2026). Comparative Analysis of Traditional Machine Learning and Sequential Deep Learning Models for Spam Email Classification. Formosa Journal of Computer and Information Science (FJCIS), Vol. 5, No. 1, 2026
DOI: https://doi.org/10.55927/fjcis.v5i1.16502
URL: https://journal.formosapublisher.org/index.php/fjcis
0 Komentar