A Stable and Adaptable Machine Learning Framework for Phishing Detection
Sheela S Maharajpet
Department of MCA, Acharya Institute of Technology, Bangalore – 560107, India.
Shivi Dixit
Department of Computer Applications, Acharya Institute of Graduate Studies, Bangalore - 560107, India.
Hrishikesh Sharma *
Department of MCA, Acharya Institute of Technology, Bangalore – 560107, India.
*Author to whom correspondence should be addressed.
Abstract
Background: Phishing contributes to over one-third of security incidents globally, highlighting the urgent need for robust detection systems.
Aims: The aim of this study is to design and validate a phishing detection system that ensures accuracy, adaptability, and real-time deployment suitability. The system targets institutional and enterprise-level use, focusing on overcoming the shortcomings of traditional rule-based and blacklist approaches.
Study Design: An experimental research study was conducted to evaluate multiple machine learning algorithms for phishing detection. The study adopted a comparative design to identify the most stable and efficient model.
Place and Duration of Study: The work was carried out at the Department of MCA, Acharya Institute of Technology, Bangalore, India, between July 2025 and September 2025.
Methodology: A dataset of SMS messages, consisting of 5,559 messages labelled (phishing and legitimate), was pre-processed using tokenisation, stop-word removal, and vectorisation (TF-IDF and BoW). Lexical, structural, statistical, and semantic features were engineered. Six classifiers—Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Random Forest (RF), k-Nearest Neighbours (kNN), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM)—were trained and evaluated using Accuracy, Precision, Recall, and F1-score. Cross-validation was applied for stability testing. A Django-based web interface was implemented for real-time predictions.
Results: The proposed method uses many algorithms of machine learning with feature engineering to find phishing sites. Support Vector Machine achieved the best stability with 99.99% Accuracy, 98.99% Precision, 99.12% Recall, and 99.05% F1-score. MNB, kNN, and LSTM achieved near-perfect results, while CNN performed relatively lower (Accuracy 91.02%). Real-time system testing showed an average response time of 0.05 seconds per message.
Conclusion: The proposed phishing detection system demonstrates strong accuracy, efficiency, and adaptability. Its lightweight design and real-time performance make it suitable for deployment in institutional servers, email systems, and organisational networks, providing an effective defence against evolving phishing attacks.
Keywords: Phishing detection, machine learning, support vector machine, cyber threats, URL analysis