ANALISIS KOMPARATIF ALGORITMA MACHINE LEARNING  UNTUK MENDETEKSI MALICIOUS URL BERBASIS FITUR GANDA

Allan Desi Alexander; Joni Warta; Hendarman Lubis; Asep Ramdhani Mahbub; Rasim Rasim

doi:10.52362/jmijayakarta.v5i3.2101

Allan Desi Alexander Universitas Bhayagkara Jakarta Raya
Joni Warta Universitas Bhayagkara Jakarta Raya
Hendarman Lubis Universitas Bhayagkara Jakarta Raya
Asep Ramdhani Mahbub Universitas Bhayagkara Jakarta Raya
Rasim Rasim Universitas Bhayangkara Jakarta Raya, Indonesia

DOI: https://doi.org/10.52362/jmijayakarta.v5i3.2101

Abstract

Malicious URL Detection (MUD) merupakan komponen esensial dalam pertahanan siber, mengingat kerugian finansial global yang disebabkan oleh phishing, penyebaran malware, dan serangan botnet IoT. Pendekatan tradisional seperti blacklisting terbukti tidak efektif melawan URL yang baru dibuat atau polymorphic. Penelitian ini menyajikan analisis komparatif ekstensif dari tiga kelas algoritma utama: Ensemble Learning (Random Forest/RF), Kernel Methods (Support Vector Machine/SVM), dan Deep Learning (DL), dalam mengklasifikasikan URL yang berpotensi berbahaya. Data yang digunakan bersumber dari repositori publik URLhaus, yang sangat fokus pada malware download, khususnya kampanye botnet Mozi dan Mirai. Metodologi studi ini menekankan pada rekayasa fitur multi-modal, yang menggabungkan fitur leksikal, berbasis host/domain, dan fitur berbasis metadata (tag malware). Kinerja model dievaluasi menggunakan metrik yang sensitif terhadap keamanan siber, yaitu Presisi, Recall, dan F1-Score, untuk meminimalisir False Negatives. Hasil analisis memperlihatkan bahwa meskipun model DL mencapai akurasi tertinggi, Random Forest menawarkan keseimbangan optimal antara kinerja deteksi yang kuat dan efisiensi komputasi, menjadikannya ideal untuk implementasi real-time dalam sistem deteksi ancaman.

Malicious URL Detection (MUD) is an essential component of cyber defense, given the global financial losses caused by phishing, malware distribution, and IoT botnet attacks. Traditional approaches such as blacklisting have proven ineffective against newly created or polymorphic URLs. This study presents an extensive comparative analysis of three main classes of algorithms: Ensemble Learning (Random Forest/RF), Kernel Methods (Support Vector Machine/SVM), and Deep Learning (DL), in classifying potentially malicious URLs. The data used is sourced from the public repository URLhaus, which focuses heavily on download malware, specifically the Mozi and Mirai botnet campaigns. The study's methodology emphasizes multi-modal feature engineering, combining lexical, host/domain-based, and metadata-based features (malware tags). Model performance is evaluated using cybersecurity-sensitive metrics, namely Precision, Recall, and F1-Score, to minimize False Negatives. The analysis results show that although the DL model achieves the highest accuracy, Random Forest offers an optimal balance between strong detection performance and computational efficiency, making it ideal for real-time implementation in threat detection systems.

References

[1] F. O. Catak, K. Sahinbas, and V. Dörtkardeş, “Malicious URL detection using machine learning,” Artif. Intell. Paradig. Smart Cyber-Physical Syst., vol. 1, no. 1, pp. 160–180, 2020, doi: 10.4018/978-1-7998-5101-1.ch008.
[2] Y. Tian, Y. Yu, J. Sun, and Y. Wang, “From past to present: A survey of malicious URL detection techniques, datasets and code repositories,” Comput. Sci. Rev., vol. 58, p. 100810, 2025, doi: 10.1016/j.cosrev.2025.100810.
[3] U. Sabeel, S. S. Heydari, K. El-Khatib, and K. Elgazzar, “Unknown, Atypical and Polymorphic Network Intrusion Detection: A Systematic Survey,” IEEE Trans. Netw. Serv. Manag., vol. 21, no. 1, pp. 1190–1212, Feb. 2024, doi: 10.1109/TNSM.2023.3298533.
[4] M. R. Naeem, R. Amin, M. Farhan, F. S. Alsubaei, E. Alsolami, and M. D. Zakaria, “Cyber security Enhancements with reinforcement learning: A zero-day vulnerabilityu identification perspective,” PLoS One, vol. 20, no. 5, p. e0324595, May 2025, doi: 10.1371/journal.pone.0324595.
[5] S. A. Habtor and A. H. H. Dahah, “Machine-Learning Classifiers for Malware Detection Using Data Features,” J. ICT Res. Appl., vol. 15, no. 3, pp. 265–290, 2021, doi: 10.5614/ITBJ.ICT.RES.APPL.2021.15.3.5.
[6] Y. Song, D. Zhang, J. Wang, Y. Wang, Y. Wang, and P. Ding, “Application of deep learning in malware detection: a review,” J. Big Data, vol. 12, no. 1, 2025, doi: 10.1186/s40537-025-01157-y.
[7] M. Azeem, D. Khan, S. Iftikhar, S. Bawazeer, and M. Alzahrani, “Analyzing and comparing the effectiveness of malware detection: A study of machine learning approaches,” Heliyon, vol. 10, no. 1, p. e23574, 2024, doi: 10.1016/j.heliyon.2023.e23574.
[8] S. Sankaranarayanan, A. T. Sivachandran, A. S. Mohd Khairuddin, K. Hasikin, and A. R. Wahab Sait, “An ensemble classification method based on machine learning models for malicious Uniform Resource Locators (URL),” PLoS One, vol. 19, no. 5, p. e0302196, May 2024, doi: 10.1371/journal.pone.0302196.
[9] O. Lamrabti, A. MezriOui, and A. Belmekki, “URL_trigger: Real time solution for Detection Malicious URL using Deep Learning,” 2023, pp. 328–334.
[10] V. Sai and C. Telaprolu, “Analyzing URL Structure for Machine Learning : Feature Engineering and Classification Applications,” vol. 2024, pp. 1–5, 2024.
[11] “Klasifikasi Malicious URL Menggunakan Algoritma Improved Random Forest dan Random Forest Berbasis Web,” J. Sains dan Inform., vol. 9, no. 1, pp. 8–14, Apr. 2023, doi: 10.22216/jsi.v9i1.1378.
[12] M. Al-Khwarizmi, “COMPARATIVE ANALYSIS OF RANDOM FOREST, SVM, AND LSTM ALGORITHMS FOR THREAT DETECTION IN INTERNET DOMAINS Ablayeva Oygul Ziyodullayevna Tashkent university of information technologies named after,” vol. 05, no. 05, pp. 1499–1504, 2025, [Online]. Available: https://www.academicpublishers.org/journals/index.php/ijai.
[13] S. Abad, H. Gholamy, and M. Aslani, “Classification of Malicious URLs Using Machine Learning,” Sensors, vol. 23, no. 18, p. 7760, Sep. 2023, doi: 10.3390/s23187760.
[14] D. Wahyudi, M. Niswar, and A. A. P. Alimuddin, “WEBSITE PHISING DETECTION APPLICATION USING SUPPORT VECTOR MACHINE (SVM),” J. Inf. Technol. Its Util., vol. 5, no. 1, pp. 18–24, Jun. 2022, doi: 10.56873/jitu.5.1.4836.
[15] D. A. Kusuma, A. R. Dewi, and A. R. Wijaya, “Perbandingan Random Forest dan Convolutional Neural Network dalam Memprediksi Peralihan Pelanggan,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 10, no. 2, pp. 186–194, 2025, doi: 10.14421/jiska.2025.10.2.186-194.
[16] Shantanu, B. Janet, and R. Joshua Arul Kumar, “Malicious URL Detection: A Comparative Study,” in 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Mar. 2021, pp. 1147–1151, doi: 10.1109/ICAIS50930.2021.9396014.
[17] S. F. N. Hikmah Adwin Adam, “Machine Learning-Driven Detection of Malicious URL: Comparative Analysis of Random Forest and SVMs,” vol. 8, no. July, pp. 33–42, 2024, [Online]. Available: http://ojs.uma.ac.id/index.php/jite.
[18] P. Y. P. Pratama, “Perancangan PH Meter Dengan Sensor PH Air Berbasis Arduino I Putu Yoga Pramesia Pratama a1 , Kadek Suar Wibawa a2 , I Made Agus Dwi Suarjaya a3,” vol. 3, no. 2, 2022.
[19] Z. Diko and K. Sibanda, “Comparative Analysis of Popular Supervised Machine Learning Algorithms for Detecting Malicious Universal Resource Locators,” J. Cyber Secur. Mobil., vol. 13, no. 5, pp. 1105–1128, 2024, doi: 10.13052/jcsm2245-1439.13513.
[20] A. F. Mahmud and S. Wirawan, “Deteksi Phishing Website menggunakan Machine Learning Metode Klasifikasi,” Sist. J. Sist. Inf. , vol. 13, no. 4, pp. 2540–9719, 2024, [Online]. Available: http://sistemasi.ftik.unisi.ac.id.

	Semua	Sejak 2017
Kutipan	577	577
indeks-h	11	11
indeks-i10	11	11

ANALISIS KOMPARATIF ALGORITMA MACHINE LEARNING UNTUK MENDETEKSI MALICIOUS URL BERBASIS FITUR GANDA

Abstract

References

Most read articles by the same author(s)

QUICK MENU