mailto:uumlib@uum.edu.my 24x7 Service; AnyTime; AnyWhere

Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance

Rafrastara, Fauzi Adi and Ghozi, Wildanil and Sani, Ramadhan Rakhmat and Handoko, Lekso Budi and Abdussalam, Abdussalam and Pramudya, Elkaf Rahmawan and M. Abdollah, Faizal (2025) Integrating Information Gain and Chi-Square for Enhanced Malware Detection Performance. Journal of Information and Communication Technology (JICT), 24 (1). pp. 79-101. ISSN 1675-414X

[thumbnail of JICT 24 01 2025 79-101.pdf]
Preview
PDF - Published Version
Available under License Attribution 4.0 International (CC BY 4.0).

Download (794kB) | Preview

Abstract

Malware represents a serious and continuously evolving threat in the modern digital environment. Detecting malware is essential to safeguard devices and systems from risks such as data corruption, data theft, account compromises, and unauthorized access that could result in total system takeover. As malware has progressed from its simpler, monomorphic variants to more sophisticated forms like oligomorphic, polymorphic, and metamorphic, a machine learning-based detection system is now required, surpassing the limitations of traditional signature-based methods. Recent studies have shown that this challenge can be addressed by employing machine learning algorithms for detection. Some studies have also implemented various feature selection methods to optimize detection efficiency. However, they continue to struggle with false positives and false negatives, striving to reach zero tolerance in malware detection. This study introduces the IGCS method, a combined feature selection approach that integrates InformationGain with Chi-Square (X²) to enhance both the effectiveness and efficiency of machine learning classifiers. Using IGCS, six classifiers—Random Forest, XGBoost, kNN, Decision Tree, Logistic Regression, and Naïve Bayes—achieved higher performance scores compared to other scenarios, such as when classifiers were combined with Information Gain, Chi-Square, PCA, or even without any feature selection. As a result, Random Forest with 30 features selected by IGCS proved superior to any combination of classifiers and feature selection methods in malware detection, achieving 99.0% accuracy, recall, precision, and F1-Score. This combination also demonstrated efficiency with a 52.5% decrease in training time and a 56.9% decrease in testing time

Item Type: Article
Uncontrolled Keywords: Malware detection, IGCS, feature selection, Information Gain, Chi-Square
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: School of Computing
Depositing User: Mdm. Rozana Zakaria
Date Deposited: 12 Aug 2025 13:25
Last Modified: 12 Aug 2025 13:25
URI: https://repo.uum.edu.my/id/eprint/32390

Actions (login required)

View Item View Item