View Article |
Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy
Wah, Yap Bee1, Nurain Ibrahim2, Hamzah Abdul Hamid3, Shuzlina Abdul-Rahman4, Fong, Simon5.
Feature selection has been widely applied in many areas such as classification of spam emails, cancer cells, fraudulent claims, credit risk, text categorisation and DNA microarray analysis. Classification involves building predictive models to predict the target variable based on several input variables (features). This study compares filter and wrapper feature selection methods to maximise the classifier accuracy. The logistic regression was used as a classifier while the performance of the feature selection methods was based on the classification accuracy, Akaike information criteria (AIC), Bayesian information criteria (BIC), Area Under Receiver operator curve (AUC), as well as sensitivity and specificity of the classifier. The simulation study involves generating data for continuous features and one binary dependent variable for different sample sizes. The filter methods used are correlation based feature selection and information gain, while the wrapper methods are sequential forward and sequential backward elimination. The simulation was carried out using R, an open-source programming language. Simulation results showed that the wrapper method (sequential forward selection and sequential backward elimination) methods were better than the filter method in selecting the correct features.
Affiliation:
- Universiti Teknologi MARA, Malaysia
- Universiti Teknologi MARA, Malaysia
- Universiti Malaysia Perlis, Malaysia
- Universiti Teknologi MARA, Malaysia
- University of Macau, Macau
Download this article (This article has been downloaded 217 time(s))
|
|
Indexation |
Indexed by |
MyJurnal (2021) |
H-Index
|
3 |
Immediacy Index
|
0.000 |
Rank |
0 |
Indexed by |
Scopus 2020 |
Impact Factor
|
CiteScore (1.1) |
Rank |
Q3 (Agricultural and Biological Sciences (all)) Q3 (Environmental Science (all)) Q3¬¬- (Computer Science (all)) Q3 (Chemical Engineering (all)) |
Additional Information |
SJR (0.174) |
|
|
|