CSE-DT Features Selection Technique for Diabetes Classification
Matthew T. Ogedengbe, Charity O. Egbunu1.
Diabetes has become one of the world deadliest disease. It is a sickness which occurs as a result of increase in blood sugar level in the body. Most people living with it encounter various complications in their body organs if it remain undetected and untreated at the early stage. Most literatures considered all features of a diabetes dataset as risk factors in diagnosing diabetes and this has resulted to low classification accuracy and longer execution time since all the features in the dataset are involved in the classification process. Selecting the most relevant features as the risk factors improves the performance of classifiers in term of classification accuracy and other performance measures. This paper presents feature selection technique called Classifier Subset Evaluator (CSE) which selects most relevant risk factors for the prevalence of diabetes in the body. The selected features (risk factors) were passed to J48 decision tree (DT) classifier for training and testing, and the DT classified all the instances of the dataset based on these selected features. The CSE and DT were hybridized as a proposed Classifier Subset Evaluator Decision Tree (CSE-DT). The CSE-DT was experimented on Pima Indian Diabetes dataset (PIDD) acquired from the UCI data repository and implemented on Waikato Experiment for Knowledge Analysis (WEKA). The CSE-DT was compared with Naïve-Bayes, Support vector machine (SVM) and Decision Tree for the evaluation measure in terms of F-Measure, Precision, ROC, Recall and Accuracy. The results show that the CSE-DT attained a better classification accuracy value of 81.64% among others.
Affiliation:
- Federal University of Agriculture, Markudi, Nigeria
Download this article (This article has been downloaded 65 time(s))