Abstract:
Dimensionality reduction (i.e., feature selection) is an essential technique in data science when handling high dimensional data such as cancer microarray samples. Cancer microarray experiments normally provide a large number of data which is assumed to contain many features, called, genes. However, genes can be either redundant or irrelevant, and thus be removed without incurring much loss of information. A small number of samples with a large number of genes is the major problem in microarray data analysis. In this study, a new machine learning method, namely, hybrid wrapper – filter feature selection is proposed for gene selection. This approach combines the genes selected by both filter and wrapper feature selection methods. Further, it uses a least priority feature elimination procedure where the genes with the lowest priority are eliminated. The propsoed technique is validated and evaluated on two microarray data sets namely, leukemia and colon cancer data sets. With gene selection performed by the proposed method, it helps to classify the leukemia microarray samples with perfect classification (100%) and to classify the colon cancer data set only with two misclassifications giving an accuracy of 90.5%. Results show that the proposed technique is extremely efficient in terms of the computational time too.