AAAI Publications, The Twenty-Sixth International FLAIRS Conference

Font Size: 
Classification Performance of Rank Aggregation Techniques for Ensemble Gene Selection
David J. Dittman, Taghi M. Khoshgoftaar, Randall Wald, Amri Napolitano

Last modified: 2013-05-19


A very promising tool for data mining and bioinformatics is ensemble gene (feature) selection. Ensemble feature selection is the process of performing multiple runs of feature selection and then aggregating the results into a final ranked list. However, a central question of ensemble feature selection is how to aggregate the individual results into a single ranked feature list. There are a number of techniques available, ranging from simple to complex; the question is which one to choose. This paper is a comprehensive study on the use of nine different rank aggregation techniques for building classification models to use gene microarray data for distinguishing between cancerous and non-cancerous cells (or between patients who did or did not respond well to cancer treatment). The techniques are tested using an ensemble with twenty-five feature selection techniques and fifty iterations along with eleven bioinformatics datasets and five learners. Our results show that Lowest Rank is the worst performing aggregation technique by a clear margin. The other techniques perform similarly well and a simple technique (e.g., Mean aggregation) is preferable due to computation time and the limited possible benefit of a more complex technique. To our knowledge there has never been a study this intensive on the classification abilities of rank aggregation techniques in the field of bioinformatics.

Full Text: PDF