Choh Man Teng, University of West Florida, USA
Imperfections in data can arise from many sources. The quality of the data is of prime concern to any task that involves data analysis. It is crucial that we have a good understanding of data imperfections and the effects of various noise handling techniques. We study here a number of noise handling approaches, namely, robust algorithms that are tolerant of some amount of noise in the data, filtering that eliminates the noisy instances from the input, and polishing which corrects the noisy instances rather than removing them. We evaluated the performance of these approaches experimentally. The results indicated that in addition to the traditional approach of avoiding overfitting, both filtering and polishing can be viable mechanisms for reducing the negative effects of noise. Polishing in particular showed significant improvement over the other two approaches in many cases, suggesting that even though noise correction adds considerable complexity to the task, it also recovers information not available with the other two approaches.