Xiaohua Hu and Nick Cercone, University of Regina, Canada
Many data mining algorithms developed recently are based on inductive learning methods. Very few are based on similarity-based learning. However, similarity-based learning accrues advantages, such as simple representations for concept descriptions, low incremental learning costs, small storage requirements, etc. We present a similarity-based learning method from databases in the context of rough set theory. Unlike the previous similarity-based learning methods, which only consider the syntactic distance between instances and treat all attributes equally important in the similarity measure, our method can analyse the attribute in the databases by using rough set theory and identify the relevant attributes to the task attributes. We also eliminate super uous attributes for the task attribute and assign a weight to the relevant attributes according to their significance to the task attributes. Our similarity measure takes into account the semantic information embedded in the databases.