Amir Globerson and Naftali Tishby, The Hebrew University
Finding effective low dimensional features from empirical co-occurrence data is one of the most fundamental problems in machine learning and complex data analysis. One principled approach to this problem is to represent the data in low dimension with minimal loss of the information contained in the original data. In this paper we present a novel information theoretic principle and algorithm for extracting low dimensional representations, or feature-vectors, that capture as much as possible of the mutual information between the variables. Unlike previous work in this direction, here we do not cluster or quantize the variables, but rather extract continuous feature functions directly from the co-occurrence matrix, using a converging iterative projection algorithm. The obtained features serve, in a well defined way, as approximate sufficient statistics that capture the information in a joint sample of the variables. Our approach is both simpler and more general than clustering or mixture models and is applicable to a wide range of problems, from document categorization to bioinformatics and analysis of neural codes.