Learning Adaptive Random Features
Random Fourier features are a powerful framework to approximate shift invariant kernels with Monte Carlo integration, which has drawn considerable interest in scaling up kernel-based learning, dimensionality reduction, and information retrieval. In the literature, many sampling schemes have been proposed to improve the approximation performance. However, an interesting theoretic and algorithmic challenge still remains, i.e., how to optimize the design of random Fourier features to achieve good kernel approximation on any input data using a low spectral sampling rate? In this paper, we propose to compute more adaptive random Fourier features with optimized spectral samples (wj’s) and feature weights (pj’s). The learning scheme not only significantly reduces the spectral sampling rate needed for accurate kernel approximation, but also allows joint optimization with any supervised learning framework. We establish generalization bounds using Rademacher complexity, and demonstrate advantages over previous methods. Moreover, our experiments show that the empirical kernel approximation provides effective regularization for supervised learning.