Random Feature Maps for the Itemset Kernel
Although kernel methods efficiently use feature combinations without computing them directly, they do not scale well with the size of the training dataset. Factorization machines (FMs) and related models, on the other hand, enable feature combinations efficiently, but their optimization generally requires solving a non-convex problem. We present random feature maps for the itemset kernel, which uses feature combinations, and includes the ANOVA kernel, the all-subsets kernel, and the standard dot product. Linear models using one of our proposed maps can be used as an alternative to kernel methods and FMs, resulting in better scalability during both training and evaluation. We also present theoretical results for a proposed map, discuss the relationship between factorization machines and linear models using a proposed map for the ANOVA kernel, and relate the proposed feature maps to prior work. Furthermore, we show that the maps can be calculated more efficiently by using a signed circulant matrix projection technique. Finally, we demonstrate the effectiveness of using the proposed maps for real-world datasets..