*Nathan D. Ratliff, J. Andrew Bagnell*

We propose a novel variant of the conjugate gradient algorithm,Kernel Conjugate Gradient (KCG), designed to speed up learning for kernel machines with differentiable loss functions. This approach leads to a better conditioned optimization problem during learning. We establish an upper bound on the number of iterations for KCG that indicates it should require less than the *square root* of the number of iterations that standard conjugate gradient requires. In practice, for various differentiable kernel learning problems, we find KCG consistently, and significantly, outperforms existing techniques. The algorithm is simple to implement, requires no more computation per iteration than standard approaches, and is well motivated by Reproducing Kernel Hilbert Space (RKHS) theory. We further show that data-structure techniques recently used to speed up kernel machine approaches are well matched to the algorithm by reducing the dominant costs of training: function evaluation and RKHS inner product computation.

*Subjects:* 12. Machine Learning and Discovery; 9.2 Computational Complexity

*Submitted:* Sep 28, 2006

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.