Deep Kernel Methods

Deep Kernel Methods

Since the appearance of the Support Vector Machines, a considerable amount of work has been done related to kernel methods. These methods transform the data into a feature space where the mathematics can be done implicitly by means of a kernel function. There is a robust mathematical theory that supports them and they pose a convex optimization problem. The use of the reproducing Kernel Hilbert Space offers many advantages in machine learning, such as the possibility to define powerful and flexible models, and the possibility to generalize many results and algorithms for linear models in Euclidean spaces. The developed learning algorithms are quite independent of the choice of the similarity measure, which allows the user to adapt the latter to the specific problems at hand without the need to reformulate the learning algorithm itself.


However, traditional kernel methods suffer many problems, especially with its computational complexity, which grows at least quadratically in relation to the sample size. This is due to the need to calculate the kernel matrix. Thus, kernel methods are very successful with small datasets but do not scale well on their own to large datasets. This is one of the reasons why deep learning architectures have recently replaced the kernel methods in many research areas.

Efforts have been made since the end of the last decade to improve the shortcomings mentioned above. The complexity of the algorithm’s decision function can be limited, or reduction methods can be applied in order to keep only necessary and important information. One of the most famous and used approaches to solve scalability problems are the Nyström method, which allows to compute a low dimensional approximation of the original kernel matrix, and the Random Fourier Features method, which allows to find an explicit approximation to the feature mapping.


Deep learning has already shown utility in improving the kernel methods drawbacks. Hierarchical architectures can be designed to obtain optimal kernel compositions, to preprocess the inputs of a kernel machine, or to simulate the calculation of the kernel to lighten the computational cost. Our work is to investigate what and how will be the best interactions and strategies that allow to obtain effective and efficient kernel methods that compete on par with deep learning.


Iván Yesid Castellanos Martínez

Santiago Toledo Cortes


Toledo-Cortés, S., Castellanos-Martinez, I. Y., & Gonzalez, F. A. (2018).  Large Scale Learning Techniques for Least Squares Support Vector Machines.  In Iberoamerican Congress on Pattern Recognition (pp. 3-11).