Classical online learning techniques enforce a prior distribution on the objective to be optimized in order to induce model sparsity. Such prior distributions are chosen with mathematical convenience in mind, but not necessarily for being the best priors. The Minimum Description Length (MDL) principle is usually used with two pass strategies, one for feature selection, and a second one for optimization with the selected features.
An approach inspired by the Minimum Description Length principle is proposed for adaptively selecting and regularizing features during online learning based on their usefulness in improving the objective. The approach eliminates noisy or useless features from the optimization process, leading to improved loss. By utilizing the MDL principle, this approach enables an optimizer to reduce the problem dimensionality to the subspace of the feature space for which the smallest loss is obtained. The approach can be tuned for trading off between model sparsity and accuracy. Empirical results on large scale practical real-world systems demonstrate how it improves such tradeoffs. Huge model size reductions can be achieved with no loss in performance relative to standard techniques, while moderate loss improvements (which can translate to large regret improvements) are achieved with moderate size reductions. The results also demonstrate that overfitting is mitigated by this approach. Analysis shows that the approach can achieve the loss of optimizing with the best feature subset.
Gil Shamir received the B.Sc. (Cum Laude), and M.Sc. degrees from the Technion, Israel - Institute of Technology, Haifa, Israel in 1990 and 1997, respectively, and the Ph.D. degree from the University of Notre Dame, Notre Dame, IN, U.S.A. in 2000, all in electrical engineering.
His main research interests include information theory, machine learning, coding and communication theory.
Dr. Shamir received an NSF CAREER award in 2003