Vowpal Wabbit for Fast Learning
This blog post is authored by John Langford , Principal Researcher at Microsoft Research, New York City. Vowpal Wabbit is an open source machine learning (ML) system sponsored by Microsoft. VW is the essence of speed in machine learning, able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms.
The name has three references---the vorpal blade of Jabberwocky, the rabbit of Monty Python, and Elmer Fudd who hunted the wascally wabbit throughout my childhood.
VW sees use inside of Microsoft for ad relevance and other natural-language related tasks. Its external footprint is quite large with known applications across a broad spectrum of companies including Amazon, American Express, AOL, Baidu, eHarmony, Facebook, FTI Consulting, GraphLab, IBM, Twitter, Yahoo! and Yandex.
Why? Several tricks started with or are unique to VW:
VW supports online learning and optimization by default. Online learning is an old approach which is becoming much more common. Various alterations to standard stochastic gradient descent make the default rule more robust across many datasets, and progressive validation allows debugging learning applications in sub-linear time.
VW does Feature Hashing which allows learning from dramatically rawer representations, reducing the need to preprocess data, speeding execution, and sometimes even improving accuracy.
The conjunction of online learning and feature hashing imply the ability to learn from any amount of information via network streaming. This makes the system a reliable baseline tool.
VW has also been parallelized to be the most scalable public ML algorithm, as measured by the quantity of data effectively learned from, more information here.
VW has a reduction stack which allows the basic core learning algorithm to address many advanced problem types such as cost-sensitive multiclass classification. Some of these advanced problem types, such as for interactive learning exist only in VW.
There is more, of course, but the above gives you the general idea – VW has several advanced designs and technologies which make it particularly compelling for some applications.
In terms of deployment, VW runs as a library or a standalone daemon, but Microsoft Azure ML creates the possibility of cloud deployment. Imagine operationalizing a learned model for traffic from all over the world in seconds. Azure ML presently exposes the feature hashing capability inside of VW via a module of the same name.
What of the future? We hunt the Wascally Wabbit. Technology-wise, I intend to use VW to experiment with other advanced ML ideas:
Can ML be made radically easier to program: https://arxiv.org/pdf/1406.1837?
Is efficient exploration possible: https://arxiv.org/pdf/1402.0555?
Can we predict one of k things efficiently in logarithmic time: https://arxiv.org/pdf/1406.1822?
Good answers to these questions can change the scope of future ML applications wadically.
John
Follow my personal blog here.
Comments
- Anonymous
August 14, 2014
Arshak of Argyle Data is giving a workshop on Vowpal Wabbit in SF: www.eventbrite.com/e/terafeature-machine-learning-no-coding-required-tickets-11753600335 - Anonymous
August 16, 2014
Aha I always thought it was also a reference to the entity Rabbit in Vernor Vinge's Rainbows End. (Who it is strongly hinted is an AI) - Anonymous
August 18, 2014
Compelling for what applications please. Very exciting! - Anonymous
October 02, 2014
This post is authored by Sudarshan Raghunathan , Principal Development Lead for modules in the Microsoft - Anonymous
October 14, 2014
This post is authored by Dhruv Mahajan , Sundararajan Sellamanickam and Keerthi Selvaraj , Researchers - Anonymous
December 30, 2014
We launched this blog in June 2014 with the intent of sharing important advances and practical knowledge