Powerset team resumes HBase contributions
by Bryan Kirschner on October 14, 2008 06:57pm
It is just two months since Microsoft finalized the acquisition of Powerset, a San Francisco-based search and natural language company. Powerset's goals are to "change the way humans interact with computers through language"- improving search by indexing Web pages based on the meaning expressed in them rather than just the literal words.
Collaboration between the Powerset team and their new colleagues in Live Search has already resulted in some integration projects: Freebase Answers, improved captions for Wikipedia results, and new related searches using the Factz engine.
The application of Powerset's technology to Live Search will enable Live Search to more quickly surface the most relevant information, resulting in improvements to the end-user experience. The Powerset acquisition is an important part of Live Search's strategy, and HBase is key to Powerset's ongoing success and will also open more opportunities for other Live Search projects as well as for the broader community to move the whole web forward.
But what's especially notable is that the Powerset team has resumed contributions to HBase, an open-source, column-oriented, distributed database written in Java. The contributions relate to infrastructural storage technology enabling large scale data processing.
HBase, which is an important component of Powerset's development, is developed as part of the Apache Software Foundation's Hadoop project, and runs on top of the Hadoop Distributed File System, providing BigTable-type capabilities. (HBase initially started as a contribution to Hadoop before becoming a full sub-project of Hadoop in January 2008.)
For the past year and a half, Powerset has sponsored two full-time developers to work on HBase; Michael Stack and Jim Kellerman are also on the Hadoop Project Management Committee. Through the continued work of these developers, Microsoft will help improve HBase, which receives significant lift from the active community that supports the project.
Technology companies and communities have always collaborated (see this great research overview). There are some great examples in the past of Microsoft being a creative, agile leader - one of my favorites being the Most Valuable Professional (MVP) Program, which had its origin in organic, outside-in cooperation:
"Way back in the dark ages, Microsoft provided a great deal of technical support on CompuServe. The CompuServe FoxPro forum was extremely busy and Calvin Hsia, then an independent developer, now Developer Lead on the Fox team, created what we called "Calvin's List." It was a listing of the number of postings by person, including info on both messages sent and received. ...As the story goes, some of the Microsoft people jumped on Calvin's List as a way to identify high contributors, and thus was born the MVP program."
But if you look at how open source in particular has changed the industry from 1998 onward, as other vendors figured out ways to interact with open source, we simply haven't been the first, the fastest, or the most creative.
That history is a fact of life. But so are the implications from studying what has happened as firms and communities find ways to work together (I have a small quibble with the choice of title but not with the main point of A man on the inside: Unlocking communities as complementary assets - a "woman on the inside" would be just as effective...).
The conclusion is unambiguous: there are mutual opportunities that come from openness to working together. We're just scratching the surface on the range of opportunities for Microsoft to participate in and contribute to open source communities in ways that are good for customers, good for communities - and good for business.
The next ten years of software will also be a time of growth and change, where both open source and Microsoft communities will grow together, so it is exciting to see contribution to HBase join contribution to ADOdb, a popular data access layer for PHP used by many applications (this was Microsoft's first code contribution to PHP projects, but not the last), and OpenPegasus, an important part of System Center's new cross-platform approach.
But it is not unexpected-and others will follow.