Using LucidWorks on Windows Azure (Part 1 of a multi-part MS Open Tech series)
LucidWorks Search on Windows Azure delivers a high-performance search service based on Apache Lucene/Solr open source indexing and search technology. This service enables quick and easy provisioning of Lucene/Solr search functionality on Windows Azure without any need to manage and operate Lucene/Solr servers, and it supports pre-built connectors for various types of enterprise data, structured data, unstructured data and web sites.
In June, we shared an overview of the LucidWorks Search service for Windows Azure. For this post, the first in a series, we’ll cover a few of the concepts you need to know to get the most out of the LucidWorks search service on Windows Azure. In future posts we’ll show you how to set up a LucidWorks service on Windows Azure and demonstrate how to integrate search with Web sites, unstructured data and structured data.
Options for Developers
Developers can add search to their existing Web Sites, or create a new Windows Azure Web site with search as a central function. For example, in future posts in this series, we’ll create a simple Windows Azure web site that will use the LucidWorks search service to index and search the contents of other Web sites. Then we’ll enable search from the same demo Web site against a set of unstructured data and MySQL structured data in other locations.
Overview: Documents, Fields, and Collections
LucidWorks creates an index of unstructured and structured data. Any individual item that is indexed and/or searched is called a Document. Documents can be a row in a structured data source or a file in an unstructured data source, or anything else that Solr/Lucene understands.
An individual item in a Document is called a Field. Same concept – fields can be columns of data in a structured source or a word in an unstructured source, or anything in between. Fields are generally atomic, in other words they cannot be broken down into smaller items.
LucidWorks calls groups of Documents that can be managed and searched independently of each other Collections. Searching, by default is on one collection at a time, but of course programmatically a developer can create search functionality that returns results for more than one Collection.
Security via Collections and Filters
Collections are a great way to restrict a group of users, controlled by access to Windows Azure Web sites and by LucidWorks. In addition, LucidWorks Admins can create Filters inside a Collection. User identity can be integrated with an existing LDAP directory, or managed programmatically via API.
LucidWorks additional Features
LucidWorks adds value to Solr/Lucene with some very useful UI enhancements that can be enabled without programming.
Persistent Queries and Alerts, Auto-complete, spellcheck and similar terms.
Users can create their own persistent queries. Search terms are automatically monitored and Alerts are delivered to a specified email address using the Name of the alert as the subject line. You can also specify how often the persistent query should check for new data and how often alerts are generated.
Search term Typeahead can be enabled via LucidWorks’ auto-complete functionality. Auto-complete tracks the characters the user has already entered and displays terms that start with those characters.
When results re displayed, LucidWorks can spell-check queries and offer alternative terms based on similar spellings of words and synonyms in the query.
Stopwords
Search engines use Stopwords to remove common words from queries and query indexes like “a”, “and”, or “for” that add no value to searches. LucidWorks has an editable list of Stopwords that is a great start to increase search relevance.
Increasing Relevance with Click Scoring
Click scoring tracks common queries and query results and tracks which results are most often selected against query terms and scores relevance based on the comparison results. Results with a higher relevance are placed higher in search result rankings, based on user activity.
LucidWorks on Windows Azure – Easy Deployment
The best part of LucidWorks is how easily Enterprise Search can be added as a service. In our next LucidWorks blog post we’ll cover how to quickly get up and running with Enterprise search by adding a LucidWorks service to an existing Windows Azure Web site.