Share via


FAST Search Content SSA Architecture Demystified

I was recently working with a large enterprise customer trying to understand how they should design their FAST Search Content SSA architecture to handle their indexing throughput, indexing priority and high availability requirements for index freshness. Prior to this engagement, this element of the FAST Search architecture was largely a black box to me. From what I understood, all you needed to do was create a content SSA, define your data sources and let the system do its thing. Even for the large content scenario for FAST Search, the primary reason for multiple crawler components was to accommodate the throughput limitations of a single network interface. It wasn't  as though we would scale the "content SSA" to accommodate our content throughput requirements. After doing some surface-level research, I decided to dig deeper.

The FAST Search Content SSA is really just a logical, or abstract, component that encompasses all of the crawler components and the search service administration component. So, in order to scale and create a highly-available content SSA, we should look at those components.

The search service administration component is primarily just an abstraction on top of the search service configuration database. Unfortunately, the search service admin component does not provide any automatic failover mechanisms. If the server hosting the service goes down, so does your feeding (but not your search). However, it's just a few quick lines of PowerShell to bring it back up... and a backup server to bring it back up on, of course. 

For the search service configuration database itself, that's not my area of expertise. You should design your SQL infrastructure in the proper manner so that it is not your single point of failure using SQL best practices for high availability.

For the crawler components, their state is also stored in the crawler databases. If a server hosting a crawler component fails, the crawler can easily be restored by creating a new crawler component on another server and pointing it at your crawler database. Again, high availabiity for that database is outside the scope of my expertise but I'm sure the rest of the interwebs are filled with details.