February 2015
Volume 30 Number 2
Microsoft Azure - The Rise of Event Stream-Oriented Systems
By Christopher Bennage | February 2015
It’s all about data these days. Data helps us make informed decisions. Big Data helps us make informed and insightful decisions. Big streams of data help us make informed, insightful and timely decisions. These continuously flowing streams of data are often called event streams. It’s increasingly common to build software systems whose primary purpose is to process event streams.
Even across different industries and domains, there’s a discernable common architectural pattern around these event stream-oriented systems. This pattern for modern event stream-oriented systems plays the same fundamental role that the classic n-tier architecture held for traditional on-premises enterprise systems. I’ll start off by exploring a thumbnail sketch of this nascent pattern.
Identify the Pattern
First, I should clarify what is meant by the term event. Here, it means merely a bit of data signifying that something happened in a system. Events tend to be small in size, in the byte or kilobyte range. You’ll also hear terms like message, telemetry or even just data in place of event.
Next, there are event producers. These producers could be almost anything—connected cars, smart thermostats, game consoles, personal fitness devices or even a software system generating self-diagnostic events. It’s important to recognize, though, that in most of these systems, you’re dealing with numerous event producers.
Many systems anticipate numbers of event producers in the tens of thousands and ranging into tens of millions or more. This means these systems tend to have both high volume and high velocity. High volume means there’s a lot of data overall and high velocity means the data is generated frequently.
There are also event consumers. Consumers are the real heart of these types of systems. They’re responsible for analyzing, interpreting and responding to events. The number of consumers in a typical system might range from a couple to a couple dozen. Events aren’t routed to specific consumers. Each consumer is looking at the same set of events. In the context of Microsoft Azure, consumers are most likely cloud services.
Consider this example. There’s an event stream representing financial transactions. The event producers in this scenario are point-of-sale systems in retail stores. One consumer owns the responsibility to analyze the stream for fraudulent activity and raise alerts. Another consumer analyzes the same stream to make just-in-time supply chain optimizations. Finally, a third consumer is responsible for translating the events into long-term cold storage for later analytics.
When combined with the reality of high-volume and high-velocity events, this pattern of producers and consumers presents a few interesting problems:
- How do you prevent event production surges from overwhelming consumers? That is, how should the system respond when the rate of event production starts to exceed the rate of consumption?
- Because event velocity is high, how can you scale an individual event consumer?
The key to the problem is to use an event broker (see Figure 1). This is precisely the role performed by the recently released Azure Event Hubs.
Figure 1 The Azure Event Hub Architecture
So how, exactly, does using a broker such as Event Hubs solve the problems I’ve outlined so far?
Understand Event Hubs
Event Hubs provides the elasticity needed to absorb and persist events until downstream consumers can catch up. Event Hubs can effectively level out variability in the event stream rate so consumers don’t have to worry about it. Without this leveling, a receiving consumer might become overwhelmed and begin to fail.
Using a broker isolates the event producers and event consumers from each other. This isolation is especially important in more sophisticated versions of the architectural pattern where additional intermediaries are necessary between the producers and the consumers. Event Hubs is a point of composition, a seam or boundary in the architecture. All components that interact through an Event Hub don’t require specific knowledge of each other.
At this point, it might be easy to confuse Event Hubs with traditional enterprise messaging services that offer the same type of isolation. However, Event Hubs is different in several key ways that make it ideal for this architectural pattern.
Independent Consumers
Event Hubs uses a publish and subscribe model; however, each consumer has an independent view of the same event stream. In some traditional messaging systems with multiple consumers, messages are copied for each interested consumer. This can be inefficient in terms of speed and space, but the benefit is that each consumer has its own “inbox.” As a consumer processes messages, it removes them from its inbox. There’s no affect on other consumers because they have their own copies in their own inboxes.
With Event Hubs, there’s one set of immutable events and, because they’re immutable, there only needs to be one copy of each event. Likewise, consumers never remove events from the system. All consumers are looking at the same set of events. Because of this, consumers own the responsibility of keeping track of where they are in the event stream. They do this by tracking their offset in the event stream. There’s actually an API for this built into the SDK.
Time-Based Retention
In traditional messaging systems, the consumer is responsible for telling the system when it’s done with the message. The system can then get rid of the message. Because an Event Hubs consumer is responsible for tracking his own position within the event stream, how does an Event Hub know when the consumer is done with the events? In short, it doesn’t. With Event Hubs, you configure a retention period and events are stored for that amount of time. This means events expire on their own, independent of any consumer action.
The implication of time-based retention is the consumer needs to examine and process events before they expire. With time-based retention, each consumer has pressure to keep up. Fortunately, the underlying design of Event Hubs lets individual consumers scale as necessary.
Event Hubs supports this by is physically partitioning the event stream. You set the number of partitions when provisioning an Event Hub. See the official documentation at bit.ly/11QAxOY for more details.
As events are published to an Event Hub, they’re placed in partitions. A given event resides in only one partition. Events are evenly distributed by default across partitions in a round-robin fashion. There are mechanisms for providing partition affinity. The most common lets you set a partition key property on an event, and all events with the same key will be delivered to the same partition.
How does a partitioned event stream help consumers with time-based retention? In the context of Event Hubs, the correct term is actually consumer group. The reason for calling it a group is each consumer really consists of multiple instances. Each group has one instance per partition. From this point, consumer group refers to the consumer as a whole and consumer instance refers to the member of the group interested in a particular partition.
This means a consumer group can process stream events in parallel. Each consumer instance in the group can process a partition independent of other instances. These consumer instances can all reside in one machine, with each consumer instance running in isolation from one another. They could all be distributed across multiple machines, even to the point of each consumer instance running on a dedicated box. This way, Event Hubs circumvents some of the typical problems associated with the classic pattern of competing consumers.
Isolation is a key concept here. First, you’re isolating event producers and event consumers from each other, thus enabling flexible architecture composition, as well as load leveling. Second, consumer groups are isolated from each other, reducing the opportunity for cascading failures across consumer groups. Third, consumer instances in a given consumer group are isolated from each other to enable horizontal scaling for individual consumer groups.
Use Event Hubs
There are several good tutorials for getting started with Event Hubs. Check out the official documentation at bit.ly/11QAxOY and follow the tutorial that uses the platform of your choice.
You’ll need to provision an Event Hub first. The process is straightforward. You can easily try it out with a trial Azure account. In the Azure Management Portal, navigate to the Service Bus section. You’ll need to create a Service Bus namespace if you don’t already have one. After that, you’ll see a tab called Event Hubs that has instructions for creating an Event Hub (see Figure 2).
Figure 2 Create an Event Hub
You also need to set up a shared access policy for the Event Hub before you can begin. These policies manage security for an Event Hub. In the portal, navigate to the Event Hub you just created and select the Configure tab.
Choose Manage for the permissions and give the policy a name such as “super” or “do-not-use-in-production.” After that, switch back to the Dashboard tab and click the Connection Information button at the bottom. You’ll want to make note of the connection string there, as well as the name you gave your Event Hub.
Produce Events
The code I’ll show here uses the .NET SDK, but you can use any platform that supports HTTP or AMQP. You’ll need to reference the Microsoft Azure Service Bus NuGet package. The classes you need are in the Microsoft.ServiceBus.Messaging namespace. All you need to do is create a client, create an event and send:
var client = EventHubClient.CreateFromConnectionString (
connectionString,
eventHubName);
var body = Encoding.UTF8.GetBytes("My first event");
var eventData = new EventData (body);
await client.SendAsync (eventData);
Despite the simplicity, there are a few interesting items to point out. The body of the event is just a byte array. Any consumer groups processing this event will need to know how to interpret those bytes. It’s likely the consumer groups will need some sort of hint to determine how to deserialize the body. Before the event is sent, metadata can be attached:
eventData.Properties.Add ("event-type", "utf8string");
This means using keys and values that are well known by both producers and consumer groups. If you want to ensure a set of events is delivered to the same partition, you can set a partition key:
eventData.PartitionKey = "something-meaningful-to-your-domain";
You’ll get better performance if events don’t have affinity with partitions. In some cases, though, you’ll want a set of related events routed to a single consumer instance for processing. Events in a given partition are guaranteed to be in the order they were received. Likewise, there’s no easy way to guarantee the order of events across different partitions in an Event Hub. This is often the motivation for wanting events to have affinity to a particular partition.
For example, if you’re enabling smart cars, you want all events for a given car to be in the same partition. You might choose the Vehicle Identification Number (VIN) for the partition key. Or your system might focus on smart buildings, with hundreds of devices in each building producing events. In that case, you might use the identity of the building itself as the partition key so all events from all devices in the same building land in the same partition.
Overall, partition affinity is a dangerous practice and you should only use it carefully. A poor choice of partition key can result in an uneven event distribution across partitions. This could ultimately mean consumer groups would have trouble scaling. The good news is that many times you can change the system design to avoid the need for partition affinity.
Consume Events
You may be concerned about how you’ll manage all this. Your consumer groups need to keep track of their offset in the event stream. Each group needs to have an instance for each partition. Fortunately, there’s an API for that.
Reference the NuGet package Microsoft Azure Service Bus Event Hub-EventProcessorHost. The classes you need are in the Microsoft.ServiceBus.Messaging namespace. Getting started is as simple as implementing a single interface: IEventProcessor.
Once you’ve implemented your event processor, you’ll create an instance of EventProcessorHost to register your event processor. The host will handle all the grunt work for you. When it starts up, it will examine your Event Hub to see how many partitions it has. It will then create one instance of your event processor for each available partition.
There are three methods you need to implement. The first two are OpenAsync and CloseAsync. The host calls OpenAsync when the event processor instance is first granted a partition lease. This means the event processor instance has exclusive access to the partition for the consumer group in question. Likewise, the host calls CloseAsync when its lease is lost or when it’s shutting down. While you’re getting started, you can use a very simple implementation:
public Task OpenAsync(PartitionContext context)
{
return Task.FromResult(true);
}
public Task CloseAsync(PartitionContext context, CloseReason reason)
{
return Task.FromResult(true);
}
Both of these methods receive a PartitionContext argument. The remaining method receives it, as well. You can examine this argument if you want to view details about the specific partition leased to the event processor. The final method is where you actually receive the events (see Figure 3).
Figure 3 The Final Method that Delivers the Events
public async Task ProcessEventsAsync (PartitionContext context,
IEnumerable<EventData> messages)
{
foreach (var message in messages)
{
var eventType = message.Properties["event-type"];
var bytes = message.GetBytes();
if (eventType.ToString() == "utf8string") {
var body = System.Text.Encoding.UTF8.GetString (bytes);
// Do something interesting with the body
} else {
// Record that you don't know what to do with this event
}
}
await context.CheckpointAsync();
// This is not production-ready code
}
As you can see, this is straightforward. You receive an enumerable set of events you can iterate over and do whatever work is needed. You also have this invocation of context.CheckpointAsync at the end of the method. This tells the host you’ve successfully processed this set of events and you’d like to record a checkpoint. The checkpoint is the offset of the last event in the batch.
That’s how your consumer group can keep track of which events have been processed for each partition. After a host is started, it tries to acquire a lease for any available partition. When it starts processing for a partition, it will examine the checkpoint information for that partition. Only events more recent than the last checkpointed offset are sent to their respective processors.
The host also provides automatic load leveling across machines. For example, let’s say you have an Event Hub with 16 partitions. This means there will be 16 instances of your event processor—one for each partition. If you’re running the host on a single machine, it creates all 16 instances on the same machine. If you start another host on a second machine and it’s part of the same consumer group, the two hosts will begin to level the distribution of event processor instances across the two machines. There will ultimately be eight event processor instances per machine. Likewise, if you take down the second machine, then the first host takes back over the orphaned partitions.
Assume your implementation of IEventProcessor is MyEventProcessor. Then instantiating the host can be as simple as this:
var host = new EventProcessorHost(
hostName,
eventHubName,
consumerGroupName,
eventHubConnectionString,
checkpointConnectionString);
await host.RegisterEventProcessorAsync<MyEventProcessor>();
The eventHubConnectionString and eventHubName are the same values used when sending events in the previous example. It’s best to use connection strings with shared access policies that restrict usage to just what’s needed.
The hostName identifies the instance of the EventProcessorHost. When running the host in a cluster (meaning multiple machines), it’s recommended you provide a name that reflects the identity of the machine on which it’s running.
The consumerGroupName argument identifies the logical consumer group this host represents. There’s a default consumer group you can reference using the constant EventHubConsumerGroup.DefaultGroupName. Any other name requires you first provision the consumer group. Do this by creating an instance of Microsoft.ServiceBus.NamespaceManager and using methods such as CreateConsumerGroupAsync.
Finally, you need to provide a connection string to an Azure Storage account using checkpointConnectionString. This storage account is where the host tracks all state regarding partitions and event offsets. This state is stored in blobs in plain text you can readily examine.
There are other Azure services that are integrated with Event Hubs out-of-the-box. Azure Stream Analytics (currently in Preview) provides a declarative SQL-like syntax for transforming and analyzing event streams originating in Event Hubs. Likewise, Event Hubs offers a spout for the very popular Apache Storm, now available as a Preview on Azure through HDInsight.
Wrapping Up
The architectural pattern outlined here is just the beginning. When implementing a real-world system, there are numerous other concerns you’ll need to take into consideration. These concerns involve advanced security, provisioning and management of event producers, protocol translation, outbound communication, and more. Nevertheless, you’re now equipped with the foundational concepts necessary to build a system using an event broker such as Event Hubs.
Christopher Bennage is a member of the Microsoft patterns & practices team. He likes to make things with computers.
Thanks to the following Microsoft technical experts for reviewing this article: Mostafa Elhemali and Dan Rosanova