Share via


How NOT to configure a Hardware Load Balancer for Exchange 2010

 

As we know, a Hardware Load Balancer (HLB) offers far more advantages to a Windows Network Load Balancer (WNLB) the technology predominantly used in the past version of Exchange. The internet has several articles listing the pros & cons of an HLB used to distribute exchange workload, I will not repeat those pros & cons in this article, instead this article will focus on the nitty-gritty of an HLB configuration & the target audience is primarily HLB experts.The lessons are based on a scenario I came across recently where an organization faced serious reliability issues & poor performance when they switched from a WNLB to an HLB for Exchange 2010.It took me longer then I should have expected to realize what specifically was causing this issue. This blog post documents my experience & is meant as a reference to prevent getting into a similar scenario.But first, a quick Re-cap of how you would configure an HLB for Exchange without going into a deep dive.

HLB Configuration Recap:

For virtually all the cases I have come across since Exchange 2010 was released, I gave the below table to the HLB folks. They got going & all went well.

Service Ports Required Type of LB Affinity Unicast/Multicast
SMTP TCP 25 Client IP    
OWA TCP 443 Cookie Required Single NIC: Multicast
OA TCP 443 Client IP   Multiple NICs: Unicast
MAPI/RPC TCP 135 Client IP    
TCP 1024-65535 Note: Unicast works with a broader set of switches and won’t broadcast to all ports on a VLAN
ActiveSync TCP 443 Client IP    
EWS TCP 443 Client IP Required  

 Table credit: Ex-Peers from the PFE team.

The Issues we faced (summarized):

1.     Constant disconnects & re-connects of outlook

After investigating, we found that this was caused by:

a.       No Affinity / persistence configured

As a result of no affinity being set, the HLB happily kept load balancing the outlook traffic across the nodes available. (Exactly what it is meant to be doing in a vanilla non-exchange scenarios) as a result, user sessions kept being thrown across the Exchange CAS nodes & thus the connect-disconnect-connect-disconnect experience.Solution: refer above table to configure affinity.

b.      Single rule for all traffic.

While technically this is not a problem, it is related to the affinity configuration discussed above. By having a single rule, the HLB was treating all exchange traffic similarly which should not be the case for affinity at least because different client protocols (e.g. OWA) work differently.  Discussing how each protocol differs is not possible in this article and would need a separate article to discuss.

2.     Application sending out internal e-mails were no longer able to send e-mails with the HLB in place.

The customer had several applications sending out e-mails to users in the organization. Exchange Server was configured to "allow" e-mail from a list of known IP addresses which represent the application servers / scanners / printers IP address. Even though we replaced the IP being used by the WNLB with the HLB IP address (the same one applications were sending e-mail out to), none of the applications were able to send e-mail.A peek at the IIS & SMTP Logs on the exchange server made the root cause crystal clear. The HLB was replacing the source IP address with its own IP address thus exchange server would deny this SMTP relay to this HLB’s IP address.

a.       Source address logging / Source address translation / transparent mode / Source IP persistence.

Enabling a feature which gives an HLB the ability to use the Originating client's IP address for communication with the Exchange Server solves this problem.HLB vendors call this feature by varying names. Citrix Netscaler & Juniper devices call this feature “Source address logging” while some other vendors may call it Source address translation OR transparent mode.Source Address logging or its equivalent is quite an important feature which has other reasons to be enabled for:·         Application access: Source (Client’s) IP address based access to application features. E.g. ( SMTP access for applications sending out e-mails )·         Auditing of mailbox access: A very critical compliance related functionality would need the source IP address to know where a given mailbox was accessed from. This is especially important for forensic analysis of mailboxes which have compromised passwords.·         Geographical load balancing: Source (client's) IP address can be used to determine the region / country of origin of the client or the originating ISP of the client. This Traffic can then be geo balanced.·         IP Based Filtering: Some devices (e.g. firewall) placed between the server and the HLB may need the client's IP address for filtering traffic.·         Billing: some applications use source IP address for billing purposes or usage analysis.With this feature disabled, Exchange will only see one IP address (that of the HLB) as the source of ALL client traffic. Thus it will assume client traffic is coming from a single client / IP address. This will break many features and functionality of Exchange Server as well as applications using Exchange Server.

3.     Exchange Outage if one redundant host was rebooted / taken offline

The customer’s deployment consisted of redundant Exchange Hosts. This allowed us to take any one server offline for maintenance with absolutely no impact to end-users, even when there are considerable number of incoming clients connections, (of course after draining active connections & sessions). Once the WNLB was replaced by the HLB, any server which was taken offline / rebooted / shutdown would result in some clients (which were connected to that very server) being unable to access exchange services. The HLB administrator would have to login to the HLB & disable traffic from being forwarded to the server which was taken offline. Only after this was done would all clients be able to access exchange services. Once I saw how the HLB administrator resolved the outage each time an exchange server was taken offline, the root cause was clear. The HLB was neither monitoring the Exchange Host nor the Exchange Application. A functionality which is much needed in an HLB.

a.       Host / Application Monitoring

A WNLB can monitor a host & if the host is un-available, it will not send any traffic to that host. An HLB offers far more monitoring capabilities then a WNLB which as stated earlier can only monitor if a Host is UP or DOWN.Examples of monitoring which various HLBs may be capable of achieving:·         Host Monitoring: This is the most basic check where the HLB will ping the Host. Un-fortunately it is not very reliable as the host can be UP but the Exchange Services might be DOWN. As mentioned earlier, this is the only kind of monitoring a WNLB can do. ·         TCP Connect:  Another method is where the HLB will open a connection to a specific port. If the connection is successful, the HLB considers the application as UP. If the application has hung, but is still accepting incoming connections the HLB would not be able to detect it.*** ***·         Application Connect: Still another method is where the HLB will open a connection to a specific port used by the application & check for a pre-defined / expected response. If the response is as expected, the HLB will consider the server as UP. The exact capabilities of this kind of application monitoring, will vary by the HLB you choose. Some HLB could even login to the Exchange server with valid exchange credentials to verify that the server is UP. 

How to verify root causes:

1.     Misconfigured affinity / no affinity configured.

The best way to verify if affinity is configured is to check the IIS logs on the exchange server.  Another quick & dirty way is to observe if any OWA sessions timeout sooner than expected. (OWA login page is presented).Note: there is a difference between login page & logout page. – observe carefully if the page is a login or logout page. 

2.     Source address logging / Source address translation / transparent mode / Source IP persistence.

Method one: On the exchange servers, check the IIS Log files for source IP addresses. If they are all from a single source IP address, that is bad news & source address logging is not configured.Note: This might require an IIS reset to commit the IIS log files to disk. Else wait till it commits the logs to disk.  Method two: Telnet to port 25 & type EHLO. This will return the IP address that the Exchange server sees as the source of the traffic. 

3.     Monitoring.

Host Monitoring: Disconnect one Host from the network, OR simply shut it down. If you see traffic still being forwarded to the IP address of the shutdown host, it means host monitoring is not working. Application Monitoring: Stop exchange services like System attendant & observe if traffic is still being forwarded to that host. If incoming requests are still forwarded to the host, it means that application monitoring is not working.

To summarize the lessons we learned:

1.                   You MUST configure affinity when using Exchange 2010 as descried in the above table. 2.                   Consider Separate rules for each port / protocol. This allows different types of affinity to be used for different kinds of traffic.3.                   You MUST enable Source address logging ( feature name could vary by HLB brand )4.                   You MUST configure Monitoring. Application level monitoring is preferred over host based monitoring.
If application monitoring is not a capability of the HLB, at least configure host based monitoring.

A note on the traffic flow from a client to the exchange server.

When a client communicates with an exchange server, it will always be the client opening a connection to the Exchange server. Exchange will respond & transfer data to a client only over a connection which is already open.Exchange Server will NEVER open a connection to any client. Even in the case of Push Mail on a mobile device or the new e-mail notification toast of Outlook.1.       Client – HLB – Exchange, Exchange – HLB – Client 2.       Client – HLB – Exchange, Exchange – Client (Direct Server Return ) 3.       Client – HLB – Exchange. 

Applies to

Note: Exchange 2013 has gone through some major architecture changes. Some of the guidance provided in this blog (e.g. affinity) will not apply for exchange 2013.