Modeling servers for resource consumption

In the past few years, I have helped various groups deal with server problems related to resource usage. A typical scenario goes like this “we have a server we are about to release. It does great on our internal stress tests – generally composed of a server under an artificial load of multiple client machines – and when we put it online, it runs great for a while and then it crashes or stops working because it runs out of resource X – memory, database connection, whatever finite resource you can imagine –“ In general these folks have been very good at making their server CPU efficient: at the maximum load specified the CPU usage on the machine under say 80%. Indeed, when things start to go bad, the CPU is far from pegged.

Let’s take the specific example of a CLR based POP3 mail server. The team did a great job to make it efficient and simple. On a 4 CPU machine, all CPUs showed the same 75% load at the specified maximum artificial load. The memory usage was about 100MB (the machine had 512MB of memory and no swap file). When deployed online this server would last anywhere between a few days and 2 weeks before crashing with an out of memory exception. Looking at the full memory dump collected at the time of the exception, we could see that the heap usage was very high, with a lot of very large byte arrays. We traced them back to the actual contents of the mail messages retrieved from the email store and on their way to the user’s mail client. These messages were a lot larger than the average mail payload (less than 20KB) and they were a surprising amount of them, enough to drive the server process out of memory. Actual request measurement told us that very few mails will be large, yet when the server crashed, we found a lot of them. This was the clue to understand the problem: small mail messages were sent quickly and all of the memory allocated by the request was quickly reclaimed by the garbage collector. In house stress tests showed this worked well. The requests for large pieces of mail were the problem, a few of them got sent quickly but some of them were sent to dialup users so the actual time for the whole request to be over was very large compared to the average requests. Soon these lingering requests accumulated and the memory got exhausted. I asked the server team to show me their load specifications: They counted an average request size (this is for all of the objects in use during a request) of 20KB and an average of 4000 requests in flight. The heap size should therefore be around 80MB. They told me that the maximum payload size was 4MB (I think I remember that it was limited by the server, which wouldn’t accept single email pieces larger than that). A quick calculation shows that if the server was unlucky enough to serve only the largest email pieces, the heap size would have to be 16GB to accommodate 4000 request in flight. Clearly that’s not possible on the CLR platform for Win32. Therefore if it was possible to crash the server by requesting enough large pieces of mail. By accepting and starting to process all of the incoming requests, the server have overcommitted and died. This actual workload is hard to replicate in house because generally, labs are not equipped with slow communication lines such as dial ups.

 

 

Writing a CLR based simulator for servers.

It turns out we can model these resource usage variations and simulate them fairly simply:

At heart, a server takes requests, create some memory and ties up some resources, sends a response back and flushes the request. The lifetime of the request, the resource usage and the incoming rate are what makes the server go into an overcommitted situation.

We can model this simply by allocating and storing some object simulating the requests into a queue at every “tick” of the simulator. This simulates the incoming requests. The size of the queue represents the number of requests in flight in the server. Now the requests need to be processed and a response needs to be written back. To simulate the processing , you allocate some unrooted (temporary ) objects and some objects rooted in the request object. They represent the memory consumption of the request. You can randomly vary their sizes according to the profile of the server load you are simulating. To start simply, you can do this allocation in the same step as when you queue up the request. You can also allocate whatever other resource you need to model (database connection, etc..). A more sophisticated processing would entail assigning steps or states to the request objects and have them go through a series of steps or state transitions – each with some processing and resource consumption- to reach the final state: the response has been written out. To simulate the final state, you can remove the requests in semi random order. To simulate the POP3 server, the requests with the smallest simulated response would be removed first and the ones with the largest responses would stay in the queue for more ticks. This super simple model will reproduce the instability of servers without a throttling mechanism. It is helpful in evaluating real throttling mechanisms in a very simple environment. In another post I will talk about some server throttling mechanisms.

Comments

  • Anonymous
    December 08, 2006
    Could you please change the font, its almost unreadable :(

  • Anonymous
    December 24, 2006
    The comment has been removed

  • Anonymous
    January 03, 2007
    Second vote on the font, white on black is really hard to read. We have also seen many issues with resource usage in the past with our app. DataSets are a really common way to get these issues, you fill up a DataSet with "select * from table" which in Dev is really small but in production has millions of rows, and boom. These issues are really tricky to debug, in general we use windbg + sos to debug this stuff. Scitech's ".net memory profiler" is another option, but in general windbg is faster. In some rare instances we have seen heap fragmentation, but this is not the general case. The hardest issues we have had to debug are the unmanaged leaks in our app. Regarding slow communication lines, many of the more expensive hardware routers can be configured to simulate a slow link. There are also a few software options out there. Perhaps the underlying libraries should also provide the ability to run in test mode where they slow down comms for all the .net provided comms stuff. An idea that I think is probably not that good but may be worth exploring would be for the GC to swap Gen2 objects to the disk if its nearing fatal mem usage (swapping least used objects first)

  • Anonymous
    February 07, 2007
    This is a very good comment. I am preparing a blog entry to give techniques to deal with the scenario you are talking about, within certain limits. Persisting objects to the disk under the cover is hard. We don't know which objects are used the least and it is hard to be totally transparent. Obviously this is possible, a number of in memory object databases are doing just this but it always entail a non trivial performance loss for the common case. Some application specific measures are generally cheaper and can be targeted to just a specific part of the object population where the tradeoff between performance and memory footprint can be shifted correctly  

  • Anonymous
    May 08, 2007
    In my last blog entry I described the problem of servers over committing resources. There are several

  • Anonymous
    June 18, 2008
    Could you comment on garbage collection mechanisms on network performance monitoring agents and their effect on management of agent lifetime and application throughput ?

  • Anonymous
    December 02, 2008
    The comment has been removed