ADO.NET new for 2.0, Clearing the pool on Server recycle

This blog talks about behavior in the SqlClient and the Oracle managed provider exclusively.

One of the biggest problems with pooling before ado.net 2.0 has to be the fact that we did not know how to handle the pool when the server went down. Let me describe just how ugly this behavior was using the following scenario:

PRE-ADO.NET 2.0

Open 3 connections, close 3 connections, we now have 3 connections in the pool. Now stop Server and start Server.

All 3 connections in the pool are no longer valid since they hold references to Server connections that no longer exist, the pool is corrupted.

So far this is by design pooling behavior, the ugly part comes next.

You open a connection, the pooler checks to see if we have connections waiting in the pool. We have 3 connections so it just hands you back one of these. No exception

You execute a command. Exception, when we tried to do a roundtrip to the Server it told us that it had no idea what the connection we were trying to use was.

The connection becomes closed and we throw it away.

You open a connection, the pooler checks to see if we have connections waiting in the pool. We have 2 connections so it just hands you back one of these. No exception.

You execute a command. Exception, when we tried to do a roundtrip to the Server it told us that it had no idea what the connection we were trying to use was.

The connection becomes closed and we throw it away.

You open a connection, the pooler checks to see if we have connections waiting in the pool. We have 1 connections so it just hands you back one of these. No exception

You execute a command. Exception, when we tried to do a roundtrip to the Server it told us that it had no idea what the connection we were trying to use was.

The connection becomes closed and we throw it away.

You open a connection, the pooler checks to see if we have connections waiting in the pool. We have no connections in the pool so we do a roundtrip to the server to get a new connection.

You execute a command. Everything works as expected.

As you can see we did a terrible job of handling something as simple as a server stopping and starting again. Developers that had applications that dealt with server reboots (like a failover cluster) definitely felt the pain. Some people tried to automate the process above to flush the pool and some people tried changing connection strings on exception (this creates a new pool), the result is quite ugly code and it is directly our fault.

In ADO.NET 2.0 we have improved this scenario (somewhat) by realizing that after certain types of exceptions our entire pool must be corrupted, so instead of throwing away an individual connection we will mark the entire pool as doomed and the next time that you attempt to open a connection you will always attempt to establish a fresh connection.

ADO.NET 2.0

Open 30 connections, close 30 connections, we now have 30 connections in the pool. Now stop Server and start Server.

All 30 connections in the pool are no longer valid since they hold references to Server connections that no longer exist, the pool is corrupted.

You open a connection, the pooler checks to see if we have connections waiting in the pool. We have 30 connections so it just hands you back one of these. No exception

You execute a command. Exception, when we tried to do a roundtrip to the Server it told us that it had no idea what the connection we were trying to use was.

The connection becomes closed and we realize that this exception means that the pool is corrupted, so we mark all the connections in the pool as doomed.

You open a connection, the pooler checks to see if we have connections waiting in the pool. All of the connections in the pool are marked as doomed so we do a roundtrip to the server to get a new connection.

You execute a command. Everything works as expected.

We still throw an exception in ado.net 2.0, but when we realize that the exception means that the pool is corrupted we walk the pool and mark each connection as doomed. A doomed connection is a connection that cannot be taken out of the pool, if the connection is currently out of the pool and in use marking it as doomed only means that the next time the connection comes into the pool it will not come out again. This is to protect against scenarios where I open 2 connections, start and stop the server and then try to execute on one connection. This will throw an exception and mark the pool as doomed. If I go to the second connection now and call Close() I will not get an exception and the connection will go back in the pool, but since it was marked as doomed when the exception happened it will not come out again.

Q: What do you mean by “realize that an exception means that the pool is corrupted”? Are you going to be throwing away pools that are perfectly fine just because I get an exception?

A: We determine that a pool is corrupted _only_ when we get a fatal exception from the network layer on a previously opened connection. Exceptions from the server, exceptions that happen during log in and of course client side exceptions are ignored.

Q: So when you get an exception from server X you will clear all the pools that connect to server X right?

A: No, we will only clear the pool where this exception occurred.

Q: It is really bad that you still throw the first exception, couldn’t you have caught the exception, cleared the pool, opened a new connection and continued?

A: Yes we could have, this was too hard to get right at this time but I certainly hope it makes it into a future version.

Q: What if I know that the server has stopped and started again, is there anything I can do?

A: Yes, you can use the new static SqlConnection and OracleConnection methods ClearPool(connection) and ClearAllPools(). These methods expose the pool clear functionality described here. They will walk the pool (or all the pools) and mark all of the connections as doomed.

Q: So I can use ClearPool and ClearAllPools() to synchronously close all of my connections?

A: No, actually not at all. The only thing that these methods do is walk the pool and mark the connections as doomed. The only thing that marking a connection as doomed does is to stop it from being taken out of the pool. The connections are still cleaned in the background using the same 4-8 minute algorithm that we have for current idle connections in the pool.

Rambling out, The information posted here is provided “AS IS” and confers no rights.

Comments

  • Anonymous
    October 05, 2004
    A: Yes we could have, this was too hard to get right at this time but I certainly hope it makes it into a future version.

    Future version means .NET 2.0 beta2 or we have to wait until Longhorn? Is it really too hard to implement such a behaviour?
  • Anonymous
    October 05, 2004
    Great post! More! More! :-)
  • Anonymous
    October 05, 2004
    The comment has been removed
  • Anonymous
    October 05, 2004
    When we got stomped by this over a year and a half ago I just added a custom .Open to my wrapper around the Oracle DAL. In the .Open I take the hit of doing a sure-win query... like Select SysDate From Dual; If the open succeeded but the query failed with a ORA-03114 I'd just close the connection, parse the connection string, and add POOLING=FALSE so that I wouldn't get a dead connection, and do the open again. Sure this makes for somewhat inefficient use of the pool, but desparate times call for desparate measures.

    So what your saying now is that with all the great improvements in ADO.Net 2.0 I still have to take the hit, trap the exception close the connection, and try again, but this time I won't have to forcibly bypass the pool. Wow... now THAT's progress ;)
  • Anonymous
    October 05, 2004
    The comment has been removed
  • Anonymous
    October 05, 2004
    The comment has been removed
  • Anonymous
    October 05, 2004
    The comment has been removed
  • Anonymous
    October 06, 2004
    The comment has been removed
  • Anonymous
    July 29, 2005
    About a year and a half ago, a large project I was working on started having intermitten problems with...
  • Anonymous
    June 02, 2006
    We are wellocme to it's configuration.
  • Anonymous
    June 12, 2009
    PingBack from http://cellulitecreamsite.info/story.php?id=8862
  • Anonymous
    June 17, 2009
    PingBack from http://pooltoysite.info/story.php?id=2471