Performance Testing the Microsoft OLE DB Provider for DB2 (Part 3)
In Part 1 of this series I covered testing SQL queries using the Microsoft OLE DB Provider for DB2 using a linked server. In Part 2 I showed how you can test performance with SSIS. However, in many cases, customers may be using C# or VB.NET, writing web services or stand-alone applications to query DB2 for data. So, let’s begin.
First I wanted to again test different settings for the Rowset Cache Size option on the parameter. But, since these queries were going to be smaller, I was wondering if using Connection Pooling would help. The Microsoft OLE DB Provider that comes with Host Integration Server (and the SQL Feature Pack) implements its own connection pooling internally.
When Connection Pooling is enabled, the data provider does not immediately issue a close to the DB2 database when it receives a close from the consumer (application). It waits before closing the connection (this is controlled by the “Connection Timeout” property, which defaults to 30 seconds). When a new Open() is issued, for the SAME connection string, this helps speed up reconnecting as some frames do not have to be repeated to the host system (security for instance). This is most beneficial when you are making many small queries to the host, one behind the other. It may not be as beneficial when queries take many seconds or minutes to complete.
So, to begin with, I decided to limit my query to 5,000 rows. I also wanted to make the DB2 server do a bit of work, so I decided to use the following query (with the fetch first it would return over 2 million rows):
const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";
I created my initial connection string with the Data Access Tool, and then copied this into my C# application, after striping out the “Connection Pooling” and “Rowset Cache Size” options, and I ended up with this:
const string cnStringDB2OLEDB = @"Provider=DB2OLEDB;User ID=USERNAME;Password=password;Initial Catalog=DB2400;Network Transport Library=TCP;Host CCSID=37;PC Code Page=1252;Network Address=DB2400;Network Port=446;Package Collection=SCHEMA;Default Schema=SCHEMA;Process Binary as Character=False;Units of Work=RUW;Default Qualifier=SCHEMA;DBMS Platform=DB2/AS400;Persist Security Info=True;Defer Prepare=False;Derive Parameters=False;";
At this point I was almost ready, but I needed to decide how many times I wanted my query to run, and how many different settings of Rowset I wanted to use. I decided I wanted to the query to run 10 times for each connection, and increment the Rowset by 30 each time, starting at 0 and finishing at 300.
// number of times to run each query
const int iQueryIteration = 10;
// rowset cache size increments from 0 to 300 using this value
const int iCacheInc = 30;
From various other tests, I know that simulating working with the incoming data makes a difference, so I setup a variable that can be adjusted as necessary, which would cause the data to actually be read into a variable later. For my purposes, I left it at true.
// set bReadValues to true to simulate working with the data coming in
bool bReadValues = true;
Now, at this point, everything is setup to begin. The main procedure controls building the connection string, and then calling another procedure to make the calls.
The 1st thing that the Main procedure does is calls a procedure to create a database in SQL Server. Then it builds the connection string as follows:
// multiple tests at this point
// test with pooling off, then on
// each test to be done with different settings of rowset
// to gauge performance
int iParamTest = 1;
while (iParamTest < 3)
{
string poolString = "";
switch (iParamTest)
{
case 1:
poolString = "Connection Pooling=False;";
break;
case 2:
poolString = "Connection Pooling=True;";
break;
}
iRowSet = 0;
while (iRowSet < 301)
{
myconstr = "";
rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";
// build final connection string here
myconstr = cnStringDB2OLEDB + poolString + rowString;
After the connection string is built, it runs the query, 10 times as follows:
for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)
{
testno++;
// start timer watch
watch.Start();
rows = RunDbConnectionTest(myconstr, DB2QueryString);
watch.Stop();
Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);
UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);
watch.Reset();
if (rows == -1) // error happened, exit
{
Environment.Exit(rows);
}
}
iRowSet = iRowSet + iCacheInc;
The UpdateSQLPerfTable() call logs the times into SQL Server for each query. Then the watch is reset. If an error was to happen (rows returns = -1), the program will exit. The loops will then repeat until all tests are run.
The RunDbConnectionTest() procedure builds the OleDbConnection object, creates a command object, and a data reader, and the command is executed. At this point the data is read in (by assigning each column to a string value). After all columns and rows are read in, the objects created are closed and disposed of accordingly. Any errors are caught in Try/Catch statements, which will return a -1 to the main procedure.
Once the data is in SQL Server, you can query against the table to see the differences at each setting. I ran a pivot query against the table, with the following results (times in seconds):
rowset |
Connection Pooling=False; |
Connection Pooling=True; |
0 |
2.05654372 |
1.28185158 |
30 |
1.41009238 |
0.87079308 |
60 |
1.44409547 |
0.87172373 |
90 |
1.46562513 |
1.10562903 |
120 |
1.3914555 |
1.05005861 |
150 |
1.38745682 |
0.88520811 |
180 |
1.44412404 |
0.84566784 |
210 |
1.41111907 |
0.87019778 |
240 |
1.41200047 |
0.83981306 |
270 |
1.42906141 |
0.82953586 |
300 |
1.44249608 |
1.1492816 |
So, just looking at the averages, you can see there is a difference when using connection pooling. But looking at the raw numbers gives a better idea. My first query was to look at rowset 0:
TestNumber |
Pooling |
rowset |
Seconds |
1 |
Connection Pooling=False; |
0 |
2.65 |
2 |
Connection Pooling=False; |
0 |
2.44 |
3 |
Connection Pooling=False; |
0 |
1.78 |
4 |
Connection Pooling=False; |
0 |
2.50 |
5 |
Connection Pooling=False; |
0 |
1.94 |
6 |
Connection Pooling=False; |
0 |
1.91 |
7 |
Connection Pooling=False; |
0 |
1.83 |
8 |
Connection Pooling=False; |
0 |
1.83 |
9 |
Connection Pooling=False; |
0 |
1.84 |
10 |
Connection Pooling=False; |
0 |
1.84 |
111 |
Connection Pooling=True; |
0 |
1.89 |
112 |
Connection Pooling=True; |
0 |
1.31 |
113 |
Connection Pooling=True; |
0 |
1.21 |
114 |
Connection Pooling=True; |
0 |
1.20 |
115 |
Connection Pooling=True; |
0 |
1.19 |
116 |
Connection Pooling=True; |
0 |
1.20 |
117 |
Connection Pooling=True; |
0 |
1.20 |
118 |
Connection Pooling=True; |
0 |
1.20 |
119 |
Connection Pooling=True; |
0 |
1.20 |
120 |
Connection Pooling=True; |
0 |
1.21 |
Now, looking at the above, you can see that times without connection pooling were higher than when using connection pooling.
Next, I wanted to see if setting the Rowset Cache Size made any difference with and without connection pooling. Using a query to get the minimum of the duration across all the results, I was able to determine that in my test it was test #208, with a rowset of 270. I then issued a query to pull out the tests results using “rowset = 270”:
SELECT TestNumber, ParamString AS Pooling, rowset, CAST(durationMs / 1000 AS decimal(6, 2)) AS Seconds
FROM CSPerfTrials
WHERE (rowset = 270)
ORDER BY TestNumber, Pooling
This query gives me the following results:
TestNumber |
Pooling |
rowset |
Seconds |
91 |
Connection Pooling=False; |
270 |
1.39 |
92 |
Connection Pooling=False; |
270 |
1.41 |
93 |
Connection Pooling=False; |
270 |
1.41 |
94 |
Connection Pooling=False; |
270 |
1.40 |
95 |
Connection Pooling=False; |
270 |
1.40 |
96 |
Connection Pooling=False; |
270 |
1.41 |
97 |
Connection Pooling=False; |
270 |
1.46 |
98 |
Connection Pooling=False; |
270 |
1.52 |
99 |
Connection Pooling=False; |
270 |
1.41 |
100 |
Connection Pooling=False; |
270 |
1.48 |
201 |
Connection Pooling=True; |
270 |
1.30 |
202 |
Connection Pooling=True; |
270 |
0.87 |
203 |
Connection Pooling=True; |
270 |
0.78 |
204 |
Connection Pooling=True; |
270 |
0.77 |
205 |
Connection Pooling=True; |
270 |
0.76 |
206 |
Connection Pooling=True; |
270 |
0.77 |
207 |
Connection Pooling=True; |
270 |
0.76 |
208 |
Connection Pooling=True; |
270 |
0.75 |
209 |
Connection Pooling=True; |
270 |
0.76 |
210 |
Connection Pooling=True; |
270 |
0.78 |
In the above, with connection pooling turned off, the times for each query were between 1.39 and 1.52 seconds. However, with connection pooling, the initial query (TestNumber 201) took 1.30 seconds, and subsequent queries were significantly faster, and in most cases below .8 seconds. The savings in time (about 0.4 to 0.5 seconds) would be a benefit as more and more queries were being ran, and with this particular query, you can run 2 queries with connection pooling (after the initial connection) almost as fast as 1 query without connection pooling.
I ran another pivot query which summed all the times over 30 iterations of the query, and came back with this chart (in seconds), which shows the times savings a bit better:
rowset |
Connection Pooling=False; |
Connection Pooling=True; |
0 |
55.7799701 |
36.5997885 |
30 |
42.7464447 |
23.7765142 |
60 |
42.1101308 |
24.9684609 |
90 |
42.5139897 |
23.7936823 |
120 |
41.9289347 |
24.7713371 |
150 |
42.5943495 |
24.1104076 |
180 |
42.2610654 |
23.8675952 |
210 |
42.2377875 |
24.3038118 |
240 |
43.31585 |
24.105081 |
270 |
42.7937661 |
25.1228602 |
300 |
42.2332467 |
23.6615868 |
And again, with 100 iterations of the query at each setting, using this SQL pivot query:
USE [DB2TestDatabases]
go
declare @comstr nvarchar(4000) -- command string
declare @success int -- variable for executing SQL commands
declare @params nvarchar(255) --
declare @tempstr nvarchar(1500)
declare param_cursor CURSOR FOR
select distinct ParamString
from [dbo].CSPerfTrials
set @tempstr = ''
OPEN param_cursor
FETCH NEXT FROM param_cursor into @params
WHILE @@FETCH_STATUS = 0
BEGIN
set @tempstr = @tempstr + '[' + @params + ']'
FETCH NEXT FROM param_cursor into @params
if @@FETCH_STATUS = 0
begin
set @tempstr = @tempstr + ', '
end
END
CLOSE param_cursor
DEALLOCATE param_cursor
set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '
set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( sum(durationms) FOR ParamString IN ('
set @comstr = @comstr + @tempstr + ')) AS p'
exec @success=sp_executesql @comstr
rowset |
Connection Pooling=True; |
Connection Pooling=False; |
0 |
121.6649042 |
190.0007958 |
150 |
78.565167 |
144.2463663 |
210 |
78.1296505 |
141.0669099 |
30 |
77.8703857 |
142.5785544 |
270 |
78.3527013 |
142.036522 |
90 |
78.0573694 |
141.4086889 |
120 |
77.8508053 |
145.7663853 |
300 |
78.8497313 |
141.7899574 |
240 |
77.4381023 |
146.7418084 |
60 |
78.2401951 |
141.6710446 |
180 |
78.3692842 |
142.6525498 |
Executing 100 queries with connection pooling takes almost half the time it takes without, when used in conjunction with RowSet Cache Size, with this particular query.
For my final test, I wanted to see what impact this would have on a large table, where the results were not limited. I used this query (select count(*) from schema.table where amount BETWEEN 100.00 AND 110.00), takes 51 seconds from SQL to return the count of 1296 rows. So a good test to pull in those 1296 rows and read the data, 100 times for each connection string (100 * 11 * 2) for 2,200 reads against the table. I thought this would more accurately demonstrate trying to pull in a small amount of data across a large table.
After the tests were run, I ran a query to sum up the times for each setting, with these results (times are in seconds):
rowset |
Connection Pooling=True; |
Connection Pooling=False; |
0 |
3721.2146623 |
4052.1730781 |
150 |
3720.9639454 |
3976.7106773 |
210 |
3712.2296273 |
4068.358878 |
30 |
3724.0935948 |
4026.1043348 |
270 |
3675.8706398 |
4084.5275822 |
90 |
3718.429541 |
3970.0594041 |
120 |
3689.3583401 |
4035.6269915 |
300 |
3785.033983 |
4010.1079362 |
240 |
3703.0019921 |
4007.4356928 |
60 |
3689.1961373 |
3973.2904402 |
180 |
3685.1553973 |
3944.8446177 |
So, I was thinking wow. Some savings, 200 – 300 seconds total. But, before you get excited, remember this was only across 100 queries. So, in reality, the savings is only 2 to 3 seconds for each query. Changing the query to show the average confirms this:
rowset |
Connection Pooling=True; |
Connection Pooling=False; |
0 |
37.212146623 |
40.521730781 |
150 |
37.209639454 |
39.767106773 |
210 |
37.122296273 |
40.68358878 |
30 |
37.240935948 |
40.261043348 |
270 |
36.758706398 |
40.845275822 |
90 |
37.18429541 |
39.700594041 |
120 |
36.893583401 |
40.356269915 |
300 |
37.85033983 |
40.101079362 |
240 |
37.030019921 |
40.074356928 |
60 |
36.891961373 |
39.732904402 |
180 |
36.851553973 |
39.448446177 |
So, the savings, on average, is only 2 – 3 seconds for each query when accessing a large table, in my particular case.
As the above charts show, using Rowset by itself helps, but when you combine that with pooling, on small queries, a significant time savings can be achieved. Since the numbers are somewhat close once Rowset is implemented here, you should see significant improvement with a setting of at least 30 for the RowSet Cache Size. Also, when querying against a large table, implementing pooling can save a few seconds, but you may not see as much of a performance impact as you would against ‘smaller’ queries where data can be retrieved much faster.
One thing to note here – all the times above were generated on a system that had little overhead against a DB2 server with minimal resources being used. Your results will vary, depending on hardware, network speed and utilization, loads on DB2, loads on the servers running the tests. But, this test and the previous 2 should give an indication of which settings may be most beneficial in your environment.
The test hardware being used in all 3 tests were identical. Windows 2003 SP2 + SQL Server 2005 (with all service packs and fixes applied for both), 2GB of memory on a 100MB network. The queries were run against DB2/400 V5R4, which was a couple of hops away from the test servers.
I hope this series of articles will help in your performance testing. If you have any questions, please do not hesitate to ask.
The code for this particular test is below.
using System;
using System.Collections;
using System.Text;
using System.Data;
using System.Data.OleDb;
using System.Data.Common;
using System.Data.SqlClient;
using System.Diagnostics;
namespace DB2PerfTest
{
class Program
{
// set bReadValues to true to simulate working with the data coming in
static bool bReadValues = true;
// connection string is missing rowset cache size and connection pooling options
const string cnStringDB2OLEDB = @"Provider=DB2OLEDB;User ID=USERNAME;Password=password;Initial Catalog=DB2400;Network Transport Library=TCP;Host CCSID=37;PC Code Page=1252;Network Address=DB2400;Network Port=446;Package Collection=SCHEMA;Default Schema=SCHEMA;Process Binary as Character=False;Units of Work=RUW;Default Qualifier=SCHEMA;DBMS Platform=DB2/AS400;Persist Security Info=True;Defer Prepare=False;Derive Parameters=False;";
// number of times to run each query
const int iQueryIteration = 10;
// rowset cache size increments from 0 to 300 using this value
const int iCacheInc = 30;
// below query returns 3,581,029 rows without the fetch first statement
// limiting to 5000 as it is not realistic that many C# applications will be handling
// millions of rows at one time
const string DB2QueryString = "select * from schema.table where amount BETWEEN 5.00 AND 10.00 FETCH FIRST 5000 ROWS ONLY";
static void Main(string[] args)
{
int rows = 0;
string rowString = "";
string myconstr = "";
int iRowSet = 0;
CreateSQLPerfTable();
Stopwatch watch = new Stopwatch();
int testno = 0;
// multiple tests at this point
// test with pooling off, then on
// each test to be done with different settings of rowset
// to gauge performance
int iParamTest = 1;
while (iParamTest < 3)
{
string poolString = "";
switch (iParamTest)
{
case 1:
poolString = "Connection Pooling=False;";
break;
case 2:
poolString = "Connection Pooling=True;";
break;
}
iRowSet = 0;
while (iRowSet < 301)
{
myconstr = "";
rowString = "Rowset Cache Size=" + iRowSet.ToString() + ";";
// build final connection string here
myconstr = cnStringDB2OLEDB + poolString + rowString;
for (int iquery = 1; iquery < iQueryIteration + 1; iquery++)
{
testno++;
// start timer watch
watch.Start();
rows = RunDbConnectionTest(myconstr, DB2QueryString);
watch.Stop();
Console.WriteLine("Test # {0}, ParamString: {1}, Rowset: {2}, Elapsed Time {3}", testno, poolString, iRowSet, (int)watch.Elapsed.TotalMilliseconds);
UpdateSQLPerfTable(testno, poolString, iRowSet, /*(int)*/watch.Elapsed.TotalMilliseconds);
watch.Reset();
if (rows == -1) // error happened, exit
{
Environment.Exit(rows);
}
}
iRowSet = iRowSet + iCacheInc;
}
iParamTest++;
} // iparamtest loop
} // end main
static int RunDbConnectionTest(string cnString, string cmString)
{
int result = 0;
try // outer try
{
OleDbConnection cn = new OleDbConnection(cnString);
cn.Open();
try
{ // inner try
OleDbCommand cm = cn.CreateCommand();
cm.Connection = cn;
cm.CommandText = cmString;
if (bReadValues)
{
OleDbDataReader dr = cm.ExecuteReader();
try
{
while (dr.Read())
{
int cnt = dr.FieldCount;
for (int i = 0; i < cnt; i++)
{
// simulate actually doing something with the data so it is 'read'
string msg = dr.GetValue(i).ToString();
}
}
}
finally
{
dr.Close();
cm.Dispose();
}
} // bReadValues
else // not reading data, just execute
{
try
{
result = cm.ExecuteNonQuery();
}
catch (Exception ex)
{
System.Console.WriteLine("");
System.Console.WriteLine(ex.Message);
result = -1;
}
finally
{
cm.Dispose();
}
} // !bReadValues
} // inner try
finally
{
cn.Close();
cn.Dispose();
}
} // outer try
catch (Exception e)
{
System.Console.WriteLine("");
System.Console.WriteLine(e.Message);
result = -1;
}
finally
{
}
return result;
}
static void CreateSQLPerfTable()
{
SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");
SqlCommand mySqlCommand = mySqlConnection.CreateCommand();
mySqlConnection.Open();
// drop perf table
Console.WriteLine("Dropping CSPerfTrials table");
mySqlCommand.CommandText =
"IF EXISTS (SELECT * FROM sys.objects " +
"WHERE object_id = OBJECT_ID(N'[dbo].[CSPerfTrials]') " +
"AND type in (N'U')) " +
"DROP TABLE [dbo].[CSPerfTrials]";
int result = mySqlCommand.ExecuteNonQuery();
Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);
// create new table
mySqlCommand.CommandText =
"CREATE TABLE dbo.CSPerfTrials" +
"( TestNumber int NULL" +
", ParamString Nvarchar(2000) NULL" +
", rowset int NULL" +
", durationMs float NULL" +
")";
Console.WriteLine("Creating dbo.CSPerfTrials table");
result = mySqlCommand.ExecuteNonQuery();
Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);
mySqlConnection.Close();
}
static void UpdateSQLPerfTable(int testno, string ParamString, int rowset, double durationMS)
{
// added pooling=false to the sql connection string to resolve an error condition
// may not be needed when in all cases but on 'fast' reads this appears to be needed
SqlConnection mySqlConnection = new SqlConnection("Persist Security Info=True;Integrated Security=SSPI;database=DB2TestDatabases;server=(local);pooling=false");
SqlCommand mySqlCommand = mySqlConnection.CreateCommand();
mySqlConnection.Open();
Console.WriteLine("Connection state is " +
mySqlConnection.State.ToString());
Console.WriteLine("inserting data into SQL table dbo.CSPerfTrials");
mySqlCommand.CommandText = @"insert into dbo.CSPerfTrials " +
"(TestNumber" +
",ParamString" +
",rowset" +
",durationMS"
+ ") values (" +
testno.ToString() + ",'" +
ParamString + "'," +
rowset.ToString() + "," +
durationMS.ToString() + ")";
int result = mySqlCommand.ExecuteNonQuery();
Console.WriteLine("mySqlCommand.ExecuteNonQuery() = " + result);
mySqlConnection.Close();
Console.WriteLine("Connection state is " +
mySqlConnection.State.ToString());
}
}
}
The SQL Queries I ran to give me the pivot charts is below. Modifying the “avg(durationms” to “sum(durationms” will give you the total times for each test run:
-- SSISPerf Pivot report
-- ***********************
-- Builds a Pivot report
USE [DB2TestDatabases]
go
declare @comstr nvarchar(4000) -- command string
declare @success int -- variable for executing SQL commands
declare @params nvarchar(255) --
declare @tempstr nvarchar(1500)
declare param_cursor CURSOR FOR
select distinct ParamString
from [dbo].CSPerfTrials
set @tempstr = ''
OPEN param_cursor
FETCH NEXT FROM param_cursor into @params
WHILE @@FETCH_STATUS = 0
BEGIN
set @tempstr = @tempstr + '[' + @params + ']'
FETCH NEXT FROM param_cursor into @params
if @@FETCH_STATUS = 0
begin
set @tempstr = @tempstr + ', '
end
END
CLOSE param_cursor
DEALLOCATE param_cursor
set @comstr = 'select rowset, ' + @tempstr + ' from (select rowset, ParamString, durationms/1000 as durationms '
set @comstr = @comstr + 'from dbo.CSPerfTrials) AS rf PIVOT ( avg(durationms) FOR ParamString IN ('
set @comstr = @comstr + @tempstr + ')) AS p'
exec @success=sp_executesql @comstr
Comments
- Anonymous
June 03, 2009
Thank you for submitting this cool story - Trackback from DotNetShoutout