APS / PDW Best Practices: Linked Server: OpenQuery VS EXEC
Many users utilize Linked Server functionality in order to issue queries to PDW using SSMS. This is completely supported and a practical way of processing. However it is important to pay attention to how you are issuing the queries through PDW. There are two popular methods, either using OPENQUERY or EXEC. The latter is the preferred method and will result in much more efficient processing, lower resource utilization on the APS installation, and simplification in the DMV's mentoring query execution. Open query is supported, however is mainly present for backwards compatibility. The behavior described below also occurs if issuing queries to a SQL instance. For this reason SQL also discourges users form using this method and ar in favor of the EXEC execution method.
See below for a detailed write up of what happens when a query is issued through OPENQUERY and EXEC and what impact it can have on an APS appliance.
Issue
Large number of queries/Sessions issued to the appliance during a short time period. Some of these include SP_PREPARE which the customer is not running.
Observations
- During the peak times, the CPU on the CTL node had very high utilization.
- There were a large number of queries running, many beyond the 32 concurrency limit
- There were a large number of prepared statement/parametrized query executions
- Also noticed explicit transactions and rollback
Findings
- The queries are being executed using SQL server linked server openquery syntax, such as:
select * from openquery(cssc8a ,' select top 1 * from dbo.col_test')
When running this syntax internally, I had the following findings:
- Two distinct sessions are being created to execute this query.
select * from sys.dm_pdw_exec_sessions
where session_id in ('SID148612' , 'SID148613')
order by login_time desc
session_id | status | request_id | security_id | login_name | login_time | query_count | is_transactional | client_id | app_name | sql_spid |
SID148613 | Closed | NULL | NULL | sa | 1/21/16 7:29 | 6 | 0 | 172.18.177.109:1090 | Microsoft SQL Server | 183 |
SID148612 | Closed | NULL | NULL | sa | 1/21/16 7:29 | 9 | 0 | 172.18.177.109:1089 | Microsoft SQL Server | 183 |
These two sessions are executing a total of 15 queries, even though only one was submitted.
I can see these two sessions are running the prepared statements as well as an explicit transaction.
select * from sys.dm_pdw_exec_requests
where session_id = 'SID148612'
order by submit_time desc
select * from sys.dm_pdw_exec_requests
where session_id = 'SID148613'
order by submit_time desc
request_id | session_id | status | submit_time | start_time | end_compile_time | end_time | total_elapsed_time | label | error_id | database_id | command | resource_class |
QID1912364 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 16 | NULL | NULL | 31 | SET NO_BROWSETABLE OFF | NULL |
QID1912362 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | exec [sp_unprepare] @P1 | NULL |
QID1912363 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | exec [sp_unprepare] @P1 | NULL |
QID1912359 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 31 | NULL | NULL | 31 | exec [sp_prepare] @P1 OUT, @P2, @P3, @P4 | NULL |
QID1912360 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 31 | NULL | NULL | 31 | exec [sp_prepare] @P1 OUT, @P2, @P3, @P4 | NULL |
QID1912361 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 31 | NULL | NULL | 31 | ;select top 1 * from dbo.col_test | NULL |
QID1912358 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | SET NO_BROWSETABLE ON | NULL |
QID1912357 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | SELECT @@SPID | NULL |
QID1912356 | SID148612 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | USE [tim_sandbox] | NULL |
request_id | session_id | status | submit_time | start_time | end_compile_time | end_time | total_elapsed_time | label | error_id | database_id | command | resource_class |
QID1912370 | SID148613 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 125 | NULL | NULL | 31 | rollback | NULL |
QID1912369 | SID148613 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 78 | NULL | NULL | 31 | ;select top 1 * from dbo.col_test | smallrc |
QID1912368 | SID148613 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | begin tran | NULL |
QID1912367 | SID148613 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | SET XACT_ABORT OFF | NULL |
QID1912366 | SID148613 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 16 | NULL | NULL | 31 | SELECT @@SPID | NULL |
QID1912365 | SID148613 | Completed | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 1/21/16 7:29 | 0 | NULL | NULL | 31 | USE [tim_sandbox] | NULL |
We believe this is causing unnecessary overhead. We know (in AU4 or earlier releases) that during times of high concurrency and a large amount of sessions are connecting, the CPU utilization will be increased on the control node and at some point new connections will begin to timeout. This is currently a limitation of the PDW Engine process. The limit this occurs varies with workload. With the above symptoms, we are effectively doubling the amount of new session requests which will cause this limit to be hit with less intentional sessions than anticipated. We are also doing a large amount of unnecessary work. The total duration for both sessions is 328ms.
As a comparison, I executed the same query using the following syntax:
exec ('select top 1 * from dbo.col_test') at cssc8a
This appears to create a single session which issues 3 queries.
session_id | status | request_id | security_id | login_name | login_time | query_count | is_transactional | client_id | app_name | sql_spid |
SID148614 | Closed | NULL | NULL | sa | 1/21/2016 7:36:38.38 | 3 | 0 | 172.18.177.109:1380 | Microsoft SQL Server | 114 |
These three queries seem much more reasonable:
request_id | session_id | status | submit_time | start_time | end_compile_time | end_time | total_elapsed_time | label | error_id | database_id | command | resource_class |
QID1912401 | SID148614 | Completed | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 31 | NULL | NULL | 31 | select top 1 * from dbo.col_test | smallrc |
QID1912400 | SID148614 | Completed | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 16 | NULL | NULL | 31 | SELECT @@SPID | NULL |
QID1912399 | SID148614 | Completed | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 1/21/2016 7:36:38 | 0 | NULL | NULL | 31 | USE [tim_sandbox] | NULL |
Total execution time using openquery: 328ms
Total Execution time using exec: 47ms
Conclusion
Using exec in lieu of openquery is far more efficient. We see a reduction in the number of sessions created, queries issued to PDW, and total elapsed time when using exec; about a 700% improvement with a single query.
Comments
- Anonymous
July 16, 2016
Great post Tim, any idea when 4 part naming will be supported on APS?