排查 Operations Manager 中的 SQL Server Service Broker 问题

Microsoft System Center Operations Manager 依赖于 SQL Server Service Broker 来实现所有任务操作。 如果禁用 SQL Server Service Broker,则所有任务操作都会受到影响。 生成的行为可能因启动的任务而异。 例如,可能会遇到以下问题:

  • 尽管后台的任务已完成,但发现向导似乎正在无休止地运行。
  • 即使后台的任务已完成,重置监视器的运行状况也从未完成。

本文提供了 SQL Server Service Broker 问题的常见故障排除步骤。

注意

本文中的 SQL 查询使用操作数据库的默认名称 OperationsManager 。 如果使用其他数据库名称,请替换为 OperationsManager 操作数据库的名称。

检查 SQL Server Service Broker 是否已启用

  1. 运行以下 SQL 查询:

    SELECT is_broker_enabled FROM sys.databases WHERE name = 'OperationsManager'
    
  2. 如果字段的返回值为 is_broker_enabled 1(一),则启用 SQL Server Service Broker。 否则,请运行以下 SQL 查询以启用它:

    ALTER DATABASE OperationsManager SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    ALTER DATABASE OperationsManager SET ENABLE_BROKER
    ALTER DATABASE OperationsManager SET MULTI_USER
    

重启 System Center Data Access Service

  1. 启用 SQL Server Service Broker 后,重启 System Center Data Access Service (OMSDK)。

  2. 在 SQL Server Management Studio 中,转到 Databases>OperationsManager>Service Broker

  3. 展开队列和服务

  4. 验证是否有一个队列和服务的名称包含以下值:

    • 创建队列和服务的管理服务器的 IP 地址。
    • 在该管理服务器上运行的 OMSDK 服务(Microsoft.Mom.Sdk.ServiceHost.exe)的进程 ID。

    验证在该管理服务器上运行的 OMSDK 服务的进程 ID。

    在此示例中,管理服务器的 IP 地址为 192.168.10.10。 OMSDK 服务的 PID 为 3092。

    OMSDK 服务的 PID 的屏幕截图。

    如果有多个管理服务器,则每个管理服务器都将具有单独的 Service Broker 队列和服务。

  5. 如果找不到相应的队列和服务,请再次重启 OMSDK 服务。

如果仍然找不到队列和服务,则当前 Service Broker 可能已损坏。 转到下一步以重新创建 SQL Server Service Broker。

重新创建 SQL Server Service Broker

  1. 按顺序运行以下 SQL 查询:

    declare @i int 
    set @i=0
    DECLARE @handle UNIQUEIDENTIFIER 
    declare @service_id int
    declare @service_name nvarchar (100)
    declare @far_service nvarchar (70)
    DECLARE conv_cur CURSOR FAST_FORWARD FOR 
    SELECT CONVERSATION_HANDLE, service_id, far_service 
    FROM SYS.CONVERSATION_ENDPOINTS 
    OPEN conv_cur; 
    FETCH NEXT FROM conv_cur INTO @handle, @service_id, @far_service; 
    while (@@FETCH_STATUS = 0 and (@i<500000))
    BEGIN
    select top 1 @service_name=name from sys.services where service_id=@service_id
    begin
    END CONVERSATION @handle WITH CLEANUP 
    end
    FETCH NEXT FROM conv_cur INTO @handle, @service_id, @far_service;
    set @i=@i+1
    END 
    CLOSE conv_cur
    DEALLOCATE conv_cur
    go
    
    declare @servicename sysname
    declare @queuename sysname
    declare @cmd nvarchar(200)
    declare @serverqid nvarchar(25)
    set @serverqid='%Queue_mid%'
    while ((select count(*) from sys.service_queues WHERE name like @serverqid )>1)
    begin
    set @servicename= (select top 1 s.name from sys.service_queues as q join sys.services as s on q.object_id=s.service_queue_id WHERE q.name like @serverqid order by q.create_date asc)
    set @cmd= 'DROP SERVICE '+@servicename
    exec sp_sqlexec @cmd
    set @queuename= (select top 1 Object_name(object_id) from sys.service_queues WHERE name like @serverqid order by create_date asc)
    set @cmd= 'DROP QUEUE '+@queuename
    exec sp_sqlexec @cmd
    end
    go
    
    ALTER DATABASE OperationsManager SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    ALTER DATABASE OperationsManager SET NEW_BROKER WITH ROLLBACK IMMEDIATE
    ALTER DATABASE OperationsManager SET MULTI_USER
    go
    
    ALTER DATABASE OperationsManager SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    ALTER DATABASE OperationsManager SET ENABLE_BROKER
    ALTER DATABASE OperationsManager SET MULTI_USER
    go
    
  2. 在管理服务器上重启 OMSDK 服务以重新创建 Service Broker 以及队列和服务。 此过程可能需要两次重启:

    • 第一次重启会重新创建 Service Broker
    • 第二次重启会重新创建服务队列
  3. 验证 SQL Server Service Broker 是否仍在启用。 如果已禁用,请启用它。

  4. 验证是否已生成 Service Broker 队列和服务,如重启 System Center Data Access Service 中的步骤 4 中所述。

高级故障排除

如果前面的步骤无法解决问题,请收集包含 Service Broker 事件的 SQL Server Profiler 跟踪。

收集 SQL Server Profiler 跟踪的屏幕截图。

  • OMSDK 服务在重新启动时创建服务和队列时的示例跟踪:

    创建队列和服务时的示例跟踪。

  • 重置监视器运行状况时的示例跟踪:

    重置监视器运行状况时的示例跟踪。

  • 禁用 Service Broker 时的示例跟踪:

    禁用 Service Broker 时的示例跟踪。

此外,针对操作数据库运行以下 SQL 脚本以收集诊断日志。

USE master
go

SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON;
DECLARE @StartTime datetime
select @@version as 'Version'
select GETDATE() as 'RunDateTime', GETUTCDATE() as 'RunUTCDateTime', SYSDATETIMEOFFSET() as 'SysDateTimeOffset'

select @@servername as 'ServerName'
PRINT '-- sys.databases --' 
select * from master.sys.databases where is_broker_enabled = 1 and name not in('tempdb', 'model', 'AdventureWorks', 'AdventureWorksDW')
PRINT ''

PRINT '-- sys.dm_broker_activated_tasks --' 
select * from sys.dm_broker_activated_tasks
PRINT ''

PRINT '-- sys.dm_broker_connections --' 
select * from sys.dm_broker_connections
PRINT ''
PRINT '-- COUNT Broker Connections --'
SELECT count(*) as Cnt, state_desc, login_state_desc from sys.dm_broker_connections GROUP BY state_desc, login_state_desc ORDER BY state_desc 
PRINT ''

PRINT '-- sys.dm_broker_forwarded_messages --' 
select * from sys.dm_broker_forwarded_messages
PRINT ''

PRINT '-- sys.service_broker_endpoints --' 
select * from sys.service_broker_endpoints
PRINT ''

PRINT '-- sys.tcp_endpoints --' 
select * from sys.tcp_endpoints
PRINT ''

PRINT '-- sys.certificates --' 
select * from sys.certificates
PRINT ''

PRINT '-- sys.database_mirroring --' 
select * from sys.database_mirroring where mirroring_guid is not null
PRINT ''

PRINT '-- sys.dm_db_mirroring_connections --' 
select * from sys.dm_db_mirroring_connections
PRINT ''

PRINT '-- sys.dm_os_memory_clerks (broker) --'  
select * from sys.dm_os_memory_clerks where type like '%BROKER%' order by type desc

-- Loop Through DBs and Gather SSB information specific to each DB
DECLARE tnames_cursor CURSOR
FOR SELECT name 
   FROM master.sys.databases 
   WHERE is_broker_enabled = 1 
   and state = 0 
   and name not in('tempdb', 'model', 'AdventureWorks', 'AdventureWorksDW')
   ORDER BY [name]
OPEN tnames_cursor;
DECLARE @dbname sysname;
DECLARE @SCI int; -- Checking for Broker activity
DECLARE @cmd3 nvarchar(1024); -- New Command
FETCH NEXT FROM tnames_cursor INTO @dbname;
WHILE (@@FETCH_STATUS = 0)
BEGIN
   SELECT @SCI = 0; -- service_contract_id
   select @dbname = RTRIM(@dbname);
   EXEC ('USE [' + @dbname + ']');
   SELECT @cmd3 = N'SELECT @SCI_OUT = MAX(service_contract_id) FROM ' + @dbname + '.sys.service_contracts';
   EXEC sp_executesql @cmd3, N'@SCI_OUT INT OUTPUT', @SCI_OUT = @SCI OUTPUT; 
   IF @SCI > 7
      BEGIN
      PRINT ''
      PRINT '====================================================================================='
      PRINT 'Begin Database: ' + @dbname
      SELECT @StartTime = GETDATE()
      PRINT 'Start Time : ' + CONVERT(Varchar(50), @StartTime)
    
      PRINT ''
      PRINT '-- sys.service_message_types --'
      EXEC ('SELECT  * FROM ' + @dbname + '.sys.service_message_types');
      
      -- PRINT ''
      -- PRINT '-- sys.service_contract_message_usages --' 
      -- EXEC ('SELECT * FROM ' + @dbname + '.sys.service_contract_message_usages');
      
      PRINT ''
      PRINT '-- sys.service_contracts --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.service_contracts');
      
      -- PRINT ''
      -- print '-- sys.service_contract_usages --' 
      -- EXEC ('SELECT * FROM ' + @dbname + '.sys.service_contract_usages');
      
      PRINT ''
      PRINT '-- sys.service_queues --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.service_queues');
      
      -- PRINT ''
      -- PRINT '-- sys.service_queue_usages --' 
      -- EXEC ('SELECT * FROM ' + @dbname + '.sys.service_queue_usages');
      
      PRINT ''
      PRINT '-- sys.services --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.services');
      
      PRINT ''
      PRINT '-- sys.routes --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.routes');
      
      PRINT ''
      PRINT '-- sys.remote_service_bindings --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.remote_service_bindings');
      
      PRINT ''
      PRINT '-- sys.certificates --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.certificates');
      
      PRINT ''
      PRINT '-- sys.dm_qn_subscriptions --' 
      EXEC ('SELECT * FROM ' + @dbname + '.sys.dm_qn_subscriptions');
      
      PRINT '-- sys.dm_broker_queue_monitors, current state, last activation, current backlog in transmission queue --' 
      EXEC ('USE ' + @dbname + ';SELECT t1.name AS [Service_Name],  t3.name AS [Schema_Name],  t2.name AS [Queue_Name],  
      CASE WHEN t4.state IS NULL THEN ''Not available'' 
      ELSE t4.state 
      END AS [Queue_State],  
      CASE WHEN t4.tasks_waiting IS NULL THEN ''--'' 
      ELSE CONVERT(VARCHAR, t4.tasks_waiting) 
      END AS tasks_waiting, 
      CASE WHEN t4.last_activated_time IS NULL THEN ''--'' 
      ELSE CONVERT(varchar, t4.last_activated_time) 
      END AS last_activated_time ,  
      CASE WHEN t4.last_empty_rowset_time IS NULL THEN ''--'' 
      ELSE CONVERT(varchar,t4.last_empty_rowset_time) 
      END AS last_empty_rowset_time, 
      ( 
         SELECT COUNT(*) 
         FROM sys.transmission_queue t6 WITH (NOLOCK)
         WHERE (t6.from_service_name = t1.name) 
      ) AS [Tran_Message_Count],
      DB_NAME() AS DB_NAME 
      FROM sys.services t1 WITH (NOLOCK) INNER JOIN sys.service_queues t2 WITH (NOLOCK)
      ON ( t1.service_queue_id = t2.object_id )   
      INNER JOIN sys.schemas t3 WITH (NOLOCK) ON ( t2.schema_id = t3.schema_id )  
      LEFT OUTER JOIN sys.dm_broker_queue_monitors t4 WITH (NOLOCK)
      ON ( t2.object_id = t4.queue_id  AND t4.database_id = DB_ID() )  
      INNER JOIN sys.databases t5 WITH (NOLOCK) ON ( t5.database_id = DB_ID() );')
      PRINT ''
      

      PRINT ''
      PRINT 'sys.transmission_queue (toal count, group count, and top 500)'

      -- Using count against MetaData columns rather than COUNT(*) because it is faster, and we don't need exact counts
      PRINT '-- TOTAL COUNT sys.transmission_queue --' 
      EXEC ('SELECT p.rows as TQ_Count FROM ' + @dbname + '.sys.objects as o join ' + @dbname + '.sys.partitions as p on p.object_id = o.object_id where o.name = ''sysxmitqueue''')
      -- EXEC ('SELECT count(*) as TQ_Count FROM ' + @dbname + '.sys.transmission_queue with (nolock)');  -- more accurate count

      PRINT ''
      PRINT '-- GROUP COUNT sys.transmission_queue --'
      SELECT COUNT(*) as TQ_GroupCnt, transmission_status FROM sys.transmission_queue GROUP BY transmission_status
      
      PRINT ''
      PRINT 'TOP 500'
      print '-- sys.transmission_queue --' 
      EXEC ('USE ' + @dbname + ';SELECT top 500 conversation_handle, to_service_name, to_broker_instance, from_service_name, 
         service_contract_name, enqueue_time, message_sequence_number, message_type_name, is_conversation_error, 
         is_end_of_dialog, priority, transmission_status, DB_NAME() as DB_Name FROM ' + @dbname + '.sys.transmission_queue with (nolock) order by enqueue_time, message_sequence_number');
      
      PRINT ''
      print 'sys.conversation_endpoints (total count, group count, and top 500)'
      -- Using count against MetaData columns rather than COUNT(*) becuase it is faster, and we don't need exact counts
      PRINT '-- TOTAL COUNT sys.conversation_endpoints --'
      EXEC ('SELECT p.rows as CE_Count FROM ' + @dbname + '.sys.objects as o join ' + @dbname + '.sys.partitions as p on p.object_id = o.object_id  where o.name = ''sysdesend''')
      -- EXEC ('SELECT count(*) as count FROM ' + @dbname + '.sys.conversation_endpoints with (nolock)');
      
      PRINT ''
      PRINT '-- GROUP COUNT sys.conversation_endpoints --'
      EXEC  ('SELECT COUNT(*) as CE_GroupCnt, state_desc FROM ' + @dbname + '. sys.conversation_endpoints GROUP BY state_desc')
      
      PRINT ''
      PRINT 'TOP 500'
      PRINT '-- sys.conversation_endpoints --'
      EXEC ('USE ' + @dbname + ';SELECT top 500 *, DB_NAME() as DB_Name FROM ' + @dbname + '.sys.conversation_endpoints with (nolock)');
      
    -- Gather Activation Proc Code
    /*
      SET QUOTED_IDENTIFIER OFF;
      DECLARE @cmd nvarchar(1024)
      DECLARE @cmd2 nvarchar(1024)
      select @cmd = 'DECLARE tproc_cursor CURSOR FOR select activation_procedure from ' + @dbname + '.sys.service_queues where activation_procedure is not null'
      EXEC (@cmd)
      OPEN tproc_cursor;
      DECLARE @proc sysname;
      DECLARE @len int
      FETCH NEXT FROM tproc_cursor INTO @proc;
      WHILE (@@FETCH_STATUS = 0)
      BEGIN
         select @proc = rtrim(@proc)
         select @len = len(@proc) - 8;
         select @proc = substring(@proc, 8, @len)
         select @proc
         EXEC ("select definition from " + @dbname + ".sys.sql_modules where definition like '%" + @proc + "%'")
         FETCH NEXT FROM tproc_cursor INTO @proc;
      END;
      CLOSE tproc_cursor;
      DEALLOCATE tproc_cursor;
      SET QUOTED_IDENTIFIER ON;
   */
   
   PRINT ''
   PRINT 'End of Database: ' + @dbname 
   PRINT 'END Time : ' + CONVERT(Varchar(50), GetDate())
   PRINT 'Data Collection Duration in milliseconds for ' + @dbname
   PRINT ''
   SELECT DATEDIFF(millisecond, @StartTime, GETDATE()) as Duration_ms

   PRINT '====================================================================================='
   PRINT '====================================================================================='
   PRINT '' 
   END;
   FETCH NEXT FROM tnames_cursor INTO @dbname;
END;
CLOSE tnames_cursor;
DEALLOCATE tnames_cursor;

参考