对 SQL Server 大数据群集 Active Directory 集成进行故障排除

适用于: SQL Server 2019 (15.x)

本文介绍了如何对在 Active Directory 模式下部署 SQL Server 大数据群集进行故障排除。

重要

Microsoft SQL Server 2019 大数据群集附加产品将停用。 对 SQL Server 2019 大数据群集的支持将于 2025 年 2 月 28 日结束。 具有软件保障的 SQL Server 2019 的所有现有用户都将在平台上获得完全支持,在此之前,该软件将继续通过 SQL Server 累积更新进行维护。 有关详细信息,请参阅公告博客文章Microsoft SQL Server 平台上的大数据选项

症状

你已开始使用 Active Directory 模式部署 SQL Server 大数据群集。 部署停滞不前。

以下示例展示了 bash shell 中的部署结果。

The privacy statement can be viewed at:
https://go.microsoft.com/fwlink/?LinkId=853010
 
The license terms for SQL Server Big Data Cluster can be viewed at:
Enterprise: https://go.microsoft.com/fwlink/?linkid=2104292
Standard: https://go.microsoft.com/fwlink/?linkid=2104294
Developer: https://go.microsoft.com/fwlink/?linkid=2104079
 
Cluster deployment documentation can be viewed at:
https://aka.ms/bdc-deploy
 
NOTE: Cluster creation can take a significant amount of time depending on
configuration, network speed, and the number of nodes in the cluster.
 
Starting cluster deployment.
Cluster controller endpoint is available at bdc-control.contoso.com:30080, 193.168.5.14:30080.
Waiting for control plane to be ready after 5 minutes.
Waiting for control plane to be ready after 10 minutes.
Waiting for control plane to be ready after 15 minutes.
Waiting for control plane to be ready after 20 minutes.
Waiting for control plane to be ready after 25 minutes.

检查当前已部署的 Pod。

kubectl get pods -n mssql-cluster

以下列表仅显示已部署且包含在控制器内的 Pod。 没有创建任何计算、数据或存储池 Pod。

NAME              READY   STATUS    RESTARTS   AGE
appproxy-6q4rm    2/2     Running   0          32m
compute-0-0       3/3     Running   0          32m
control-n8jqh     3/3     Running   0          35m
controldb-0       2/2     Running   0          35m
controlwd-fgpj8   1/1     Running   0          34m
data-0-0          3/3     Running   0          32m
data-0-1          3/3     Running   0          32m
dns-fjp7n         2/2     Running   0          34m
gateway-0         2/2     Running   0          32m
logsdb-0          1/1     Running   0          34m
logsui-d26c5      1/1     Running   0          34m
master-0          3/4     Running   0          32m
master-1          3/4     Running   0          32m
master-2          3/4     Running   0          32m
metricsdb-0       1/1     Running   0          34m
metricsdc-c2kbh   1/1     Running   0          34m
metricsdc-lmqzx   1/1     Running   0          34m
metricsdc-r6499   1/1     Running   0          34m
metricsdc-tj99w   1/1     Running   0          34m
metricsui-dg8rz   1/1     Running   0          34m
mgmtproxy-dvzpc   2/2     Running   0          34m
nmnode-0-0        2/2     Running   0          32m
nmnode-0-1        2/2     Running   0          32m
operator-27gt9    1/1     Running   0          32m
sparkhead-0       4/4     Running   0          31m
sparkhead-1       4/4     Running   0          31m
storage-0-0       4/4     Running   0          31m
storage-0-1       4/4     Running   0          31m
storage-0-2       4/4     Running   0          31m
zookeeper-0       2/2     Running   0          32m
zookeeper-1       2/2     Running   0          32m
zookeeper-2       2/2     Running   0          32m

检查日志

若要确定部署退出而没有创建计算、数据或存储 pod 的原因,请查看以下日志:

  • 检查 controller.log (<folderOfDebugCopyLog>\debuglogs-mssql-cluster-20200219-093941\mssql-cluster\control-<suffix>\controller\controller\<date>\controller.log)。 查找以下条目:

    WARN | StatefulSet master is not ready with 0 ready pods and 3 unready pods

  • 检查 master-0 provisioner.log (<folderOfDebugCopyLog>\debuglogs-mssql-cluster-20200219-093941\mssql-cluster\master-0\mssql-server\provisioner\provisioner.log)

    ERROR | Failed to create sql login for domain user [<domain>.<top-level-domain>\<domain-group>]
      Traceback (most recent call last):
        File "/opt/provisioner/bin/scripts/provisioningpool.py", line 214, in executeNonQueries
          connection.execute_non_query(command)
        File "src/_mssql.pyx", line 1033, in _mssql.MSSQLConnection.execute_non_query
        File "src/_mssql.pyx", line 1061, in _mssql.MSSQLConnection.execute_non_query
        File "src/_mssql.pyx", line 1634, in _mssql.check_and_raise
        File "src/_mssql.pyx", line 1683, in _mssql.maybe_raise_MSSQLDatabaseException
      _mssql.MSSQLDatabaseException: (15401, b"Windows NT user or group '<domain>.<top-level-domain>\\<domain-group>' not found. Check the name again.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")
    WARNING | [3/3] Provisioning exception occurred during provisioning step: ProvisioningMasterPool.
    WARNING | Failed to create sql login for domain user [<domain>.<top-level-domain>\<domain-group>]
    WARNING | Retrying.
    

原因

在上面的示例中,由于域组的范围设置为本地域,因此部署无法为域用户创建登录名。 使用全局或通用范围内的组。 若要了解 Active Directory 组范围要求,请参阅在 Active Directory 模式下部署 SQL Server 大数据群集

验证

检查域组 (<domain-group>) 的范围。 使用 get-adgroup

如果 <domain-group> 组范围是本地域 (DomainLocal),则部署失败。

下面的 PowerShell 脚本检查名为 bdcadminsbdcusers 的两个 Active Directory 组的范围。 将名称替换为你的组的名称。

#Administrators and users Active Directory groups
$Cluster_admins_group='bdcadmins'
$Cluster_users_group='bdcusers'

#Performing Active Directory Group Checks...

#Active Directory admin group Check
$ClusterAdminGroupScope_Result = New-Object System.Collections.ArrayList
try {
    $GroupScope = Get-ADgroup -Identity $Cluster_admins_group | Select-Object -ExpandProperty GroupScope
    
    if ($GroupScope -eq 'DomainLocal') {
        [void]$ClusterAdminGroupScope_Result.Add("Misconfiguration - $Cluster_admins_group Group scope is $GroupScope, this scope is not supported, Please change group scope to either Global or Universal") 
    }
    else {
        [void]$ClusterAdminGroupScope_Result.Add("OK - $Cluster_admins_group Group scope is $GroupScope")
    }
}
catch {
    [void]$ClusterAdminGroupScope_Result.Add("Error - " + $_.exception.message)
}
#Ad users group check
$ClusterUsersGroupScope_Result = New-Object System.Collections.ArrayList
$GroupScope = ''
try {
    $GroupScope = Get-ADgroup -Identity $Cluster_users_group | Select-Object -ExpandProperty GroupScope
    
    if ($GroupScope -eq 'DomainLocal') {
        [void]$ClusterUsersGroupScope_Result.Add("Misconfiguration - $Cluster_users_group Group scope is $GroupScope, this scope is not supported, Please change group scope to either Global or Universal")
    } 
    else 
    { [void]$ClusterUsersGroupScope_Result.Add("OK - $Cluster_users_group Group scope is $GroupScope") }
}
catch {
    [void]$ClusterUsersGroupScope_Result.Add("Error - " + $_.exception.message)
}

#Display the results
$ClusterUsersGroupScope_Result

解决方法

若要解决此问题,请创建具有通用范围或全局范围的 Active Directory 组,然后重新运行部署。