Dela via


Felsöka Azure-tillägget för SQL Server

gäller för:SQL Server

Använd Azure Resource Graph för att identifiera tillståndet för Azure-tillägget för SQL Server på dina Azure Arc-aktiverade servrar. Den här artikeln visar frågor som identifierar tillägg som inte är felfria.

Tips

Om du inte redan är bekant kan du läsa mer om Azure Resource Graph:

Identifiera tillägg som inte är felfria

Du kan skapa en instrumentpanel i Azure-portalen för att visa hälsotillståndet för alla distribuerade Azure-tillägg för SQL Server.

Tips

Skapa en egen instrumentpanel med denna fil från GitHub-förvaret sql-server-samples: SQL Server som stöder Arc Health.json.

Fråga efter ohälsosamma tillägg

Den här frågan returnerar instanser av SQL Server på servrar med tillägg installerade, men inte felfria.

resources
| where type == "microsoft.hybridcompute/machines/extensions" 
| where properties.type in ("WindowsAgent.SqlServer", "LinuxAgent.SqlServer") 
| extend targetMachineName = tolower(tostring(split(id, '/')[8])) // Extract the machine name from the extension's id
| join kind=leftouter (
    resources
    | where type == "microsoft.hybridcompute/machines"
    | project machineId = id, MachineName = name, subscriptionId, LowerMachineName = tolower(name), resourceGroup , MachineStatus= properties.status , MachineProvisioningStatus= properties.provisioningState, MachineErrors = properties.errorDetails //Project relevant machine health information.
) on $left.targetMachineName == $right.LowerMachineName and $left.resourceGroup == $right.resourceGroup and $left.subscriptionId == $right.subscriptionId // Join Based on MachineName in the id and the machine's name, the resource group, and the subscription. This join allows us to present the data of the machine as well as the extension in the final output.
| extend statusExpirationLengthRange = 3d // Change this value to change the acceptable range for the last time an extension should have reported its status.
| extend startDate = startofday(now() - statusExpirationLengthRange), endDate = startofday(now()) // Get the start and end position for the given range.
| extend extractedDateString = extract("timestampUTC : (\\d{4}\\W\\d{2}\\W\\d{2})", 1, tostring(properties.instanceView.status.message)) // Extracting the date string for the LastUploadTimestamp. Is empty if none is found.
| extend extractedDateStringYear = split(extractedDateString, '/')[0], extractedDateStringMonth = split(extractedDateString, '/')[1], extractedDateStringDay = split(extractedDateString, '/')[2] // Identifying each of the parts of the date that was extracted from the message.
| extend extractedDate = todatetime(strcat(extractedDateStringYear,"-",extractedDateStringMonth,"-",extractedDateStringDay,"T00:00:00Z")) // Converting to a datetime object and rewriting string into ISO format because todatetime() does not work using the previous format.
| extend isNotInDateRange = not(extractedDate >= startDate and extractedDate <= endDate) // Created bool which is true if the date we extracted from the message is not within the specified range. This bool will also be true if the date was not found in the message.
| where properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy" // Begin searching for unhealthy extensions using the following 1. Does extension report being healthy. 2. Is last upload within the given range. 3. Is the upload status in an OK state. 4. Is provisioning state not in a succeeded state.
    or isNotInDateRange
    or properties.instanceView.status.message !contains "uploadStatus : OK"
    or properties.provisioningState != "Succeeded"
    or MachineStatus != "Connected"
| extend FailureReasons = strcat( // Makes a String to list all the reason that this resource got flagged for
        iif(MachineStatus != "Connected",strcat("- Machine's status is ", MachineStatus," -"),"") ,
        iif(MachineErrors != "[]","- Machine reports errors -", ""),
        iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy","- Extension reported unhealthy -",""), 
        iif(isNotInDateRange,"- Last upload outside acceptable range -",""),
        iif(properties.instanceView.status.message !contains "uploadStatus : OK","- Upload status is not reported OK -",""), 
        iif(properties.provisioningState != "Succeeded",strcat("- Extension provisiong state is ", properties.provisioningState," -"),"") 
    )
| extend RecommendedAction = //Attempt to Identify RootCause based on information gathered, and point customer to what they should investigate first.
    iif(MachineStatus == "Disconnected", "Machine is disconnected. Please reconnect the machine.",
        iif(MachineStatus == "Expired", "Machine cert is expired. Go to the machine on the Azure portal for more information on how to resolve this issue.",
            iif(MachineStatus != "Connected", strcat("Machine status is ", MachineStatus,". Investigate and resolve this issue."),
                iif(MachineProvisioningStatus != "Succeeded", strcat("Machine provisioning status is ", MachineProvisioningStatus, ". Investigate and resolve machine provisioning status"),
                    iff(MachineErrors != "[]", "Machine is reporting errors. Investigate and resolve machine errors",
                        iif(properties.provisioningState != "Succeeded", strcat("Extension provisioning status is ", properties.provisioningState,". Investigate and resolve extension provisioning state."),
                            iff(properties.instanceView.status.message !contains "SQL Server Extension Agent:" and properties.instanceView.status.message contains "SQL Server Extension Agent Deployer", "SQL Server extension employer ran. However, SQL Server extension seems to not be running. Verify that the extension is currently running.",
                                iff(properties.instanceView.status.message !contains "uploadStatus : OK" or isNotInDateRange or properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Extension reported as unhealthy. View FailureReasons and LastExtensionStatusMessage for more information as to the cause of the failure.",
                                    "Unable to recommend actions. Please view FailureReasons."
                                )
                            )
                        )
                    )
                )
            )
        )
    )
| project ID = id, MachineName, ResourceGroup = resourceGroup, SubscriptionID = subscriptionId, Location = location, RecommendedAction, FailureReasons, LicenseType = properties.settings.LicenseType, 
    LastReportedExtensionHealth = iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Unhealthy", "Healthy"),
    LastExtensionUploadTimestamp = iif(indexof(properties.instanceView.status.message, "timestampUTC : ") > 0,
        substring(properties.instanceView.status.message, indexof(properties.instanceView.status.message, "timestampUTC : ") + 15, 10),
        "no timestamp"),
    LastExtensionUploadStatus = iif(indexof(properties.instanceView.status.message, "uploadStatus : OK") > 0, "OK", "Unhealthy"),
    ExtensionProvisioningState = properties.provisioningState,
    MachineStatus, MachineErrors, MachineProvisioningStatus,MachineId = machineId,
    LastExtensionStatusMessage = properties.instanceView.status.message

Om du vill identifiera möjliga problem granskar du värdet i kolumnen RecommendedAction eller FailureReasons. Kolumnen RecommendedAction innehåller möjliga första steg för att lösa problemet eller ledtrådar för vad du ska kontrollera först. Kolumnen FailureReasons innehåller en lista över orsakerna till att resursen bedömdes vara ohälsosam. Kontrollera slutligen LastExtensionStatusMessage för att se det senast rapporterade meddelandet av agenten.

Felsökningsguide

Rekommenderad åtgärd Åtgärdsinformation
Datorcertifikatet har upphört att gälla.

Gå till datorn på Azure-portalen för mer information om hur du löser problemet.
Den Arc-aktiverade datorn måste återanslutas till Arc eftersom certifikatet som användes för att autentisera till Azure upphörde att gälla. Arc-datorstatusen är löpt ut i Azure-portalen. Avinstallera agenten genom att följa dokumentationen här och sedan omregistrera här. Du behöver inte ta bort Arc-aktiverade SQL Server-resurser i portalen om du registrerar dig igen. SQL-tillägget installeras automatiskt igen så länge automatisk registrering är aktiverat (standard).
Datorn är frånkopplad.

Återanslut datorn.
Arc-datorn finns i en state = Disconnected. Det här tillståndet kan bero på olika orsaker:

Arc-ansluten datoragent stoppas, inaktiveras eller kraschar hela tiden

eller

Anslutningen blockeras mellan agenten och Azure.

Kontrollera tillståndet för arc-anslutna datortjänster/-daemoner för att kontrollera att de är aktiverade och kör.

Kontrollera anslutningen.

Felsöka agenten med hjälp av den utförliga loggen.
Tillägget rapporteras som ohälsosamt.

Visa FailureReasons och LastExtensionStatusMessage för mer information om orsaken till felet.

Senaste uppladdning utanför acceptabelt intervall (inom de senaste tre dagarna).
Kontrollera kolumnen LastExtensionUploadTimestamp. Om det är Ingen tidsstämpelrapporterade den aldrig inventerings- eller användningsdata till Azure. Felsöka anslutning från SQL-tillägget till Azure.

Om den senaste uppladdningen ligger utanför det acceptabla intervallet (inom de senaste tre dagarna) och allt annat ser OK ut, till exempel LastExtensionUploadStatus, ExtensionProvisioningStateoch MachineStatus, är det möjligt att Tjänsten/daemon för Arc SQL-tillägget stoppas. Ta reda på varför den stoppas och starta den igen. Kontrollera LastExtensionStatusMessage- efter andra ledtrådar om problemet.
Status för tilläggsetablering är Misslyckades.

Undersök och lös tilläggets provisioneringsstatus.
Antingen misslyckades den inledande installationen av SQL-tillägget eller uppdateringen. Kontrollera distributionsloggarna och tilläggsloggarna.

Kontrollera värdet i LastExtensionStatusMessage.
Uppladdningsstatus rapporteras inte som korrekt Kontrollera kolumnen LastExtensionMessage på instrumentpanelen och titta på värdet uploadStatus och värdet uploadMessage (om det finns, beroende på version).

Värdet uploadStatus är vanligtvis en HTTP-felkod. Granska , felsök felkoder.

uploadMessage kan ha mer specifik information. Felsöka anslutning för generell Arc SQL-tillägg.
Status för tilläggsetablering är Uppdatera

eller

Etableringstillstånd för tillägg är Skapa

eller

Etableringstillståndet för tillägget är Misslyckades

eller

Etableringstillståndet för tillägg tar bort
Om ett visst tillägg förblir i ett av dessa tillstånd i mer än 30 minuter är det troligtvis ett problem med provisionering. Avinstallera tillägget och installera om det med hjälp av CLI eller portalen. Om problemet kvarstår kontrollerar distributions- och tilläggsloggarna.

Om borttagningen misslyckas kan du försöka avinstallera agenten och ta bort Arc-datorresursen i portalen om det behövs och sedan distribuera om den.

Avinstallera agenten genom att följa dokumentationen här och sedan registrera igen här.

Identifiera ohälsosamma tillägg (PowerShell)

Det här exemplet körs i PowerShell. Exemplet returnerar samma resultat som föregående fråga men via ett PowerShell-skript.

# PowerShell script to execute an Azure Resource Graph query using Azure CLI
# where the extension status is unhealthy or the extension last upload time isn't in this month or the previous month.

# Requires the Az.ResourceGraph PowerShell module

# Login to Azure if needed
#az login

# Define the Azure Resource Graph query
$query = @"
resources
| where type == "microsoft.hybridcompute/machines/extensions" 
| where properties.type in ("WindowsAgent.SqlServer", "LinuxAgent.SqlServer") 
| extend targetMachineName = tolower(tostring(split(id, '/')[8])) // Extract the machine name from the extension's id
| join kind=leftouter (
    resources
    | where type == "microsoft.hybridcompute/machines"
    | project machineId = id, MachineName = name, subscriptionId, LowerMachineName = tolower(name), resourceGroup , MachineStatus= properties.status , MachineProvisioningStatus= properties.provisioningState, MachineErrors = properties.errorDetails //Project relevant machine health information.
) on $left.targetMachineName == $right.LowerMachineName and $left.resourceGroup == $right.resourceGroup and $left.subscriptionId == $right.subscriptionId // Join Based on MachineName in the id and the machine's name, the resource group, and the subscription. This join allows us to present the data of the machine as well as the extension in the final output.
| extend statusExpirationLengthRange = 3d // Change this value to change the acceptable range for the last time an extension should have reported its status.
| extend startDate = startofday(now() - statusExpirationLengthRange), endDate = startofday(now()) // Get the start and end position for the given range.
| extend extractedDateString = extract("timestampUTC : (\\d{4}\\W\\d{2}\\W\\d{2})", 1, tostring(properties.instanceView.status.message)) // Extracting the date string for the LastUploadTimestamp. Is empty if none is found.
| extend extractedDateStringYear = split(extractedDateString, '/')[0], extractedDateStringMonth = split(extractedDateString, '/')[1], extractedDateStringDay = split(extractedDateString, '/')[2] // Identifying each of the parts of the date that was extracted from the message.
| extend extractedDate = todatetime(strcat(extractedDateStringYear,"-",extractedDateStringMonth,"-",extractedDateStringDay,"T00:00:00Z")) // Converting to a datetime object and rewriting string into ISO format because todatetime() does not work using the previous format.
| extend isNotInDateRange = not(extractedDate >= startDate and extractedDate <= endDate) // Created bool which is true if the date we extracted from the message is not within the specified range. This bool will also be true if the date was not found in the message.
| where properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy" // Begin searching for unhealthy extensions using the following 1. Does extension report being healthy. 2. Is last upload within the given range. 3. Is the upload status in an OK state. 4. Is provisioning state not in a succeeded state.
    or isNotInDateRange
    or properties.instanceView.status.message !contains "uploadStatus : OK"
    or properties.provisioningState != "Succeeded"
    or MachineStatus != "Connected"
| extend FailureReasons = strcat( // Makes a String to list all the reason that this resource got flagged for
        iif(MachineStatus != "Connected",strcat("- Machine's status is ", MachineStatus," -"),"") ,
        iif(MachineErrors != "[]","- Machine reports errors -", ""),
        iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy","- Extension reported unhealthy -",""), 
        iif(isNotInDateRange,"- Last upload outside acceptable range -",""),
        iif(properties.instanceView.status.message !contains "uploadStatus : OK","- Upload status is not reported OK -",""), 
        iif(properties.provisioningState != "Succeeded",strcat("- Extension provisiong state is ", properties.provisioningState," -"),"") 
    )
| extend RecommendedAction = //Attempt to Identify RootCause based on information gathered, and point customer to what they should investigate first.
    iif(MachineStatus == "Disconnected", "Machine is disconnected. Please reconnect the machine.",
        iif(MachineStatus == "Expired", "Machine cert is expired. Go to the machine on the Azure portal for more information on how to resolve this issue.",
            iif(MachineStatus != "Connected", strcat("Machine status is ", MachineStatus,". Investigate and resolve this issue."),
                iif(MachineProvisioningStatus != "Succeeded", strcat("Machine provisioning status is ", MachineProvisioningStatus, ". Investigate and resolve machine provisioning status"),
                    iff(MachineErrors != "[]", "Machine is reporting errors. Investigate and resolve machine errors",
                        iif(properties.provisioningState != "Succeeded", strcat("Extension provisioning status is ", properties.provisioningState,". Investigate and resolve extension provisioning state."),
                            iff(properties.instanceView.status.message !contains "SQL Server Extension Agent:" and properties.instanceView.status.message contains "SQL Server Extension Agent Deployer", "SQL Server extension employer ran. However, SQL Server extension seems to not be running. Verify that the extension is currently running.",
                                iff(properties.instanceView.status.message !contains "uploadStatus : OK" or isNotInDateRange or properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Extension reported as unhealthy. View FailureReasons and LastExtensionStatusMessage for more information as to the cause of the failure.",
                                    "Unable to recommend actions. Please view FailureReasons."
                                )
                            )
                        )
                    )
                )
            )
        )
    )
| project ID = id, MachineName, ResourceGroup = resourceGroup, SubscriptionID = subscriptionId, Location = location, RecommendedAction, FailureReasons, LicenseType = properties.settings.LicenseType, 
    LastReportedExtensionHealth = iif(properties.instanceView.status.message !contains "SQL Server Extension Agent: Healthy", "Unhealthy", "Healthy"),
    LastExtensionUploadTimestamp = iif(indexof(properties.instanceView.status.message, "timestampUTC : ") > 0,
        substring(properties.instanceView.status.message, indexof(properties.instanceView.status.message, "timestampUTC : ") + 15, 10),
        "no timestamp"),
    LastExtensionUploadStatus = iif(indexof(properties.instanceView.status.message, "uploadStatus : OK") > 0, "OK", "Unhealthy"),
    ExtensionProvisioningState = properties.provisioningState,
    MachineStatus, MachineErrors, MachineProvisioningStatus,MachineId = machineId,
    LastExtensionStatusMessage = properties.instanceView.status.message
"@

# Execute the Azure Resource Graph query
$result = Search-AzGraph -Query $query

# Output the results
$result | Format-Table -Property ExtensionHealth, LastUploadTimestamp, LastUploadStatus, Message

Om du vill identifiera möjliga problem granskar du värdet i kolumnen RecommendedAction eller FailureReasons. Kolumnen RecommendedAction innehåller möjliga första steg för att lösa problemet eller ledtrådar för vad du ska kontrollera först. Kolumnen FailureReasons innehåller en lista över orsakerna till att resursen bedömdes vara ohälsosam. Kontrollera slutligen LastExtensionStatusMessage för att se det senast rapporterade meddelandet av agenten.

Identifiera tillägg som saknar uppdateringar

Identifiera tillägg utan de senaste statusuppdateringarna. Den här frågan returnerar en lista över Azure-tillägg för SQL Server ordnade efter antalet dagar sedan tillägget senast uppdaterade sin status. Värdet "-1" indikerar att tillägget kraschade och att det finns en anropsstack i tilläggets status.

// Show the timestamp extracted
// If an extension has crashed (i.e. no heartbeat), fill timestamp with "1900/01/01, 00:00:00.000"
//
resources
| where type =~ 'microsoft.hybridcompute/machines/extensions'
| extend extensionStatus = parse_json(properties).instanceView.status.message
| extend timestampExtracted = extract(@"timestampUTC\s*:\s*(\d{4}/\d{2}/\d{2}, \d{2}:\d{2}:\d{2}\.\d{3})", 1, tostring(extensionStatus))
| extend timestampNullFilled = iff(isnull(timestampExtracted) or timestampExtracted == "", "1900/01/01, 00:00:00.000", timestampExtracted)
| extend timestampKustoFormattedString = strcat(replace(",", "", replace("/", "-", replace("/", "-", timestampNullFilled))), "Z")
| extend agentHeartbeatUtcTimestamp = todatetime(timestampKustoFormattedString)
| extend agentHeartbeatLagInDays = datetime_diff('day', now(), agentHeartbeatUtcTimestamp)
| project id, extensionStatus, agentHeartbeatUtcTimestamp, agentHeartbeatLagInDays
| limit 100
| order by ['agentHeartbeatLagInDays'] asc

Den här frågan returnerar antalet tillägg grupperade efter antalet dagar sedan tillägget senast uppdaterade dess status. Värdet "-1" indikerar att tillägget kraschade och att det finns en anropsstack i tilläggets status.

// Aggregate by timestamp
//
// -1: Crashed extension with no heartbeat, we got a stacktrace instead
//  0: Healthy
// >1: Stale/Offline
//
resources
| where type =~ 'microsoft.hybridcompute/machines/extensions'
| extend extensionStatus = parse_json(properties).instanceView.status.message
| extend timestampExtracted = extract(@"timestampUTC\s*:\s*(\d{4}/\d{2}/\d{2}, \d{2}:\d{2}:\d{2}\.\d{3})", 1, tostring(extensionStatus))
| extend timestampNullFilled = iff(isnull(timestampExtracted) or timestampExtracted == "", "1900/01/01, 00:00:00.000", timestampExtracted)
| extend timestampKustoFormattedString = strcat(replace(",", "", replace("/", "-", replace("/", "-", timestampNullFilled))), "Z")
| extend agentHeartbeatUtcTimestamp = todatetime(timestampKustoFormattedString)
| extend agentHeartbeatLagInDays = iff(agentHeartbeatUtcTimestamp == todatetime("1900/01/01, 00:00:00.000Z"), -1, datetime_diff('day', now(), agentHeartbeatUtcTimestamp))
| summarize numExtensions = count() by agentHeartbeatLagInDays
| order by numExtensions desc