Query's uitvoeren op logboeken van Container Insights

Artikel
10/15/2024

Container insights verzamelt prestatiegegevens, inventarisgegevens en statusgegevens van containerhosts en containers. De gegevens worden om de drie minuten verzameld en doorgestuurd naar de Log Analytics-werkruimte in Azure Monitor, waar deze beschikbaar zijn voor logboekquery's met behulp van Log Analytics in Azure Monitor.

U kunt deze gegevens toepassen op scenario's met migratieplanning, capaciteitsanalyse, detectie en prestatieproblemen op aanvraag. Azure Monitor-logboeken kunnen u helpen bij het zoeken naar trends, het diagnosticeren van knelpunten, prognoses of het correleren van gegevens waarmee u kunt bepalen of de huidige clusterconfiguratie optimaal presteert.

Zie Query's gebruiken in Azure Monitor Log Analytics voor meer informatie over het gebruik van deze query's. Zie de log analytics-zelfstudie voor een volledige zelfstudie over het gebruik van Log Analytics om query's uit te voeren en met hun resultaten te werken.

Belangrijk

De query's in dit artikel zijn afhankelijk van gegevens die zijn verzameld door Container Insights en zijn opgeslagen in een Log Analytics-werkruimte. Als u de standaardinstellingen voor gegevensverzameling hebt gewijzigd, retourneren de query's mogelijk niet de verwachte resultaten. Met name als u het verzamelen van prestatiegegevens hebt uitgeschakeld omdat u metrische Prometheus-gegevens voor het cluster hebt ingeschakeld, worden er geen resultaten geretourneerd door query's die gebruikmaken van de Perf tabel.

Zie Gegevensverzameling configureren in Container Insights met behulp van een regel voor het verzamelen van gegevens voor vooraf ingestelde configuraties, waaronder het uitschakelen van het verzamelen van prestatiegegevens. Zie Gegevensverzameling configureren in Container Insights met behulp van ConfigMap voor verdere opties voor gegevensverzameling.

Log Analytics openen

Er zijn meerdere opties voor het starten van Log Analytics. Elke optie begint met een ander bereik. Selecteer Logboeken in het menu Bewaking voor toegang tot alle gegevens in de werkruimte. Als u de gegevens wilt beperken tot één Kubernetes-cluster, selecteert u Logboeken in het menu van dat cluster.

Bestaande logboekquery's

U hoeft niet per se te begrijpen hoe u een logboekquery schrijft om Log Analytics te gebruiken. U kunt kiezen uit meerdere vooraf gemaakte query's. U kunt de query's uitvoeren zonder deze te wijzigen of gebruiken als een begin met een aangepaste query. Selecteer query's boven aan het scherm Log Analytics en bekijk query's met een resourcetype Kubernetes Services.

Containertabellen

Zie de azure Monitor-tabelreferentie voor een lijst met tabellen en de bijbehorende gedetailleerde beschrijvingen die door Container Insights worden gebruikt. Al deze tabellen zijn beschikbaar voor logboekquery's.

Voorbeeld van logboekquery's

Het is vaak handig om query's te bouwen die beginnen met een voorbeeld of twee en deze vervolgens aanpassen aan uw vereisten. Als u meer geavanceerde query's wilt bouwen, kunt u experimenteren met de volgende voorbeeldquery's.

Alle levenscyclusgegevens van een container weergeven

ContainerInventory
| project Computer, Name, Image, ImageTag, ContainerState, CreatedTime, StartedTime, FinishedTime
| render table

Kubernetes-gebeurtenissen

Notitie

Standaard worden normale gebeurtenistypen niet verzameld, dus u ziet deze niet wanneer u een query uitvoert op de KubeEvents-tabel, tenzij de collect_all_kube_events ConfigMap-instelling is ingeschakeld. Als u normale gebeurtenissen wilt verzamelen, schakelt u collect_all_kube_events instelling in de container-azm-ms-agentconfig ConfigMap in. Zie Agentgegevensverzameling configureren voor Container Insights voor informatie over het configureren van de ConfigMap.

KubeEvents
| where not(isempty(Namespace))
| sort by TimeGenerated desc
| render table

Container CPU

Perf
| where ObjectName == "K8SContainer" and CounterName == "cpuUsageNanoCores" 
| summarize AvgCPUUsageNanoCores = avg(CounterValue) by bin(TimeGenerated, 30m), InstanceName

Containergeheugen

Deze query gebruikt memoryRssBytes die alleen beschikbaar is voor Linux-knooppunten.

Perf
| where ObjectName == "K8SContainer" and CounterName == "memoryRssBytes"
| summarize AvgUsedRssMemoryBytes = avg(CounterValue) by bin(TimeGenerated, 30m), InstanceName

Aanvragen per minuut met aangepaste metrische gegevens

InsightsMetrics
| where Name == "requests_count"
| summarize Val=any(Val) by TimeGenerated=bin(TimeGenerated, 1m)
| sort by TimeGenerated asc
| project RequestsPerMinute = Val - prev(Val), TimeGenerated
| render barchart

Pods op naam en naamruimte

let startTimestamp = ago(1h);
KubePodInventory
| where TimeGenerated > startTimestamp
| project ContainerID, PodName=Name, Namespace
| where PodName contains "name" and Namespace startswith "namespace"
| distinct ContainerID, PodName
| join
(
    ContainerLog
    | where TimeGenerated > startTimestamp
)
on ContainerID
// at this point before the next pipe, columns from both tables are available to be "projected". Due to both
// tables having a "Name" column, we assign an alias as PodName to one column which we actually want
| project TimeGenerated, PodName, LogEntry, LogEntrySource
| summarize by TimeGenerated, LogEntry
| order by TimeGenerated desc

Uitschalen van pods (HPA)

Deze query retourneert het aantal uitgeschaalde replica's in elke implementatie. Hiermee wordt het uitschaalpercentage berekend met het maximum aantal replica's dat is geconfigureerd in HPA.

let _minthreshold = 70; // minimum threshold goes here if you want to setup as an alert
let _maxthreshold = 90; // maximum threshold goes here if you want to setup as an alert
let startDateTime = ago(60m);
KubePodInventory
| where TimeGenerated >= startDateTime 
| where Namespace !in('default', 'kube-system') // List of non system namespace filter goes here.
| extend labels = todynamic(PodLabel)
| extend deployment_hpa = reverse(substring(reverse(ControllerName), indexof(reverse(ControllerName), "-") + 1))
| distinct tostring(deployment_hpa)
| join kind=inner (InsightsMetrics 
    | where TimeGenerated > startDateTime 
    | where Name == 'kube_hpa_status_current_replicas'
    | extend pTags = todynamic(Tags) //parse the tags for values
    | extend ns = todynamic(pTags.k8sNamespace) //parse namespace value from tags
    | extend deployment_hpa = todynamic(pTags.targetName) //parse HPA target name from tags
    | extend max_reps = todynamic(pTags.spec_max_replicas) // Parse maximum replica settings from HPA deployment
    | extend desired_reps = todynamic(pTags.status_desired_replicas) // Parse desired replica settings from HPA deployment
    | summarize arg_max(TimeGenerated, *) by tostring(ns), tostring(deployment_hpa), Cluster=toupper(tostring(split(_ResourceId, '/')[8])), toint(desired_reps), toint(max_reps), scale_out_percentage=(desired_reps * 100 / max_reps)
    //| where scale_out_percentage > _minthreshold and scale_out_percentage <= _maxthreshold
    )
    on deployment_hpa

Scale-outs van Nodepool

Deze query retourneert het aantal actieve knooppunten in elke knooppuntgroep. Hiermee wordt het aantal beschikbare actieve knooppunten en de maximale knooppuntconfiguratie in de instellingen voor automatische schaalaanpassing berekend om het uitschaalpercentage te bepalen. Zie opmerkingen in de query om deze te gebruiken voor een aantal waarschuwingsregels voor resultaten .

let nodepoolMaxnodeCount = 10; // the maximum number of nodes in your auto scale setting goes here.
let _minthreshold = 20;
let _maxthreshold = 90;
let startDateTime = 60m;
KubeNodeInventory
| where TimeGenerated >= ago(startDateTime)
| extend nodepoolType = todynamic(Labels) //Parse the labels to get the list of node pool types
| extend nodepoolName = todynamic(nodepoolType[0].agentpool) // parse the label to get the nodepool name or set the specific nodepool name (like nodepoolName = 'agentpool)'
| summarize nodeCount = count(Computer) by ClusterName, tostring(nodepoolName), TimeGenerated
//(Uncomment the below two lines to set this as a log search alert)
//| extend scaledpercent = iff(((nodeCount * 100 / nodepoolMaxnodeCount) >= _minthreshold and (nodeCount * 100 / nodepoolMaxnodeCount) < _maxthreshold), "warn", "normal")
//| where scaledpercent == 'warn'
| summarize arg_max(TimeGenerated, *) by nodeCount, ClusterName, tostring(nodepoolName)
| project ClusterName, 
    TotalNodeCount= strcat("Total Node Count: ", nodeCount),
    ScaledOutPercentage = (nodeCount * 100 / nodepoolMaxnodeCount),  
    TimeGenerated, 
    nodepoolName

Beschikbaarheid van systeemcontainers (replicaset)

Deze query retourneert de systeemcontainers (replicasets) en rapporteert het percentage niet beschikbaar. Zie opmerkingen in de query om deze te gebruiken voor een aantal waarschuwingsregels voor resultaten .

let startDateTime = 5m; // the minimum time interval goes here
let _minalertThreshold = 50; //Threshold for minimum and maximum unavailable or not running containers
let _maxalertThreshold = 70;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated
| summarize Clustersnapshot = count() by ClusterName
| join kind=inner (
    KubePodInventory
    | where TimeGenerated >= ago(startDateTime)
    | where Namespace in('default', 'kube-system') and ControllerKind == 'ReplicaSet' // the system namespace filter goes here
    | distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, ServiceName, PodLabel, Namespace, ContainerStatus
    | summarize arg_max(TimeGenerated, *), TotalPODCount = count(), podCount = sumif(1, PodStatus == 'Running' or PodStatus != 'Running'), containerNotrunning = sumif(1, ContainerStatus != 'running')
        by ClusterName, TimeGenerated, ServiceName, PodLabel, Namespace
    )
    on ClusterName
| project ClusterName, ServiceName, podCount, containerNotrunning, containerNotrunningPercent = (containerNotrunning * 100 / podCount), TimeGenerated, PodStatus, PodLabel, Namespace, Environment = tostring(split(ClusterName, '-')[3]), Location = tostring(split(ClusterName, '-')[4]), ContainerStatus
//Uncomment the below line to set for automated alert
//| where PodStatus == "Running" and containerNotrunningPercent > _minalertThreshold and containerNotrunningPercent < _maxalertThreshold
| summarize arg_max(TimeGenerated, *), c_entry=count() by PodLabel, ServiceName, ClusterName
//Below lines are to parse the labels to identify the impacted service/component name
| extend parseLabel = replace(@'k8s-app', @'k8sapp', PodLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/component', @'appkubernetesiocomponent', parseLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/instance', @'appkubernetesioinstance', parseLabel)
| extend tags = todynamic(parseLabel)
| extend tag01 = todynamic(tags[0].app)
| extend tag02 = todynamic(tags[0].k8sapp)
| extend tag03 = todynamic(tags[0].appkubernetesiocomponent)
| extend tag04 = todynamic(tags[0].aadpodidbinding)
| extend tag05 = todynamic(tags[0].appkubernetesioinstance)
| extend tag06 = todynamic(tags[0].component)
| project ClusterName, TimeGenerated,
    ServiceName = strcat( ServiceName, tag01, tag02, tag03, tag04, tag05, tag06),
    ContainerUnavailable = strcat("Unavailable Percentage: ", containerNotrunningPercent),
    PodStatus = strcat("PodStatus: ", PodStatus), 
    ContainerStatus = strcat("Container Status: ", ContainerStatus)

Beschikbaarheid van systeemcontainers (daemonsets)

Deze query retourneert de systeemcontainers (daemonsets) en rapporteert het niet-beschikbare percentage. Zie opmerkingen in de query om deze te gebruiken voor een aantal waarschuwingsregels voor resultaten .

let startDateTime = 5m; // the minimum time interval goes here
let _minalertThreshold = 50; //Threshold for minimum and maximum unavailable or not running containers
let _maxalertThreshold = 70;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated
| summarize Clustersnapshot = count() by ClusterName
| join kind=inner (
    KubePodInventory
    | where TimeGenerated >= ago(startDateTime)
    | where Namespace in('default', 'kube-system') and ControllerKind == 'DaemonSet' // the system namespace filter goes here
    | distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, ServiceName, PodLabel, Namespace, ContainerStatus
    | summarize arg_max(TimeGenerated, *), TotalPODCount = count(), podCount = sumif(1, PodStatus == 'Running' or PodStatus != 'Running'), containerNotrunning = sumif(1, ContainerStatus != 'running')
        by ClusterName, TimeGenerated, ServiceName, PodLabel, Namespace
    )
    on ClusterName
| project ClusterName, ServiceName, podCount, containerNotrunning, containerNotrunningPercent = (containerNotrunning * 100 / podCount), TimeGenerated, PodStatus, PodLabel, Namespace, Environment = tostring(split(ClusterName, '-')[3]), Location = tostring(split(ClusterName, '-')[4]), ContainerStatus
//Uncomment the below line to set for automated alert
//| where PodStatus == "Running" and containerNotrunningPercent > _minalertThreshold and containerNotrunningPercent < _maxalertThreshold
| summarize arg_max(TimeGenerated, *), c_entry=count() by PodLabel, ServiceName, ClusterName
//Below lines are to parse the labels to identify the impacted service/component name
| extend parseLabel = replace(@'k8s-app', @'k8sapp', PodLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/component', @'appkubernetesiocomponent', parseLabel)
| extend parseLabel = replace(@'app.kubernetes.io\\/instance', @'appkubernetesioinstance', parseLabel)
| extend tags = todynamic(parseLabel)
| extend tag01 = todynamic(tags[0].app)
| extend tag02 = todynamic(tags[0].k8sapp)
| extend tag03 = todynamic(tags[0].appkubernetesiocomponent)
| extend tag04 = todynamic(tags[0].aadpodidbinding)
| extend tag05 = todynamic(tags[0].appkubernetesioinstance)
| extend tag06 = todynamic(tags[0].component)
| project ClusterName, TimeGenerated,
    ServiceName = strcat( ServiceName, tag01, tag02, tag03, tag04, tag05, tag06),
    ContainerUnavailable = strcat("Unavailable Percentage: ", containerNotrunningPercent),
    PodStatus = strcat("PodStatus: ", PodStatus), 
    ContainerStatus = strcat("Container Status: ", ContainerStatus)

Containerlogboeken

Containerlogboeken voor AKS worden opgeslagen in de ContainerLogV2-tabel. U kunt de volgende voorbeeldquery's uitvoeren om te zoeken naar de uitvoer van stderr-/stdout-logboeken van doelpods, implementaties of naamruimten.

Containerlogboeken voor een specifieke pod, naamruimte en container

ContainerLogV2
| where _ResourceId =~ "clusterResourceID" //update with resource ID
| where PodNamespace == "podNameSpace" //update with target namespace
| where PodName == "podName" //update with target pod
| where ContainerName == "containerName" //update with target container
| project TimeGenerated, Computer, ContainerId, LogMessage, LogSource

Containerlogboeken voor een specifieke implementatie

let KubePodInv = KubePodInventory
| where _ResourceId =~ "clusterResourceID" //update with resource ID
| where Namespace == "deploymentNamespace" //update with target namespace
| where ControllerKind == "ReplicaSet"
| extend deployment = reverse(substring(reverse(ControllerName), indexof(reverse(ControllerName), "-") + 1))
| where deployment == "deploymentName" //update with target deployment
| extend ContainerId = ContainerID
| summarize arg_max(TimeGenerated, *)  by deployment, ContainerId, PodStatus, ContainerStatus
| project deployment, ContainerId, PodStatus, ContainerStatus;

KubePodInv
| join
(
    ContainerLogV2
  | where TimeGenerated >= startTime and TimeGenerated < endTime
  | where PodNamespace == "deploymentNamespace" //update with target namespace
  | where PodName startswith "deploymentName" //update with target deployment
) on ContainerId
| project TimeGenerated, deployment, PodName, PodStatus, ContainerName, ContainerId, ContainerStatus, LogMessage, LogSource

Containerlogboeken voor een mislukte pod in een specifieke naamruimte

    let KubePodInv = KubePodInventory
    | where TimeGenerated >= startTime and TimeGenerated < endTime
    | where _ResourceId =~ "clustereResourceID" //update with resource ID
    | where Namespace == "podNamespace" //update with target namespace
    | where PodStatus == "Failed"
    | extend ContainerId = ContainerID
    | summarize arg_max(TimeGenerated, *)  by  ContainerId, PodStatus, ContainerStatus
    | project ContainerId, PodStatus, ContainerStatus;

    KubePodInv
    | join
    (
        ContainerLogV2
    | where TimeGenerated >= startTime and TimeGenerated < endTime
    | where PodNamespace == "podNamespace" //update with target namespace
    ) on ContainerId
    | project TimeGenerated, PodName, PodStatus, ContainerName, ContainerId, ContainerStatus, LogMessage, LogSource

Standaardvisualisatiequery's voor Container Insights

Deze query's worden gegenereerd op basis van de out-of-the-box-visualisaties van containerinzichten . U kunt ervoor kiezen om deze te gebruiken als u aangepaste instellingen voor kostenoptimalisatie hebt ingeschakeld in plaats van de standaardgrafieken.

Aantal knooppunten op status

De vereiste tabellen voor deze grafiek bevatten KubeNodeInventory.

 let trendBinSize = 5m;
 let maxListSize = 1000;
 let clusterId = 'clusterResourceID'; //update with resource ID
 
 let rawData = KubeNodeInventory 
| where ClusterId =~ clusterId 
| distinct ClusterId, TimeGenerated 
| summarize ClusterSnapshotCount = count() by Timestamp = bin(TimeGenerated, trendBinSize), ClusterId 
| join hint.strategy=broadcast ( KubeNodeInventory 
| where ClusterId =~ clusterId 
| summarize TotalCount = count(), ReadyCount = sumif(1, Status contains ('Ready')) by ClusterId, Timestamp = bin(TimeGenerated, trendBinSize) 
| extend NotReadyCount = TotalCount - ReadyCount ) on ClusterId, Timestamp 
| project ClusterId, Timestamp, TotalCount = todouble(TotalCount) / ClusterSnapshotCount, ReadyCount = todouble(ReadyCount) / ClusterSnapshotCount, NotReadyCount = todouble(NotReadyCount) / ClusterSnapshotCount;

 rawData 
| order by Timestamp asc 
| summarize makelist(Timestamp, maxListSize), makelist(TotalCount, maxListSize), makelist(ReadyCount, maxListSize), makelist(NotReadyCount, maxListSize) by ClusterId 
| join ( rawData 
| summarize Avg_TotalCount = avg(TotalCount), Avg_ReadyCount = avg(ReadyCount), Avg_NotReadyCount = avg(NotReadyCount) by ClusterId ) on ClusterId 
| project ClusterId, Avg_TotalCount, Avg_ReadyCount, Avg_NotReadyCount, list_Timestamp, list_TotalCount, list_ReadyCount, list_NotReadyCount

Aantal pods op status

De vereiste tabellen voor deze grafiek bevatten KubePodInventory.

 let trendBinSize = 5m;
 let maxListSize = 1000;
 let clusterId = 'clusterResourceID'; //update with resource ID
 
 let rawData = KubePodInventory 
| where ClusterId =~ clusterId 
| distinct ClusterId, TimeGenerated 
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterId 
| join hint.strategy=broadcast ( KubePodInventory 
| where ClusterId =~ clusterId 
| summarize PodStatus=any(PodStatus) by TimeGenerated, PodUid, ClusterId 
| summarize TotalCount = count(), PendingCount = sumif(1, PodStatus =~ 'Pending'), RunningCount = sumif(1, PodStatus =~ 'Running'), SucceededCount = sumif(1, PodStatus =~ 'Succeeded'), FailedCount = sumif(1, PodStatus =~ 'Failed'), TerminatingCount = sumif(1, PodStatus =~ 'Terminating') by ClusterId, bin(TimeGenerated, trendBinSize) ) on ClusterId, TimeGenerated 
| extend UnknownCount = TotalCount - PendingCount - RunningCount - SucceededCount - FailedCount - TerminatingCount 
| project ClusterId, Timestamp = TimeGenerated, TotalCount = todouble(TotalCount) / ClusterSnapshotCount, PendingCount = todouble(PendingCount) / ClusterSnapshotCount, RunningCount = todouble(RunningCount) / ClusterSnapshotCount, SucceededCount = todouble(SucceededCount) / ClusterSnapshotCount, FailedCount = todouble(FailedCount) / ClusterSnapshotCount, TerminatingCount = todouble(TerminatingCount) / ClusterSnapshotCount, UnknownCount = todouble(UnknownCount) / ClusterSnapshotCount;

 let rawDataCached = rawData;
 
 rawDataCached 
| order by Timestamp asc 
| summarize makelist(Timestamp, maxListSize), makelist(TotalCount, maxListSize), makelist(PendingCount, maxListSize), makelist(RunningCount, maxListSize), makelist(SucceededCount, maxListSize), makelist(FailedCount, maxListSize), makelist(TerminatingCount, maxListSize), makelist(UnknownCount, maxListSize) by ClusterId 
| join ( rawDataCached 
| summarize Avg_TotalCount = avg(TotalCount), Avg_PendingCount = avg(PendingCount), Avg_RunningCount = avg(RunningCount), Avg_SucceededCount = avg(SucceededCount), Avg_FailedCount = avg(FailedCount), Avg_TerminatingCount = avg(TerminatingCount), Avg_UnknownCount = avg(UnknownCount) by ClusterId ) on ClusterId 
| project ClusterId, Avg_TotalCount, Avg_PendingCount, Avg_RunningCount, Avg_SucceededCount, Avg_FailedCount, Avg_TerminatingCount, Avg_UnknownCount, list_Timestamp, list_TotalCount, list_PendingCount, list_RunningCount, list_SucceededCount, list_FailedCount, list_TerminatingCount, list_UnknownCount

Lijst met containers op status

De vereiste tabellen voor deze grafiek bevatten KubePodInventory en Perf.

 let startDateTime = datetime('start time');
 let endDateTime = datetime('end time');
 let trendBinSize = 15m;
 let maxResultCount = 10000;
 let metricUsageCounterName = 'cpuUsageNanoCores';
 let metricLimitCounterName = 'cpuLimitNanoCores';
 
 let KubePodInventoryTable = KubePodInventory 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| where isnotempty(Computer) 
| project TimeGenerated, ClusterId, ClusterName, Namespace, ServiceName, ControllerName, Node = Computer, Pod = Name, ContainerInstance = ContainerName, ContainerID, ReadySinceNow = format_timespan(endDateTime - ContainerCreationTimeStamp , 'ddd.hh:mm:ss.fff'), Restarts = ContainerRestartCount, Status = ContainerStatus, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind = ControllerKind, PodStatus;

 let startRestart = KubePodInventoryTable 
| summarize arg_min(TimeGenerated, *) by Node, ContainerInstance 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project Node, ContainerInstance, InstanceName = strcat(ClusterId, '/', ContainerInstance), StartRestart = Restarts;

 let IdentityTable = KubePodInventoryTable 
| summarize arg_max(TimeGenerated, *) by Node, ContainerInstance 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project ClusterName, Namespace, ServiceName, ControllerName, Node, Pod, ContainerInstance, InstanceName = strcat(ClusterId, '/', ContainerInstance), ContainerID, ReadySinceNow, Restarts, Status = iff(Status =~ 'running', 0, iff(Status=~'waiting', 1, iff(Status =~'terminated', 2, 3))), ContainerStatusReason, ControllerKind, Containers = 1, ContainerName = tostring(split(ContainerInstance, '/')[1]), PodStatus, LastPodInventoryTimeGenerated = TimeGenerated, ClusterId;

 let CachedIdentityTable = IdentityTable;
 
 let FilteredPerfTable = Perf 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where ObjectName == 'K8SContainer' 
| where InstanceName startswith 'clusterResourceID' 
| project Node = Computer, TimeGenerated, CounterName, CounterValue, InstanceName ;

 let CachedFilteredPerfTable = FilteredPerfTable;
 
 let LimitsTable = CachedFilteredPerfTable 
| where CounterName =~ metricLimitCounterName 
| summarize arg_max(TimeGenerated, *) by Node, InstanceName 
| project Node, InstanceName, LimitsValue = iff(CounterName =~ 'cpuLimitNanoCores', CounterValue/1000000, CounterValue), TimeGenerated;
 let MetaDataTable = CachedIdentityTable 
| join kind=leftouter ( LimitsTable ) on Node, InstanceName 
| join kind= leftouter ( startRestart ) on Node, InstanceName 
| project ClusterName, Namespace, ServiceName, ControllerName, Node, Pod, InstanceName, ContainerID, ReadySinceNow, Restarts, LimitsValue, Status, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind, Containers, ContainerName, ContainerInstance, StartRestart, PodStatus, LastPodInventoryTimeGenerated, ClusterId;

 let UsagePerfTable = CachedFilteredPerfTable 
| where CounterName =~ metricUsageCounterName 
| project TimeGenerated, Node, InstanceName, CounterValue = iff(CounterName =~ 'cpuUsageNanoCores', CounterValue/1000000, CounterValue);

 let LastRestartPerfTable = CachedFilteredPerfTable 
| where CounterName =~ 'restartTimeEpoch' 
| summarize arg_max(TimeGenerated, *) by Node, InstanceName 
| project Node, InstanceName, UpTime = CounterValue, LastReported = TimeGenerated;

 let AggregationTable = UsagePerfTable 
| summarize Aggregation = max(CounterValue) by Node, InstanceName 
| project Node, InstanceName, Aggregation;

 let TrendTable = UsagePerfTable 
| summarize TrendAggregation = max(CounterValue) by bin(TimeGenerated, trendBinSize), Node, InstanceName 
| project TrendTimeGenerated = TimeGenerated, Node, InstanceName , TrendAggregation 
| summarize TrendList = makelist(pack("timestamp", TrendTimeGenerated, "value", TrendAggregation)) by Node, InstanceName;

 let containerFinalTable = MetaDataTable 
| join kind= leftouter( AggregationTable ) on Node, InstanceName 
| join kind = leftouter (LastRestartPerfTable) on Node, InstanceName 
| order by Aggregation desc, ContainerName 
| join kind = leftouter ( TrendTable) on Node, InstanceName 
| extend ContainerIdentity = strcat(ContainerName, ' ', Pod) 
| project ContainerIdentity, Status, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), Aggregation, Node, Restarts, ReadySinceNow, TrendList = iif(isempty(TrendList), parse_json('[]'), TrendList), LimitsValue, ControllerName, ControllerKind, ContainerID, Containers, UpTimeNow = datetime_diff('Millisecond', endDateTime, datetime_add('second', toint(UpTime), make_datetime(1970,1,1))), ContainerInstance, StartRestart, LastReportedDelta = datetime_diff('Millisecond', endDateTime, LastReported), PodStatus, InstanceName, Namespace, LastPodInventoryTimeGenerated, ClusterId;
containerFinalTable 
| limit 200

Lijst met controllers op status

De vereiste tabellen voor deze grafiek bevatten KubePodInventory en Perf.

 let endDateTime = datetime('start time');
 let startDateTime = datetime('end time');
 let trendBinSize = 15m;
 let metricLimitCounterName = 'cpuLimitNanoCores';
 let metricUsageCounterName = 'cpuUsageNanoCores';
 
 let primaryInventory = KubePodInventory 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| extend Node = Computer 
| where ClusterId =~ 'clusterResourceID' //update with resource ID
| project TimeGenerated, ClusterId, ClusterName, Namespace, ServiceName, Node = Computer, ControllerName, Pod = Name, ContainerInstance = ContainerName, ContainerID, InstanceName, PerfJoinKey = strcat(ClusterId, '/', ContainerName), ReadySinceNow = format_timespan(endDateTime - ContainerCreationTimeStamp, 'ddd.hh:mm:ss.fff'), Restarts = ContainerRestartCount, Status = ContainerStatus, ContainerStatusReason = columnifexists('ContainerStatusReason', ''), ControllerKind = ControllerKind, PodStatus, ControllerId = strcat(ClusterId, '/', Namespace, '/', ControllerName);

let podStatusRollup = primaryInventory 
| summarize arg_max(TimeGenerated, *) by Pod 
| project ControllerId, PodStatus, TimeGenerated 
| summarize count() by ControllerId, PodStatus = iif(TimeGenerated < ago(30m), 'Unknown', PodStatus) 
| summarize PodStatusList = makelist(pack('Status', PodStatus, 'Count', count_)) by ControllerId;

let latestContainersByController = primaryInventory 
| where isnotempty(Node) 
| summarize arg_max(TimeGenerated, *) by PerfJoinKey 
| project ControllerId, PerfJoinKey;

let filteredPerformance = Perf 
| where TimeGenerated >= startDateTime 
| where TimeGenerated < endDateTime 
| where ObjectName == 'K8SContainer' 
| where InstanceName startswith 'clusterResourceID' //update with resource ID
| project TimeGenerated, CounterName, CounterValue, InstanceName, Node = Computer ;

let metricByController = filteredPerformance 
| where CounterName =~ metricUsageCounterName 
| extend PerfJoinKey = InstanceName 
| summarize Value = percentile(CounterValue, 95) by PerfJoinKey, CounterName 
| join (latestContainersByController) on PerfJoinKey 
| summarize Value = sum(Value) by ControllerId, CounterName 
| project ControllerId, CounterName, AggregationValue = iff(CounterName =~ 'cpuUsageNanoCores', Value/1000000, Value);

let containerCountByController = latestContainersByController 
| summarize ContainerCount = count() by ControllerId;

let restartCountsByController = primaryInventory 
| summarize Restarts = max(Restarts) by ControllerId;

let oldestRestart = primaryInventory 
| summarize ReadySinceNow = min(ReadySinceNow) by ControllerId;

let trendLineByController = filteredPerformance 
| where CounterName =~ metricUsageCounterName 
| extend PerfJoinKey = InstanceName 
| summarize Value = percentile(CounterValue, 95) by bin(TimeGenerated, trendBinSize), PerfJoinKey, CounterName 
| order by TimeGenerated asc 
| join kind=leftouter (latestContainersByController) on PerfJoinKey 
| summarize Value=sum(Value) by ControllerId, TimeGenerated, CounterName 
| project TimeGenerated, Value = iff(CounterName =~ 'cpuUsageNanoCores', Value/1000000, Value), ControllerId 
| summarize TrendList = makelist(pack("timestamp", TimeGenerated, "value", Value)) by ControllerId;

let latestLimit = filteredPerformance 
| where CounterName =~ metricLimitCounterName 
| extend PerfJoinKey = InstanceName 
| summarize arg_max(TimeGenerated, *) by PerfJoinKey 
| join kind=leftouter (latestContainersByController) on PerfJoinKey 
| summarize Value = sum(CounterValue) by ControllerId, CounterName 
| project ControllerId, LimitValue = iff(CounterName =~ 'cpuLimitNanoCores', Value/1000000, Value);

let latestTimeGeneratedByController = primaryInventory 
| summarize arg_max(TimeGenerated, *) by ControllerId 
| project ControllerId, LastTimeGenerated = TimeGenerated;

primaryInventory 
| distinct ControllerId, ControllerName, ControllerKind, Namespace 
| join kind=leftouter (podStatusRollup) on ControllerId 
| join kind=leftouter (metricByController) on ControllerId 
| join kind=leftouter (containerCountByController) on ControllerId 
| join kind=leftouter (restartCountsByController) on ControllerId 
| join kind=leftouter (oldestRestart) on ControllerId 
| join kind=leftouter (trendLineByController) on ControllerId 
| join kind=leftouter (latestLimit) on ControllerId 
| join kind=leftouter (latestTimeGeneratedByController) on ControllerId 
| project ControllerId, ControllerName, ControllerKind, PodStatusList, AggregationValue, ContainerCount = iif(isempty(ContainerCount), 0, ContainerCount), Restarts, ReadySinceNow, Node = '-', TrendList, LimitValue, LastTimeGenerated, Namespace 
| limit 250;

Lijst met knooppunten op status

De vereiste tabellen voor deze grafiek zijn KubeNodeInventory, KubePodInventory en Perf.

 let endDateTime = datetime('start time');
 let startDateTime = datetime('end time');
 let binSize = 15m;
 let limitMetricName = 'cpuCapacityNanoCores';
 let usedMetricName = 'cpuUsageNanoCores'; 
 
 let materializedNodeInventory = KubeNodeInventory 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| project ClusterName, ClusterId, Node = Computer, TimeGenerated, Status, NodeName = Computer, NodeId = strcat(ClusterId, '/', Computer), Labels 
| where ClusterId =~ 'clusterResourceID'; //update with resource ID

 let materializedPerf = Perf 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| where ObjectName == 'K8SNode' 
| extend NodeId = InstanceName;

 let materializedPodInventory = KubePodInventory 
| where TimeGenerated < endDateTime 
| where TimeGenerated >= startDateTime 
| where isnotempty(ClusterName) 
| where isnotempty(Namespace) 
| where ClusterId =~ 'clusterResourceID'; //update with resource ID

 let inventoryOfCluster = materializedNodeInventory 
| summarize arg_max(TimeGenerated, Status) by ClusterName, ClusterId, NodeName, NodeId;

 let labelsByNode = materializedNodeInventory 
| summarize arg_max(TimeGenerated, Labels) by ClusterName, ClusterId, NodeName, NodeId;

 let countainerCountByNode = materializedPodInventory 
| project ContainerName, NodeId = strcat(ClusterId, '/', Computer) 
| distinct NodeId, ContainerName 
| summarize ContainerCount = count() by NodeId;

 let latestUptime = materializedPerf 
| where CounterName == 'restartTimeEpoch' 
| summarize arg_max(TimeGenerated, CounterValue) by NodeId 
| extend UpTimeMs = datetime_diff('Millisecond', endDateTime, datetime_add('second', toint(CounterValue), make_datetime(1970,1,1))) 
| project NodeId, UpTimeMs;

 let latestLimitOfNodes = materializedPerf 
| where CounterName == limitMetricName 
| summarize CounterValue = max(CounterValue) by NodeId 
| project NodeId, LimitValue = CounterValue;

 let actualUsageAggregated = materializedPerf 
| where CounterName == usedMetricName 
| summarize Aggregation = percentile(CounterValue, 95) by NodeId //This line updates to the desired aggregation
| project NodeId, Aggregation;

 let aggregateTrendsOverTime = materializedPerf 
| where CounterName == usedMetricName 
| summarize TrendAggregation = percentile(CounterValue, 95) by NodeId, bin(TimeGenerated, binSize) //This line updates to the desired aggregation
| project NodeId, TrendAggregation, TrendDateTime = TimeGenerated;

 let unscheduledPods = materializedPodInventory 
| where isempty(Computer) 
| extend Node = Computer 
| where isempty(ContainerStatus) 
| where PodStatus == 'Pending' 
| order by TimeGenerated desc 
| take 1 
| project ClusterName, NodeName = 'unscheduled', LastReceivedDateTime = TimeGenerated, Status = 'unscheduled', ContainerCount = 0, UpTimeMs = '0', Aggregation = '0', LimitValue = '0', ClusterId;

 let scheduledPods = inventoryOfCluster 
| join kind=leftouter (aggregateTrendsOverTime) on NodeId 
| extend TrendPoint = pack("TrendTime", TrendDateTime, "TrendAggregation", TrendAggregation) 
| summarize make_list(TrendPoint) by NodeId, NodeName, Status 
| join kind=leftouter (labelsByNode) on NodeId 
| join kind=leftouter (countainerCountByNode) on NodeId 
| join kind=leftouter (latestUptime) on NodeId 
| join kind=leftouter (latestLimitOfNodes) on NodeId 
| join kind=leftouter (actualUsageAggregated) on NodeId 
| project ClusterName, NodeName, ClusterId, list_TrendPoint, LastReceivedDateTime = TimeGenerated, Status, ContainerCount, UpTimeMs, Aggregation, LimitValue, Labels 
| limit 250;

 union (scheduledPods), (unscheduledPods) 
| project ClusterName, NodeName, LastReceivedDateTime, Status, ContainerCount, UpTimeMs = UpTimeMs_long, Aggregation = Aggregation_real, LimitValue = LimitValue_real, list_TrendPoint, Labels, ClusterId

Metrische prometheus-gegevens

Voor de volgende voorbeelden is de configuratie vereist die wordt beschreven in metrische gegevens van Prometheus verzenden naar de Log Analytics-werkruimte met Container Insights.

Als u metrische prometheus-gegevens wilt weergeven die door Azure Monitor zijn geschraapt en gefilterd op naamruimte, geeft u prometheus op. Hier volgt een voorbeeldquery voor het weergeven van metrische Prometheus-gegevens uit de default Kubernetes-naamruimte.

InsightsMetrics 
| where Namespace contains "prometheus"
| extend tags=parse_json(Tags)
| summarize count() by Name

Prometheus-gegevens kunnen ook rechtstreeks op naam worden opgevraagd.

InsightsMetrics 
| where Namespace contains "prometheus"
| where Name contains "some_prometheus_metric"

Als u het opnamevolume van elke grootte van metrische gegevens in GB per dag wilt identificeren om te begrijpen of deze hoog is, wordt de volgende query verstrekt.

InsightsMetrics
| where Namespace contains "prometheus"
| where TimeGenerated > ago(24h)
| summarize VolumeInGB = (sum(_BilledSize) / (1024 * 1024 * 1024)) by Name
| order by VolumeInGB desc
| render barchart

In de uitvoer worden resultaten weergegeven die vergelijkbaar zijn met het volgende voorbeeld.

Als u wilt schatten wat de grootte van metrische gegevens in GB is voor een maand om te begrijpen of het aantal gegevens dat is opgenomen in de werkruimte hoog is, wordt de volgende query opgegeven.

InsightsMetrics
| where Namespace contains "prometheus"
| where TimeGenerated > ago(24h)
| summarize EstimatedGBPer30dayMonth = (sum(_BilledSize) / (1024 * 1024 * 1024)) * 30 by Name
| order by EstimatedGBPer30dayMonth desc
| render barchart

In de uitvoer worden resultaten weergegeven die vergelijkbaar zijn met het volgende voorbeeld.

Configuratie- of scrapingfouten

Als u configuratie- of scrapingfouten wilt onderzoeken, retourneert de volgende voorbeeldquery informatieve gebeurtenissen uit de KubeMonAgentEvents tabel.

KubeMonAgentEvents | where Level != "Info"

De uitvoer toont resultaten die vergelijkbaar zijn met het volgende voorbeeld:

Veelgestelde vragen

In deze sectie vindt u antwoorden op veelgestelde vragen.

Kan ik metrische gegevens bekijken die zijn verzameld in Grafana?

Container Insights biedt ondersteuning voor het weergeven van metrische gegevens die zijn opgeslagen in uw Log Analytics-werkruimte in Grafana-dashboards. We hebben een sjabloon opgegeven die u kunt downloaden uit de opslagplaats van het Grafana-dashboard. Gebruik deze om aan de slag te gaan en als referentie om te leren hoe u gegevens opvraagt uit uw bewaakte clusters om deze te visualiseren in aangepaste Grafana-dashboards.

Waarom worden logboeklijnen groter dan 16 kB gesplitst in meerdere records in Log Analytics?

De agent gebruikt het stuurprogramma voor logboekregistratie van Docker JSON-bestanden om de stdout en stderr van containers vast te leggen. Met dit logboekstuurprogramma worden logboeklijnen gesplitst die groter zijn dan 16 kB in meerdere regels wanneer ze worden gekopieerd van stdout of stderr naar een bestand. Gebruik logboekregistratie met meerdere regels om de logboekrecordgrootte tot 64 kB op te halen.

Volgende stappen

Container insights bevat geen vooraf gedefinieerde set waarschuwingen. Zie Prestatiewaarschuwingen maken met Container Insights voor meer informatie over het maken van aanbevolen waarschuwingen voor hoog CPU- en geheugengebruik om uw DevOps- of operationele processen en procedures te ondersteunen.

Delen via