reduce 연산자

아티클
11/23/2024

적용 대상: ✅Microsoft Fabric ✅Azure Data Explorer✅Azure Monitor ✅Microsoft Sentinel

값 유사성에 따라 문자열 집합을 그룹화합니다.

이러한 각 그룹에 대해 연산 pattern자는 , count및 representative. 가장 좋은 방법은 pattern 문자가 와일드카드를 * 나타내는 그룹을 설명합니다. 이 값은 count 그룹의 값 수이며 representative 그룹의 원래 값 중 하나입니다.

구문

T | reduce [kind = ReduceKind] by Expr [with [threshold = 임계값] [, = characters 문자]]

구문 규칙에 대해 자세히 알아봅니다.

매개 변수

이름	Type	필수	설명
Expr	`string`	✔️	줄일 값입니다.
Threshold	`real`		감소 작업을 트리거하기 위해 그룹화 조건과 일치하는 데 필요한 행의 최소 비율을 결정하는 0에서 1 사이의 값입니다. 기본값은 0.1입니다. 큰 입력에 대해 작은 임계값을 설정하는 것이 좋습니다. 임계값이 작을수록 더 유사한 값이 그룹화되어 더 적지만 더 유사한 그룹이 생성됩니다. 임계값이 클수록 유사성이 낮아지고 유사도가 낮은 그룹이 더 많아집니다. 예제를 참조하세요.
캐릭터	`string`		용어 간에 구분되는 문자 목록입니다. 기본값은 모든 비-ascii 숫자 문자입니다. 예제는 Characters 매개 변수의 동작을 참조하세요.
ReduceKind	`string`		유효한 값은 .입니다 `source`. 지정된 경우 `source` 연산자는 열을 집계하는 대신 테이블의 기존 행에 `Pattern`추가 `Pattern` 합니다.

반품

그룹 및 열이 있는 행 수만큼의 행이 있는 테이블입니다patterncountrepresentative. 가장 좋은 방법은 pattern 문자가 와일드카드를 나타내는 그룹 * 또는 임의의 삽입 문자열에 대한 자리 표시자를 설명하는 것입니다. 이 값은 count 그룹의 값 수이며 representative 그룹의 원래 값 중 하나입니다.

예를 들어 결과에 reduce by city 는 다음이 포함될 수 있습니다.

패턴	Count	Representative
산*	5,182	산 버나드
성인*	2846	세인트 루시
모스크바	3726	모스크바
-에-	2730	일대일
파리	2716	파리

예제

작은 임계값

쿼리 실행

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.001 , characters = "X"

출력

패턴	Count	Representative
MachineLearning*	1000	MachineLearningX4

큰 임계값

쿼리 실행

range x from 1 to 1000 step 1
| project MyText = strcat("MachineLearningX", tostring(toint(rand(10))))
| reduce by MyText  with threshold=0.9 , characters = "X"

출력

패턴	Count	Representative
MachineLearning*	177	MachineLearningX9
MachineLearning*	102	MachineLearningX0
MachineLearning*	106	MachineLearningX1
MachineLearning*	96	MachineLearningX6
MachineLearning*	110	MachineLearningX4
MachineLearning*	100	MachineLearningX3
MachineLearning*	99	MachineLearningX8
MachineLearning*	104	MachineLearningX7
MachineLearning*	106	MachineLearningX2

Characters 매개 변수의 동작

Characters 매개 변수가 지정되지 않은 경우 모든 비 ascii 숫자 문자는 용어 구분 기호가 됩니다.

쿼리 실행

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str

출력

패턴	Count	Representative
기타	10

그러나 "Z"를 구분 기호로 지정하면 각 값 str 이 2개의 용어인 tostring(x)것처럼 표시됩니다. foo

쿼리 실행

range x from 1 to 10 step 1 | project str = strcat("foo", "Z", tostring(x)) | reduce by str with characters="Z"

출력

패턴	Count	Representative
푸*	10	fooZ1

삭제된 입력에 적용 `reduce`

다음 예제에서는 축소하기 전에 축소되는 열의 reduce GUID를 교체하는 "삭제된" 입력에 연산자를 적용하는 방법을 보여 줍니다.

// Start with a few records from the Trace table.
Trace | take 10000
// We will reduce the Text column which includes random GUIDs.
// As random GUIDs interfere with the reduce operation, replace them all
// by the string "GUID".
| extend Text=replace_regex(Text, @"[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}", @"GUID")
// Now perform the reduce. In case there are other "quasi-random" identifiers with embedded '-'
// or '_' characters in them, treat these as non-term-breakers.
| reduce by Text with characters="-_"

autocluster

참고 항목

연산자의 reduce 구현은 주로 Risto Vaarandi의 이벤트 로그에서 마이닝 패턴에 대한 A 데이터 클러스터링 알고리즘을 기반으로 합니다.

다음을 통해 공유

reduce 연산자

구문

매개 변수

반품

예제

작은 임계값

큰 임계값

Characters 매개 변수의 동작

삭제된 입력에 적용 `reduce`

피드백

추가 리소스

다음을 통해 공유

reduce 연산자

구문

매개 변수

반품

예제

작은 임계값

큰 임계값

Characters 매개 변수의 동작

삭제된 입력에 적용 reduce

관련 콘텐츠

피드백

추가 리소스

삭제된 입력에 적용 `reduce`