Jaa


Stream Analytics Tips - 配列で送られてきたデータの統計処理

メモ的に。

例えば、

{"timestamp":"2018-07-09T15:15:06","items":[
{"uid":"097e3730-7f9e-45df-8473-7d029da18d07","tick":636667461063351598,"x":0.27201866510883843,"y":2.9749924982781488},
{"uid":"097e3730-7f9e-45df-8473-7d029da18d07","tick":636667461063351598,"x":3.3017625395682466,"y":4.79066008925003},
{"uid":"097e3730-7f9e-45df-8473-7d029da18d07","tick":636667461063351598,"x":3.9475029771903083,"y":2.6145762776092516},
{"uid":"097e3730-7f9e-45df-8473-7d029da18d07","tick":636667461063351598,"x":0.90333345621048167,"y":1.6385073129267},
{"uid":"097e3730-7f9e-45df-8473-7d029da18d07","tick":636667461063351598,"x":3.1385963517886566,"y":3.7381303583915022},
{"uid":"097e3730-7f9e-45df-8473-7d029da18d07","tick":636667461063351598,"x":2.3104054305285242,"y":3.1593682305698136}]},
{"timestamp":"2018-07-09T15:16:29","items":[
{"uid":"0d59cbb6-7955-415d-b725-f4155ba048cd","tick":636667461895922282,"x":0.15601237311773578,"y":1.4404155297393051},
{"uid":"0d59cbb6-7955-415d-b725-f4155ba048cd","tick":636667461895922282,"x":1.2746048165832669,"y":4.9212264851300169},
{"uid":"0d59cbb6-7955-415d-b725-f4155ba048cd","tick":636667461895922282,"x":3.1073283139184715,"y":2.2427159395267795},
{"uid":"0d59cbb6-7955-415d-b725-f4155ba048cd","tick":636667461895922282,"x":3.4007428276356046,"y":4.9929921235856565}]}

こんな配列形式でIoT Hubを通じてStream Analyticsにデータが送られてくる(配列の要素は可変)場合の統計処理方法を書いておきます。ちなみに、uid、tickは同一配列内では同じ値でかつ、全般にわたってユニークとします。※実用的にはuid、tickのどちらかあれば十分でござる。

具体的には、配列毎のxの合計値、yの平均を求めます。

クエリーはこんな感じ

WITH expanded AS (
SELECT
msg.IoTHub.ConnectionDeviceId as deviceId,
msg.timestamp,
arrayElement.ArrayIndex,
arrayElement.ArrayValue as item
FROM
[iothub] as msg
CROSS APPLY GetArrayElements(msg.items) AS arrayElement
)
SELECT
    deviceId,
item.tick,
SUM(item.x) as sumOfX,
AVG(item.y) as avgOfY,
Count(item.x) as CountOfItems
INTO [blobout]
FROM expanded
GROUP BY item.tick,deviceId, System.Timestamp

uidを使う場合は、tickをuidに置き換えてください。

最初のWITH句で、配列をばらします。(ばらす=一次正規化されたレコードに変換される)
GROUP BY句ではWindow関数必須と思っている人多いと思いますが、System.Timestampを指定すると、Window関数無しでも統計計算可能という機能が実はあります。

出力は、

{"tick":636667461063351598,"deviceid":"devicearrayemulator","sumofx":13.873619420395055,"avgofy":3.1527057945042412,"countofitems":6},{"tick":636667461895922282,"deviceid":"devicearrayemulator","sumofx":7.9386883312550784,"avgofy":3.3993375194954392,"countofitems":4}

となり、Mission Completed!