Opzioni GROUP BY in Synapse SQL

Articolo
04/29/2024

Synapse SQL consente di sviluppare soluzioni mediante l'implementazione di diverse opzioni GROUP BY.

Che cosa fa GROUP BY

La clausola T-SQL GROUP BY viene usata per aggregare i dati in un set di righe di riepilogo.

Il pool SQL serverless non supporta le opzioni GROUP BY. Il pool SQL dedicato supporta un numero limitato di opzioni GROUP BY.

Opzioni GROUP BY supportate nel pool SQL dedicato

GROUP BY include alcune opzioni che il pool SQL dedicato non supporta. Queste opzioni presentano soluzioni alternative, come indicato di seguito:

GROUP BY con ROLLUP
GROUPING SETS
GROUP BY con CUBE

Opzioni di rollup e raggruppamento di set

L'opzione più semplice consiste nell'usare UNION ALL per eseguire il rollup anziché basarsi sulla sintassi esplicita. Il risultato è esattamente lo stesso

Nell'esempio seguente viene usata l'istruzione GROUP BY con l'opzione ROLLUP:

SELECT [SalesTerritoryCountry]
,      [SalesTerritoryRegion]
,      SUM(SalesAmount)             AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t       ON s.SalesTerritoryKey       = t.SalesTerritoryKey
GROUP BY ROLLUP (
                        [SalesTerritoryCountry]
                ,       [SalesTerritoryRegion]
                )
;

Quando si usa ROLLUP, sono necessarie le aggregazioni seguenti per l'esempio precedente:

Paese e area geografica
Country
Grand Total

Per sostituire ROLLUP e restituire gli stessi risultati, è possibile usare UNION ALL e specificare in modo esplicito le aggregazioni necessarie:

SELECT [SalesTerritoryCountry]
,      [SalesTerritoryRegion]
,      SUM(SalesAmount) AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t     ON s.SalesTerritoryKey       = t.SalesTerritoryKey
GROUP BY
       [SalesTerritoryCountry]
,      [SalesTerritoryRegion]
UNION ALL
SELECT [SalesTerritoryCountry]
,      NULL
,      SUM(SalesAmount) AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t     ON s.SalesTerritoryKey       = t.SalesTerritoryKey
GROUP BY
       [SalesTerritoryCountry]
UNION ALL
SELECT NULL
,      NULL
,      SUM(SalesAmount) AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t     ON s.SalesTerritoryKey       = t.SalesTerritoryKey;

Per sostituire GROUPING SETS, si applica lo stesso principio. È sufficiente creare sezioni UNION ALL per i livelli di aggregazione che si vuole visualizzare.

Opzioni Cube

È possibile creare un'operazione GROUP BY WITH CUBE usando l'approccio UNION ALL. Il problema è che il codice può risultare complesso e difficile da gestire. Per risolvere questo problema è possibile usare questo approccio più avanzato.

Il primo passaggio consiste nel definire il cubo che definisce tutti i livelli di aggregazione che si desidera creare. Prendere nota della funzione CROSS JOIN delle due tabelle derivate perché genera tutti i livelli. Il resto del codice è disponibile per la formattazione.

CREATE TABLE #Cube
WITH
(   DISTRIBUTION = ROUND_ROBIN
,   LOCATION = USER_DB
)
AS
WITH GrpCube AS
(SELECT    CAST(ISNULL(Country,'NULL')+','+ISNULL(Region,'NULL') AS NVARCHAR(50)) as 'Cols'
,          CAST(ISNULL(Country+',','')+ISNULL(Region,'') AS NVARCHAR(50))  as 'GroupBy'
,          ROW_NUMBER() OVER (ORDER BY Country) as 'Seq'
FROM       ( SELECT 'SalesTerritoryCountry' as Country
             UNION ALL
             SELECT NULL
           ) c
CROSS JOIN ( SELECT 'SalesTerritoryRegion' as Region
             UNION ALL
             SELECT NULL
           ) r
)
SELECT Cols
,      CASE WHEN SUBSTRING(GroupBy,LEN(GroupBy),1) = ','
            THEN SUBSTRING(GroupBy,1,LEN(GroupBy)-1)
            ELSE GroupBy
       END AS GroupBy  --Remove Trailing Comma
,Seq
FROM GrpCube;

L'immagine seguente illustra i risultati dell'operazione CREATE TABLE AS SELECT:

Group by cube

Il secondo passaggio consiste nel specificare una tabella di destinazione per archiviare i risultati provvisori:

DECLARE
 @SQL NVARCHAR(4000)
,@Columns NVARCHAR(4000)
,@GroupBy NVARCHAR(4000)
,@i INT = 1
,@nbr INT = 0
;
CREATE TABLE #Results
(
 [SalesTerritoryCountry] NVARCHAR(50)
,[SalesTerritoryRegion]  NVARCHAR(50)
,[TotalSalesAmount]      MONEY
)
WITH
(   DISTRIBUTION = ROUND_ROBIN
,   LOCATION = USER_DB
)
;

Il terzo passaggio prevede l'esecuzione del ciclo sul cubo delle colonne per eseguire l'aggregazione. La query verrà eseguita una volta per ogni riga nella tabella temporanea #Cube. I risultati vengono archiviati nella tabella temporanea #Results:

SET @nbr =(SELECT MAX(Seq) FROM #Cube);

WHILE @i<=@nbr
BEGIN
    SET @Columns = (SELECT Cols    FROM #Cube where seq = @i);
    SET @GroupBy = (SELECT GroupBy FROM #Cube where seq = @i);

    SET @SQL ='INSERT INTO #Results
              SELECT '+@Columns+'
              ,      SUM(SalesAmount) AS TotalSalesAmount
              FROM  dbo.factInternetSales s
              JOIN  dbo.DimSalesTerritory t  
              ON s.SalesTerritoryKey = t.SalesTerritoryKey
              '+CASE WHEN @GroupBy <>''
                     THEN 'GROUP BY '+@GroupBy ELSE '' END

    EXEC sp_executesql @SQL;
    SET @i +=1;
END

Infine è possibile restituire i risultati leggendoli dalla tabella temporanea #Results:

SELECT *
FROM #Results
ORDER BY 1,2,3
;

Suddividendo il codice in sezioni e generando un costrutto di ciclo il codice diventa più gestibile e di facile manutenzione.

Passaggi successivi

Per altri suggerimenti sullo sviluppo, vedere la panoramica dello sviluppo.

Condividi tramite