針對 Visual Studio Code 中的 Azure Data Lake Analytics 使用 Python、R 和 C# 開發 U-SQL

發行項
12/20/2023

瞭解如何使用 Visual Studio Code (VS Code) ，透過U-SQL撰寫 Python、R 和 C# 程式代碼，並將作業提交至 Azure Data Lake 服務。如需 Azure Data Lake Tools for VS Code 的詳細資訊，請參閱使用 Azure Data Lake Tools for Visual Studio 程序代碼。

在撰寫程式代碼後置自定義程式代碼之前，您必須在 VS Code 中開啟資料夾或工作區。

Python 和 R 的必要條件

為您的 ADL 帳戶註冊 Python 和 R 擴充功能組件。

在入口網站開啟您的帳戶。
- 選取 [概觀]。
- 選取 [範例腳本]。
選取 [更多]。
選取 [安裝 U-SQL 擴充功能]。
安裝 U-SQL 擴充功能之後，會顯示確認訊息。

注意

為了在 Python 與 R 語言服務方面獲得最佳體驗，請安裝 VSCode Python 與 R 擴充功能。

開發 Python 檔案

選取工作區中的 [新增檔案 ]。

在 U-SQL 中撰寫程式碼。以下是程式碼範例。

REFERENCE ASSEMBLY [ExtPython];
@t  = 
    SELECT * FROM 
    (VALUES
        ("D1","T1","A1","@foo Hello World @bar"),
        ("D2","T2","A2","@baz Hello World @beer")
    ) AS 
        D( date, time, author, tweet );

@m  =
    REDUCE @t ON date
    PRODUCE date string, mentions string
    USING new Extension.Python.Reducer("pythonSample.usql.py", pyVersion : "3.5.1");

OUTPUT @m
    TO "/tweetmentions.csv"
    USING Outputters.Csv();

在指令檔上按一下滑鼠右鍵，然後選取 [ADL: Generate Python Code Behind File]。

工作資料夾中會隨即產生 xxx.usql.py 檔案。在 Python 檔案中撰寫程式碼。以下是程式碼範例。

def get_mentions(tweet):
    return ';'.join( ( w[1:] for w in tweet.split() if w[0]=='@' ) )

def usqlml_main(df):
    del df['time']
    del df['author']
    df['mentions'] = df.tweet.apply(get_mentions)
    del df['tweet']
    return df

以滑鼠右鍵按兩下 USQL 檔案，您可以選取 [編譯腳本 ] 或 [ 提交作業 ] 來執行作業。

開發 R 檔案

選取工作區中的 [新增檔案 ]。

在 U-SQL 檔案中撰寫程式碼。以下是程式碼範例。

DEPLOY RESOURCE @"/usqlext/samples/R/my_model_LM_Iris.rda";
DECLARE @IrisData string = @"/usqlext/samples/R/iris.csv";
DECLARE @OutputFilePredictions string = @"/my/R/Output/LMPredictionsIris.txt";
DECLARE @PartitionCount int = 10;

@InputData =
    EXTRACT SepalLength double,
            SepalWidth double,
            PetalLength double,
            PetalWidth double,
            Species string
    FROM @IrisData
    USING Extractors.Csv();

@ExtendedData =
    SELECT Extension.R.RandomNumberGenerator.GetRandomNumber(@PartitionCount) AS Par,
        SepalLength,
        SepalWidth,
        PetalLength,
        PetalWidth
    FROM @InputData;

// Predict Species

@RScriptOutput =
    REDUCE @ExtendedData
    ON Par
    PRODUCE Par,
            fit double,
            lwr double,
            upr double
    READONLY Par
    USING new Extension.R.Reducer(scriptFile : "RClusterRun.usql.R", rReturnType : "dataframe", stringsAsFactors : false);
OUTPUT @RScriptOutput
TO @OutputFilePredictions
USING Outputters.Tsv();

以滑鼠右鍵按一下 USQL 檔案，然後選取 [ADL: Generate R Code Behind File]。
工作資料夾中會隨即產生 xxx.usql.y 檔案。在 R 檔案中撰寫程式碼。以下是程式碼範例。
```
load("my_model_LM_Iris.rda")
outputToUSQL=data.frame(predict(lm.fit, inputFromUSQL, interval="confidence"))
```
以滑鼠右鍵按兩下 USQL 檔案，您可以選取 [編譯腳本 ] 或 [ 提交作業 ] 來執行作業。

開發 C# 檔案

程式碼後置檔案是與單一 U-SQL 指令碼關聯的 C# 檔案。您可以在程式碼後置檔案中定義專用於 UDO、UDA、UDT 和 UDF 的指令碼。 UDO、UDA、UDT 和 UDF 可以直接在指令碼中使用，而不需要先註冊組件。程式碼後置檔案會放在與其對等互連 U-SQL 指令碼檔案相同的資料夾中。如果指令碼名稱為 xxx.usql，程式碼後置就會被命名為 xxx.usql.cs。如果您手動刪除該程式碼後置檔案，系統就會停用其相關聯之 U-SQL 指令碼的程式碼後置功能。如需撰寫 U-SQL 指令碼之客戶程式碼的詳細資訊，請參閱在 U-SQL 中撰寫並使用自訂程式碼：使用者定義函式 (英文)。

選取工作區中的 [新增檔案 ]。

在 U-SQL 檔案中撰寫程式碼。以下是程式碼範例。

@a = 
    EXTRACT 
        Iid int,
    Starts DateTime,
    Region string,
    Query string,
    DwellTime int,
    Results string,
    ClickedUrls string 
    FROM @"/Samples/Data/SearchLog.tsv" 
    USING Extractors.Tsv();

@d =
    SELECT DISTINCT Region 
    FROM @a;

@d1 = 
    PROCESS @d
    PRODUCE 
        Region string,
    Mkt string
    USING new USQLApplication_codebehind.MyProcessor();

OUTPUT @d1 
    TO @"/output/SearchLogtest.txt" 
    USING Outputters.Tsv();

以滑鼠右鍵按一下 USQL 檔案，然後選取 [ADL: Generate CS Code Behind File]。

工作資料夾中會隨即產生 xxx.usql.cs 檔案。在 CS 檔案中撰寫程式碼。以下是程式碼範例。

namespace USQLApplication_codebehind
{
    [SqlUserDefinedProcessor]

    public class MyProcessor : IProcessor
    {
        public override IRow Process(IRow input, IUpdatableRow output)
        {
            output.Set(0, input.Get<string>(0));
            output.Set(1, input.Get<string>(0));
            return output.AsReadOnly();
        } 
    }
}

以滑鼠右鍵按兩下 USQL 檔案，您可以選取 [編譯腳本 ] 或 [ 提交作業 ] 來執行作業。

共用方式為

針對 Visual Studio Code 中的 Azure Data Lake Analytics 使用 Python、R 和 C# 開發 U-SQL

Python 和 R 的必要條件

開發 Python 檔案

開發 R 檔案

開發 C# 檔案

下一步

其他資源