Transform data by using the Script activity in Azure Data Factory or Synapse Analytics

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

You use data transformation activities in a Data Factory or Synapse pipeline to transform and process raw data into predictions and insights. The Script activity is one of the transformation activities that pipelines support. This article builds on the transform data article, which presents a general overview of data transformation and the supported transformation activities.

Using the script activity, you can execute common operations with Data Manipulation Language (DML), and Data Definition Language (DDL). DML statements like INSERT, UPDATE, DELETE and SELECT let users insert, modify, delete, and retrieve data in the database. DDL statements like CREATE, ALTER and DROP allow a database manager to create, modify, and remove database objects such as tables, indexes, and users.

You can use the Script activity to invoke a SQL script in one of the following data stores in your enterprise or on an Azure virtual machine (VM):

  • Azure SQL Database
  • Azure Synapse Analytics
  • SQL Server Database. If you are using SQL Server, install Self-hosted integration runtime on the same machine that hosts the database or on a separate machine that has access to the database. Self-Hosted integration runtime is a component that connects data sources on-premises/on Azure VM with cloud services in a secure and managed way. See the Self-hosted integration runtime article for details.
  • Oracle
  • Snowflake

The script can contain either a single SQL statement or multiple SQL statements that run sequentially. You can use the Script task for the following purposes:

  • Truncate a table in preparation for inserting data.
  • Create, alter, and drop database objects such as tables and views.
  • Re-create fact and dimension tables before loading data into them.
  • Run stored procedures. If the SQL statement invokes a stored procedure that returns results from a temporary table, use the WITH RESULT SETS option to define metadata for the result set.
  • Save the rowset returned from a query as activity output for downstream consumption.

Syntax details

Here is the JSON format for defining a Script activity:

{ 
   "name": "<activity name>", 
   "type": "Script", 
   "linkedServiceName": { 
      "referenceName": "<name>", 
      "type": "LinkedServiceReference" 
    }, 
   "typeProperties": { 
      "scripts" : [ 
         { 
            "text": "<Script Block>", 
            "type": "<Query> or <NonQuery>", 
            "parameters":[ 
               { 
                  "name": "<name>", 
                  "value": "<value>", 
                  "type": "<type>", 
                  "direction": "<Input> or <Output> or <InputOutput>", 
                  "size": 256 
               }, 
               ... 
            ] 
         }, 
         ... 
      ],     
         ... 
         ] 
      }, 
      "scriptBlockExecutionTimeout": "<time>",  
      "logSettings": { 
         "logDestination": "<ActivityOutput> or <ExternalStore>", 
         "logLocationSettings":{ 
            "linkedServiceName":{ 
               "referenceName": "<name>", 
               "type": "<LinkedServiceReference>" 
            }, 
            "path": "<folder path>" 
         } 
      } 
    } 
} 

The following table describes these JSON properties:

Property name Description Required
name The name of the activity. Yes
type The type of the activity, set to "Script". Yes
typeProperties Specify properties to configure the Script Activity. Yes
linkedServiceName The target database the script runs on. It should be a reference to a linked service. Yes
scripts An array of objects to represent the script. No
scripts.text The plain text of a block of queries. No
scripts.type The type of the block of queries. It can be Query or NonQuery. Default: Query. No
scripts.parameter The array of parameters of the script. No
scripts.parameter.name The name of the parameter. No
scripts.parameter.value The value of the parameter. No
scripts.parameter.type The data type of the parameter. The type is logical type and follows type mapping of each connector. No
scripts.parameter.direction The direction of the parameter. It can be Input, Output, InputOutput. The value is ignored if the direction is Output. ReturnValue type isn't supported. Set the return value of SP to an output parameter to retrieve it. No
scripts.parameter.size The max size of the parameter. Only applies to Output/InputOutput direction parameter of type string/byte[]. No
scriptBlockExecutionTimeout The wait time for the script block execution operation to complete before it times out. No
logSettings The settings to store the output logs. If not specified, script log is disabled. No
logSettings.logDestination The destination of log output. It can be ActivityOutput or ExternalStore. Default: ActivityOutput. No
logSettings.logLocationSettings The settings of the target location if logDestination is ExternalStore. No
logSettings.logLocationSettings.linkedServiceName The linked service of the target location. Only blob storage is supported. No
logSettings.logLocationSettings.path The folder path under which to store logs. No

Activity output

Sample output:

{ 
    "resultSetCount": 2, 
    "resultSets": [ 
        { 
            "rowCount": 10, 
            "rows":[ 
                { 
                    "<columnName1>": "<value1>", 
                    "<columnName2>": "<value2>", 
                    ... 
                } 
            ] 
        }, 
        ... 
    ], 
    "recordsAffected": 123, 
    "outputParameters":{ 
        "<parameterName1>": "<value1>", 
        "<parameterName2>": "<value2>" 
    }, 
    "outputLogs": "<logs>", 
    "outputLogsLocation": "<folder path>", 
    "outputTruncated": true, 
    ... 
} 
Property name Description Condition
resultSetCount The count of result sets returned by the script. Always
resultSets The array which contains all the result sets. Always
resultSets.rowCount Total rows in the result set. Always
resultSets.rows The array of rows in the result set. Always
recordsAffected The row count of affected rows by the script. If scriptType is NonQuery
outputParameters The output parameters of the script. If parameter type is Output or InputOutput.
outputLogs The logs written by the script, for example, print statement. If connector supports log statement and enableScriptLogs is true and logLocationSettings isn't provided.
outputLogsPath The full path of the log file. If enableScriptLogs is true and logLocationSettings is provided.
outputTruncated Indicator of whether the output exceeds the limits and get truncated. If output exceeds the limits.

Note

  • The output is collected every time a script block is executed. The final output is the merged result of all script block outputs. The output parameter with same name in different script block will get overwritten.
  • Since the output has size / rows limitation, the output will be truncated in following order: logs -> parameters -> rows. Note, this applies to a single script block, which means the output rows of next script block won’t evict previous logs.
  • Any error caused by log won’t fail the activity.
  • For consuming activity output resultSets in down stream activity please refer to the Lookup activity result documentation.
  • Use outputLogs when you are using 'PRINT' statements for logging purpose. If query returns resultSets, it will be available in the activity output and will be limited to 5000 rows/ 4MB size limit.

Configure the Script activity using UI

Inline script

Screenshot showing the UI to configure an inline script.

Inline scripts integrate well with Pipeline CI/CD since the script is stored as part of the pipeline metadata.

Logging

Screenshot showing the UI for the logging settings for a script.

Logging options:

  • Disable - No execution output is logged.
  • Activity output - The script execution output is appended to the activity output. Downstream activities can then consume it. The output size is limited to 4 MB.
  • External storage - Persists output to storage. Use this option if the output size is greater than 2 MB or you would like to explicitly persist the output on your storage account.

Note

Billing - The Script activity will be billed as Pipeline activities.

See the following articles that explain how to transform data in other ways: