U-SQL Using APPLY with an Applier UDO
Summary
U-SQL’s user-defined operators (UDOs) provide the ability to write your own custom Applier in C#. The following provides the syntax of the applier expression as you call it in an APPLY.
Syntax
Applier_Expression := ['USING'] applier_type_expression Derived_Table_Alias_With_Types [Readonly_Clause] [Required_Clause].
Remarks
applier_type_expression
The applier typed expression creates an Applier UDO and in addition to the input row data, takes the derived table schema definition as its input as well as the provided clauses. Unlike EXPLODE (or in general rowset expressions) an Applier uses the UDO programming model to get access to the incoming row and to return zero to many rows as a result. If the expression is not resulting in an instance of an IApplier, an error is raised. The result will be typed as specified in the derived table schema.For more information on the UDO programming model see U-SQL Programmability Guide: User-Defined Applier.
Derived_Table_Alias_With_Types
Because the UDO model requires knowledge of the column data types, the derived table schema specification requires the column data types to be specified.
Syntax
Derived_Table_Alias_With_Types := 'AS' Quoted_or_Unquoted_Identifier '(' Column_Definition_List ')'.
Readonly_Clause
The optional READONLY clause can help the UDO programmer to write more efficient code. The optional READONLY clause specifies that either all columns (if specified with *) or the specified columns are read only for the Applier and will be passed through to the output using either the same name or the specified column name in parenthesis.
Syntax
Readonly_Clause := 'READONLY' Star_Or_Readonly_Column_List.
Star_Or_Readonly_Column_List := '*' | Readonly_Column_List.
Readonly_Column_List := Readonly_Column { ',' Readonly_Column }.
Readonly_Column := Column_Identifier [Output_Column_Dependency_Alias].
Output_Column_Dependency_Alias := '(' Quoted_or_Unquoted_Identifier ')'.
Required_Clause
The optionalREQUIRED
clause can help the UDO programmer to write more efficient code. The optionalREQUIRED
clause specifies that either all columns are required on input for the Applier (if specified with *) or the specified columns are required. If a specified column is followed by a list of columns in parenthesis, then the input column is only required if the columns in that list are referenced from the output.
Syntax
Required_Clause := 'REQUIRED' Star_Or_Required_Column_List.
Star_Or_Required_Column_List := '*' | Required_Column_List.
Required_Column_List := Required_Column { ',' Required_Column}.
Required_Column := Column_Identifier [Required_Output_Column_Dependency_List].
Required_Output_Column_Dependency_List := '(' Identifier_List ')'.
Examples
- The examples can be executed in Visual Studio with the Azure Data Lake Tools plug-in.
- The scripts can be executed locally. An Azure subscription and Azure Data Lake Analytics account is not needed when executed locally.
- For simplicity, the example(s) with user-defined code make use of Code-Behind for assembly management. The main advantage of Code-Behind is that the tooling will register the assembly file and add the REFERENCE ASSEMBLY statement automatically. To use Assembly registration instead of Code-Behind, see Using Assemblies: Code-Behind vs. Assembly Registration Walkthrough.
ParserApplier
c# code is placed in the associated Code-Behind .cs file.
using Microsoft.Analytics.Interfaces;
using Microsoft.Analytics.Types.Sql;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace ReferenceGuide_Examples
{
[SqlUserDefinedApplier]
public class ParserApplier : IApplier
{
private string parsingPart;
public ParserApplier(string parsingPart)
{
if (parsingPart.ToUpper().Contains("ALL")
|| parsingPart.ToUpper().Contains("FORMALNAME")
|| parsingPart.ToUpper().Contains("ISOALPHA3CODE")
|| parsingPart.ToUpper().Contains("LATESTRECORDEDPOPULATION")
|| parsingPart.ToUpper().Contains("REGION")
|| parsingPart.ToUpper().Contains("SUBREGION")
)
{
this.parsingPart = parsingPart;
}
else
{
throw new ArgumentException("Incorrect parameter. Please use: 'ALL[FORMALNAME|ISOALPHA3CODE|LATESTRECORDEDPOPULATION|REGION|SUBREGION]'");
}
}
public override IEnumerable<IRow> Apply(IRow input, IUpdatableRow output)
{
string[] properties = input.Get<string>("Properties").Split(',');
// only process with correct number of properties
if (properties.Count() == 5)
{
string FormalName = properties[0];
string IsoAlpha3Code = properties[1];
int LatestRecordedPopulation = -1;
string Region = properties[3];
string Subregion = properties[4];
// Only return LatestRecordedPopulation if it is number, otherwise, -1
if (!int.TryParse(properties[2], out LatestRecordedPopulation))
{
LatestRecordedPopulation = -1;
}
if (parsingPart.ToUpper().Contains("FORMALNAME") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("FormalName", FormalName);
if (parsingPart.ToUpper().Contains("ISOALPHA3CODE") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("IsoAlpha3Code", IsoAlpha3Code);
if (parsingPart.ToUpper().Contains("LATESTRECORDEDPOPULATION") || parsingPart.ToUpper().Contains("ALL")) output.Set<int>("LatestRecordedPopulation", LatestRecordedPopulation);
if (parsingPart.ToUpper().Equals("REGION") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("Region", Region);
if (parsingPart.ToUpper().Contains("SUBREGION") || parsingPart.ToUpper().Contains("ALL")) output.Set<string>("Subregion", Subregion);
}
yield return output.AsReadOnly();
}
}
}
Using ParserApplier
The user-defined Applier acts as a comma-delimited value parser for the country properties. Using Code-Behind from previous section, above.
@countries =
SELECT * FROM
( VALUES
(1, "Afghanistan", "Islamic State of Afghanistan,AFG,28400000,Asia,Southern Asia"),
(3, "Albania", "Republic of Albania,ALB,3785031,Europe,Southern Europe"),
(4, "Algeria", "People's Democratic Republic of Algeria,DZA,34178188,Africa,Northern Africa"),
(6, "Andorra", "Principality of Andorra,AND,87243,Europe,Southern Europe"),
(7, "Angola", "People's Republic of Angola,AGO,12799293,Africa,Middle Africa"),
(10, "Antigua and Barb.", "Antigua and Barbuda,ATG,85632,Americas,Caribbean"),
(11, "Argentina", "Argentine Republic,ARG,42550127,Americas,South America"),
(12, "Armenia", "Republic of Armenia,ARM,2967004,Asia,Western Asia"),
(15, "Australia", "Commonwealth of Australia,AUS,22997671,Oceania,Australia and New Zealand"),
(16, "Austria", "Republic of Austria,AUT,2,Europe,Western Europe")
) AS T(CountryID, CountryName, Properties);
@result =
SELECT
c.CountryID,
c.CountryName,
Properties.FormalName,
Properties.IsoAlpha3Code,
Properties.LatestRecordedPopulation,
Properties.Region,
Properties.Subregion
FROM @countries AS c
CROSS APPLY
new ReferenceGuide_Examples.ParserApplier ("all") AS Properties(
FormalName string,
IsoAlpha3Code string,
LatestRecordedPopulation int,
Region string,
Subregion string);
OUTPUT @result
TO "/Output/ReferenceGuide/StatementsAndExpressions/Appplier/exampleA.txt"
USING Outputters.Text();
U-SQL's CROSS/OUTER APPLY with VALUES
The VALUES clause allows to apply each row of the constructed rowset to be correlated to each row in the rowset source.
@bands =
SELECT *
FROM (VALUES ("Beatles", "George Harrison, John Lennon, Paul McCartney, Ringo Starr"),
("Creedence Clearwater Revival", "Doug Clifford, John Fogerty, Stu Cook, Tom Fogerty"),
("Eagles", "Don Henley, Glenn Frey, Joe Walsh, Timothy Schmit"),
("Pink Floyd", "David Gilmour, Nick Mason, Richard Wright, Roger Watters, Syd Barrett"),
("Rolling Stones", "Charlie Watts, Keith Richards, Mick Jagger, Ronnie Wood")) AS Bands(name, members);
@ca_val1 = SELECT * FROM @bands CROSS APPLY VALUES (1) AS T(x);
@ca_val2 = SELECT * FROM @bands CROSS APPLY VALUES (1),(name.Length) AS T(x);
@ca_val3 = SELECT * FROM @bands CROSS APPLY VALUES (1,name.Length,3) AS T(x,y,z);
OUTPUT @ca_val1
TO "/output/ReferenceGuide/DML/Appplier/crossapply_value1.csv"
USING Outputters.Csv();
OUTPUT @ca_val2
TO "/output/ReferenceGuide/DML/Appplier/crossapply_value2.csv"
USING Outputters.Csv();
OUTPUT @ca_val3
TO "/output/ReferenceGuide/DML/Appplier/crossapply_value3.csv"
USING Outputters.Csv();
See Also
- Query Statements and Expressions (U-SQL)
- FROM Clause (U-SQL)
- SELECT Expression (U-SQL)
- U-SQL SELECT Selecting from CROSS APPLY and OUTER APPLY
- Output Statement (U-SQL)
- U-SQL Programmability Guide: User-Defined Applier
- Extending U-SQL Expressions with User-Code
- How to register U-SQL Assemblies in your U-SQL Catalog