Using Custom Python Libraries with U-SQL
The U-SQL/Python extensions for Azure Data Lake Analytics ships with the standard Python libraries and includes pandas and numpy. We've been getting a lot of questions about how to use custom libraries. This is very simple!
Introducing zipimport
PEP 273 (zipimport) gave Python's import statement the ability to import modules from ZIP files. Take a moment to review the zipimport documentation before we proceed.
Here are the the basics:
- Ccreate a module (a .py file, etc.)
- ZIP up the module into a .zip file
- Add the full path of the zip file to sys.path
- Import the module
Build and test a simple zipped package
Before you try this with U-SQL, first master the mechanics of zipimport on your own box.
Create a file called mymodule.py with the following contents:
# demo module
hello_world = "Hello World! This is code from a custom module"
This module defines a single variable called hello_world.
Create a zip file called modules.zip that contains the mymodule.py at the root .
- In Windows you can create right-click on mymodule.py and select Send to compressed folder
- This will create a file called mymodule.zip
- Rename mymodule.zip to mycustommodules.zip
- NOTE: This renaming step isn't strictly mandatory when using zipimport, but will help highlight how the process will work
Create a test.py Python file in the same folder as mycustommodules.zip.
import sys
sys.path.insert(0, 'mycustommodules.zip')
import mymodule
print(mymodule.hello_world)
Your folder should contain:
- test.py
- mycustommodules.py
Now run the test.py program
python test.py
The output should look like this:
Hello World! This is code from a custom module
Deploying Custom Python Modules with U-SQL
First upload the mycustommodules.zip file to your ADLS store - in this case we will upload it to the root of the default ADLS account for the ADLA account we are using - so its path is "\mycustommodules.zip"
Now, run this U-SQL script
REFERENCE ASSEMBLY [ExtPython];
DEPLOY RESOURCE @"/mycustommodules.zip";
// mymodule.py is inside the mycustommodules.zip file
DECLARE @myScript = @"
import sys
sys.path.insert(0, 'mycustommodules.zip')
import mymodule
def usqlml_main(df):
del df['number']
df['hello_world'] = str(mymodule.hello_world)
return df
";
@rows =
SELECT * FROM (VALUES (1)) AS D(number);
@rows =
REDUCE @rows ON number
PRODUCE hello_world string
USING new Extension.Python.Reducer(pyScript:@myScript);
OUTPUT @rows
TO "/demo_python_custom_module.csv"
USING Outputters.Csv(outputHeader: true);
It will produce a simple CSV file with "Hello World! This is code from a custom module" as a row.
Comments
- Anonymous
June 28, 2017
This is very helpful! - Anonymous
June 28, 2017
Great article! A follow up question: If I wanted to use a custom library such as tensorflow that uses different versions of numpy than what is pre-installed with the U-sql python extension, what would be the best way to do this ?