Partager via


AzureDLFileSystem Class

Access Azure DataLake Store as if it were a file-system

Inheritance
builtins.object
AzureDLFileSystem

Constructor

AzureDLFileSystem(token_credential=None, **kwargs)

Parameters

Name Description
store_name
Required
str(<xref:””)

Store name to connect to. If not supplied, we use environment variable azure_data_lake_store_name

token_credential
<xref:<xref:azure.datalake.store.core.credentials object>>

When setting up a new connection, this contains the authorization credentials. Use Azure Identity to get this or define an implementation of azure.core.credentials.TokenCredential

Default value: None
scopes
Required

which is a list of scopes to use for the token.

url_suffix
Required

Domain to send REST requests to. The end-point URL is constructed using this and the store_name. If None, use default.

api_version
Required
str(<xref:2018-09-01>)

The API version to target with requests. Changing this value will change the behavior of the requests, and can cause unexpected behavior or breaking changes. Changes to this value should be undergone with caution.

per_call_timeout_seconds
Required
float(<xref:60>)

This is the timeout for each requests library call.

kwargs
Required
<xref:<xref:azure.datalake.store.core.optional key/values>>

Other arguments forwarded to the DatalakeRESTInterface constructor.

Methods

access

Does such a file/directory exist?

cat

Return contents of file

chmod

Change access mode of path

Note this is not recursive.

chown

Change owner and/or owning group

Note this is not recursive.

concat

Concatenate a list of files into one new file

connect

Establish connection object.

cp

Not implemented. Copy file between locations on ADL

current

Return the most recently created AzureDLFileSystem

df

Resource summary of path

du

Bytes in keys at path

exists

Does such a file/directory exist?

get

Stream data from file at path to local filename

get_acl_status

Gets Access Control List (ACL) entries for the specified file or directory.

glob

Find files (not directories) by glob-matching.

head

Return first bytes of file

info

File information for path

invalidate_cache

Remove entry from object file-cache

listdir

List all elements under directory specified with path

ls

List all elements under directory specified with path

merge

Concatenate a list of files into one new file

mkdir

Make new directory

modify_acl_entries

Modify existing Access Control List (ACL) entries on a file or folder. If the entry does not exist it is added, otherwise it is updated based on the spec passed in. No entries are removed by this process (unlike set_acl).

Note: this is by default not recursive, and applies only to the file or folder specified.

mv

Move file between locations on ADL

open

Open a file for reading or writing

put

Stream data from local filename to file at path

read_block

Read a block of bytes from an ADL file

Starting at offset of the file, read length bytes. If delimiter is set then we ensure that the read starts and stops at delimiter boundaries that follow the locations offset and `offset

  • length. If offset` is zero then we start at zero. The bytestring returned WILL include the end delimiter string.

If offset+length is beyond the eof, reads to eof.

remove

Remove a file or directory

remove_acl

Remove the entire, non default, ACL from the file or folder, including unnamed entries. Default entries cannot be removed this way, please use remove_default_acl for that.

Note: this is not recursive, and applies only to the file or folder specified.

remove_acl_entries

Remove existing, named, Access Control List (ACL) entries on a file or folder. If the entry does not exist already it is ignored. Default entries cannot be removed this way, please use remove_default_acl for that. Unnamed entries cannot be removed in this way, please use remove_acl for that.

Note: this is by default not recursive, and applies only to the file or folder specified.

remove_default_acl

Remove the entire default ACL from the folder. Default entries do not exist on files, if a file is specified, this operation does nothing.

Note: this is not recursive, and applies only to the folder specified.

rename

Move file between locations on ADL

rm

Remove a file or directory

rmdir

Remove empty directory

set_acl

Set the Access Control List (ACL) for a file or folder.

Note: this is by default not recursive, and applies only to the file or folder specified.

set_expiry

Set or remove the expiration time on the specified file. This operation can only be executed against files.

Note: Folders are not supported.

stat

File information for path

tail

Return last bytes of file

touch

Create empty file

unlink

Remove a file or directory

walk

Get all files below given path

access

Does such a file/directory exist?

access(path, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

invalidate_cache

Whether to invalidate cache

Default value: True

Returns

Type Description
True,
<xref:false> <xref:depending> <xref:on> <xref:whether> <xref:the> <xref:path> <xref:exists.>

cat

Return contents of file

cat(path)

Parameters

Name Description
path
Required

Path to query

Returns

Type Description
<xref:azure.datalake.store.core.Contents> <xref:of> <xref:azure.datalake.store.core.file>

chmod

Change access mode of path

Note this is not recursive.

chmod(path, mod)

Parameters

Name Description
path
Required
str

Location to change

mod
Required
str

Octal representation of access, e.g., "0777" for public read/write. See docs

chown

Change owner and/or owning group

Note this is not recursive.

chown(path, owner=None, group=None)

Parameters

Name Description
path
Required
str

Location to change

owner
str

UUID of owning entity

Default value: None
group
str

UUID of group

Default value: None

concat

Concatenate a list of files into one new file

concat(outfile, filelist, delete_source=False)

Parameters

Name Description
outfile
Required
<xref:azure.datalake.store.core.path>

The file which will be concatenated to. If it already exists, the extra pieces will be appended.

filelist
Required
list of <xref:azure.datalake.store.core.paths>

Existing adl files to concatenate, in order

delete_source

If True, assume that the paths to concatenate exist alone in a directory, and delete that whole directory when done.

Default value: False

Returns

Type Description

connect

Establish connection object.

connect()

cp

Not implemented. Copy file between locations on ADL

cp(path1, path2)

Parameters

Name Description
path1
Required
path2
Required

current

Return the most recently created AzureDLFileSystem

current()

df

Resource summary of path

df(path)

Parameters

Name Description
path
Required
str

Path to query

du

Bytes in keys at path

du(path, total=False, deep=False, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

total

Return the sum on list

Default value: False
deep

Recursively enumerate or just use files under current dir

Default value: False
invalidate_cache

Whether to invalidate cache

Default value: True

Returns

Type Description
<xref:size> <xref:pairs>,
<xref:total> <xref:size.>

List of dict of name

exists

Does such a file/directory exist?

exists(path, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

invalidate_cache

Whether to invalidate cache

Default value: True

Returns

Type Description
True,
<xref:false> <xref:depending> <xref:on> <xref:whether> <xref:the> <xref:path> <xref:exists.>

get

Stream data from file at path to local filename

get(path, filename)

Parameters

Name Description
path
Required

ADL Path to read

filename
Required
str or <xref:azure.datalake.store.core.Path>

Local file path to write to

Returns

Type Description

get_acl_status

Gets Access Control List (ACL) entries for the specified file or directory.

get_acl_status(path)

Parameters

Name Description
path
Required
str

Location to get the ACL.

glob

Find files (not directories) by glob-matching.

glob(path, details=False, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

details

Whether to include file details

Default value: False
invalidate_cache

Whether to invalidate cache

Default value: True

Returns

Type Description
List <xref:of> <xref:azure.datalake.store.core.files>

head

Return first bytes of file

head(path, size=1024)

Parameters

Name Description
path
Required

Path to query

size
int

How many bytes to return

Default value: 1024

Returns

Type Description
<xref:First>(<xref:size>) bytes <xref:of> <xref:azure.datalake.store.core.file>

info

File information for path

info(path, invalidate_cache=True, expected_error_code=None)

Parameters

Name Description
path
Required

Path to query

invalidate_cache

Whether to invalidate cache or not

Default value: True
expected_error_code
int

Optionally indicates a specific, expected error code, if any.

Default value: None

Returns

Type Description
File <xref:information>

invalidate_cache

Remove entry from object file-cache

invalidate_cache(path=None)

Parameters

Name Description
path

Remove the path from object file-cache

Default value: None

Returns

Type Description

listdir

List all elements under directory specified with path

listdir(path='', detail=False, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

detail

Detailed info or not.

Default value: False
invalidate_cache

Whether to invalidate cache or not

Default value: True

Returns

Type Description
List <xref:of> <xref:elements> <xref:under> <xref:directory> <xref:specified> <xref:with> <xref:path>

ls

List all elements under directory specified with path

ls(path='', detail=False, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

detail

Detailed info or not.

Default value: False
invalidate_cache

Whether to invalidate cache or not

Default value: True

Returns

Type Description
List <xref:of> <xref:elements> <xref:under> <xref:directory> <xref:specified> <xref:with> <xref:path>

merge

Concatenate a list of files into one new file

merge(outfile, filelist, delete_source=False)

Parameters

Name Description
outfile
Required
<xref:azure.datalake.store.core.path>

The file which will be concatenated to. If it already exists, the extra pieces will be appended.

filelist
Required
list of <xref:azure.datalake.store.core.paths>

Existing adl files to concatenate, in order

delete_source

If True, assume that the paths to concatenate exist alone in a directory, and delete that whole directory when done.

Default value: False

Returns

Type Description

mkdir

Make new directory

mkdir(path)

Parameters

Name Description
path
Required

Path to create directory

Returns

Type Description

modify_acl_entries

Modify existing Access Control List (ACL) entries on a file or folder. If the entry does not exist it is added, otherwise it is updated based on the spec passed in. No entries are removed by this process (unlike set_acl).

Note: this is by default not recursive, and applies only to the file or folder specified.

modify_acl_entries(path, acl_spec, recursive=False, number_of_sub_process=None)

Parameters

Name Description
path
Required
str

Location to set the ACL entries on.

acl_spec
Required
str

The ACL specification to use in modifying the ACL at the path in the format '[default:]user|group|other:[entity id or UPN]:r|-w|-x|-,[default:]user|group|other:[entity id or UPN]:r|-w|-x|-,...'

recursive

Specifies whether to modify ACLs recursively or not

Default value: False
number_of_sub_process
Default value: None

mv

Move file between locations on ADL

mv(path1, path2)

Parameters

Name Description
path1
Required

Source Path

path2
Required

Destination path

Returns

Type Description

open

Open a file for reading or writing

open(path, mode='rb', blocksize=33554432, delimiter=None)

Parameters

Name Description
path
Required
<xref:azure.datalake.store.core.string>

Path of file on ADL

mode
<xref:azure.datalake.store.core.string>

One of 'rb', 'ab' or 'wb'

Default value: rb
blocksize
int

Size of data-node blocks if reading

Default value: 33554432
delimiter
<xref:byte>(<xref:s>) or None

For writing delimiter-ended blocks

Default value: None

put

Stream data from local filename to file at path

put(filename, path, delimiter=None)

Parameters

Name Description
filename
Required
str or <xref:azure.datalake.store.core.Path>

Local file path to read from

path
Required

ADL Path to write to

delimiter

Optional delimeter for delimiter-ended blocks

Default value: None

Returns

Type Description

read_block

Read a block of bytes from an ADL file

Starting at offset of the file, read length bytes. If delimiter is set then we ensure that the read starts and stops at delimiter boundaries that follow the locations offset and `offset

  • length. If offset` is zero then we start at zero. The bytestring returned WILL include the end delimiter string.

If offset+length is beyond the eof, reads to eof.

read_block(fn, offset, length, delimiter=None)

Parameters

Name Description
fn
Required
<xref:azure.datalake.store.core.string>

Path to filename on ADL

offset
Required
int

Byte offset to start read

length
Required
int

Number of bytes to read

delimiter
bytes(<xref:optional>)

Ensure reading starts and stops at delimiter bytestring

Default value: None

Examples


>>> adl.read_block('data/file.csv', 0, 13)  
b'Alice, 100\nBo'
>>> adl.read_block('data/file.csv', 0, 13, delimiter=b'\n')  
b'Alice, 100\nBob, 200\n'

Use length=None to read to the end of the file.

adl.read_block('data/file.csv', 0, None, delimiter=b'n') # doctest: +SKIP b'Alice, 100nBob, 200nCharlie, 300'

See also

<xref:distributed.utils.read_block>

remove

Remove a file or directory

remove(path, recursive=False)

Parameters

Name Description
path
Required

The location to remove.

recursive

Whether to remove also all entries below, i.e., which are returned by walk().

Default value: False

Returns

Type Description

remove_acl

Remove the entire, non default, ACL from the file or folder, including unnamed entries. Default entries cannot be removed this way, please use remove_default_acl for that.

Note: this is not recursive, and applies only to the file or folder specified.

remove_acl(path)

Parameters

Name Description
path
Required
str

Location to remove the ACL.

remove_acl_entries

Remove existing, named, Access Control List (ACL) entries on a file or folder. If the entry does not exist already it is ignored. Default entries cannot be removed this way, please use remove_default_acl for that. Unnamed entries cannot be removed in this way, please use remove_acl for that.

Note: this is by default not recursive, and applies only to the file or folder specified.

remove_acl_entries(path, acl_spec, recursive=False, number_of_sub_process=None)

Parameters

Name Description
path
Required
str

Location to remove the ACL entries.

acl_spec
Required
str

The ACL specification to remove from the ACL at the path in the format (note that the permission portion is missing) '[default:]user|group|other:[entity id or UPN],[default:]user|group|other:[entity id or UPN],...'

recursive

Specifies whether to remove ACLs recursively or not

Default value: False
number_of_sub_process
Default value: None

remove_default_acl

Remove the entire default ACL from the folder. Default entries do not exist on files, if a file is specified, this operation does nothing.

Note: this is not recursive, and applies only to the folder specified.

remove_default_acl(path)

Parameters

Name Description
path
Required
str

Location to set the ACL on.

rename

Move file between locations on ADL

rename(path1, path2)

Parameters

Name Description
path1
Required

Source Path

path2
Required

Destination path

Returns

Type Description

rm

Remove a file or directory

rm(path, recursive=False)

Parameters

Name Description
path
Required

The location to remove.

recursive

Whether to remove also all entries below, i.e., which are returned by walk().

Default value: False

Returns

Type Description

rmdir

Remove empty directory

rmdir(path)

Parameters

Name Description
path
Required

Directory path to remove

Returns

Type Description

set_acl

Set the Access Control List (ACL) for a file or folder.

Note: this is by default not recursive, and applies only to the file or folder specified.

set_acl(path, acl_spec, recursive=False, number_of_sub_process=None)

Parameters

Name Description
path
Required
str

Location to set the ACL on.

acl_spec
Required
str

The ACL specification to set on the path in the format '[default:]user|group|other:[entity id or UPN]:r|-w|-x|-,[default:]user|group|other:[entity id or UPN]:r|-w|-x|-,...'

recursive

Specifies whether to set ACLs recursively or not

Default value: False
number_of_sub_process
Default value: None

set_expiry

Set or remove the expiration time on the specified file. This operation can only be executed against files.

Note: Folders are not supported.

set_expiry(path, expiry_option, expire_time=None)

Parameters

Name Description
path
Required
str

File path to set or remove expiration time

expire_time
int

The time that the file will expire, corresponding to the expiry_option that was set

Default value: None
expiry_option
Required
str

Indicates the type of expiration to use for the file:

  1. NeverExpire: ExpireTime is ignored.

  2. RelativeToNow: ExpireTime is an integer in milliseconds representing the expiration date relative to when file expiration is updated.

  3. RelativeToCreationDate: ExpireTime is an integer in milliseconds representing the expiration date relative to file creation.

  4. Absolute: ExpireTime is an integer in milliseconds, as a Unix timestamp relative to 1/1/1970 00:00:00.

stat

File information for path

stat(path, invalidate_cache=True, expected_error_code=None)

Parameters

Name Description
path
Required

Path to query

invalidate_cache

Whether to invalidate cache or not

Default value: True
expected_error_code
int

Optionally indicates a specific, expected error code, if any.

Default value: None

Returns

Type Description
File <xref:information>

tail

Return last bytes of file

tail(path, size=1024)

Parameters

Name Description
path
Required

Path to query

size
int

How many bytes to return

Default value: 1024

Returns

Type Description
<xref:Last>(<xref:size>) bytes <xref:of> <xref:azure.datalake.store.core.file>

touch

Create empty file

touch(path)

Parameters

Name Description
path
Required

Path of file to create

Returns

Type Description

Remove a file or directory

unlink(path, recursive=False)

Parameters

Name Description
path
Required

The location to remove.

recursive

Whether to remove also all entries below, i.e., which are returned by walk().

Default value: False

Returns

Type Description

walk

Get all files below given path

walk(path='', details=False, invalidate_cache=True)

Parameters

Name Description
path
Required

Path to query

details

Whether to include file details

Default value: False
invalidate_cache

Whether to invalidate cache

Default value: True

Returns

Type Description
List <xref:of> <xref:azure.datalake.store.core.files>