DataLakeFileClient Class

A client to interact with the DataLake file, even if the file may not yet exist.

Inheritance
azure.storage.filedatalake._path_client.PathClient
DataLakeFileClient

Constructor

DataLakeFileClient(account_url: str, file_system_name: str, file_path: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | TokenCredential | None = None, **kwargs: Any)

Parameters

Name Description
account_url
Required
str

The URI to the storage account.

file_system_name
Required
str

The file system for the directory or files.

file_path
Required
str

The whole file path, so that to interact with a specific file. eg. "{directory}/{subdirectory}/{file}"

credential

The credentials with which to authenticate. This is optional if the account URL already has a SAS token. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. If the resource URI already contains a SAS token, this will be ignored in favor of an explicit credential

  • except in the case of AzureSasCredential, where the conflicting SAS tokens will raise a ValueError. If using an instance of AzureNamedKeyCredential, "name" should be the storage account name, and "key" should be the storage account key.
Default value: None

Keyword-Only Parameters

Name Description
api_version
str

The Storage API version to use for requests. Default value is the most recent service version that is compatible with the current SDK. Setting to an older version may result in reduced feature compatibility.

audience
str

The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://.blob.core.windows.net.

Examples

Creating the DataLakeServiceClient from connection string.


   from azure.storage.filedatalake import DataLakeFileClient
   DataLakeFileClient.from_connection_string(connection_string, "myfilesystem", "mydirectory", "myfile")

Variables

Name Description
url
str

The full endpoint URL to the file system, including SAS token if used.

primary_endpoint
str

The full primary endpoint URL.

primary_hostname
str

The hostname of the primary endpoint.

Methods

acquire_lease

Requests a new lease. If the file or directory does not have an active lease, the DataLake service creates a lease on the file/directory and returns a new lease ID.

append_data

Append data to the file.

close

This method is to close the sockets opened by the client. It need not be used when using with a context manager.

create_file

Create a new file.

delete_file

Marks the specified file for deletion.

download_file

Downloads a file to the StorageStreamDownloader. The readall() method must be used to read all the content, or readinto() must be used to download the file into a stream. Using chunks() returns an iterator which allows the user to iterate over the content in chunks.

exists

Returns True if a file exists and returns False otherwise.

flush_data

Commit the previous appended data.

from_connection_string

Create DataLakeFileClient from a Connection String.

get_access_control
get_file_properties

Returns all user-defined metadata, standard HTTP properties, and system properties for the file. It does not return the content of the file.

query_file

Enables users to select/project on datalake file data by providing simple query expressions. This operations returns a DataLakeFileQueryReader, users need to use readall() or readinto() to get query data.

remove_access_control_recursive

Removes the Access Control on a path and sub-paths.

rename_file

Rename the source file.

set_access_control

Set the owner, group, permissions, or access control list for a path.

set_access_control_recursive

Sets the Access Control on a path and sub-paths.

set_file_expiry

Sets the time a file will expire and be deleted.

set_http_headers

Sets system properties on the file or directory.

If one property is set for the content_settings, all properties will be overridden.

set_metadata

Sets one or more user-defined name-value pairs for the specified file system. Each call to this operation replaces all existing metadata attached to the file system. To remove all metadata from the file system, call this operation with no metadata dict.

update_access_control_recursive

Modifies the Access Control on a path and sub-paths.

upload_data

Upload data to a file.

acquire_lease

Requests a new lease. If the file or directory does not have an active lease, the DataLake service creates a lease on the file/directory and returns a new lease ID.

acquire_lease(lease_duration: int | None = -1, lease_id: str | None = None, **kwargs) -> DataLakeLeaseClient

Parameters

Name Description
lease_duration
Required
int

Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

lease_id
Required
str

Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

Keyword-Only Parameters

Name Description
if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

A DataLakeLeaseClient object, that can be run in a context manager.

append_data

Append data to the file.

append_data(data: bytes | str | Iterable[AnyStr] | IO[AnyStr], offset: int, length: int | None = None, **kwargs) -> Dict[str, str | datetime | int]

Parameters

Name Description
data
Required

Content to be appended to file

offset
Required
int

start position of the data to be appended to.

length
Required
int or None

Size of the data in bytes.

Keyword-Only Parameters

Name Description
flush

If true, will commit the data after it is appended.

validate_content

If true, calculates an MD5 hash of the block content. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https as https (the default) will already validate. Note that this MD5 hash is not stored with the file.

lease_action
Literal["acquire", "auto-renew", "release", "acquire-release"]

Used to perform lease operations along with appending data.

"acquire" - Acquire a lease. "auto-renew" - Re-new an existing lease. "release" - Release the lease once the operation is complete. Requires flush=True. "acquire-release" - Acquire a lease and release it once the operations is complete. Requires flush=True.

lease_duration
int

Valid if lease_action is set to "acquire" or "acquire-release".

Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

lease

Required if the file has an active lease or if lease_action is set to "acquire" or "acquire-release". If the file has an existing lease, this will be used to access the file. If acquiring a new lease, this will be used as the new lease id. Value can be a DataLakeLeaseClient object or the lease ID as a string.

cpk

Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

Returns

Type Description

dict of the response header.

Examples

Append data to the file.


   file_client.append_data(data=file_content[2048:3072], offset=2048, length=1024)

close

This method is to close the sockets opened by the client. It need not be used when using with a context manager.

close() -> None

create_file

Create a new file.

create_file(content_settings: ContentSettings | None = None, metadata: Dict[str, str] | None = None, **kwargs) -> Dict[str, str | datetime]

Parameters

Name Description
content_settings
Required

ContentSettings object used to set path properties.

metadata
Required

Name-value pairs associated with the file as metadata.

Keyword-Only Parameters

Name Description
lease

Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

umask
str

Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

owner
str

The owner of the file or directory.

group
str

The owning group of the file or directory.

acl
str

Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format "[scope:][type]:[id]:[permissions]".

lease_id
str

Proposed lease ID, in a GUID string format. The DataLake service returns 400 (Invalid request) if the proposed lease ID is not in the correct format.

lease_duration
int

Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change.

expires_on

The time to set the file to expiry. If the type of expires_on is an int, expiration time will be set as the number of milliseconds elapsed from creation time. If the type of expires_on is datetime, expiration time will be set absolute to the time provided. If no time zone info is provided, this will be interpreted as UTC.

permissions
str

Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

cpk

Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

encryption_context
str

Specifies the encryption context to set on the file.

Returns

Type Description
dict[str, str],

response dict (Etag and last modified).

Examples

Create file.


   file_client = filesystem_client.get_file_client(file_name)
   file_client.create_file()

delete_file

Marks the specified file for deletion.

delete_file(**kwargs) -> None

Keyword-Only Parameters

Name Description
lease

Required if the file has an active lease. Value can be a LeaseClient object or the lease ID as a string.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

None.

Examples

Delete file.


   new_client.delete_file()

download_file

Downloads a file to the StorageStreamDownloader. The readall() method must be used to read all the content, or readinto() must be used to download the file into a stream. Using chunks() returns an iterator which allows the user to iterate over the content in chunks.

download_file(offset: int | None = None, length: int | None = None, **kwargs: Any) -> StorageStreamDownloader

Parameters

Name Description
offset
Required
int

Start of byte range to use for downloading a section of the file. Must be set if length is provided.

length
Required
int

Number of bytes to read from the stream. This is optional, but should be supplied for optimal performance.

Keyword-Only Parameters

Name Description
lease

If specified, download only succeeds if the file's lease is active and matches this ID. Required if the file has an active lease.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

cpk

Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the file was created with a Customer-Provided Key.

max_concurrency
int

Maximum number of parallel connections to use when transferring the file in chunks. This option does not affect the underlying connection pool, and may require a separate configuration of the connection pool.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here. This method may make multiple calls to the service and the timeout will apply to each call individually.

Returns

Type Description

A streaming object (StorageStreamDownloader)

Examples

Return the downloaded data.


   download = file_client.download_file()
   downloaded_bytes = download.readall()

exists

Returns True if a file exists and returns False otherwise.

exists(**kwargs: Any) -> bool

Keyword-Only Parameters

Name Description
timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

True if a file exists, otherwise returns False.

flush_data

Commit the previous appended data.

flush_data(offset: int, retain_uncommitted_data: bool | None = False, **kwargs) -> Dict[str, str | datetime]

Parameters

Name Description
offset
Required
int

offset is equal to the length of the file after commit the previous appended data.

retain_uncommitted_data
Required

Valid only for flush operations. If "true", uncommitted data is retained after the flush operation completes; otherwise, the uncommitted data is deleted after the flush operation. The default is false. Data at offsets less than the specified position are written to the file when flush succeeds, but this optional parameter allows data after the flush position to be retained for a future flush operation.

Keyword-Only Parameters

Name Description
content_settings

ContentSettings object used to set path properties.

close

Azure Storage Events allow applications to receive notifications when files change. When Azure Storage Events are enabled, a file changed event is raised. This event has a property indicating whether this is the final change to distinguish the difference between an intermediate flush to a file stream and the final close of a file stream. The close query parameter is valid only when the action is "flush" and change notifications are enabled. If the value of close is "true" and the flush operation completes successfully, the service raises a file change notification with a property indicating that this is the final update (the file stream has been closed). If "false" a change notification is raised indicating the file has changed. The default is false. This query parameter is set to true by the Hadoop ABFS driver to indicate that the file stream has been closed."

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

lease_action
Literal["acquire", "auto-renew", "release", "acquire-release"]

Used to perform lease operations along with appending data.

"acquire" - Acquire a lease. "auto-renew" - Re-new an existing lease. "release" - Release the lease once the operation is complete. "acquire-release" - Acquire a lease and release it once the operations is complete.

lease_duration
int

Valid if lease_action is set to "acquire" or "acquire-release".

Specifies the duration of the lease, in seconds, or negative one (-1) for a lease that never expires. A non-infinite lease can be between 15 and 60 seconds. A lease duration cannot be changed using renew or change. Default is -1 (infinite lease).

lease

Required if the file has an active lease or if lease_action is set to "acquire" or "acquire-release". If the file has an existing lease, this will be used to access the file. If acquiring a new lease, this will be used as the new lease id. Value can be a DataLakeLeaseClient object or the lease ID as a string.

cpk

Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

Returns

Type Description
dict[str, str],

response header in dict

Examples

Commit the previous appended data.


   with open(SOURCE_FILE, "rb") as data:
       file_client = file_system_client.get_file_client("myfile")
       file_client.create_file()
       file_client.append_data(data, 0)
       file_client.flush_data(data.tell())

from_connection_string

Create DataLakeFileClient from a Connection String.

from_connection_string(conn_str: str, file_system_name: str, file_path: str, credential: str | Dict[str, str] | AzureNamedKeyCredential | AzureSasCredential | TokenCredential | None = None, **kwargs: Any) -> Self

Parameters

Name Description
conn_str
Required
str

A connection string to an Azure Storage account.

file_system_name
Required
str

The name of file system to interact with.

file_path
Required
str

The whole file path, so that to interact with a specific file. eg. "{directory}/{subdirectory}/{file}"

credential
Required

The credentials with which to authenticate. This is optional if the account URL already has a SAS token, or the connection string already has shared access key values. The value can be a SAS token string, an instance of a AzureSasCredential or AzureNamedKeyCredential from azure.core.credentials, an account shared access key, or an instance of a TokenCredentials class from azure.identity. Credentials provided here will take precedence over those in the connection string. If using an instance of AzureNamedKeyCredential, "name" should be the storage account name, and "key" should be the storage account key.

Default value: None

Keyword-Only Parameters

Name Description
audience
str

The audience to use when requesting tokens for Azure Active Directory authentication. Only has an effect when credential is of type TokenCredential. The value could be https://storage.azure.com/ (default) or https://.blob.core.windows.net.

Returns

Type Description

A DataLakeFileClient.

get_access_control

get_access_control(upn: bool | None = None, **kwargs) -> Dict[str, Any]

Parameters

Name Description
upn
Required

Optional. Valid only when Hierarchical Namespace is enabled for the account. If "true", the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names. If "false", the values will be returned as Azure Active Directory Object IDs. The default value is false. Note that group and application Object IDs are not translated because they do not have unique friendly names.

Keyword-Only Parameters

Name Description
lease

Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

response dict containing access control options with no modifications.

get_file_properties

Returns all user-defined metadata, standard HTTP properties, and system properties for the file. It does not return the content of the file.

get_file_properties(**kwargs: Any) -> FileProperties

Keyword-Only Parameters

Name Description
lease

Required if the directory or file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

cpk

Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the file was created with a customer-provided key.

upn

If True, the user identity values returned in the x-ms-owner, x-ms-group, and x-ms-acl response headers will be transformed from Azure Active Directory Object IDs to User Principal Names in the owner, group, and acl fields of FileProperties. If False, the values will be returned as Azure Active Directory Object IDs. The default value is False. Note that group and application Object IDs are not translate because they do not have unique friendly names.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

All user-defined metadata, standard HTTP properties, and system properties for the file.

Examples

Getting the properties for a file.


   properties = file_client.get_file_properties()

query_file

Enables users to select/project on datalake file data by providing simple query expressions. This operations returns a DataLakeFileQueryReader, users need to use readall() or readinto() to get query data.

query_file(query_expression: str, **kwargs: Any) -> DataLakeFileQueryReader

Parameters

Name Description
query_expression
Required
str

Required. a query statement. eg. Select * from DataLakeStorage

Keyword-Only Parameters

Name Description
on_error

A function to be called on any processing errors returned by the service.

file_format

Optional. Defines the serialization of the data currently stored in the file. The default is to treat the file data as CSV data formatted in the default dialect. This can be overridden with a custom DelimitedTextDialect, or DelimitedJsonDialect or "ParquetDialect" (passed as a string or enum). These dialects can be passed through their respective classes, the QuickQueryDialect enum or as a string.

output_format

Optional. Defines the output serialization for the data stream. By default the data will be returned as it is represented in the file. By providing an output format, the file data will be reformatted according to that profile. This value can be a DelimitedTextDialect or a DelimitedJsonDialect or ArrowDialect. These dialects can be passed through their respective classes, the QuickQueryDialect enum or as a string.

lease

Required if the file has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

cpk

Decrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS. Required if the file was created with a Customer-Provided Key.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description
<xref:azure.storage.filedatalake.DataLakeFileQueryReader>

A streaming object (DataLakeFileQueryReader)

Examples

select/project on datalake file data by providing simple query expressions.


   errors = []
   def on_error(error):
       errors.append(error)

   # upload the csv file
   file_client = datalake_service_client.get_file_client(filesystem_name, "csvfile")
   file_client.upload_data(CSV_DATA, overwrite=True)

   # select the second column of the csv file
   query_expression = "SELECT _2 from DataLakeStorage"
   input_format = DelimitedTextDialect(delimiter=',', quotechar='"', lineterminator='\n', escapechar="", has_header=False)
   output_format = DelimitedJsonDialect(delimiter='\n')
   reader = file_client.query_file(query_expression, on_error=on_error, file_format=input_format, output_format=output_format)
   content = reader.readall()

remove_access_control_recursive

Removes the Access Control on a path and sub-paths.

remove_access_control_recursive(acl: str, **kwargs: Any) -> AccessControlChangeResult

Parameters

Name Description
acl
Required
str

Removes POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, and a user or group identifier in the format "[scope:][type]:[id]".

Keyword-Only Parameters

Name Description
progress_hook
<xref:func>(AccessControlChanges)

Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

continuation_token
str

Optional continuation token that can be used to resume previously stopped operation.

batch_size
int

Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

max_batches
int

Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed then, continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

continue_on_failure

If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Exceptions

Type Description

User can restart the operation using continuation_token field of AzureError if the token is available.

rename_file

Rename the source file.

rename_file(new_name: str, **kwargs: Any) -> DataLakeFileClient

Parameters

Name Description
new_name
Required
str

the new file name the user want to rename to. The value must have the following format: "{filesystem}/{directory}/{subdirectory}/{file}".

Keyword-Only Parameters

Name Description
content_settings

ContentSettings object used to set path properties.

source_lease

A lease ID for the source path. If specified, the source path must have an active lease and the lease ID must match.

lease

Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

source_if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

source_if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

source_etag
str

The source ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

source_match_condition

The source match condition to use upon the etag.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

the renamed file client

Examples

Rename the source file.


   new_client = file_client.rename_file(file_client.file_system_name + '/' + 'newname')

set_access_control

Set the owner, group, permissions, or access control list for a path.

set_access_control(owner: str | None = None, group: str | None = None, permissions: str | None = None, acl: str | None = None, **kwargs) -> Dict[str, str | datetime]

Parameters

Name Description
owner
Required
str

Optional. The owner of the file or directory.

group
Required
str

Optional. The owning group of the file or directory.

permissions
Required
str

Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported. permissions and acl are mutually exclusive.

acl
Required
str

Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format "[scope:][type]:[id]:[permissions]". permissions and acl are mutually exclusive.

Keyword-Only Parameters

Name Description
lease

Required if the file/directory has an active lease. Value can be a LeaseClient object or the lease ID as a string.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description
dict[str, str],

response dict containing access control options (Etag and last modified).

set_access_control_recursive

Sets the Access Control on a path and sub-paths.

set_access_control_recursive(acl: str, **kwargs: Any) -> AccessControlChangeResult

Parameters

Name Description
acl
Required
str

Sets POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format "[scope:][type]:[id]:[permissions]".

Keyword-Only Parameters

Name Description
progress_hook
<xref:func>(AccessControlChanges)

Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

continuation_token
str

Optional continuation token that can be used to resume previously stopped operation.

batch_size
int

Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

max_batches
int

Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

continue_on_failure

If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Exceptions

Type Description

User can restart the operation using continuation_token field of AzureError if the token is available.

set_file_expiry

Sets the time a file will expire and be deleted.

set_file_expiry(expiry_options: str, expires_on: datetime | int | None = None, **kwargs) -> None

Parameters

Name Description
expiry_options
Required
str

Required. Indicates mode of the expiry time. Possible values include: 'NeverExpire', 'RelativeToCreation', 'RelativeToNow', 'Absolute'

expires_on
Required

The time to set the file to expiry. When expiry_options is RelativeTo*, expires_on should be an int in milliseconds. If the type of expires_on is datetime, it should be in UTC time.

Keyword-Only Parameters

Name Description
timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

set_http_headers

Sets system properties on the file or directory.

If one property is set for the content_settings, all properties will be overridden.

set_http_headers(content_settings: ContentSettings | None = None, **kwargs) -> Dict[str, Any]

Parameters

Name Description
content_settings
Required

ContentSettings object used to set file/directory properties.

Keyword-Only Parameters

Name Description
lease

If specified, set_file_system_metadata only succeeds if the file system's lease is active and matches this ID.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

file/directory-updated property dict (Etag and last modified)

set_metadata

Sets one or more user-defined name-value pairs for the specified file system. Each call to this operation replaces all existing metadata attached to the file system. To remove all metadata from the file system, call this operation with no metadata dict.

set_metadata(metadata: Dict[str, str], **kwargs) -> Dict[str, str | datetime]

Parameters

Name Description
metadata
Required

A dict containing name-value pairs to associate with the file system as metadata. Example: {'category':'test'}

Keyword-Only Parameters

Name Description
lease

If specified, set_file_system_metadata only succeeds if the file system's lease is active and matches this ID.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

cpk

Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description
dict[str, str],

file system-updated property dict (Etag and last modified).

update_access_control_recursive

Modifies the Access Control on a path and sub-paths.

update_access_control_recursive(acl: str, **kwargs: Any) -> AccessControlChangeResult

Parameters

Name Description
acl
Required
str

Modifies POSIX access control rights on files and directories. The value is a comma-separated list of access control entries. Each access control entry (ACE) consists of a scope, a type, a user or group identifier, and permissions in the format "[scope:][type]:[id]:[permissions]".

Keyword-Only Parameters

Name Description
progress_hook
<xref:func>(AccessControlChanges)

Callback where the caller can track progress of the operation as well as collect paths that failed to change Access Control.

continuation_token
str

Optional continuation token that can be used to resume previously stopped operation.

batch_size
int

Optional. If data set size exceeds batch size then operation will be split into multiple requests so that progress can be tracked. Batch size should be between 1 and 2000. The default when unspecified is 2000.

max_batches
int

Optional. Defines maximum number of batches that single change Access Control operation can execute. If maximum is reached before all sub-paths are processed, then continuation token can be used to resume operation. Empty value indicates that maximum number of batches in unbound and operation continues till end.

continue_on_failure

If set to False, the operation will terminate quickly on encountering user errors (4XX). If True, the operation will ignore user errors and proceed with the operation on other sub-entities of the directory. Continuation token will only be returned when continue_on_failure is True in case of user errors. If not set the default value is False for this.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here.

Returns

Type Description

A summary of the recursive operations, including the count of successes and failures, as well as a continuation token in case the operation was terminated prematurely.

Exceptions

Type Description

User can restart the operation using continuation_token field of AzureError if the token is available.

upload_data

Upload data to a file.

upload_data(data: bytes | str | Iterable | IO, length: int | None = None, overwrite: bool | None = False, **kwargs) -> Dict[str, Any]

Parameters

Name Description
data
Required

Content to be uploaded to file

length
Required
int

Size of the data in bytes.

overwrite
Required

to overwrite an existing file or not.

Keyword-Only Parameters

Name Description
content_settings

ContentSettings object used to set path properties.

metadata

Name-value pairs associated with the blob as metadata.

lease

Required if the blob has an active lease. Value can be a DataLakeLeaseClient object or the lease ID as a string.

umask
str

Optional and only valid if Hierarchical Namespace is enabled for the account. When creating a file or directory and the parent folder does not have a default ACL, the umask restricts the permissions of the file or directory to be created. The resulting permission is given by p & ^u, where p is the permission and u is the umask. For example, if p is 0777 and u is 0057, then the resulting permission is 0720. The default permission is 0777 for a directory and 0666 for a file. The default umask is 0027. The umask must be specified in 4-digit octal notation (e.g. 0766).

permissions
str

Optional and only valid if Hierarchical Namespace is enabled for the account. Sets POSIX access permissions for the file owner, the file owning group, and others. Each class may be granted read, write, or execute permission. The sticky bit is also supported. Both symbolic (rwxrw-rw-) and 4-digit octal notation (e.g. 0766) are supported.

if_modified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has been modified since the specified time.

if_unmodified_since

A DateTime value. Azure expects the date value passed in to be UTC. If timezone is included, any non-UTC datetimes will be converted to UTC. If a date is passed in without timezone info, it is assumed to be UTC. Specify this header to perform the operation only if the resource has not been modified since the specified date/time.

validate_content

If true, calculates an MD5 hash for each chunk of the file. The storage service checks the hash of the content that has arrived with the hash that was sent. This is primarily valuable for detecting bitflips on the wire if using http instead of https, as https (the default), will already validate. Note that this MD5 hash is not stored with the blob. Also note that if enabled, the memory-efficient upload algorithm will not be used because computing the MD5 hash requires buffering entire blocks, and doing so defeats the purpose of the memory-efficient algorithm.

etag
str

An ETag value, or the wildcard character (*). Used to check if the resource has changed, and act according to the condition specified by the match_condition parameter.

match_condition

The match condition to use upon the etag.

cpk

Encrypts the data on the service-side with the given key. Use of customer-provided keys must be done over HTTPS.

timeout
int

Sets the server-side timeout for the operation in seconds. For more details see https://learn.microsoft.com/rest/api/storageservices/setting-timeouts-for-blob-service-operations. This value is not tracked or validated on the client. To configure client-side network timesouts see here. This method may make multiple calls to the service and the timeout will apply to each call individually.

max_concurrency
int

Maximum number of parallel connections to use when transferring the file in chunks. This option does not affect the underlying connection pool, and may require a separate configuration of the connection pool.

chunk_size
int

The maximum chunk size for uploading a file in chunks. Defaults to 100*1024*1024, or 100MB.

encryption_context
str

Specifies the encryption context to set on the file.

Returns

Type Description

response dict (Etag and last modified).

Attributes

api_version

The version of the Storage API used for requests.

Returns

Type Description
str

location_mode

The location mode that the client is currently using.

By default this will be "primary". Options include "primary" and "secondary".

Returns

Type Description
str

primary_endpoint

The full primary endpoint URL.

Returns

Type Description
str

primary_hostname

The hostname of the primary endpoint.

Returns

Type Description
str

secondary_endpoint

The full secondary endpoint URL if configured.

If not available a ValueError will be raised. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Returns

Type Description
str

Exceptions

Type Description

secondary_hostname

The hostname of the secondary endpoint.

If not available this will be None. To explicitly specify a secondary hostname, use the optional secondary_hostname keyword argument on instantiation.

Returns

Type Description

url

The full endpoint URL to this entity, including SAS token if used.

This could be either the primary endpoint, or the secondary endpoint depending on the current location_mode. :returns: The full endpoint URL to this entity, including SAS token if used. :rtype: str