Unable to read content of file despite able to see FileInfo. urlopen error [Errno 5] Input/output error

Daniel Wang 5 Reputation points
2025-03-07T05:28:18.5866667+00:00

I have 2 notebooks in different accounts, Staging & Production. Both use Managed Identity, linkedService -> System-assigned managed identity, and mounted drive. Both notebooks use the exact same code:

Both can see the FileInfo, name, size, etc...

FileInfo(path=file:/synfs/notebook/23/mount1/staging_path/ABC.zip, name=ABC.zip, size=1024),

The Staging environment can read the contents of the file, while Production gives error:

URLError: <urlopen error [Errno 5] Input/output error: '/synfs/notebook/23/mount1/staging_path/ABC.zip'>

Code FYR

mssparkutils.fs.mount("abfss://container_name@account_name.dfs.core.windows.net", "/mount1", {"linkedService": "workspace_storage_test"})

mssparkutils.fs.ls(path)

mssparkutils.fs.ls(f'file:{mssparkutils.fs.getMountPath("/mount1")}{staging_path}')


df0 = pd.read_csv(f'file:{mssparkutils.fs.getMountPath("/mount1")}{staging_path}ABC.zip', compression ='zip', sep='|', names = abc, dtype= xyz)
df1 = spark.createDataFrame(df0)
display(df1)
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,228 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 29,711 Reputation points
    2025-03-07T12:53:01.5766667+00:00

    First, check the permissions, the Managed Identity used in the Production environment should have the necessary permissions to access the storage account and the specific file.

    You need also to verify that the mount point (/mount1) is correctly mounted in the Production environment. You can use the following command to list the mounts and verify:

    
    mssparkutils.fs.mounts()
    
    

    If the mount is not present or incorrect, remount it:

    
    mssparkutils.fs.unmount("/mount1")
    
    mssparkutils.fs.mount("abfss://container_name@account_name.dfs.core.windows.net", "/mount1", {"linkedService": "workspace_storage_test"})
    

    Verify that the file path is correct and accessible. You can list the contents of the directory to verify:

    
    mssparkutils.fs.ls(f'file:{mssparkutils.fs.getMountPath("/mount1")}{staging_path}')
    

    The Input/output error might indicate a network issue. Check if there are any network restrictions or firewall rules that might be blocking access to the storage account in the Production environment.

    Implement retry logic in your code to handle transient errors:

    
    import time
    
    from urllib.error import URLError
    
    retries = 3
    
    for attempt in range(retries):
    
        try:
    
            df0 = pd.read_csv(f'file:{mssparkutils.fs.getMountPath("/mount1")}{staging_path}ABC.zip', compression='zip', sep='|', names=abc, dtype=xyz)
    
            break
    
        except URLError as e:
    
            if attempt < retries - 1:
    
                time.sleep(5)  # Wait for 5 seconds before retrying
    
                continue
    
            else:
    
                raise e
    
    

    Add logging to capture more details about the error:

    
    import logging
    
    logging.basicConfig(level=logging.INFO)
    
    logger = logging.getLogger(__name__)
    
    try:
    
        df0 = pd.read_csv(f'file:{mssparkutils.fs.getMountPath("/mount1")}{staging_path}ABC.zip', compression='zip', sep='|', names=abc, dtype=xyz)
    
    except URLError as e:
    
        logger.error(f"Failed to read file: {e}")
    
        raise e
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.