ADF - postgres: Read method cannot be called when another read operation is pending

Mart Roben 0 Reputation points
2024-11-12T09:22:24.8866667+00:00

I'm using Azure Data Factory copy data activity to move data from Azure PostgreSQL to OneLake. I'm using a self-hosted integration runtime for that (because of how our VNETs are set up).

Here is the error I'm getting:

Operation on target activity_copy_data_to_onelake failed: 'Type=Npgsql.NpgsqlException,Message=Exception while reading from stream,Source=Npgsql,''Type=System.NotSupportedException,Message=The Read method cannot be called when another read operation is pending.,Source=System,'

The same setup works perfectly for all the other tables in the same database, but fails for one. It might be because the problematic table is bigger than others (50 columns x 6M rows).

Here's what I have tried with no success in solving the problem:

  • Setting connection timeout to 600 and command timeout to 0 (no timeout) in the linked service connection string.
  • Setting pooling to "no" in the linked service connection string.
  • Running the same query by pgAdmin. It succeeded, but took 18 minutes. In ADF it fails after ~200 seconds. The query itself is a simple SELECT * FROM <table>

Does anyone have any ideas how to diagnose or solve this problem?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,911 questions
Azure Database for PostgreSQL
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 26,961 Reputation points
    2024-11-12T10:59:57.2366667+00:00

    For large tables, it’s helpful to partition or batch data. In the ADF Copy Data activity, try enabling "Data partitioning" and specify partition columns if you have any integer or date columns that can divide the data into manageable chunks.

    Try limiting the maximum concurrent connections in the self-hosted integration runtime configuration, which could reduce competition for resources during the copy operation.

    Consider upgrading the VM or adding more nodes to your self-hosted integration runtime cluster. With more resources, the runtime may handle the large dataset more effectively.

    If partitioning isn't feasible, consider using a paginated query with offsets and limits in the "Source Query" of the Copy Data activity to retrieve data in chunks manually.

    Check if the CPU and memory usage are spiking during the transfer. If so, consider increasing the compute size or using additional nodes.


  2. Ganesh Gurram 1,510 Reputation points Microsoft Vendor
    2024-11-15T13:45:46.93+00:00

    @Mart Roben - Thanks for the question and using MS Q&A forum.

    The error you are encountering, The Read method cannot be called when another read operation is pending, typically indicates a concurrency issue where multiple read operations are being attempted simultaneously on the same connection.

    Here are some steps you can take to diagnose and potentially solve this problem:

    1. Review Connection Pooling Settings: Ensure that connection pooling settings are correctly configured. Setting pooling to "no" might not be ideal for handling large datasets. Instead, try adjusting the pool size settings.
    2. Increase Command Timeout: You've already set the command timeout to 0 (no timeout), which is a good step. However, ensure that the connection timeout is also sufficiently large to accommodate the data volume.
    3. Optimize Query: Consider optimizing your query if possible. Instead of SELECT *, specify only the necessary columns. Use pagination or batch processing to handle large datasets. This can be done by modifying your query to fetch data in smaller chunks.
    4. Check Self-Hosted Integration Runtime Configuration: Ensure that your self-hosted integration runtime has sufficient resources (CPU, memory) to handle large data transfers. Check for any network latency issues between your self-hosted integration runtime and the Azure PostgreSQL database.
    5. Monitor Database Performance: Use PostgreSQL performance monitoring tools to check for any bottlenecks or issues on the database side. Ensure that the database is not under heavy load and there are no locks or long-running transactions that could affect performance.
    6. Retry Logic: Implement retry logic in your ADF pipeline to handle transient issues. This can be done by setting the retry policy in the Copy Data activity.
    7. Detailed Logs and Diagnostics: Enable detailed logging in ADF to capture more information about the failure. Check PostgreSQL logs for any related errors or warnings during the data transfer.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click `Accept Answer` and `Yes` for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.