Does COPY lock my TimescaleDB database with 70k chunks?

KriShna PrakaSh 0 Reputation points
2025-01-30T14:12:30.2766667+00:00

I am using TimescaleDB in an Azure VM for an IoT database. I need to export data in CSV format using the COPY method, but I have around 70,000 chunks in my hypertable. My concern is whether executing COPY to export data locks the database, as it is actively running.

I specifically need data for the top 2,000 most active users from the past year to analyze seasonal variations. The data should include user profile details and device ID for analysis.

If COPY locks the database, what would be a better alternative to export this data efficiently without impacting database performance?

I don’t need to back up the entire database—just the necessary dataset for analysis. Any recommendations on efficient querying or export methods would be helpful.

Thanks in advance!

Azure Database Migration service
Azure Database for PostgreSQL
{count} votes

1 answer

Sort by: Most helpful
  1. Vijayalaxmi Kattimani 1,085 Reputation points Microsoft Vendor
    2025-01-31T11:08:30.24+00:00

    Hi KriShna PrakaSh,

    Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.

    We would like to inform you that, using COPY in TimescaleDB for exporting data does not lock the entire database, but it can impact performance due to I/O and CPU usage, especially with a large number of chunks in your hypertable. This can impact the performance of your actively running database.

    Here are some alternative methods to efficiently export your data without causing significant performance issues:

    • Parallel Export with pg_dump: You can use pg_dump with the -j option to perform a parallel dump. This allows you to export data in parallel, reducing the time taken and minimizing the impact on database performance. This method is efficient and can handle large datasets by splitting the export process into multiple jobs.
    pg_dump -h <host> -U <user> -d <database> -t <table> -j <number_of_jobs> -F c -f <output_file>
    
    • Using COPY with Smaller Batches: Instead of exporting the entire dataset at once, you can export data in smaller batches. This reduces the load on the database and minimizes locking.
    COPY (
      SELECT user_id, device_id, profile_data, timestamp
      FROM your_hypertable
      WHERE user_id IN (
        SELECT user_id
        FROM (
          SELECT user_id, COUNT(*) AS activity_count
          FROM your_hypertable
          WHERE timestamp >= NOW() - INTERVAL '1 year'
          GROUP BY user_id
          ORDER BY activity_count DESC
          LIMIT 2000
        ) top_users
      )
    ) TO '/mnt/data/export.csv' WITH CSV HEADER;
    
    • Use Foreign Tables and ETL Tools.
    • Foreign tables: Export data into a separate, lightweight PostgreSQL database and then extract it.
    • ETL Tools: Use Azure Data Factory or Python scripts with psycopg2 to fetch and export data in chunks.

    Please refer to the below mentioned link for more information.

    https://docs.timescale.com/self-hosted/latest/migration/schema-then-data/

    I hope, This response will address your query and helped you to overcome on your challenges.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.