Most cost-effective way to migrate existing data

Pål Kristian Halle 110 Reputation points
2024-11-29T12:15:08.8866667+00:00

Hello ADX experts.

I have tested ADX for a while now and have decided to try it in production.

However, a big uncertainty is the migration process. We are looking to migrate about 50 TB of data from an Azure SQL database.

What would be the most cost-effective way of doing this?

When looking at the cost analysis for our test cluster, Hot LRS Write Operations account for about 50% of the total costs. During the test period, we have ingested about 25 GB of uncompressed data per day, costing us about $50 in write operations per day. Based on these numbers, the migration would cost us a fortune (around $100 K) just to get started.

Is there anyway around this?

For the test case I've used the IKustoQueuedIngestClient.IngestFromDataReaderAsync method in the .NET SDK for ingestion, and the data is ingested in small chunks. (Multiple, simultaneous requests from thousands of sensors, each causing a small ingestion.)

Looking at the ADXIngestionBatching table, most batches are processed because DataSourcesInBatch have reached the threshold of 500.

Also, what is considered a write operation? For example, would it cost me the same whether I queue 1 or 100 kB of data, or would the latter be 100 times more expensive?

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
537 questions
{count} vote

Accepted answer
  1. Chandra Boorla 4,845 Reputation points Microsoft Vendor
    2024-12-02T05:17:21.26+00:00

    Hi @Pål Kristian Halle

    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    I understand that migrating 50 TB of data from an Azure SQL Database to Azure Data Explorer (ADX) and managing the associated costs—particularly around write operations—can be challenging. Here are some insights and recommendations to help you minimize those costs.

    Understanding Write Operations:

    • Write Operations in ADX - Write operations in ADX typically refer to the ingestion of data into the database. Costs are incurred based on the volume of data ingested and the number of ingestion operations.
    • Batching - Batching data effectively can help reduce costs. By increasing the size of each batch, you can minimize the number of write operations. For example, ingesting 1 MB of data in a single operation is generally more cost-effective than ingesting 1 kB of data in 1000 separate operations.

    What would be the most cost-effective way of doing this?

    Cost-Effective Migration Strategies:

    • Optimize Batching -
      • Larger Batches - Increase the size of your ingestion batches. Instead of ingesting small chunks of data from individual sensors, aggregate data into larger batches before ingestion. This can significantly reduce the number of write operations
      • Batching Thresholds - Monitor and adjust the thresholds for batching in ADX to ensure that you are maximizing the amount of data ingested per operation.
    • Compression -
      • Pre-Compression - Compress your data before ingestion. ADX supports ingestion of compressed data, which can reduce the volume of data being ingested and, consequently, the associated costs.
      • Data Formats - Use efficient data formats like Parquet or Avro, which are optimized for storage and can further reduce ingestion costs.
    • Ingestion Methods -
      • Queued Ingestion - Continue using the IKustoQueuedIngestClient but optimize it for larger batches. This method is typically more cost-effective than direct ingestion for large volumes of data.
      • Direct Ingestion - For initial bulk ingestion, consider using direct ingestion methods that might offer better performance and potentially lower costs for very large datasets.
    • Storage Tiering -
      • Hot vs. Cold Storage - Utilize the appropriate storage tier for your data. While Hot LRS (Locally Redundant Storage) is more expensive, it provides faster access. For historical data that is not frequently accessed, consider using Cold or Archive storage tiers, which are more cost-effective.
    • Data Partitioning-
      • Partitioning: Partition your data based on time or other relevant dimensions. This can help in managing and querying the data more efficiently, potentially reducing overall costs.

    what is considered a write operation? For example, would it cost me the same whether I queue 1 or 100 kB of data, or would the latter be 100 times more expensive?

    In Azure Data Explorer (ADX), a write operation in Azure Data Explorer (ADX) generally refers to the action of ingesting a piece of data into a table. The cost of write operations in ADX is not directly tied to the volume of data (e.g., 1 KB vs 100 KB) but instead is primarily based on the number of separate ingestion requests or transactions that are processed.

    Migrating large volumes of data to ADX can be cost-intensive if not managed properly. By optimizing batch sizes, using compression, selecting appropriate storage tiers, and employing efficient ingestion methods, you can make the migration process more cost-effective. Always monitor and adjust your strategies based on the actual performance and cost metrics observed during the migration process.

    Please refer for additional information: Recommendations for optimizing data costs

    I hope this information helps. Please do let us know if you have any further queries.

    Thank you.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.