How to scale throught partitioning CosmosDB to handle billions of documents

Question

At my company we have a mongo collection hosted on CosmosDB that has reached 180 million records that is currently sharded. Sadly and painfully, the chosen partitioning strategy has been shown to be really bad: suffers both from the scatter/gather pattern and is monotonically increasing, meaning that we experience a terrible hot partition.

It is the daily date of insert, meaning that given a day all the writes go on the corresponding physical partition, so we can use only (total RU)/~25 (~25 is the current number of phyisical partitions so far that keeps increasing with our daily operations).

I have been asked to try and fix this, after a bit of reading and watching resources on mongo sharding, I came to a possible solution, but I am not sure if this would work.

Our data model is something like this:


    {

    	"containerID": ,

    	"uniqueID": ,

    	"creationTimestamp": ,

    	"freeText": ,

    	"user": ,

    	"dataset": 

    }

Let me give as much as possible information on the current collection:

it has 180 million records;

size is ~1 terabyte;

each document has a weight of [2, 200] kb, but I estimate the major part of it being close to 10/20 kb;

containerID and dataset are external keys monotonically increasing;

uniqueID is a random id generated.

About our query patterns, we search with the following use cases:

Using containerID giving a list of values, E.g. [1, 2, 200]. This field represents a relational key into another database that groups our mongodb documents together (somehow a foreign key);
we use a single containerID value sorting by creationTimestamp and paginating. This is the most frequent query in my experience;
we get all the documents querying with dataset (it has a one to many relationship with containerID, so one dataset has multiple containerIDs associated) sorting by creationTimestamp and paginating.

I investigated the containerID field, and I discovered that we have for each key a range of (1, 10.000] values. The average is 200-300 values I think, but it is not a complete even distribution.

My proposed solution would be to add a field in my data model, hashedContainerID that I would populate with my application code that is a hash for my containerID, and use it as a partition key.

In this way, I should avoid the problem of monotonically increasing key (because containerID has been hashed).

I would also like to avoid the itemID because of scatter and gather patterns in querying considering our data is increasing and we cannot consider further our collection as small.

Is hashedContainerID a good partition key given that for a key I have a range of values up to 10.000 entries?
Does this solution avoid hot partitions while writing documents?
Can this shard key scale up to billions of documents?
Is it necessary to make manually a hash, or can I even use containerID? I read that cosmosDB hashes automatically the field, so having increasing values when writing may not be a concern for me because CosmosDB behind the scenes fixes this?

Thank you for your time!

Answer

To handle billions of documents in CosmosDB, especially in your case where you are experiencing issues with hot partitions due to monotonically increasing partition keys, it's crucial to re-evaluate your partitioning strategy. I'll address the points you raised in your query:

Is hashedContainerID a good partition key given that for a key I have a range of values up to 10,000 entries?

Yes, using a hashed version of containerID could be a good approach. By hashing the containerID, you avoid the issue of a monotonically increasing key, which is one of the primary reasons behind the hot partition issue you're experiencing. Since CosmosDB internally uses a hash-based partitioning mechanism, your custom hash for containerID can ensure a more even distribution across partitions. Given that you have a relatively large range of values (up to 10,000 per key), this method should work well for distributing load more evenly, preventing all writes from being concentrated on a single partition.

Does this solution avoid hot partitions while writing documents?

Yes, hashing the containerID should prevent hot partitions during document writes. One of the major problems with your current partitioning strategy is that writes for a given day all target the same partition, creating a bottleneck. By distributing the documents based on a hashed containerID, you'll break the concentration of writes into a single partition, making use of more of your available RU/s (Request Units per second) across the physical partitions, and thus avoiding hot partitions.

Can this shard key scale up to billions of documents?

Yes, this shard key should scale effectively up to billions of documents. The use of a hash function is a common method to distribute load evenly across many partitions, even as the dataset grows significantly. However, the choice of partition key should always be based on the specific query patterns that you expect in the future. Since you're dealing with billions of records, and your queries frequently involve containerID, distributing by a hashed containerID should be scalable. Just make sure that any future queries can also take advantage of this partition key.

Is it necessary to manually create a hash, or can I even use containerID?

In CosmosDB, you don't need to manually hash the containerID if it’s already used as a partition key. CosmosDB automatically hashes partition key values behind the scenes. Therefore, you can simply use containerID as the partition key, and CosmosDB will handle distributing the data. However, the important consideration is whether containerID provides enough uniqueness and even distribution across partitions. If it is still monotonically increasing or too clustered, manually hashing containerID can give you finer control and ensure better distribution, but this step is not strictly necessary.

Share via

How to scale throught partitioning CosmosDB to handle billions of documents

1 answer

Your answer