Two steps to shard a MongoDB collection

shard_collection

In the post How to set up a MongoDB Sharded Cluster we studied how to set up a MongoDB Sharded Cluster. Its goal, as we already know, is to scale and to balance the workload uniformly across all our shards.

Today, we are going to learn what to do in order to shard a collection and get all its documents well distributed among our shards.

We must execute all administrative tasks related to shards clusters connected to a mongos:

MongoDB shards at a collection level. This means that, for a given database, we can have sharded collections and non-sharded collections.

First step

Sharding must be enabled in the database the collection belongs to is necessary before trying to shard it. We are going to do it by this way:

Collections are sharded using a field called ‘shardkey’, hence, it is very important to choose it properly. You can read about the characteristics a shard key must have at this url: Choosing a shard key

Second step

We have this command to shard a collection (as a shardkey we use the ‘username’ field):

When we shard a collection MongoDB creates an index on the shardkey:

If our collection has some previous data these will be distributed among the shards.

Very easy, right?

Let’s see MongoDB in action. We are going to insert documents in our collection to check that MongoDB distributes them among the shards. First of all we have to stop de balancer, afterwards we are going to do the inserts and all the documents must be located at the shard which the collection belongs to (shard 0 by default). We are going to continue running the balancer to check that MongoDB moves uniformly our documents across all the shards.

We set the size of the chunk in 1Mb to avoid inserting too much documents.

We stop the balancer:

We insert the documents:

We check that all the documents have been stored at the shard the collection belongs to.

We start the balancer (automatically MongoDB moves the documents):

And, finally, we can check that all the documents have been moved as we expected.

If you are asking yourself how do I know the shard in which the data I need is stored?, do not worry, you only have to request it to the mongos and it will retrieve it for you.

This is the end of the post, I wish that you have understood all the steps and you can get the most of your MongoDB Sharded Cluster.

Juan Roy

MongoDB Fan & Financial Software Developer

You may also like...

4 Responses

  1. November 5, 2015

    […] already know that MongoDB is capable of keeping our cluster balanced for one or more collections (Two steps to shard a MongodB collection). This is done chunk-based (per shard), it is not document-based […]

  2. November 7, 2015

    […] already know that MongoDB is capable of keeping our cluster balanced for one or more collections (Two steps to shard a MongodB collection). This is done chunk-based (per shard), it is not document-based […]

  3. March 30, 2016

    […] data to balance in our shards. So, we decide the collections to divide. You can read at this post Two steps to shard a MongoDB collection how to split a […]

  4. April 2, 2016

    […] Two steps to shard a MongoDB collection and What is a MongoDB chunk? posts we explained how to shard a collection, what a chunk is, which […]

Leave a Reply