How MongoDB balance your data?

  • migration_steps3

We have already studied the chunk concept What is a MongoDB chunk? and also how MongoDB splits it when grows beyond the maximum size established by default Four steps to split a MongoDB chunk. At this post we are going to study the steps that MongoDB follows to balance our cluster.

The first thing we must know is that we choose the data to balance in our shards. So, we decide the collections to divide. You can read at this post Two steps to shard a MongoDB collection how to split a collection.

The balancing process does not disturb the normal cluster work, it is a background process. There is only one migration at a time, per cluster, in order to do not overload it. Therefore, only two shards will be working on (its primaries). If we do not change the by default chunk size, 64MB, this is the maximum amount of data that MongoDB will migrate at a time. This is a size so big so that there will not be too much migrations, and at the same time, it is little in order to do not overload our database.

Balancing round

Any of the mongos can begin a round balancing, but only can be an active one at a time. So, the mongos must check this through the locks collection at the configdb.

Only when the value equals 0 the mongos can begin the balancing round.

Is the cluster balanced?

The mongos checks the number of chunks per shard and decides whether the collection/collections is balanced. At this post What is a MongoDB chunk? I explain how MongoDB determines this. When the collection is not balanced the balancer will move the necessary chunks to get it. When the migration has been finished the mongos will update the locks collection and the round balancer will be finished.

Does the chunk need to be split?

Ok, the collection is not well balanced. The mongos will choose a chunk to move it to another shard. How MongoDB chooses this chunk is studied at this post What is a MongoDB chunk?.

Before moving the chunk the mongos asks the shard which owns it (shard FROM) if this is too big and must be split.

The balancing begins

The mongos orders to the shard FROM that begins the transfer, but before beginning this shard makes sure that it is not removing data from a chunk previously migrated.

Please, read this chunk

The shard FROM asks shard TO (shard chunk destination) to read the chosen chunk to been migrated.

The transfer begins

The chunk belongs to shard FROM until the transfer will be finished. Until then, there can be write operations on this chunk that must also be transferred to shard TO.

The transfer ends

The FROM shard updates the chunk migration at the config servers.

The transferred data is deleted

The shard FROM begins to delete the data that has been moved.

The cache is refreshed

The mongos refreshes its cache. The remaining mongos will look for this data at the shard FROM and will get an STALE CONFIG EXCEPTION. This will cause them to read the metadata at the config servers for refreshing its cache. The clients will not realize it.

I wish that this post helps you to understand how MongoDB chooses the chunks to move between shards and the steps it follows for balancing the cluster. Please, if you read something wrong or something is omitted do not hesitate to use the comments. We will learn each other.

Juan Roy

MongoDB Fan & Financial Software Developer

You may also like...

Leave a Reply