How to Get the MongoDB Chunk Size
Recently we found the traffic is not balanced across the MongoDB cluster shards. After investigation, the root cause is that data on each shard is not evenly distributed (Chunk balancing != data balancing != traffic balancing). The data distribution looks like this:
Shard | Data Size |
---|---|
mongo-0 | 10.55 GB |
mongo-1 | 25.76 GB |
mongo-2 | 10.04 GB |
Why the data size of mongo-1
is significantly large than
others while the chunk number among 3 shards is almost the same? Then we
need to analysis the chunk size distribution across these shards.
Get the chunk size is not that straightforward. No API is provided to
directly give the chunk size. Although the basic chunk information is
stored at config.chunks
collection, there is no data size
field. Consider a collection cxx
whose shard key is
Uuid
, the chunk information looks like this (for one
shard):
{
"_id": {
"$oid": "603348a8b74404eea898cc25"
},
"lastmod": {
"$timestamp": {
"t": 0,
"i": 773
}
},
"lastmodEpoch": {
"$oid": "5ff83e4ba85a6bd465831542"
},
"ns": "cxx",
"min": {
"Uuid": "044B96CA34334FBE860D47BD63339B8E"
},
"max": {
"Uuid": "0494E87F8ED34B0E9ECF6BE5363BBD3C"
},
"shard": "mongo-2",
"history": [{
"validAfter": {
"$timestamp": {
"t": 6261,
"i": 1632930119
}
},
"shard": "mongo-2"
}
]
}
The good news is that the chunk information contains the min/max key
of the chunk. So we can get the chunk size by the dataSize
API:
{
dataSize: <string>,
keyPattern: <document>,
min: <document>,
max: <document>,
estimate: <boolean>
}
The intuition is to get the data size of all the documents in the chunk.
For the above example, use the following command in mongo shell to
get the chunk size, remember using maxTimeMS
to limit max
command execution timeout:
db.runCommand({ dataSize: "cxx", keyPattern: { "Uuid": 1 }, min: { "Uuid": "044B96CA34334FBE860D47BD63339B8E" }, max: { "Uuid": "0494E87F8ED34B0E9ECF6BE5363BBD3C" }, maxTimeMS:1000 })
To analyze the chunk size distribution, we can iterate each chunk and
get its size by the above command. Put it together, the final script
(modify ns
and the Uuid
field according to
your scenario):
var ns = "";
db.getSiblingDB("config").chunks.find({ns : ns}).forEach(function(chunk) {
chunkSize = db.runCommand({ dataSize: ns, keyPattern: { "Uuid": 1 }, min: { "Uuid": chunk.min.Uuid }, max: { "Uuid": chunk.max.Uuid }, maxTimeMS:1000 })
print(chunk.shard + " " + chunk.min.Uuid + ": " + tojson(chunkSize.size/1024/1024) + "MB")
})
Sample output:
mongo-0 00020DFCE1B64BE28098133E09F0DFF9: 27.868586540222168MB
mongo-0 020FC0A3F7DC456D8CA48787A22BFEF5: 0.03055095672607422MB
mongo-1 02109DCC630C4EB68D890BB1C664C732: 2.049802780151367MB
mongo-1 02394113C1FF4C33BB067299D00F2A57: 0.17893218994140625MB
mongo-0 023CB90536CB4D9A83E88BC362CDA80A: 5.774362564086914MB
mongo-0 02AD2533642349398B6B7518914C87E7: 8.088775634765625MB
mongo-1 03463C80D8844550A3F1F259345807DA: 3.041494369506836MB
mongo-0 037F4D692AE34A3FB7F4B5A3205121C2: 1.5527820587158203MB
mongo-1 039C5810056F4205B1FC476A67F003BF: 6.535520553588867MB
mongo-2 04197554460047ADBD5B62B35F4956B6: 2.715306282043457MB
...