0%

最近发现MongoDB分片集群的流量不太均衡,研究之后发现根本原因在于数据分布不均衡。虽然数据分布均衡不等于流量均衡,但还是应该尽量使得数据分布在不同shard之间基本均衡。三个shard的数据分布大概这样:

Shard Data Size
mongo-0 10.55 GB
mongo-1 25.76 GB
mongo-2 10.04 GB

mongo-1这个分片的数据大小显著高于其他分片,而三个分片的chunk数目是基本一致的,所以需要分析不同分片上的chunk大小分布。

Read more »

Recently we found the traffic is not balanced across the MongoDB cluster shards. After investigation, the root cause is that data on each shard is not evenly distributed (Chunk balancing != data balancing != traffic balancing). The data distribution looks like this:

Shard Data Size
mongo-0 10.55 GB
mongo-1 25.76 GB
mongo-2 10.04 GB

Why the data size of mongo-1 is significantly large than others while the chunk number among 3 shards is almost the same? Then we need to analysis the chunk size distribution across these shards.

Read more »

After adding new shards to our production MongoDB cluster (v4.4.6-ent with 5 shards, 3 replicas for each shard), we found that the balancer is not working. sh.status() displays many chunk migration errors:

...
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                7 : Failed with error 'aborted', from mongo-1 to mongo-3
                7208 : Failed with error 'aborted', from mongo-1 to mongo-4
  databases:
        {  "_id" : "X",  "primary" : "mongo-1",  "partitioned" : true,  "version" : {  "uuid" : UUID("xxx"),  "lastMod" : 1 } }
                X.A
                        shard key: { "Uuid" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                mongo-0       231
                                mongo-1       327
                                mongo-2       230
                                mongo-3       208
...
Read more »

When a pod in error state (crashloopbackoff), kubernetes would restart the pod. If you try to exec into the pod to check the log or debug, the following error message appears:

unable to upgrade connection: container not found ("")

Because the old pod has been killed and you cannot exec into it anymore. So how can we prevent the pod from endless restart?

Read more »

Multiple Backup Daemons are typically run when the storage requirements or the load generated by the deployment is too much for a single daemon.

Directly scale the statefulset ops-manager-backup-daemon to multiple instances (e.g. 3) doesn't work. Because the mongodb-enterprise-operator is watching the statefulset, the instance number will be scaled down to 1 by the MongoDB operator several miniutes later.

So how to scale up the backup dameons by the MongoDB kubernetes operator?

Read more »

Change stream是什么?官方文档:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

这里我们利用change stream来做实时的主从复制。网上没有找到相应的方案,想必是因为直接的做法可能是通过replica set来完成,不会手动进行主从复制。但业务层是有这样的需求的,比如跨地区的异构集群数据备份。

已有的轮子只找到了 MongoShake 。但MongoShake毕竟不是商业项目,代码拉下来运行时发现并不能在我们的环境中正常工作:

  • TLS验证有些问题,通过修改源码解决了
  • all同步模式下,只能通过oplog进行了全量复制,在用change stream进行增量复制时不停抛错,无法正常运行

考虑到改轮子可能比造个轮子更费劲,就研究了下如何自己做主从复制。最简单的原理就是从源库实时地读oplog,然后在目标库上重放oplog。说起来简单,但实现起来可能没那么容易,尤其在源库是分片集群时,不能直接用mongos拉oplog,而要手动从不同的shard上拉取数据,实现难度较高。

好消息是在MongoDB v3.6之后有了change stream功能,再加上我们使用MongoDB Ops Manager做分片集群的管理,可以轻松地做快照恢复,那么主从复制要做的就是从快照时间点之后重放实时的改动。

看起来这轮子自己能造。

Read more »

By the official manual:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

Here we leverage change stream to replicate data from one MongoDB to another in realtime.

There is some existing tools such as MongoShake do the same thing. However, MongoShake is a little bit complicated to use. We encoutered two issues:

  • Modify the source code to use TLS authentication
  • Cannot perform increment sync in all sync_mode

Since our goal is realtime replication, we choose a more straightforward and controllable way: MongoDB Ops Manager cluster restore and change stream to apply realtime changes.

Read more »

MongoDB change stream is a nice feature. It allows applications to access real-time data changes without the complexity and risk of tailing the oplog.

Recently, when we use change stream to replicate data from one sharded cluster to another, it immediately made the cluster unstable (broke down several nodes and triggered the primary change). Then the read/write operations latency significantly increased.

Read more »