Finisky Garden

NLP, 软件工程, 产品设计

Recently we found the traffic is not balanced across the MongoDB cluster shards. After investigation, the root cause is that data on each shard is not evenly distributed (Chunk balancing != data balancing != traffic balancing). The data distribution looks like this:

Shard Data Size
mongo-0 10.55 GB
mongo-1 25.76 GB
mongo-2 10.04 GB

Why the data size of mongo-1 is significantly large than others while the chunk number among 3 shards is almost the same? Then we need to analysis the chunk size distribution across these shards.

阅读全文 »

After adding new shards to our production MongoDB cluster (v4.4.6-ent with 5 shards, 3 replicas for each shard), we found that the balancer is not working. sh.status() displays many chunk migration errors:

...
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                7 : Failed with error 'aborted', from mongo-1 to mongo-3
                7208 : Failed with error 'aborted', from mongo-1 to mongo-4
  databases:
        {  "_id" : "X",  "primary" : "mongo-1",  "partitioned" : true,  "version" : {  "uuid" : UUID("xxx"),  "lastMod" : 1 } }
                X.A
                        shard key: { "Uuid" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                mongo-0       231
                                mongo-1       327
                                mongo-2       230
                                mongo-3       208
...
阅读全文 »

When a pod in error state (crashloopbackoff), kubernetes would restart the pod. If you try to exec into the pod to check the log or debug, the following error message appears:

unable to upgrade connection: container not found ("")

Because the old pod has been killed and you cannot exec into it anymore. So how can we prevent the pod from endless restart?

阅读全文 »

Multiple Backup Daemons are typically run when the storage requirements or the load generated by the deployment is too much for a single daemon.

Directly scale the statefulset ops-manager-backup-daemon to multiple instances (e.g. 3) doesn't work. Because the mongodb-enterprise-operator is watching the statefulset, the instance number will be scaled down to 1 by the MongoDB operator several miniutes later.

So how to scale up the backup dameons by the MongoDB kubernetes operator?

阅读全文 »

Apex Compile Error

The environment (cuda10.0):

$ conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

The apex repo master HEAD:

commit 0c2c6eea6556b208d1a8711197efc94899e754e1 (HEAD -> master, origin/master, origin/HEAD)
Author: Nan Zheng <80790206+nanz-nv@users.noreply.github.com>
Date:   Sat Jul 17 08:53:59 2021 +0800
...

阅读全文 »

Change stream是什么?官方文档:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

这里我们利用change stream来做实时的主从复制。网上没有找到相应的方案,想必是因为直接的做法可能是通过replica set来完成,不会手动进行主从复制。但业务层是有这样的需求的,比如跨地区的异构集群数据备份。

已有的轮子只找到了 MongoShake 。但MongoShake毕竟不是商业项目,代码拉下来运行时发现并不能在我们的环境中正常工作:

  • TLS验证有些问题,通过修改源码解决了
  • all同步模式下,只能通过oplog进行了全量复制,在用change stream进行增量复制时不停抛错,无法正常运行

考虑到改轮子可能比造个轮子更费劲,就研究了下如何自己做主从复制。最简单的原理就是从源库实时地读oplog,然后在目标库上重放oplog。说起来简单,但实现起来可能没那么容易,尤其在源库是分片集群时,不能直接用mongos拉oplog,而要手动从不同的shard上拉取数据,实现难度较高。

好消息是在MongoDB v3.6之后有了change stream功能,再加上我们使用MongoDB Ops Manager做分片集群的管理,可以轻松地做快照恢复,那么主从复制要做的就是从快照时间点之后重放实时的改动。

看起来这轮子自己能造。

阅读全文 »

By the official manual:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

Here we leverage change stream to replicate data from one MongoDB to another in realtime.

There is some existing tools such as MongoShake do the same thing. However, MongoShake is a little bit complicated to use. We encoutered two issues:

  • Modify the source code to use TLS authentication
  • Cannot perform increment sync in all sync_mode

Since our goal is realtime replication, we choose a more straightforward and controllable way: MongoDB Ops Manager cluster restore and change stream to apply realtime changes.

阅读全文 »

MongoDB change stream is a nice feature. It allows applications to access real-time data changes without the complexity and risk of tailing the oplog.

Recently, when we use change stream to replicate data from one sharded cluster to another, it immediately made the cluster unstable (broke down several nodes and triggered the primary change). Then the read/write operations latency significantly increased.

阅读全文 »

升级Hexo到v8.5.0之后,发现mathjax不能正确显示公式。看了下文档,发现推荐的hexo renderer是hexo-renderer-pandoc,而目前使用的是hexo-renderer-kramed,而且这个包已经不再更新也不推荐使用了。

那就换用hexo-renderer-pandoc,虽然公式能正常渲染,但又有新的问题,一是内嵌html不能正确识别,另一个是引用和列表展示不换行。

阅读全文 »
0%