Finisky Garden

NLP, 软件工程, 产品设计

在同一机器上对不同repo使用不同的github账号是个常见需求。举个例子,repo1托管在github账号x1下,而repo2托管在账号x2下,如何方便地在同一机器上使用不同账号自动git push到对应的远端?比较直接的做法是在不同repo目录下使用git config配置用户名,但这样有两个问题:

  • 每个repo都要配置一遍比较繁琐
  • 有些情况下无法配置。比如使用 hexo-deployer-git 部署Hexo网站时,.deploy_git目录是动态生成的,而所用的git账户和远端url修改不便。

于是,我们可以借用SSH config文件来把不同github账号与repo联系起来。在SSH config中定义多个不同的host项即可,然后在访问github时,使用一个虚拟host作为别名代替真正的主机名github.com即可。

阅读全文 »

How to use different github account for different repository? For instance, we have two github accounts x1 and x2, while x1 for repo1 and x2 for repo2. At the first glance, we can set git config in different repository folder by git config user.name xxx. However, this approach has two drawbacks:

  • Need to config user name/email in every repository
  • In some case, the git user cannot be configured by git config. For example, hexo-deployer-git. Since the git repo is automatically generated by the deployer, it's hard to manually set the user name.

Fortunately, we can leverage SSH config to associate different github accounts with different repos. Define different host entries does the trick: since we login to github via SSH, we use a virtual host as an alias to represent the real host name.

阅读全文 »

最近发现MongoDB分片集群的流量不太均衡,研究之后发现根本原因在于数据分布不均衡。虽然数据分布均衡不等于流量均衡,但还是应该尽量使得数据分布在不同shard之间基本均衡。三个shard的数据分布大概这样:

Shard Data Size
mongo-0 10.55 GB
mongo-1 25.76 GB
mongo-2 10.04 GB

mongo-1这个分片的数据大小显著高于其他分片,而三个分片的chunk数目是基本一致的,所以需要分析不同分片上的chunk大小分布。

阅读全文 »

Recently we found the traffic is not balanced across the MongoDB cluster shards. After investigation, the root cause is that data on each shard is not evenly distributed (Chunk balancing != data balancing != traffic balancing). The data distribution looks like this:

Shard Data Size
mongo-0 10.55 GB
mongo-1 25.76 GB
mongo-2 10.04 GB

Why the data size of mongo-1 is significantly large than others while the chunk number among 3 shards is almost the same? Then we need to analysis the chunk size distribution across these shards.

阅读全文 »

After adding new shards to our production MongoDB cluster (v4.4.6-ent with 5 shards, 3 replicas for each shard), we found that the balancer is not working. sh.status() displays many chunk migration errors:

...
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                7 : Failed with error 'aborted', from mongo-1 to mongo-3
                7208 : Failed with error 'aborted', from mongo-1 to mongo-4
  databases:
        {  "_id" : "X",  "primary" : "mongo-1",  "partitioned" : true,  "version" : {  "uuid" : UUID("xxx"),  "lastMod" : 1 } }
                X.A
                        shard key: { "Uuid" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                mongo-0       231
                                mongo-1       327
                                mongo-2       230
                                mongo-3       208
...
阅读全文 »

When a pod in error state (crashloopbackoff), kubernetes would restart the pod. If you try to exec into the pod to check the log or debug, the following error message appears:

unable to upgrade connection: container not found ("")

Because the old pod has been killed and you cannot exec into it anymore. So how can we prevent the pod from endless restart?

阅读全文 »

Multiple Backup Daemons are typically run when the storage requirements or the load generated by the deployment is too much for a single daemon.

Directly scale the statefulset ops-manager-backup-daemon to multiple instances (e.g. 3) doesn't work. Because the mongodb-enterprise-operator is watching the statefulset, the instance number will be scaled down to 1 by the MongoDB operator several miniutes later.

So how to scale up the backup dameons by the MongoDB kubernetes operator?

阅读全文 »

Apex Compile Error

The environment (cuda10.0):

$ conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

The apex repo master HEAD:

commit 0c2c6eea6556b208d1a8711197efc94899e754e1 (HEAD -> master, origin/master, origin/HEAD)
Author: Nan Zheng <80790206+nanz-nv@users.noreply.github.com>
Date:   Sat Jul 17 08:53:59 2021 +0800
...

阅读全文 »

Change stream是什么?官方文档:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

这里我们利用change stream来做实时的主从复制。网上没有找到相应的方案,想必是因为直接的做法可能是通过replica set来完成,不会手动进行主从复制。但业务层是有这样的需求的,比如跨地区的异构集群数据备份。

已有的轮子只找到了 MongoShake 。但MongoShake毕竟不是商业项目,代码拉下来运行时发现并不能在我们的环境中正常工作:

  • TLS验证有些问题,通过修改源码解决了
  • all同步模式下,只能通过oplog进行了全量复制,在用change stream进行增量复制时不停抛错,无法正常运行

考虑到改轮子可能比造个轮子更费劲,就研究了下如何自己做主从复制。最简单的原理就是从源库实时地读oplog,然后在目标库上重放oplog。说起来简单,但实现起来可能没那么容易,尤其在源库是分片集群时,不能直接用mongos拉oplog,而要手动从不同的shard上拉取数据,实现难度较高。

好消息是在MongoDB v3.6之后有了change stream功能,再加上我们使用MongoDB Ops Manager做分片集群的管理,可以轻松地做快照恢复,那么主从复制要做的就是从快照时间点之后重放实时的改动。

看起来这轮子自己能造。

阅读全文 »

By the official manual:

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

Here we leverage change stream to replicate data from one MongoDB to another in realtime.

There is some existing tools such as MongoShake do the same thing. However, MongoShake is a little bit complicated to use. We encoutered two issues:

  • Modify the source code to use TLS authentication
  • Cannot perform increment sync in all sync_mode

Since our goal is realtime replication, we choose a more straightforward and controllable way: MongoDB Ops Manager cluster restore and change stream to apply realtime changes.

阅读全文 »
0%