Finisky Garden

NLP, Software Engineering, Product Design

0%

Apex Compile Error

The environment (cuda10.0):

1
$ conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch

The apex repo master HEAD:

1
2
3
4
commit 0c2c6eea6556b208d1a8711197efc94899e754e1 (HEAD -> master, origin/master, origin/HEAD)
Author: Nan Zheng <80790206+nanz-nv@users.noreply.github.com>
Date:   Sat Jul 17 08:53:59 2021 +0800
...

Install apex:

1
2
3
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Compile error like this:

1
2
3
    csrc/mlp.cpp:127:54: error: expected primary-expression before ‘>’ token
           w_ptr.push_back(inputs[i + 1].data_ptr<scalar_t>());
                                                          ^

Solution

Similar discussions: https://github.com/NVIDIA/apex/issues/802 https://github.com/NVIDIA/apex/issues/1139

By the official manual :

Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a single collection, a database, or an entire deployment, and immediately react to them. Because change streams use the aggregation framework, applications can also filter for specific changes or transform the notifications at will.

MongoDB change stream is a nice feature. It allows applications to access real-time data changes without the complexity and risk of tailing the oplog.

Recently, when we use change stream to replicate data from one sharded cluster to another, it immediately made the cluster unstable (broke down several nodes and triggered the primary change). Then the read/write operations latency significantly increased.

Observations

Observations on our production envrionment:

Recently I found that the Google auto ads significantly slows down the page loading speed. There are also many discussions about this. As a static website, fast loading speed is crutial. In this post, we will optimize the PageSpeed Insights score by delay loading auto ads.

First, let’s check the current PSI score in mobile: PSI Mobile

It seems that the Reduce unused JavaScript section has many items to be improved. Check the official Google auto ads script:

Blogroll is natively supported in NexT theme. All links will be shown in the sidebar. However, as your links increases, the sidebar length increases as well. It makes the page lengthy and distracting. Therefore, we consider creating a dedicated blogroll page.

After searching, most of the existing approaches need to modify NexT source code (theme swig template files). The implementation is a little bit complicated while breaks the theme’s integrity. When you update the theme later, you will need to manually merge or rebase the master to your code.

Recently I would like to simplify permanent link for each post. From:

/2021/03/21/migrateopsmanager.en/

To:

/migrateopsmanager.en/

Shorter URL is more concise and readable, as the date string makes no sense to users. But what’s the meaning of URL backward compatibility?

Change Permanent Link Format Issue

According to the official document , changing permalink format is pretty easy: just modify :year/:month/:day/:title/ to :title/ would be OK. However, the real problem is that all existing incoming links will be invalid after this modification. Then the ranking of our site will be affected which is unacceptable.

We have a costly SQL server database with bad performance. Specifically, some store procedures (join several tables on primary key, each table has ~10M rows) were executed for several miniutes. The execution plan showed that the index seek costs 90% of the total time. Finally we found the root cause is the indexes have very high degree of fragmentation. Since its DBA had changed many times, we need to analyze the database schemas, table disk usage and storage procedure dependency tables. Based on these results, we cleanup tables, store procedures and rebuild the indexes to improve the DB performance. Here are the queries to accomplish these tasks.

Recently we want to deploy MongoDB Ops Manager and MongoDB deployments in different data centers to improve disaster recovery. If they are deploymented in the same data center and unfortunately it fails, you cannot restore the backup data to a new cluster as both Ops Manager and deployments are unavailable.

Of course, we don’t want to re-deploy the existing MongoDB deployments in Kubernetes. But how to make the deployments sending data to the new ops manager URL?

Using MongoDB in .NET is easy. However, there are two ways to manipulate the documents in C# code: raw bson document or stronly-typed document. In this article, we will compare the difference of them by examples. Basically, strongly-typed collection is preferred unless you have strong reason to use weakly-typed document (different types in the same collection?).

BsonDocument CRUD

The MongoDB C# Driver Official Document provide examples in this style. I guess the reason is that MongoDB is schemaless and the driver would like to demonstrate how to access document without schema. Actually, noSQL doesn’t means no SQL but stands for not only SQL. Creating a schema for a collection is still recommended because it’s easier to access documents and use index.

MongoDB transaction is a nice feature. Although MongoDB uses optimistic concurrency control, write conflict is unavoidable. The situation becomes worse in multi-document transaction which modifies many documents in one transaction. If a write conflict happens, a MongoDBCommandException will be thrown:

Exception: Command update failed: Encountered error from mongodb.svc.cluster.local:27017 during a transaction :: caused by :: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction..

Today I try to import an existing MongoDB deployment (out of the kubernetes cluster) into MongoDB Ops Manager which is running in kubernetes. After installing MongoDB Agent to the deployment, only automation functionality works while monitoring and backup not work. The root cause is that the agent still try to post data into Ops Manager’s internal endpoint.

The latest MongoDB Agent consists of a single binary that contains all three functions: Automation, Monitoring, and Backup. So theoretically install and configure the all-in-one automation agent in the deployment VM is enough.

I have a sharded cluster (2 shards, each 3 mongods; 3 config server, 2 mongoses) which is deployed by MongoDB Ops Manager.

Last week, one of the shard host status was shown as a grey diamond (Hover: “Last Ping: Never”). Besides, in the Ops Manager’s server page, a server had two processes (e.g. sharddb-0 and sharddb-config). However, the cluster still works well and we can list the host sharddb-0-0(shard 0, replica 0) in the mongo shell by sh.status() and rs.status(). What’s wrong with the cluster?

When I execute MongoDB transactions in parallel, I encounter lots of MongoCommandException: code 251, codename NoSuchTransaction:

Command find failed: cannot continue txnId 4 for session 38604515-2584-45a5-a17a-5eb5d34ea6c4 - = with txnId 5. Command find failed: cannot continue txnId 4 for session 38604515-2584-45a5-a17a-5eb5d34ea6c4 - = with txnId 6. Command insert failed: cannot continue txnId 31 for session 3ed7ea61-eae1-440f-8d95-b6e066b35b69 - = with txnId 34.

Problem Analysis

I performed some tests to pinpoint the issue:

Lightsquid is a handy log analyzer for squid proxy. However, the project is not maintained since 2009.

Today, I found lightsquid doesn’t work in 2021. Why?

Lightsquid Error

When I run lightparser, the output looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$ ./lightparser.pl access.log.66666
>>> use file :: /var/log/squid/access.log.66666
run TIME: 1 sec
LightSquid parser statistic report

                 3312 lines processed (average 3312.00 lines per second)
                    0 lines parsed
                    0 lines recovered
                    0 lines notrecovered
                    0 lines skiped by bad year
                 3312 lines skiped by date filter
                    0 lines skiped by Denied filter
                    0 lines skiped by skipURL filter

WARNING !!!!, parsed 0 lines from total : 3312
please check confiuration !!!!
may be wrong log format selected ?

Seems that all lines are filtered by the date filter. I double checked the log format and make sure it is correct and the same with previous days.

MongoDB sharded cluster is the most complicated architecture. The deployment of sharded cluster in Kubernetes is relatively hard. We will go through the deployment process by MongoDB Ops Manager in this post.

Before start, please go through the Create a UserDB ReplicaSet first.

A MongoDB sharded cluster consists of the following components:

  • shard : Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set .
  • mongos : The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
  • config servers : Config servers store metadata and configuration settings for the cluster.

In this post, we are going to create a sharded cluster with 2 shards (3 instances replica set), 2 mongos and 3 config servers.

This is part5, we will use the generated certficates to enable user database TLS and AUTH.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

Understanding Different Secure Connections

Before we start, look at the following three TLS:

This is part4, we will create a self-signed CA certificate and three server certificates.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

Self-signed certificates is not recommended for production. It cannot prevent man-in-the-middle attack. Since our main purpose is to encrypt the communication messages instead of authentication. Self-signed certificates is acceptable.

This is part3, we will expose the user database pods to the public so that Mongo client is able to access it.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

So far, the user database can be accessed only inside the kubernetes cluster. The official blog’s approach is to expose the pods by NodePort: # Connect to a MongoDB Database Resource from Outside Kubernetes

This is part2, we will create a user database that is a 3 instances ReplicaSet.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

The so called Application Database is the backend DB of Ops Manager. It cannot be used to store user data. The user database is called MongoDB Deployment. Note that the deployment is different with Kubernetes deployment.

It’s pretty easy to configure a MongoDB standalone instance (almost zero configuration). However, if you want to run a production-level MongoDB cluster, the configuration process is non-trivial. For a production cluster, replication/sharding/dyanmic scaling/backup/transport encryption/monitoring are required. Is there a nice tool to help us?

MongoDB cluster is a distributed system, which is well suited to run in Kubernetes. However, the collaboration of MongoDB instances usually need to manually run commands on each instance which is independent of Kubernetes. Therefore, MongoDB Enterprise Kubernetes Operator is developed to mitigate the gap. Morever, MongoDB Ops Manager is a great web portal to help these automation tasks.