MongoDB Agent Host Discovery Issue

Publish on: 2021-01-19 Classify at: MongoDB Read:≈ 3min Views: Comments:

I have a sharded cluster (2 shards, each 3 mongods; 3 config server, 2 mongoses) which is deployed by MongoDB Ops Manager.

Last week, one of the shard host status was shown as a grey diamond (Hover: “Last Ping: Never”). Besides, in the Ops Manager’s server page, a server had two processes (e.g. sharddb-0 and sharddb-config). However, the cluster still works well and we can list the host sharddb-0-0(shard 0, replica 0) in the mongo shell by sh.status() and rs.status(). What’s wrong with the cluster?

Don't Use Load Balancer In front of Mongos

Publish on: 2021-01-13 Classify at: MongoDB Read:≈ 3min Views: Comments:

When I execute MongoDB transactions in parallel, I encounter lots of MongoCommandException: code 251, codename NoSuchTransaction:

Command find failed: cannot continue txnId 4 for session 38604515-2584-45a5-a17a-5eb5d34ea6c4 - = with txnId 5. Command find failed: cannot continue txnId 4 for session 38604515-2584-45a5-a17a-5eb5d34ea6c4 - = with txnId 6. Command insert failed: cannot continue txnId 31 for session 3ed7ea61-eae1-440f-8d95-b6e066b35b69 - = with txnId 34.

Problem Analysis

I performed some tests to pinpoint the issue:

Lightsquid Stop Working in 2021

Publish on: 2021-01-03 Classify at: Linux Read:≈ 5min Views: Comments:

Lightsquid is a handy log analyzer for squid proxy. However, the project is not maintained since 2009.

Today, I found lightsquid doesn’t work in 2021. Why?

Lightsquid Error

When I run lightparser, the output looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$ ./lightparser.pl access.log.66666
>>> use file :: /var/log/squid/access.log.66666
run TIME: 1 sec
LightSquid parser statistic report

                 3312 lines processed (average 3312.00 lines per second)
                    0 lines parsed
                    0 lines recovered
                    0 lines notrecovered
                    0 lines skiped by bad year
                 3312 lines skiped by date filter
                    0 lines skiped by Denied filter
                    0 lines skiped by skipURL filter

WARNING !!!!, parsed 0 lines from total : 3312
please check confiuration !!!!
may be wrong log format selected ?

Seems that all lines are filtered by the date filter. I double checked the log format and make sure it is correct and the same with previous days.

Deploy MongoDB Sharded Cluster by Ops Manager

Publish on: 2020-12-26 Classify at: MongoDB Read:≈ 12min Views: Comments:

MongoDB sharded cluster is the most complicated architecture. The deployment of sharded cluster in Kubernetes is relatively hard. We will go through the deployment process by MongoDB Ops Manager in this post.

Before start, please go through the Create a UserDB ReplicaSet first.

A MongoDB sharded cluster consists of the following components:

shard : Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set .
mongos : The mongos acts as a query router, providing an interface between client applications and the sharded cluster.
config servers : Config servers store metadata and configuration settings for the cluster.

In this post, we are going to create a sharded cluster with 2 shards (3 instances replica set), 2 mongos and 3 config servers.

MongoDB Cluster In Kubernetes(5): Enable UserDB TLS and Auth

Publish on: 2020-12-19 Classify at: MongoDB Read:≈ 7min Views: Comments:

This is part5, we will use the generated certficates to enable user database TLS and AUTH.

MongoDB Ops Manager Series:

Understanding Different Secure Connections

Before we start, look at the following three TLS:

MongoDB Cluster In Kubernetes(4): Openssl Generates Self-signed Certificates

Publish on: 2020-12-19 Classify at: MongoDB Read:≈ 7min Views: Comments:

This is part4, we will create a self-signed CA certificate and three server certificates.

MongoDB Ops Manager Series:

Self-signed certificates is not recommended for production. It cannot prevent man-in-the-middle attack. Since our main purpose is to encrypt the communication messages instead of authentication. Self-signed certificates is acceptable.

MongoDB Cluster In Kubernetes(3): Expose UserDB to Public

Publish on: 2020-12-19 Classify at: MongoDB Read:≈ 4min Views: Comments:

This is part3, we will expose the user database pods to the public so that Mongo client is able to access it.

MongoDB Ops Manager Series:

So far, the user database can be accessed only inside the kubernetes cluster. The official blog’s approach is to expose the pods by NodePort: # Connect to a MongoDB Database Resource from Outside Kubernetes

MongoDB Cluster In Kubernetes(2): Create a UserDB ReplicaSet

Publish on: 2020-12-19 Classify at: MongoDB Read:≈ 5min Views: Comments:

This is part2, we will create a user database that is a 3 instances ReplicaSet.

MongoDB Ops Manager Series:

The so called Application Database is the backend DB of Ops Manager. It cannot be used to store user data. The user database is called MongoDB Deployment. Note that the deployment is different with Kubernetes deployment.

MongoDB Cluster In Kubernetes(1): Install MongoDB Ops Manager

Publish on: 2020-12-19 Classify at: MongoDB Read:≈ 4min Views: Comments:

It’s pretty easy to configure a MongoDB standalone instance (almost zero configuration). However, if you want to run a production-level MongoDB cluster, the configuration process is non-trivial. For a production cluster, replication/sharding/dyanmic scaling/backup/transport encryption/monitoring are required. Is there a nice tool to help us?

MongoDB cluster is a distributed system, which is well suited to run in Kubernetes. However, the collaboration of MongoDB instances usually need to manually run commands on each instance which is independent of Kubernetes. Therefore, MongoDB Enterprise Kubernetes Operator is developed to mitigate the gap. Morever, MongoDB Ops Manager is a great web portal to help these automation tasks.

MongoDB Certificate Key File Ownership And Permission

Publish on: 2020-11-30 Classify at: MongoDB Read:≈ 1min Views: Comments:

When you setup TLS/SSL for MongoDB Configure mongod and mongos for TLS/SSL , you might encounter the following errors:

1
2
{"t":{"$date":"2020-11-30T08:02:19.406+00:00"},"s":"E",  "c":"NETWORK",  "id":23248,   "ctx":"main","msg":"Cannot read certificate file","attr":{"keyFile":"/etc/ssl/testserver1.pem","error":"error:0200100D:system library:fopen:Permission denied"}}
{"t":{"$date":"2020-11-30T08:02:19.406+00:00"},"s":"F",  "c":"CONTROL",  "id":20574,   "ctx":"main","msg":"Error during global initialization","attr":{"error":{"code":140,"codeName":"InvalidSSLConfiguration","errmsg":"Can not set up PEM key file."}}}

{"t":{"$date":"2020-11-30T08:01:14.545+00:00"},"s":"I",  "c":"ACCESS",   "id":20254,   "ctx":"main","msg":"Read security file failed","attr":{"error":{"code":30,"codeName":"InvalidPath","errmsg":"permissions on / are too open"}}}

So what’s the right ownership and permission for the certificate pem file? The answer is: the pem file should have read access but no write access for the user mongodb.

Mysqlbench Server Status Metrics Meaning

Publish on: 2020-11-15 Classify at: Linux Read:≈ 2min Views: Comments:

Monitoring mysql server metrics is crucial for a DBA. Typically, we can simply monitor the recent server status summary through mysqlbench. But what’s the meaning for these metrics? Some of them are self-explained such as connections and traffic while others are not. For example, what’s the difference between Selects per second and Innodb reads per second? How to measure the write performance?

The following figure illustrates the serve status:

MySQL Replication Lag Monitoring Script

Publish on: 2020-11-11 Classify at: Linux Read:≈ 2min Views: Comments:

Master-slave replication is widely used in production. Monitoring the replication lag is a common and critical task. Typically, we are able to get the real-time difference between the master and the slave by periodically checking the Seconds_Behind_Master variable.

According to link

Seconds_Behind_Master: this field shows an approximation for difference between the current timestamp on the slave against the timestamp on the master for the event currently being processed on the slave.

AspNetCore Accept text/plain as applicatoin/json

Publish on: 2020-09-06 Classify at: C# Read:≈ 2min Views: Comments:

We developed a REST API long time ago and recently found that the released client has a flaw: the HttpRequestMessage is missing content-type application/json. In earlier version we manually deserialize the request json by reading the request body but now we leverage AspNetCore framework to automatically get the request structure from the API parameter. However, the legacy client will not work: HTTP 415 “Unsupported Media Type” error happens.

Therefore, for backward compatibility, we need to make the server treat all coming requests as application/json even it is a plain text (by default).

Permanantly Disable ipv6 on Ubuntu

Publish on: 2020-09-06 Classify at: Linux Read:≈ 1min Views: Comments:

Someone may argue that there is no reason to disable ipv6 on Linux. For me, the reason is that in some ipv6-enabled website, I frequently be classified as a bot who need to solve captcha which is really annoying :-( . Consider the ipv6 address is never change (assigned by service provider), disable ipv6 is the simplest solution.

Let’s come to the solution. Just use your favorite editor to add one line in /etc/sysctl.conf:

Entity Framework use Expression instead of Func Delegate

Publish on: 2020-09-02 Classify at: C# Read:≈ 4min Views: Comments:

When we use entity framework to manipulate SQL database, we use where to query and include to load related entities (join operation).

Sample database table schema: Employee: Id, Name, Age Salary: Id, BasePay

Typical query scenarios:

Query employee 3’s name => Employee.Where(x => x.Id == 3)
Query employee Jack’s age => Employee.Where(x => x.Name == “Jack”)
Query employee Jack’s basepay => Employee.Where(x => x.Name == “Jack”).Include(x => x.Salary) …

To make the code clean and focus, the following examples will not include dbContext creation logic, we assume db = DbContext() which contains the Employee table.

Can't locate CGI.pm in @INC

Publish on: 2020-09-01 Classify at: Linux Read:≈ 1min Views: Comments:

Install an old library lightsquid on Ubuntu 20.04. When visit the cgi, internal server error pops up.

Debug the cgi by directly run it in /var/www/lightsquid:

1
2
3
/var/www/lightsquid$ perl index.cgi
Can't locate CGI.pm in @INC (you may need to install the CGI module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at index.cgi line 19.
BEGIN failed--compilation aborted at index.cgi line 19.

Seems that some modules are missing, after several search: https://packages.ubuntu.com/search?suite=trusty&arch=any&mode=filename&searchon=contents&keywords=cgi.pm

Get SQL Query in Entity Framework Core

Publish on: 2020-08-26 Classify at: C# Read:≈ 2min Views: Comments:

View the generated SQL query by entity framework is important. The translated SQL query may not what you expected. Sometimes it leads to significant performance issue.

Print the generated query for an IQueryable<T> in EF Core is different with EF. Finally I found a worked solution which is listed below.

The following code is tested in MySql.Data.EntityFrameworkCore 8.0.16. Just put the following class into your project:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public static class QueryableExtensions
{
    private static readonly TypeInfo QueryCompilerTypeInfo = typeof(QueryCompiler).GetTypeInfo();
    private static readonly FieldInfo QueryCompilerField = typeof(EntityQueryProvider).GetTypeInfo().DeclaredFields.First(x => x.Name == "_queryCompiler");
    private static readonly FieldInfo QueryModelGeneratorField = typeof(QueryCompiler).GetTypeInfo().DeclaredFields.First(x => x.Name == "_queryModelGenerator");
    private static readonly FieldInfo DataBaseField = QueryCompilerTypeInfo.DeclaredFields.Single(x => x.Name == "_database");
    private static readonly PropertyInfo DatabaseDependenciesField = typeof(Database).GetTypeInfo().DeclaredProperties.Single(x => x.Name == "Dependencies");

    public static string ToSql<TEntity>(this IQueryable<TEntity> query)
    {
        var queryCompiler = (QueryCompiler)QueryCompilerField.GetValue(query.Provider);
        var queryModelGenerator = (QueryModelGenerator)QueryModelGeneratorField.GetValue(queryCompiler);
        var queryModel = queryModelGenerator.ParseQuery(query.Expression);
        var database = DataBaseField.GetValue(queryCompiler);
        var databaseDependencies = (DatabaseDependencies)DatabaseDependenciesField.GetValue(database);
        var queryCompilationContext = databaseDependencies.QueryCompilationContextFactory.Create(false);
        var modelVisitor = (RelationalQueryModelVisitor)queryCompilationContext.CreateQueryModelVisitor();
        modelVisitor.CreateQueryExecutor<TEntity>(queryModel);
        var sql = modelVisitor.Queries.First().ToString();

        return sql;
    }
}

The usage is straightforward, just append a .ToSql() after your IQueryable<T>:

MountVolume.SetUp failed for volume secret not found Issue

Publish on: 2020-05-21 Classify at: Misc Read:≈ 2min Views: Comments:

Today I found that some pods in kubernetes cluster are failed, the status is Waiting: ContainerCreating. The pod events:

1
2
3
4
MountVolume.SetUp failed for volume "xxxxx" : secret "xxxxx" not found
kubelet aks-agentpool-xxx-vmss000001

Unable to attach or mount volumes: unmounted volumes=[xxxxx], unattached volumes=[xxxxx]: timed out waiting for the condition

I remember that about one week ago I delete some secretes in this cluster. Therefore, the problem becomes how to recover the deleted secret “xxxxx”?

Clean .com.google.Chrome.* in /tmp

Publish on: 2020-03-31 Classify at: Linux Read:≈ 2min Views: Comments:

Recently, I found there were huge amount of files named .com.google.Chrome.* be created in my /tmp folder. Obviously the culprit is Chrome. However, after some research no solution is found to prevent Chrome creating these garbage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/tmp$ du -csh .com.google.Chrome.*

8.0K    .com.google.Chrome.00OKwD
104K    .com.google.Chrome.013jYf
172K    .com.google.Chrome.015x5t
...
48K     .com.google.Chrome.Zytrhf
16K     .com.google.Chrome.zz233G
36K     .com.google.Chrome.ZzrsZY
163M    total
/tmp$  find /tmp -name ".com.google.Chrome*" -ls| wc -l
3468

Update 2020/12/12 I found there are lots of .com.google.Chrome.* files in /tmp/snap.chromium/tmp :-( . Look at the ncdu results:

A start job is running for Create Volatile Files and Directories

Publish on: 2020-03-29 Classify at: Linux Read:≈ 3min Views: Comments:

Yesterday, when I login to my VM I found that the disk is full. Even a tab command completion cannot be done. I immediately delete some unused files and the system works. However, when I reboot the system, it stucked at A start job is running for Create Volatile Files and Directories...:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
...
[  OK  ] Started File System Check on /dev/disk/cloud/azure_resource-part1.
[  OK  ] Started File System Check Daemon to report status.
[  OK  ] Started File System Check on /dev/d…5d996-7436-4ec8-b5d6-0d7b6100aeb5.
         Mounting /data...
[  OK  ] Mounted /data.
[  OK  ] Reached target Local File Systems.
         Starting AppArmor initialization...
         Starting ebtables ruleset management...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting Set console font and keymap...
[  OK  ] Started Set console font and keymap.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started ebtables ruleset management.
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started AppArmor initialization.
[    **] A start job is running for Create V… Directories (13min 46s / no limit)

I searched and found a similar discussion: # Boot stuck at “A start job is running for Create Volatile Files and Directories”