Finisky Garden

NLP, Software Engineering, Product Design

0%

This is part4, we will create a self-signed CA certificate and three server certificates.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

Self-signed certificates is not recommended for production. It cannot prevent man-in-the-middle attack. Since our main purpose is to encrypt the communication messages instead of authentication. Self-signed certificates is acceptable.

This is part3, we will expose the user database pods to the public so that Mongo client is able to access it.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

So far, the user database can be accessed only inside the kubernetes cluster. The official blog’s approach is to expose the pods by NodePort: # Connect to a MongoDB Database Resource from Outside Kubernetes

This is part2, we will create a user database that is a 3 instances ReplicaSet.

MongoDB Ops Manager Series:

  1. Install MongoDB Ops Manager
  2. Create a UserDB ReplicaSet
  3. Expose UserDB to Public
  4. Openssl Generates Self-signed Certificates
  5. Enable UserDB TLS and Auth

The so called Application Database is the backend DB of Ops Manager. It cannot be used to store user data. The user database is called MongoDB Deployment. Note that the deployment is different with Kubernetes deployment.

It’s pretty easy to configure a MongoDB standalone instance (almost zero configuration). However, if you want to run a production-level MongoDB cluster, the configuration process is non-trivial. For a production cluster, replication/sharding/dyanmic scaling/backup/transport encryption/monitoring are required. Is there a nice tool to help us?

MongoDB cluster is a distributed system, which is well suited to run in Kubernetes. However, the collaboration of MongoDB instances usually need to manually run commands on each instance which is independent of Kubernetes. Therefore, MongoDB Enterprise Kubernetes Operator is developed to mitigate the gap. Morever, MongoDB Ops Manager is a great web portal to help these automation tasks.

When you setup TLS/SSL for MongoDB Configure mongod and mongos for TLS/SSL , you might encounter the following errors:

1
2
{"t":{"$date":"2020-11-30T08:02:19.406+00:00"},"s":"E",  "c":"NETWORK",  "id":23248,   "ctx":"main","msg":"Cannot read certificate file","attr":{"keyFile":"/etc/ssl/testserver1.pem","error":"error:0200100D:system library:fopen:Permission denied"}}
{"t":{"$date":"2020-11-30T08:02:19.406+00:00"},"s":"F",  "c":"CONTROL",  "id":20574,   "ctx":"main","msg":"Error during global initialization","attr":{"error":{"code":140,"codeName":"InvalidSSLConfiguration","errmsg":"Can not set up PEM key file."}}}

or

1
{"t":{"$date":"2020-11-30T08:01:14.545+00:00"},"s":"I",  "c":"ACCESS",   "id":20254,   "ctx":"main","msg":"Read security file failed","attr":{"error":{"code":30,"codeName":"InvalidPath","errmsg":"permissions on / are too open"}}}

So what’s the right ownership and permission for the certificate pem file? The answer is: the pem file should have read access but no write access for the user mongodb.

Monitoring mysql server metrics is crucial for a DBA. Typically, we can simply monitor the recent server status summary through mysqlbench. But what’s the meaning for these metrics? Some of them are self-explained such as connections and traffic while others are not. For example, what’s the difference between Selects per second and Innodb reads per second? How to measure the write performance?

The following figure illustrates the serve status: mysqlbench server status

Master-slave replication is widely used in production. Monitoring the replication lag is a common and critical task. Typically, we are able to get the real-time difference between the master and the slave by periodically checking the Seconds_Behind_Master variable.

According to link

Seconds_Behind_Master: this field shows an approximation for difference between the current timestamp on the slave against the timestamp on the master for the event currently being processed on the slave.

We developed a REST API long time ago and recently found that the released client has a flaw: the HttpRequestMessage is missing content-type application/json. In earlier version we manually deserialize the request json by reading the request body but now we leverage AspNetCore framework to automatically get the request structure from the API parameter. However, the legacy client will not work: HTTP 415 “Unsupported Media Type” error happens.

Therefore, for backward compatibility, we need to make the server treat all coming requests as application/json even it is a plain text (by default).

Someone may argue that there is no reason to disable ipv6 on Linux. For me, the reason is that in some ipv6-enabled website, I frequently be classified as a bot who need to solve captcha which is really annoying :-( . Consider the ipv6 address is never change (assigned by service provider), disable ipv6 is the simplest solution.

Let’s come to the solution. Just use your favorite editor to add one line in /etc/sysctl.conf:

When we use entity framework to manipulate SQL database, we use where to query and include to load related entities (join operation).

Sample database table schema: Employee: Id, Name, Age Salary: Id, BasePay

Typical query scenarios:

  • Query employee 3’s name => Employee.Where(x => x.Id == 3)
  • Query employee Jack’s age => Employee.Where(x => x.Name == “Jack”)
  • Query employee Jack’s basepay => Employee.Where(x => x.Name == “Jack”).Include(x => x.Salary) …

To make the code clean and focus, the following examples will not include dbContext creation logic, we assume db = DbContext() which contains the Employee table.

Install an old library lightsquid on Ubuntu 20.04. When visit the cgi, internal server error pops up.

Debug the cgi by directly run it in /var/www/lightsquid:

1
2
3
/var/www/lightsquid$ perl index.cgi
Can't locate CGI.pm in @INC (you may need to install the CGI module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at index.cgi line 19.
BEGIN failed--compilation aborted at index.cgi line 19.

Seems that some modules are missing, after several search: https://packages.ubuntu.com/search?suite=trusty&arch=any&mode=filename&searchon=contents&keywords=cgi.pm

View the generated SQL query by entity framework is important. The translated SQL query may not what you expected. Sometimes it leads to significant performance issue.

Print the generated query for an IQueryable<T> in EF Core is different with EF. Finally I found a worked solution which is listed below.

The following code is tested in MySql.Data.EntityFrameworkCore 8.0.16. Just put the following class into your project:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public static class QueryableExtensions
{
    private static readonly TypeInfo QueryCompilerTypeInfo = typeof(QueryCompiler).GetTypeInfo();
    private static readonly FieldInfo QueryCompilerField = typeof(EntityQueryProvider).GetTypeInfo().DeclaredFields.First(x => x.Name == "_queryCompiler");
    private static readonly FieldInfo QueryModelGeneratorField = typeof(QueryCompiler).GetTypeInfo().DeclaredFields.First(x => x.Name == "_queryModelGenerator");
    private static readonly FieldInfo DataBaseField = QueryCompilerTypeInfo.DeclaredFields.Single(x => x.Name == "_database");
    private static readonly PropertyInfo DatabaseDependenciesField = typeof(Database).GetTypeInfo().DeclaredProperties.Single(x => x.Name == "Dependencies");

    public static string ToSql<TEntity>(this IQueryable<TEntity> query)
    {
        var queryCompiler = (QueryCompiler)QueryCompilerField.GetValue(query.Provider);
        var queryModelGenerator = (QueryModelGenerator)QueryModelGeneratorField.GetValue(queryCompiler);
        var queryModel = queryModelGenerator.ParseQuery(query.Expression);
        var database = DataBaseField.GetValue(queryCompiler);
        var databaseDependencies = (DatabaseDependencies)DatabaseDependenciesField.GetValue(database);
        var queryCompilationContext = databaseDependencies.QueryCompilationContextFactory.Create(false);
        var modelVisitor = (RelationalQueryModelVisitor)queryCompilationContext.CreateQueryModelVisitor();
        modelVisitor.CreateQueryExecutor<TEntity>(queryModel);
        var sql = modelVisitor.Queries.First().ToString();

        return sql;
    }
}

The usage is straightforward, just append a .ToSql() after your IQueryable<T>:

Today I found that some pods in kubernetes cluster are failed, the status is Waiting: ContainerCreating. The pod events:

1
2
3
4
MountVolume.SetUp failed for volume "xxxxx" : secret "xxxxx" not found
kubelet aks-agentpool-xxx-vmss000001

Unable to attach or mount volumes: unmounted volumes=[xxxxx], unattached volumes=[xxxxx]: timed out waiting for the condition

I remember that about one week ago I delete some secretes in this cluster. Therefore, the problem becomes how to recover the deleted secret “xxxxx”?

Recently, I found there were huge amount of files named .com.google.Chrome.* be created in my /tmp folder. Obviously the culprit is Chrome. However, after some research no solution is found to prevent Chrome creating these garbage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/tmp$ du -csh .com.google.Chrome.*

8.0K    .com.google.Chrome.00OKwD
104K    .com.google.Chrome.013jYf
172K    .com.google.Chrome.015x5t
...
48K     .com.google.Chrome.Zytrhf
16K     .com.google.Chrome.zz233G
36K     .com.google.Chrome.ZzrsZY
163M    total
/tmp$  find /tmp -name ".com.google.Chrome*" -ls| wc -l
3468

Update 2020/12/12 I found there are lots of .com.google.Chrome.* files in /tmp/snap.chromium/tmp :-( . Look at the ncdu results:

Yesterday, when I login to my VM I found that the disk is full. Even a tab command completion cannot be done. I immediately delete some unused files and the system works. However, when I reboot the system, it stucked at A start job is running for Create Volatile Files and Directories...:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
...
[  OK  ] Started File System Check on /dev/disk/cloud/azure_resource-part1.
[  OK  ] Started File System Check Daemon to report status.
[  OK  ] Started File System Check on /dev/d…5d996-7436-4ec8-b5d6-0d7b6100aeb5.
         Mounting /data...
[  OK  ] Mounted /data.
[  OK  ] Reached target Local File Systems.
         Starting AppArmor initialization...
         Starting ebtables ruleset management...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting Set console font and keymap...
[  OK  ] Started Set console font and keymap.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started ebtables ruleset management.
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started AppArmor initialization.
[    **] A start job is running for Create V… Directories (13min 46s / no limit)

I searched and found a similar discussion: # Boot stuck at “A start job is running for Create Volatile Files and Directories”

After I changed the HOME folder to another place, I copied the ssh config folder from old HOME to the new place. Supposedly it should directly work, right? However, when I login the server with my private key, the server said: “Server Refused Our Key”…

I spent some time to figure out the problem: new HOME folder access mode issue, it SHOULD NOT have write access for group.

Unity supports three ways of registering types:

  • Instance registration
  • Type registration
  • Factory registration

Typically, Instance registration and Type registration resolve dependencies through ResolvedParameter<T>, while Factory registration resolve dependencies by a factory delegate. In practice, when you want to resolve a List<T> or Dictionary<T1, T2>, Factory registration is what you want. Let’s go through how to resolve a collection of customized classes.

Animal Example

Start with example, assume we have an IAnimal interface which has two implementations. There is a Zoo class which accept List<IAnimal> as the parameter:

If you use Visual Studio 2017 and create an .NET Core projects the first time, the following errors might appear:

The current .NET SDK does not support targeting .NET Standard 2.0. Either target .NET Standard 1.6 or lower, or use a version of the .NET SDK that supports .NET Standard 2.0. XXXProject C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Sdks\Microsoft.NET.Sdk\build\Microsoft.NET.TargetFrameworkInference.targets

Another related issue is .NET Core project load failed. Both errors point out that the corresponding .NET SDK is missing.

Recently we use Grafana to monitor ASP.NET Core apps. We have an interesting observation that sometimes the “allocated memory” is larger than “working set”. After investigation, we found the root cause is that the app uses native dlls which operates on unmanaged memory which is not managed by GC. Therefore, how to correctly collect memory related metrics in C#?

We will explain these related concepts in this post.

Basic Concepts

Managed vs Unmanaged Code

# What is “managed code”?

The error looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
  ...
  File "miniconda3/envs/torch/lib/python3.7/site-packages/apex/__init__.py", line 18, in <module>
    from apex.interfaces import (ApexImplementation,
  File "miniconda3/envs/torch/lib/python3.7/site-packages/apex/interfaces.py", line 10, in <module>
    class ApexImplementation(object):
  File "miniconda3/envs/torch/lib/python3.7/site-packages/apex/interfaces.py", line 14, in ApexImplementation
    implements(IApex)
  File "miniconda3/envs/torch/lib/python3.7/site-packages/zope/interface/declarations.py", line 483, in implements
    raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.

Refer to this issue: https://github.com/NVIDIA/apex/issues/116