Finisky Garden

NLP, Software Engineering, Product Design

0%

When you setup TLS/SSL for MongoDB Configure mongod and mongos for TLS/SSL , you might encounter the following errors:

1
2
{"t":{"$date":"2020-11-30T08:02:19.406+00:00"},"s":"E",  "c":"NETWORK",  "id":23248,   "ctx":"main","msg":"Cannot read certificate file","attr":{"keyFile":"/etc/ssl/testserver1.pem","error":"error:0200100D:system library:fopen:Permission denied"}}
{"t":{"$date":"2020-11-30T08:02:19.406+00:00"},"s":"F",  "c":"CONTROL",  "id":20574,   "ctx":"main","msg":"Error during global initialization","attr":{"error":{"code":140,"codeName":"InvalidSSLConfiguration","errmsg":"Can not set up PEM key file."}}}

or

1
{"t":{"$date":"2020-11-30T08:01:14.545+00:00"},"s":"I",  "c":"ACCESS",   "id":20254,   "ctx":"main","msg":"Read security file failed","attr":{"error":{"code":30,"codeName":"InvalidPath","errmsg":"permissions on / are too open"}}}

So what’s the right ownership and permission for the certificate pem file? The answer is: the pem file should have read access but no write access for the user mongodb.

Monitoring mysql server metrics is crucial for a DBA. Typically, we can simply monitor the recent server status summary through mysqlbench. But what’s the meaning for these metrics? Some of them are self-explained such as connections and traffic while others are not. For example, what’s the difference between Selects per second and Innodb reads per second? How to measure the write performance?

The following figure illustrates the serve status: mysqlbench server status

Master-slave replication is widely used in production. Monitoring the replication lag is a common and critical task. Typically, we are able to get the real-time difference between the master and the slave by periodically checking the Seconds_Behind_Master variable.

According to link

Seconds_Behind_Master: this field shows an approximation for difference between the current timestamp on the slave against the timestamp on the master for the event currently being processed on the slave.

We developed a REST API long time ago and recently found that the released client has a flaw: the HttpRequestMessage is missing content-type application/json. In earlier version we manually deserialize the request json by reading the request body but now we leverage AspNetCore framework to automatically get the request structure from the API parameter. However, the legacy client will not work: HTTP 415 “Unsupported Media Type” error happens.

Therefore, for backward compatibility, we need to make the server treat all coming requests as application/json even it is a plain text (by default).

Someone may argue that there is no reason to disable ipv6 on Linux. For me, the reason is that in some ipv6-enabled website, I frequently be classified as a bot who need to solve captcha which is really annoying :-( . Consider the ipv6 address is never change (assigned by service provider), disable ipv6 is the simplest solution.

Let’s come to the solution. Just use your favorite editor to add one line in /etc/sysctl.conf:

When we use entity framework to manipulate SQL database, we use where to query and include to load related entities (join operation).

Sample database table schema: Employee: Id, Name, Age Salary: Id, BasePay

Typical query scenarios:

  • Query employee 3’s name => Employee.Where(x => x.Id == 3)
  • Query employee Jack’s age => Employee.Where(x => x.Name == “Jack”)
  • Query employee Jack’s basepay => Employee.Where(x => x.Name == “Jack”).Include(x => x.Salary) …

To make the code clean and focus, the following examples will not include dbContext creation logic, we assume db = DbContext() which contains the Employee table.

Install an old library lightsquid on Ubuntu 20.04. When visit the cgi, internal server error pops up.

Debug the cgi by directly run it in /var/www/lightsquid:

1
2
3
/var/www/lightsquid$ perl index.cgi
Can't locate CGI.pm in @INC (you may need to install the CGI module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at index.cgi line 19.
BEGIN failed--compilation aborted at index.cgi line 19.

Seems that some modules are missing, after several search: https://packages.ubuntu.com/search?suite=trusty&arch=any&mode=filename&searchon=contents&keywords=cgi.pm

View the generated SQL query by entity framework is important. The translated SQL query may not what you expected. Sometimes it leads to significant performance issue.

Print the generated query for an IQueryable<T> in EF Core is different with EF. Finally I found a worked solution which is listed below.

The following code is tested in MySql.Data.EntityFrameworkCore 8.0.16. Just put the following class into your project:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public static class QueryableExtensions
{
    private static readonly TypeInfo QueryCompilerTypeInfo = typeof(QueryCompiler).GetTypeInfo();
    private static readonly FieldInfo QueryCompilerField = typeof(EntityQueryProvider).GetTypeInfo().DeclaredFields.First(x => x.Name == "_queryCompiler");
    private static readonly FieldInfo QueryModelGeneratorField = typeof(QueryCompiler).GetTypeInfo().DeclaredFields.First(x => x.Name == "_queryModelGenerator");
    private static readonly FieldInfo DataBaseField = QueryCompilerTypeInfo.DeclaredFields.Single(x => x.Name == "_database");
    private static readonly PropertyInfo DatabaseDependenciesField = typeof(Database).GetTypeInfo().DeclaredProperties.Single(x => x.Name == "Dependencies");

    public static string ToSql<TEntity>(this IQueryable<TEntity> query)
    {
        var queryCompiler = (QueryCompiler)QueryCompilerField.GetValue(query.Provider);
        var queryModelGenerator = (QueryModelGenerator)QueryModelGeneratorField.GetValue(queryCompiler);
        var queryModel = queryModelGenerator.ParseQuery(query.Expression);
        var database = DataBaseField.GetValue(queryCompiler);
        var databaseDependencies = (DatabaseDependencies)DatabaseDependenciesField.GetValue(database);
        var queryCompilationContext = databaseDependencies.QueryCompilationContextFactory.Create(false);
        var modelVisitor = (RelationalQueryModelVisitor)queryCompilationContext.CreateQueryModelVisitor();
        modelVisitor.CreateQueryExecutor<TEntity>(queryModel);
        var sql = modelVisitor.Queries.First().ToString();

        return sql;
    }
}

The usage is straightforward, just append a .ToSql() after your IQueryable<T>:

Today I found that some pods in kubernetes cluster are failed, the status is Waiting: ContainerCreating. The pod events:

1
2
3
4
MountVolume.SetUp failed for volume "xxxxx" : secret "xxxxx" not found
kubelet aks-agentpool-xxx-vmss000001

Unable to attach or mount volumes: unmounted volumes=[xxxxx], unattached volumes=[xxxxx]: timed out waiting for the condition

I remember that about one week ago I delete some secretes in this cluster. Therefore, the problem becomes how to recover the deleted secret “xxxxx”?

Recently, I found there were huge amount of files named .com.google.Chrome.* be created in my /tmp folder. Obviously the culprit is Chrome. However, after some research no solution is found to prevent Chrome creating these garbage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/tmp$ du -csh .com.google.Chrome.*

8.0K    .com.google.Chrome.00OKwD
104K    .com.google.Chrome.013jYf
172K    .com.google.Chrome.015x5t
...
48K     .com.google.Chrome.Zytrhf
16K     .com.google.Chrome.zz233G
36K     .com.google.Chrome.ZzrsZY
163M    total
/tmp$  find /tmp -name ".com.google.Chrome*" -ls| wc -l
3468

Update 2020/12/12 I found there are lots of .com.google.Chrome.* files in /tmp/snap.chromium/tmp :-( . Look at the ncdu results:

Yesterday, when I login to my VM I found that the disk is full. Even a tab command completion cannot be done. I immediately delete some unused files and the system works. However, when I reboot the system, it stucked at A start job is running for Create Volatile Files and Directories...:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
...
[  OK  ] Started File System Check on /dev/disk/cloud/azure_resource-part1.
[  OK  ] Started File System Check Daemon to report status.
[  OK  ] Started File System Check on /dev/d…5d996-7436-4ec8-b5d6-0d7b6100aeb5.
         Mounting /data...
[  OK  ] Mounted /data.
[  OK  ] Reached target Local File Systems.
         Starting AppArmor initialization...
         Starting ebtables ruleset management...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting Set console font and keymap...
[  OK  ] Started Set console font and keymap.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started ebtables ruleset management.
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started AppArmor initialization.
[    **] A start job is running for Create V… Directories (13min 46s / no limit)

I searched and found a similar discussion: # Boot stuck at “A start job is running for Create Volatile Files and Directories”

After I changed the HOME folder to another place, I copied the ssh config folder from old HOME to the new place. Supposedly it should directly work, right? However, when I login the server with my private key, the server said: “Server Refused Our Key”…

I spent some time to figure out the problem: new HOME folder access mode issue, it SHOULD NOT have write access for group.

Unity supports three ways of registering types:

  • Instance registration
  • Type registration
  • Factory registration

Typically, Instance registration and Type registration resolve dependencies through ResolvedParameter<T>, while Factory registration resolve dependencies by a factory delegate. In practice, when you want to resolve a List<T> or Dictionary<T1, T2>, Factory registration is what you want. Let’s go through how to resolve a collection of customized classes.

Animal Example

Start with example, assume we have an IAnimal interface which has two implementations. There is a Zoo class which accept List<IAnimal> as the parameter:

If you use Visual Studio 2017 and create an .NET Core projects the first time, the following errors might appear:

The current .NET SDK does not support targeting .NET Standard 2.0. Either target .NET Standard 1.6 or lower, or use a version of the .NET SDK that supports .NET Standard 2.0. XXXProject C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Sdks\Microsoft.NET.Sdk\build\Microsoft.NET.TargetFrameworkInference.targets

Another related issue is .NET Core project load failed. Both errors point out that the corresponding .NET SDK is missing.

Recently we use Grafana to monitor ASP.NET Core apps. We have an interesting observation that sometimes the “allocated memory” is larger than “working set”. After investigation, we found the root cause is that the app uses native dlls which operates on unmanaged memory which is not managed by GC. Therefore, how to correctly collect memory related metrics in C#?

We will explain these related concepts in this post.

Basic Concepts

Managed vs Unmanaged Code

# What is “managed code”?

The error looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>
  ...
  File "miniconda3/envs/torch/lib/python3.7/site-packages/apex/__init__.py", line 18, in <module>
    from apex.interfaces import (ApexImplementation,
  File "miniconda3/envs/torch/lib/python3.7/site-packages/apex/interfaces.py", line 10, in <module>
    class ApexImplementation(object):
  File "miniconda3/envs/torch/lib/python3.7/site-packages/apex/interfaces.py", line 14, in ApexImplementation
    implements(IApex)
  File "miniconda3/envs/torch/lib/python3.7/site-packages/zope/interface/declarations.py", line 483, in implements
    raise TypeError(_ADVICE_ERROR % 'implementer')
TypeError: Class advice impossible in Python3.  Use the @implementer class decorator instead.

Refer to this issue: https://github.com/NVIDIA/apex/issues/116

A weird problem that execute “import torch” in bash works but when you run it in Jupyter notebook:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
ImportError Traceback (most recent call last)
<ipython-input-4-8ba1970b60ce> in <module>
 6 import random
 7 
----> 8  import torch
 9 import torch.nn as nn
 10 

~/miniconda3/envs/tf/lib/python3.6/site-packages/torch/__init__.py in <module>
 79 del _dl_flags
 80 
---> 81  from torch._C import *
 82 
 83 __all__ += [name for name in dir(_C)

ImportError: dlopen: cannot load any more object with static TLS

Seems that it’s a compatibility issue. Even though there are a large volume of discussions but none of them works. Ironically, I accidently fixed the issue by resolving another: # RuntimeError cuDNN error CUDNN_STATUS_EXECUTION_FAILED Solution

1
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

When you encountered the above issue and Google it, you will find lots of discussions. Unfortunately, very rare of them are useful and work.

Actually, the root cause is pytorch/cuda/python compatibility issue.

Solution

The solution is straightforward. Simply downgrade pytorch and install a different version of cuda or python would be fine.

My environment:

  1. Ubuntu 18.04 LTS
  2. Python 3.6.9
  3. PyTorch 1.3.0
  4. cuda 10.1

This command resolved my issue (PyTorch version really matters! from 1.3.0 to 1.2.0):

When I use kubectl to deploy a service:

1
$ kubectl --kubeconfig=C:\Users\xxx\.kube\config apply -f deploy.yaml --namespace=xxx

a SchemeError was raised:

1
EXEC(0,0): Error : SchemaError(io.k8s.api.apps.v1beta1.RollingUpdateStatefulSetStrategy): invalid object doesn't have additional properties

Seems that the issue is caused by kubectl version mismatch: Install and Set Up kubectl

1
You must use a kubectl version that is within one minor version difference of your cluster. For example, a v1.2 client should work with v1.1, v1.2, and v1.3 master. Using the latest version of kubectl helps avoid unforeseen issues.

Follow these steps to resolve the issue: