MongoDB Agent Monitoring and Backup not Working

Today I try to import an existing MongoDB deployment (out of the kubernetes cluster) into MongoDB Ops Manager which is running in kubernetes. After installing MongoDB Agent to the deployment, only automation functionality works while monitoring and backup not work. The root cause is that the agent still try to post data into Ops Manager's internal endpoint.

The latest MongoDB Agent consists of a single binary that contains all three functions: Automation, Monitoring, and Backup. So theoretically install and configure the all-in-one automation agent in the deployment VM is enough.

Configure the automation agent according to this, edit mmsBaseUrl in /etc/mongodb-mms/automation-agent.config (assume our Ops Manager public url is http://myopsmanager.com:8080, other fields such as mmsApiKey are omitted here):

...
mmsBaseUrl=http://myopsmanager.com:8080
...

Startthe automation agent service:

$ systemctl start mongodb-mms-automation-agent.service

Verify the agent in the Ops Manager and import the deployment. In the Ops Manager server page, only automation is green, while monitoring and backup functionality is still grey (standby). MongoDB Agent Not Working

Check the backup agent log in the deployment VM:

/var/log/mongodb-mms-automation$ cat backup-agent.log
[2021-01-20T07:49:51.245+0000] [header.info] [::0]       AgentVersion = 10.14.17.6445
[2021-01-20T07:49:51.245+0000] [header.info] [::0]      GitCommitHash = 0000000000000000000000000000000000000000
[2021-01-20T07:49:51.245+0000] [header.info] [::0]       ManagedAgent = true
[2021-01-20T07:49:51.245+0000] [header.info] [::0]          GoVersion = go1.14.7
[2021-01-20T07:49:51.245+0000] [header.info] [::0]         mothership = ops-manager-svc.mongodb.svc.cluster.local:8080
[2021-01-20T07:49:51.245+0000] [header.info] [::0]              https = false
[2021-01-20T07:49:51.245+0000] [header.info] [::0]          httpProxy = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0]      krb5Principal = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0]         krb5Keytab = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0] krb5ConfigLocation = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0]  gssapiServiceName = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0] sslClientCertificate = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0] sslTrustedServerCertificates = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0] sslRequireValidServerCertificates = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0] sslTrustedMMSBackupServerCertificate = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0]            logFile = /var/log/mongodb-mms-automation/backup-agent.log
[2021-01-20T07:49:51.245+0000] [header.info] [::0] maxLogFileSizeBytes = 104857600
[2021-01-20T07:49:51.245+0000] [header.info] [::0] maxLogFileDurationHrs = 240
[2021-01-20T07:49:51.245+0000] [header.info] [::0]    enableKeepAlive = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0]  keepAliveDuration = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0]       connDeadline = <unset>
[2021-01-20T07:49:51.245+0000] [header.info] [::0] mothershipResponseHeaderTimeout = <unset>
[2021-01-20T07:49:51.444+0000] [agent.info] [backupexecutor/executor.go:Execute:75] Starting backup module version 10.14.17.6445 0000000000000000000000000000000000000000
[2021-01-20T07:49:51.492+0000] [agent.info] [commonbackup/utils.go:func1:217] Listen for shutdown signal
[2021-01-20T07:49:51.494+0000] [agent.warn] [components/agent.go:Iterate:69] Unable to load latest configuration from server. Will try again. Tag: unassigned Err: Op: Get Err: dial tcp: lookup ops-manager-svc.mongodb.svc.cluster.local: Temporary failure in name resolution
[2021-01-20T07:50:50.713+0000] [agent.warn] [components/agent.go:Iterate:69] Unable to load latest configuration from server. Will try again. Tag: unassigned Err: Op: Get Err: dial tcp: lookup ops-manager-svc.mongodb.svc.cluster.local: Temporary failure in name resolution

Check the monitoring agent log:

/var/log/mongodb-mms-automation$ cat monitoring-agent.log
[2021-01-20T07:49:51.248+0000] [header.info] [::0]       AgentVersion = 10.14.17.6445
[2021-01-20T07:49:51.248+0000] [header.info] [::0]      GitCommitHash = 0000000000000000000000000000000000000000
[2021-01-20T07:49:51.248+0000] [header.info] [::0]       ManagedAgent = true
[2021-01-20T07:49:51.248+0000] [header.info] [::0]          GoVersion = go1.14.7
...
[2021-01-20T07:49:51.248+0000] [header.info] [::0] maxProfilingEntries = <unset>
[2021-01-20T07:49:51.248+0000] [header.info] [::0]         mmsBaseUrl = http://ops-manager-svc.mongodb.svc.cluster.local:8080
[2021-01-20T07:49:51.248+0000] [header.info] [::0]            logFile = /var/log/mongodb-mms-automation/monitoring-agent.log
[2021-01-20T07:49:51.248+0000] [header.info] [::0] maxLogFileSizeBytes = 104857600
[2021-01-20T07:49:51.248+0000] [header.info] [::0] maxLogFileDurationHrs = 240
[2021-01-20T07:49:51.248+0000] [header.info] [::0] useWindowsEventLog = <unset>
[2021-01-20T07:49:51.248+0000] [header.info] [::0] zlibCompressionLevel = <unset>
[2021-01-20T07:49:51.444+0000] [agent.info] [monitoringexecutor/executor.go:Execute:99] Starting monitoring module
[2021-01-20T07:49:51.444+0000] [services.compressor.info] [monitoring/encoding.go:NewPoolingZlibCompressor:243] Starting 2 compressor handlers
[2021-01-20T07:49:51.444+0000] [agent.info] [monitoring/agent.go:NewOnlineAgent:288] Creating generic consumer pool with a limit of 4 consumers
[2021-01-20T07:49:51.490+0000] [agent.error] [monitoring/agent.go:Iterate:171] Failed to fetch Conf
Failure getting conf. Op: Get Err: dial tcp: lookup ops-manager-svc.mongodb.svc.cluster.local: Temporary failure in name resolution
        at cm/monitoring/server.go:167
        at cm/monitoring/server.go:219
        at cm/monitoring/agent.go:169
        at cm/monitoring/agent.go:261
        at monitoring/monitoringexecutor/executor.go:125
        at louisaberger/procexec/concurrency.go:45
        at src/runtime/asm_amd64.s:1373
[2021-01-20T07:49:51.490+0000] [agent.info] [monitoring/agent.go:Run:266] Done. Sleeping for 27s...

Obviously, the monitoring and the backup modules are still using internal ops manager URL. However, I don't find anywhere else to configure the mmsBaseUrl except /etc/mongodb-mms/automation-agent.config.

Since MongoDB Agent is not open-sourced, we cannot figure out how it get mmsBaseUrl.

A simple workaround is to use hosts file to map the internal endpoint to the Ops Manager service public IP (e.g. myopsmanager.com IP), edit /etc/hosts:

xxx.xxx.xxx.xxx ops-manager-svc.mongodb.svc.cluster.local
And it works.