[01:12:09] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 18.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:12:17] PROBLEM - MariaDB sustained replica lag on m1 on db2078 is CRITICAL: 19.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [01:14:29] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [01:14:37] RECOVERY - MariaDB sustained replica lag on m1 on db2078 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [06:07:18] Can I get a review on https://gerrit.wikimedia.org/r/758715 ? [06:07:26] It is for tomorrow's switch [06:18:28] same for https://gerrit.wikimedia.org/r/c/operations/puppet/+/758716 and https://gerrit.wikimedia.org/r/c/operations/dns/+/758717 [07:04:54] volans: I just got this https://phabricator.wikimedia.org/T300473#7666386 from a reimage, but I am not sure how to try that manually, is that a special cookbook to run? [07:05:02] https://netbox.wikimedia.org/api/extras/job-results/2404824/ [07:25:29] marostegui: I can check later the logs, most likely the netbox script succeded but we were just unable to poll the results. Unfortunately Netbox keeps the results only of the last run. If there was another concurrent run of the same script against another host might explain this [07:26:34] is totally safe to re-run the Netbox script, from Netbox UI, top-right menu, go to Scripts, selext Import from PuppetDB (somethibg like that) [07:26:54] volans: Ah thanks - I can do that then. For the record, no other reimages going on as far as I know. Thanks! [07:26:56] put the hostname (not fqdn), mark commit changes and run it [07:27:37] will do thanks! [07:29:39] that's the last step of the cookbook so you can consider that completed anyway. The only other bit is that if the netbox status of the host is planned it gets changed to staged, not sure if that applies here [07:38:59] nah, it is active [07:40:17] it all went well according to the script output [07:57:06] ack thx [08:59:22] marostegui: so I checked the logs and this is weird, there were no other calls to the PuppetDB script on Netbox around that time. And all 10 retries to get the results retuned no data (as in result.json()['data'] was None) [08:59:44] volans: maybe a glitch? [09:01:25] unless a race condition with the ganeti sync script, let me see if there could be reasons why it doesn't log there [09:03:22] nah, it does an HTTP POST too, I would have seen that in the logs [09:03:54] marostegui: if it happens again let me know and I'll dig deeper and make the cookbook automatically retry the POST too, as the script is idempotent and we can re-run it safely [09:04:07] sorry for the trouble [09:04:39] volans: will do - so far it hasn't happened again and I have done quite a bunch of reimages alrady [09:04:47] In fact, this is the first time I have ever seen this error [09:05:19] me too [15:45:01] PROBLEM - MariaDB sustained replica lag on m1 on db2078 is CRITICAL: 66.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [15:45:33] PROBLEM - MariaDB sustained replica lag on m1 on db1117 is CRITICAL: 34 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321 [15:47:07] RECOVERY - MariaDB sustained replica lag on m1 on db2078 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2078&var-port=13321 [15:47:37] RECOVERY - MariaDB sustained replica lag on m1 on db1117 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1117&var-port=13321