[04:54:16] good morning, we have s5 switchover soon. That enables us to drop old columns templatelinks in cebwiki [06:13:19] morning; I have a couple of errands to run, so will be starting a bit late today. Also I have a pile of email to catch up on, so maybe ping me if there's something urgent requiring my attention [06:36:21] great job, Amir1 with the switchover [06:37:14] I will be doing the data check on es2021 against eqiad [06:38:38] it shouldn't affect production unless a very bad corruption case is detected, in which case mysql server self-crashes (unlikely, but a possibility) [06:43:17] jynus: Thanks! [06:54:03] I've updated the DBError dashboard to filter out cloudweb2002-dev, as that filter didn't work since the host switch [07:20:26] jynus: Thanks. I have been doing it manually so many time I forgot to just add it [09:17:59] Amir1: there is going to be maintenance- I belive an upgrade as part of it for librenms- I was asked to perform a backup before it (and I will do) but normally we also stop replication on one replica for even faster recovery [09:18:37] noted [09:18:38] wanted to be sure you would be ok with me stopping replication on e.g. db1117:m1 for 30 minutes or so [09:19:07] I think it is scheduled for 13 UTC, so around that time [09:19:17] (will take care of downtimes, etc.) [09:19:43] sure, it doesn't get traffic to my knowledge [09:19:46] nothing ever happens and we just restart replication right away [09:20:01] but until the day in which it is needed! [09:20:19] basically contacting for awareness [09:21:17] will solidify the plans on the ticket [09:22:07] I think you also saw the list of affected servers on the meeting calendar- hopefully the list was useful to you [09:22:14] for tomorrow [09:22:59] (I will take care of downtime and shutting down the backup, and backup source dbs) [09:41:17] es2021 check is still ongoing, probably won't be done until tomorrow [15:43:00] I will leave db1117:m1 with its sql thread stopped for (my) night [15:43:32] I will restart it early tomorrow morning so it caches up before the pdu maintenance window [16:34:04] PROBLEM - MariaDB sustained replica lag on s5 on db2094 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2094&var-port=13315 [16:43:24] PROBLEM - MariaDB sustained replica lag on s5 on db2094 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2094&var-port=13315 [16:50:26] RECOVERY - MariaDB sustained replica lag on s5 on db2094 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2094&var-port=13315 [18:48:49] PROBLEM - MariaDB sustained replica lag on s5 on db2094 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2094&var-port=13315 [19:00:25] RECOVERY - MariaDB sustained replica lag on s5 on db2094 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2094&var-port=13315