[01:08:25] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 7.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321
[01:08:35] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 14.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[01:11:17] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321
[01:12:53] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321
[03:52:39] <marostegui>	 jynus: at 15utc the phabricator migration will be happening today. in case you want to run some last minute backups 
[05:34:51] <Amir1>	 https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=1&var-server=db2109&var-datasource=thanos&var-cluster=wmcs&from=1692758235082&to=1692768874368
[09:39:17] <jynus>	 should I stop replication myself?
[09:39:21] <jynus>	 Re: phabricator
[11:27:11] <marostegui>	 jynus: No, I will do it
[11:27:14] <marostegui>	 But it is at 15:00 utc
[11:40:22] <jynus>	 no worries
[11:47:36] <marostegui>	 what I mean is that I will stop it on the "production" host
[11:47:52] <marostegui>	 the one we have for the quick failover
[11:51:17] <jynus>	 yes, leaving that to you. I meant I had planned for the backup already
[11:51:36] <marostegui>	 ah cool
[11:51:46] <jynus>	 (and tested, but waiting a bit to run it)
[11:53:49] <jynus>	 There is one minor thing- becasue dbprov1004 hasn't been upgraded to 10.6 yet, preparation will have to happen on the host, cannot be done beforehand (so xtrabackaup /srv/sqldata --prepare before start on the host)
[15:05:16] <jynus>	 marostegui: I have a screen with "✔ root@cumin1001:~$ # Do not run unless emergency, will break data # transfer.py --type=decompress dbprov1004.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.m3.2023-08-23--13-34-58.tar.gz db1164.eqiad.wmnet:/srv/sqldata.s3" pending, in case someone else has to run it in an emergency
[15:05:32] <marostegui>	 oh nice :)
[15:05:52] <marostegui>	 hopefully if that moment arrives, I can simply promote db1119 to master
[15:05:53] <jynus>	 that way one has not to remember the options
[15:06:02] <jynus>	 yes ofc
[15:14:52] <marostegui>	 I think I am going to leave replication stopped in the secondary DC
[15:14:56] <marostegui>	 Until tomorrow
[15:15:04] <marostegui>	 It wouldn't hurt
[15:15:27] <marostegui>	 And on the "hot backup" host too
[15:15:41] <marostegui>	 But I will restart it on the backup source (db1217) once I get the ok from brennen 
[15:16:10] <jynus>	 I will remove my screen, though, to prevent accidents