[04:36:18] <marostegui>	 Starting to switch s7 codfw master
[05:54:26] <CountCount>	 Hi! I have been getting 500 errors with "Error in Cassandra table storage backend" for Wikidata RestBASE requests for a while now. See https://phabricator.wikimedia.org/T366414 . Do I need to migrate to the new Rest API?
[06:16:53] <marostegui>	 Going to switch s4 codfw master
[06:17:11] <marostegui>	 CountCount: I guess we'll need urando.m to answer that, but he's in a different timezone
[06:17:42] <marostegui>	 CountCount: I've subscribed him to the ticket
[06:18:02] <CountCount>	 marostegui: Thank you.
[06:49:33] <marostegui>	 s4 codfw done
[08:56:22] <arnaudb>	 db2207 repool is over, will retry s2 codfw switch master
[09:19:40] <arnaudb>	 fyi i'm having issue on codfw s2
[09:20:49] <arnaudb>	 looks like a semi-sync issue, trying to figure out who's trigeering the        Slave_SQL_Running_State: Waiting for semi-sync ACK from slave
[09:27:44] <arnaudb>	 still having the error on db2207 after disabling semi-sync replication on all replica
[09:29:23] <arnaudb>	 I'd be tempted to revert my switchmaster
[09:29:50] <arnaudb>	 but I lack the xp to be sure it's doable in this state (cc Amir1 marostegui )
[09:51:21] <arnaudb>	 I can't depool replicas because I'd reach an unauthorized level on dbctl
[10:27:13] <Amir1>	 arnaudb: let me check
[10:27:24] <arnaudb>	 Amir1: wait
[10:27:28] <arnaudb>	 incident is over
[10:27:34] <Amir1>	 oh okay
[10:28:07] <arnaudb>	 deus ex machina/timeout, idk what but something unclogged what was not handled by codfw s2 master
[10:46:54] <Amir1>	 Emperor: Sorry but I saw this alert, I don't know if you're aware:
[10:46:55] <Amir1>	 FIRING: [7x] DiskSpace: Disk space thanos-be1001:9100:/srv/swift-storage/sdd1 5.577% free 
[10:49:51] <Emperor>	 Amir1: thanks for mentioning; it's related to T351927 (and I have pinged g.odog on the ticket already, last week, I think/hope they're on it)
[10:49:51] <stashbot>	 T351927: Decide and tweak Thanos retention - https://phabricator.wikimedia.org/T351927
[10:50:17] <Amir1>	 ah, Thanks. Sorry for the extra ping
[10:59:16] <Emperor>	 no probs, the odd extra ping is better than something getting ignored :)
[11:53:34] <Amir1>	 in the past three months, cluddb1021 shrank by 1.6TB: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddb1021&var-datasource=thanos&var-cluster=mysql&viewPanel=28&from=1710265806319&to=1718020361201
[13:30:44] <Emperor>	 gmeet just killed my browser, with you shortly.
[14:45:47] <icinga-wm_>	 PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 12 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104
[14:45:47] <icinga-wm_>	 PROBLEM - MariaDB sustained replica lag on s4 on db1221 is CRITICAL: 46.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1221&var-port=9104
[14:48:47] <icinga-wm_>	 RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104
[14:52:47] <icinga-wm_>	 RECOVERY - MariaDB sustained replica lag on s4 on db1221 is OK: (C)10 ge (W)5 ge 4.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1221&var-port=9104