[04:36:18] Starting to switch s7 codfw master [05:54:26] Hi! I have been getting 500 errors with "Error in Cassandra table storage backend" for Wikidata RestBASE requests for a while now. See https://phabricator.wikimedia.org/T366414 . Do I need to migrate to the new Rest API? [06:16:53] Going to switch s4 codfw master [06:17:11] CountCount: I guess we'll need urando.m to answer that, but he's in a different timezone [06:17:42] CountCount: I've subscribed him to the ticket [06:18:02] marostegui: Thank you. [06:49:33] s4 codfw done [08:56:22] db2207 repool is over, will retry s2 codfw switch master [09:19:40] fyi i'm having issue on codfw s2 [09:20:49] looks like a semi-sync issue, trying to figure out who's trigeering the Slave_SQL_Running_State: Waiting for semi-sync ACK from slave [09:27:44] still having the error on db2207 after disabling semi-sync replication on all replica [09:29:23] I'd be tempted to revert my switchmaster [09:29:50] but I lack the xp to be sure it's doable in this state (cc Amir1 marostegui ) [09:51:21] I can't depool replicas because I'd reach an unauthorized level on dbctl [10:27:13] arnaudb: let me check [10:27:24] Amir1: wait [10:27:28] incident is over [10:27:34] oh okay [10:28:07] deus ex machina/timeout, idk what but something unclogged what was not handled by codfw s2 master [10:46:54] Emperor: Sorry but I saw this alert, I don't know if you're aware: [10:46:55] FIRING: [7x] DiskSpace: Disk space thanos-be1001:9100:/srv/swift-storage/sdd1 5.577% free [10:49:51] Amir1: thanks for mentioning; it's related to T351927 (and I have pinged g.odog on the ticket already, last week, I think/hope they're on it) [10:49:51] T351927: Decide and tweak Thanos retention - https://phabricator.wikimedia.org/T351927 [10:50:17] ah, Thanks. Sorry for the extra ping [10:59:16] no probs, the odd extra ping is better than something getting ignored :) [11:53:34] in the past three months, cluddb1021 shrank by 1.6TB: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=clouddb1021&var-datasource=thanos&var-cluster=mysql&viewPanel=28&from=1710265806319&to=1718020361201 [13:30:44] gmeet just killed my browser, with you shortly. [14:45:47] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 12 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [14:45:47] PROBLEM - MariaDB sustained replica lag on s4 on db1221 is CRITICAL: 46.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1221&var-port=9104 [14:48:47] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [14:52:47] RECOVERY - MariaDB sustained replica lag on s4 on db1221 is OK: (C)10 ge (W)5 ge 4.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1221&var-port=9104