[00:04:22] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on es4 on es1022 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es1022&var-port=9104
[00:05:22] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on es4 on es1022 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=es1022&var-port=9104
[07:21:23] <arnaudb>	 poor db1234, its name was too perfect to type on a TKL
[07:21:27] <Emperor>	 is it me or have we had a lot of sad database servers recently?
[07:22:07] <arnaudb>	 I can't say for "before me" but the rate has increased recently indeed!
[08:35:44] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s4 on db2206 is CRITICAL: 44 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2206&var-port=9104
[08:37:46] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s4 on db2206 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2206&var-port=9104
[10:49:51] <kwakuofori>	 morning, everyone. back at the office. please let me know if there's something that needs following up
[10:50:00] <kormat>	 kwakuofori: welcome back :)
[10:50:31] <kwakuofori>	 thanks, kormat
[12:07:39] <jynus>	 corruption on db1156
[12:08:38] <jynus>	 probably just needs an index rebuild, it is not a mw host so not jumping in
[12:11:56] <Amir1>	 I get to it
[13:52:25] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 6480 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312
[13:55:16] <jynus>	 ^ expected?
[13:55:49] <jynus>	 I guess it is a fallout of db1156
[13:58:06] <arnaudb>	 don't know but it's reducing
[13:58:58] <arnaudb>	 and its back in sync
[13:59:04] <jynus>	 nice
[13:59:13] <Amir1>	 that was the db1156
[13:59:19] <Amir1>	 I fixed it
[13:59:27] <arnaudb>	 great!
[13:59:33] <Amir1>	 https://phabricator.wikimedia.org/T363161
[14:03:26] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s2 on db1155 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312
[14:13:46] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on s7 on db2122 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2122&var-port=9104
[14:14:46] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on s7 on db2122 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2122&var-port=9104
[14:39:45] <elukey>	 urandom: o/ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1021915 when you have time :)
[15:07:00] <urandom>	 elukey: oh boy, it's happening!
[15:14:17] <elukey>	 urandom: yesss.. ok if we restart restbase-codfw now?
[15:14:27] <elukey>	 so it picks up the truststore
[15:23:24] <urandom>	 elukey: did you canary it somewhere?
[15:23:43] <urandom>	 restbase always makes me (extra) nervous
[15:23:58] <urandom>	 but in general, I'm ok to proceed, yes
[15:24:03] <urandom>	 elukey: ^^^
[15:25:23] <elukey>	 urandom: I didn't, if you want I can disable puppet, run it on one node, restart cassandras and you can double check
[15:25:26] <elukey>	 then we can proceed
[15:25:28] <elukey>	 wdyt?
[15:37:02] <jynus>	 Last job for this section: dump.matomo.2024-04-23--04-13-04 failed! CC btullis I will check now why it failed, but FYI, in case there is known maintenance or something
[15:37:59] <jynus>	 Error connecting to database: Access denied for user 'dump'@'10.64.16.31'
[15:39:06] <jynus>	 ^ Amir1 almost sure you haven't touched the dump grants for db1208.eqiad.wmnet, but asking you to discard that first as it will be easier
[15:39:29] <jynus>	 most likely it will be something ip-related or maintenance or something
[15:39:37] <Amir1>	 I haven't touched dump grants
[15:40:00] <Amir1>	 at least as much as I remember 
[15:40:07] <jynus>	 thank you, as I expected
[15:40:30] <jynus>	 then debugging the hard way :-D
[15:41:14] <jynus>	 yeah, the user is gone. Probaly it was cloned from production
[15:41:22] <jynus>	 (the data)
[15:41:35] <jynus>	 will ask DE team
[15:43:03] <jynus>	 probably T349397
[15:43:04] <stashbot>	 T349397: Migrate the matomo host to bookworm - https://phabricator.wikimedia.org/T349397
[15:43:07] <jynus>	 will comment there
[15:46:11] <jynus>	 I reopened https://phabricator.wikimedia.org/T349397#9736182 in case Ben reads this later, as I am 99% it is that issue, and should be easy to correct.
[15:46:20] <jynus>	 *sure
[15:50:37] <urandom>	 elukey: up to you; starting w/ codfw is a canary too in a way
[15:56:16] <elukey>	 urandom: will do it tomorrow :)
[15:56:56] <urandom>	 :)
[15:58:14] <urandom>	 sorry for being slow to respond, my broadband has been a bit flaky, and it seems like a few notifications have been delayed (or dropped)
[15:59:45] <urandom>	 I just got a text apology from Google, notice of a (ridiculously small) credit, and a promise that the disruptions are over 🙂
[16:10:42] <btullis>	 jynus: Oh, sorry. I'm totally sure that's me. I migrated matomo from matomo1002 to matomo1003 last week and I probably just dropped the grant by mistake.
[16:12:48] <jynus>	 there was no mistake, just checking why
[16:13:05] <jynus>	 will add the account, retry backups and update the ticket
[16:20:46] <elukey>	 urandom: np! I'll wait for you tomorrow before proceeding so we can check together the canary, restbases makes me nervous as well :D
[16:20:59] <elukey>	 but the worst should be session store probably :D
[16:21:53] <urandom>	 nothing restbase ever works as it ought to
[16:22:31] <urandom>	 sessionstore I expect to be easy and carefree, but obviously there is more at stake :)