[01:10:38] PROBLEM - MariaDB sustained replica lag on m1 on db1217 is CRITICAL: 8.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:12:06] RECOVERY - MariaDB sustained replica lag on m1 on db1217 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1217&var-port=13321 [01:13:24] PROBLEM - MariaDB sustained replica lag on m1 on db2160 is CRITICAL: 17.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [01:14:52] RECOVERY - MariaDB sustained replica lag on m1 on db2160 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2160&var-port=13321 [05:40:27] I am going to failover m5 proxy [06:37:38] good morning y'all [06:41:44] arnaudb: Good morning and welcome o/ [06:42:41] thank you! [06:51:17] good morning arnaudb [06:52:08] hello jynus [06:56:24] arnaudb: welcome! (Manuel here) [06:58:22] thank you! [09:23:46] Emperor: o/ [09:24:05] elukey: he's out today [09:24:09] when you have a moment do you mind to let me know if https://phabricator.wikimedia.org/T345058 is crazy? [09:24:12] marostegui: ahhh okok [09:24:22] I have a cassandra procedure to do [09:24:41] * marostegui runs [09:24:52] elukey: you might want to ping urando.m once's he's online [09:25:11] <3 [09:25:32] people don't trust me in #serviceops, I thought to get some love in here, but same thing [09:25:43] :D :D :D [09:25:44] elukey: maybe it is time to get the hint [09:26:11] marostegui: exactly yes, thanks for the direct feedback :D [09:26:29] my pleasure, you know you can always come here for some more [09:28:28] I will [09:29:22] <3 [09:31:50] <3 [10:20:37] marostegui: I haven't touched s6 for reboots nor schema changes. But that's basically the only section left in some cases. Can I do stuff on it? :D [10:20:50] yeah [10:21:00] I need to fist test RBR on sanitarium on codfw [10:21:11] Amir1: However, be _very_ careful with sanitarium hosts [10:21:18] How does the script stop mariadb? [10:21:36] https://gerrit.wikimedia.org/r/c/operations/software/+/830659/1/dbtools/auto_schema/rolling_restart.py [10:21:42] stop slave; [10:21:46] then stop mariadb [10:22:01] Actually, you've rebooted them before already for all the other sections right? [10:22:09] I'm not planning to touch sanitarium yet. Its master will do that though [10:22:18] Fine yeah [10:22:20] The master is ok [10:22:31] cool [10:22:45] I leave reboot santariums to you, you seem to like it :D [10:22:52] :-( [10:23:14] if you do the one that has s6 on it, I can take care of the rest [10:23:45] https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=1&var-server=db1106&var-datasource=thanos&var-cluster=wmcs&from=1693202599462&to=1693212280891 [10:23:47] yeah I need to install 10.6 there [10:23:53] btw, extlinks drop of s1: 60GB [10:23:57] It is going to take me a few days, I want to test stuff on codfw [10:24:15] sure, I'm on clinic duty this week [10:24:16] no rush [13:12:22] Amir1: Going to restart db1155 RIP [13:12:33] Let's see what breaks this time :-/ [13:12:46] 🤞 [13:13:00] I should make it a cron job [13:13:18] like chaos monkey but instead of monkey, a godzilla [13:13:47] bonus point: Doing it on 2am sundays [13:17:54] ok, host back and replication started [13:17:55] we'll see [13:49:29] jynus: can you remind me when m2 backups are taken? [13:49:46] basically when I can stop m2 backup source for a couple of hours [13:50:12] right now until 0 utc [13:50:20] ok! thanks! [13:50:39] or starting I think around 14 or so (they take while long because of OTRS, precisely) [13:50:57] So, you mean they are starting nowish or that I can do it nowish :) [13:50:57] tomorrow [13:51:05] now it is free [13:51:12] great thanks [14:11:09] marostegui: how did you learn about DC switchover! :-D [14:11:11] ? [14:11:38] jynus: The switchover ghost is always around us [14:11:48] uuuuuuuuuh [14:12:22] do you know who is leading it? [14:12:51] I don't know who is specifically doing it this year from serviceops, nop [14:12:54] claime ^ [14:13:24] kamila will handle it, but we have a kickoff meeting this week to determine exactly when it happens [14:13:37] nice [14:13:49] It should have been sept 20th, but it probably won't be [14:13:56] claime: could you tell her to talk to me once that's is decided? [14:13:59] ofc [14:14:04] thanks <3 [15:05:08] extlinks in s4 drops 230GB in some hosts and 140GB in some other https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=28&orgId=1&var-server=db2099&var-datasource=thanos&var-cluster=wmcs&from=1693216211027&to=1693235021027 [15:44:38] sobanski: db1118 is ready, I will post on the task [18:51:25] jynus: is there an onboarding task/page already for arnaudb ? [18:51:57] yes [18:52:02] I will pm it to you