[04:28:03] I'm going to start switching s5 primary [05:40:48] how can be pagelinks 120G in frwiki :O [07:09:09] I just pushed this https://gerrit.wikimedia.org/r/c/operations/puppet/+/1042825 - it has been missed for years XD [08:06:52] marostegui: 10:05:14 <+icinga-wm_> PROBLEM - MariaDB Replica SQL: s2 on db2125 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table recentchanges is corrupt: try to repair it on query. Default database: cswiki. [Query snipped] [08:06:52] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica do you want me to rebuild the index? or do you want to take a look? [08:09:12] I will get that fixed arnaudb thanks! [08:09:20] arnaudb: can you create a task? [08:09:23] so we can track these things [08:09:33] I've added to T367261 [08:09:33] T367261: Rebuild recentchanges table everywhere - https://phabricator.wikimedia.org/T367261 [08:09:36] do you want a separate one ? [08:09:41] had a rebuild there had already happened and failed afterwards or wasn't touched? [08:10:05] jynus: not done yet [08:10:30] is it scheduled everywhere? I am considering otherwise doing prevently on backup sources [08:10:37] yes, it is scheduled everywhere [08:10:44] ok, then not touching anything [08:10:44] s3 is done and s6 is running [08:10:45] jynus: you can follow on T367261 [08:10:48] thanks [08:11:04] (the host is fixed) [08:11:16] are you repooling it or do you want me to marostegui ? [08:11:22] arnaudb: please do :) [08:11:23] thanks [08:11:25] ack [09:26:15] I'm going afk for a bit [11:54:46] jynus: The cumin2002 reboot will happen in an hour, are you backup jobs already completed? [11:55:33] yeah, 45 minutes ago [11:55:43] thank you for asking, moritzm ! [11:56:05] ack, thansk for doublechecking, then I'll go ahead in an hour [11:56:11] actually, the codfw one was 3 hours ago, 45 minutes for eqiad [11:56:37] what would be a good time to do the same for cumin1002? [11:56:56] any time after 11:45 UTC [11:58:03] (I've included a buffer there already) [13:05:20] ^ marostegui, arnaudb, Amir1: what would be a good day next week to reboot cumin1002? [13:05:46] I've got no schema change scheduled so I won't be blocking here :) [13:05:57] later in the week would be better for me, I have s2 schema change that's going to take a while [13:06:25] e.g . Thursday in a week? [13:06:30] SGTM [13:11:12] Sounds good to me too [13:11:21] Can you send a calendar invite so we all are aware? [13:11:43] I'll remind that in the team meeting [13:11:53] Amir1: can you do the calendar invite? [13:12:09] sure [13:12:12] I will (with invites for you, Amir1, jynus and arnaudb) [13:12:18] thanks! [13:13:08] okay, since Moritz is doing it, I won't. [13:15:14] just done it [13:16:04] marostegui: is the old master of s5 (eqiad, db1230) free now? [13:26:40] It is [13:26:40] Hi folks, anyone up to reviewing https://gerrit.wikimedia.org/r/c/operations/puppet/+/1043061 please? Changing how we partition moss* nodes [13:27:10] mostly flagrantly stolen from ms-be_simple.cfg, but only 1 big partition, and no LVM erasure [13:27:59] Danke moritzm [13:43:36] awesome. gracias marostegui [13:43:41] de nada XD [13:58:40] marostegui: we're testing w/ volans the new spicerack modules on db1125, is it ok for us to trash the databases on it? [13:59:02] yeah, you can do whatever you want with it, but leave it up once you are done [13:59:08] so it doesn't alert on orchestrator [14:00:03] sure! [14:00:35] <3 thx [14:09:18] I would be grateful if anyone could look at my CR today, please, so I can try reimaging some of the apus/moss nodes... [15:09:15] Also https://gerrit.wikimedia.org/r/c/operations/puppet/+/1043115 to set up the codfw apus cluster, although that's a bit less of an immediate blocker [15:10:15] done! [15:11:06] and done! [15:12:46] marostegui: FYI in the end we had some mercy and didn't trash the database on db1125, just restarted it few times and run upgrade [15:13:14] (mercy for me that would have had to restore the dbs :p) [15:13:29] also for the db itself [15:13:30] :D [15:13:48] as french would say: merci. [15:14:14] (sorry, I'm getting tired x) [15:15:59] arnaudb: thanks :) [15:24:08] I'm tracking down remaining dependencies on Puppet 5 infra, which servers access the swift ring data from volatile? the data is currently synched from the Puppet 5 and Puppet 7 servers and I'm wondering if we can already stop doing so for Puppet 5 [15:26:39] moritzm: all the nodes in the relevant swift cluster (so ms* and thanos-* ) [15:28:24] ok, given those are all on Puppet 7 by now, I'll make a patch to drop the Puppet 5 sync, then [19:50:53] I don't know if people are aware, now there are 180 disk space warnings for thanos for many hosts (not just one): https://alerts.wikimedia.org/?q=alertname%21%3DSystemdUnitFailed&q=team%3D~data-persistence&q=%40state%3Dactive [20:30:12] Hey all! A contractor thisdot is doing some work on an improved vanishing workflow and there are some schema changes associated with it. Would one of your team be able to take a quick gander at the patch? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/1042323 [20:31:11] (I'm following the "involve as many relevant people as possible" bit of https://wikitech.wikimedia.org/wiki/Schema_changes ) [20:37:00] @Amir1 apologies, just seen you've already looked at the task [20:50:09] Seddon: ah, yeah, I think it's ready from DBA point of view, I commented on the patches several times. OTOH, please file a schema change in production ticket now [20:50:50] https://wikitech.wikimedia.org/wiki/Schema_changes [20:55:25] yeah clocked that! Will do! I might have further questions