[08:00:20] Cteam: welcome to today 🦄! Don’t forget to post your update in thread. [08:00:20] Feel free to include: [08:00:20] 1. 🕫 Anything you'd like to share about your work [08:00:20] 2. ☏ Anything you'd like to get help with [08:00:20] 3. ⚠ Anything you're currently blocked on [08:00:20] (this message is from a toolforge job under the admin project) [11:53:51] Friday: [11:53:53] * T345811, reimaged one cloudnet and failed over to it [11:53:54] T345811: [openstack] Upgrade eqiad hosts to bookworm - https://phabricator.wikimedia.org/T345811 [11:53:55] * T349695, tweaked some settings and added slow query logging [11:53:56] T349695: [toolsdb] MariaDB process is killed by OOM killer (October 2023) - https://phabricator.wikimedia.org/T349695 [11:53:57] Today: [11:53:59] * T345811, reimaging the second cloudnet [11:54:01] * T349695, checking the logs and looking for root causes [11:54:03] * some code reviews [13:16:16] done: [13:16:16] * some toolforge mail server improvements (reviews welcome starting from https://gerrit.wikimedia.org/r/c/operations/puppet/+/971890) [13:16:16] * some buildservice code reviews [13:16:16] * made a patch to auto-restart sssd-nss after network blips (https://gerrit.wikimedia.org/r/c/operations/puppet/+/970728) [13:16:16] * clinic duty stuff (toolforge access and quota requests [13:16:16] doing: [13:16:17] * thinking about improving toolforge quota management [15:31:48] Monday: I am going to actually try to work a full day today! I'm still messing with ceph (T348643) in the background but also hope to look at some recent cinder backup misbehavior. And I'll also likely spend a while re-stabilizing eqiad1 after dhinus reimages rabbitmq hosts :/ [15:31:49] T348643: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643