[08:00:24] Cteam: welcome to today 🦄! Don’t forget to post your update in thread. [08:00:24] Feel free to include: [08:00:24] 1. 🕫 Anything you'd like to share about your work [08:00:24] 2. ☏ Anything you'd like to get help with [08:00:24] 3. ⚠ Anything you're currently blocked on [08:00:24] (this message is from a toolforge job under the admin project) [08:13:27] Done: [08:13:27] * Not much, lots of things happening yesterday, was kind of a running-after-events day [08:13:27] Doing: [08:13:27] * With the ceph hiccup, we discovered that a bunch (14) ceph osd nodes have all their disks affected by bad sectors, and degrading, so started a task to replace them [08:13:27] * Finished undraining all the ceph osd nodes, so the cluster is whole again, have to plan how to handle the disk issues + the drainage, but in any case, have to free some space if possible, so will continue digging for backups and deleted VMs to cleanup. [08:13:27] * I want to add the bad sector gathering + alerting [08:13:27] * I want to deploy the latest endpoint to the builds api [08:13:28] * I want to add paging from metricsinfra [08:13:28] * I want to add/look into alerts for the whole cloudvps/tools/paws/superset being down [08:13:29] Blockers: [08:13:29] * lack of time xd [09:12:26] Yesterday: [09:12:38] * T348643 (debugging disk errors) [09:12:39] T348643: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 [09:12:57] * T348668 (debugging Trove error) [09:12:58] T348668: Trove instances not being created or restarted with configuration group applied - https://phabricator.wikimedia.org/T348668 [09:13:07] Today: [09:13:23] * continuing with T348668 [09:14:12] * creating a decision request about our incident response process [09:15:16] * fixing tox in wmcs-cookbooks [09:24:12] yesterday: [09:24:12] * dealt with ceph SlowOps fallout [09:24:12] * updating https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Upgrading_Kubernetes [09:24:12] today: [09:24:12] * poking at metricsinfra stuff [09:24:13] blockers: [09:24:13] * cloudvirt-wdqs moves [09:24:14] * https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/2 [14:28:35] today: [14:28:36] • Testing, documenting radosgw things