[00:08:18] 10serviceops, 10Observability-Logging: Mutate mmkubernetes k8s fields into ECS fields - https://phabricator.wikimedia.org/T292881 (10colewhite) [00:55:19] 10serviceops, 10MW-on-K8s, 10MediaWiki-SettingsLoader, 10Continuous-Integration-Config, 10Patch-For-Review: Install php-yaml for use by SettingsLoader - https://phabricator.wikimedia.org/T296331 (10Legoktm) 05In progress→03Resolved Done \o/ [00:56:58] 10serviceops, 10DBA, 10Patch-For-Review: Shutdown Tendril and dbtree - https://phabricator.wikimedia.org/T297605 (10Dzahn) [09:13:21] 10serviceops, 10Service-deployment-requests: New Service Request memcached-wikifunctions - https://phabricator.wikimedia.org/T297815 (10MatthewVernon) [10:17:15] hello folks [10:17:34] if you are ok, now that kafka-main2003 has bios+nic upgraded, I can attempt a reimage to buster [10:20:47] <_joe_> +1 [10:31:33] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main2003.codfw.wmnet with OS buster [10:39:36] ok the bios + nic firmware helped, and the partman reuse recipe worked, it is installing buster [10:47:21] \o/ [10:47:31] <_joe_> great [10:51:53] nice :-) [11:06:40] kafka-main2003 on buster and recovering missing data [11:06:56] the partman reuse recipe worked, as well as the fixed uid/gid [11:07:19] so we should be good to upgrade all the other main nodes [11:08:40] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main2003.codfw.wmnet with OS buster completed: - kafka-main2003 (**WARN**) - Downtimed on Ici... [11:12:04] \o/ [11:16:54] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) [11:17:22] 10serviceops, 10SRE, 10ops-codfw: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed - https://phabricator.wikimedia.org/T297422 (10elukey) 05Resolved→03Open @Papaul the upgrade worked, I reimaged kafka-main2003 this morning! I'd need to upgrade kafka-mai... [11:19:15] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) The BIOS+NIC upgrade for kafka-main2003 worked, it is now running buster. The partman reuse recipe and the fixed uid/gid worked as well. Next steps: 1) Upgrade BIOS+NIC on kafka-main200... [11:19:30] I left some notes for the next steps in --^ [11:20:36] I may be able to work with Papaul to upgrade the firmawares on main200[12] today/tomorrow, but the rest will be done in 2022 :) [11:29:32] <_joe_> yeah :) [11:29:45] <_joe_> thanks a ton for doing this [11:41:02] 10serviceops, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Priority Backlog 📥): PipelineLib deploy is broken and needs refactoring to use helm3 - https://phabricator.wikimedia.org/T297809 (10Jelto) Hey, thanks for reaching out and sorry for the inconvenience with removal of `helm2`.... [12:22:21] 10serviceops, 10MW-on-K8s: On the kube-experimental mwdebug cluster, MediaWiki sees all edits as coming from localhost - https://phabricator.wikimedia.org/T297613 (10Joe) Sadly the story is more complex; in fact, only requests coming from the edge contain X-Client-Ip by default, so we need to inject it into an... [14:34:01] 10serviceops, 10SRE, 10ops-codfw: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed - https://phabricator.wikimedia.org/T297422 (10elukey) [14:34:26] folks Papaul may have time now for firmware upgrade, I'll take down kafka-main2001 [14:38:15] <_joe_> elukey: ++ [15:36:49] 10serviceops, 10SRE, 10ops-codfw: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed - https://phabricator.wikimedia.org/T297422 (10Papaul) [15:44:47] jelto, arnoldokoth: there's new Icinga alerts for gitlab-runner1001/2001: DISK CRITICAL - /run/docker/netns/3cd24ebca5bb is not accessible: Permission denied [15:45:24] by default /etc/nagios/nrpe.d/check_disk_space.cfg checks all mounts [15:45:57] but in this case it doesn't have the necessary permissions to access the nsfs mount used by Docker [15:47:17] but there's a hiera option to pass custom flags: profile::base::check_disk_options [15:48:28] and with that you can pass an additional option to /usr/lib/nagios/plugins/check_disk, e.g. -i "/run/docker" [16:58:05] 10serviceops, 10SRE, 10ops-codfw: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed - https://phabricator.wikimedia.org/T297422 (10Papaul) [17:11:39] 10serviceops, 10SRE, 10ops-codfw: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed - https://phabricator.wikimedia.org/T297422 (10Papaul) 05Open→03Resolved [17:11:43] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10Papaul) [17:25:18] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) 05Stalled→03Open Thanks to Papaul, kafka-main200[12] are now ready to be reimaged :) [17:27:53] kafka-main200[12] with firmware upgraded, I may be able to reimage them tomorrow :) [17:51:29] moritzm: Looking into that. [20:10:33] 10serviceops, 10Parsoid, 10Patch-For-Review: Compare Parsoid perf on current production servers vs a newer test server - https://phabricator.wikimedia.org/T297259 (10Legoktm) OK, mw1456 is depooled and should have PHP/envoy configured the same as parsoid servers do. Once the train rolls out, I'll start runni...