[07:30:16] good morning folks! [07:30:31] if you are ok I'll start with the reimage of kafka-main1003 [07:30:51] (downtime + stop all services beforehand, kick-off reimage) [08:08:11] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1003.eqiad.wmnet with OS buster [08:42:06] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1003.eqiad.wmnet with OS buster completed: - kafka-main1003 (**WARN**) - Downtimed on Ici... [08:42:22] kafka-main1003 on buster! Will proceed with 1002 in ~20/30 mins [09:26:46] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1002.eqiad.wmnet with OS buster [09:26:49] reimaging 1002 [09:59:46] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1002.eqiad.wmnet with OS buster completed: - kafka-main1002 (**WARN**) - Downtimed on Ici... [10:00:11] and 1002 done [10:00:13] one to go :) [10:24:21] proceeding with 1001 [10:30:00] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1001.eqiad.wmnet with OS buster [10:32:49] 10serviceops, 10SRE: Clean up old Docker images on deneb - https://phabricator.wikimedia.org/T287222 (10jbond) 05Open→03Resolved a:03jbond I have cleaned this up, seems like an old build environment hadn't torne its self down properly. i have manually cleaned up [11:03:14] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1001.eqiad.wmnet with OS buster completed: - kafka-main1001 (**WARN**) - Downtimed on Ici... [11:09:53] aaand kafka main on busteR! [11:12:35] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) 05Open→03Resolved a:03elukey All nodes on buster! [11:17:51] 10serviceops, 10MW-on-K8s, 10SRE-swift-storage, 10Shellbox, 10Patch-For-Review: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10Joe) I did the following test: # - Try to upload the image via Special:Upload to testwiki using "upload via url", which currently has `wmgUsePage... [11:27:53] 10serviceops, 10Release-Engineering-Team, 10Scap: Deploy Scap version 4.1.1 - https://phabricator.wikimedia.org/T298986 (10Joe) 05Open→03Resolved [14:07:22] elukey: woohooo thank you! [15:13:16] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: setup/install kubestage100[34] - https://phabricator.wikimedia.org/T293729 (10Arnoldokoth) [15:34:10] 10serviceops, 10MW-on-K8s, 10SRE-swift-storage, 10Shellbox, 10Patch-For-Review: Support large files in Shellbox - https://phabricator.wikimedia.org/T292322 (10Joe) On mwdebug1002 I have set the excimer time limit (in mediawiki's code), the envoy timeout, the apache timeout, the php-fpm request_terminate_... [15:42:27] 10serviceops, 10Release-Engineering-Team, 10Scap: Deploy Scap version 4.1.1 - https://phabricator.wikimedia.org/T298986 (10dancy) Thanks @Joe! [16:32:58] 10serviceops, 10Data-Engineering, 10Patch-For-Review: Move kafka clusters to fixed uid/gid - https://phabricator.wikimedia.org/T296982 (10herron) [16:33:13] 10serviceops, 10Data-Engineering, 10Patch-For-Review: Move kafka clusters to fixed uid/gid - https://phabricator.wikimedia.org/T296982 (10herron) [16:37:58] 10serviceops, 10Patch-For-Review: Upgrade kafka-main nodes to buster - https://phabricator.wikimedia.org/T296641 (10elukey) [16:38:32] 10serviceops, 10Data-Engineering, 10Patch-For-Review: Move kafka clusters to fixed uid/gid - https://phabricator.wikimedia.org/T296982 (10elukey) 05Open→03Resolved a:03elukey [17:42:13] 10serviceops, 10DC-Ops, 10SRE, 10decommission-hardware, and 2 others: decommission kubestage100[12]-eqiad - https://phabricator.wikimedia.org/T299142 (10RobH) [20:52:34] 10serviceops, 10Release-Engineering-Team (Seen): contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10hashar) [20:58:02] 10serviceops, 10Release-Engineering-Team (Seen): contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) There is now the new procurement ticket T299081 @hashar also see my comment about OS version on T299081#7620086 [20:58:25] 10serviceops, 10Release-Engineering-Team (Seen): contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) So there will be contint2002 (which could or could not make this contint2001.mgmt debugging moot). There is also the question whether that has anything to do with an OS version upgrad... [20:59:17] 10serviceops, 10Release-Engineering-Team (Seen): contint hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) [20:59:57] 10serviceops, 10Gerrit, 10SRE: replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn) [21:00:48] 10serviceops, 10Gerrit, 10SRE: replacement for gerrit2001 - https://phabricator.wikimedia.org/T243027 (10Dzahn) There is now the new procurement ticket T299081