[00:06:18] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [02:48:18] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudidm2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:35:51] 10Toolforge-standards-committee (Maintainer needed): wikistream.toolforge.org needs new maintainers - https://phabricator.wikimedia.org/T251555#10174070 (10Pintoch) p:05Low→03Triage [03:39:45] 10Toolforge-standards-committee (Maintainer needed): wikistream.toolforge.org needs new maintainers - https://phabricator.wikimedia.org/T251555#10174083 (10Pintoch) Sorry about that! I thought it could be interesting to get a better overview of the urgency of tasks in this board by trying to get a sense of how u... [03:43:00] (03PS1) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) [03:46:28] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [03:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:06:18] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [04:39:01] (03approved) 10raymond-ndibe: maintain-kubeusers: bump to 0.0.169-20240924215037-64da2c2e [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/529 (https://phabricator.wikimedia.org/T375157) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [04:39:05] (03merge) 10raymond-ndibe: maintain-kubeusers: bump to 0.0.169-20240924215037-64da2c2e [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/529 (https://phabricator.wikimedia.org/T375157) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [04:40:40] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers [04:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [04:42:34] !log raymondndibe@wmf3402 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers [04:42:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:15:27] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [05:15:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:16:45] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [05:16:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:17:36] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [05:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:18:52] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [05:18:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:29:37] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the toolsbeta cluster [05:32:21] !log raymond-ndibe@cloudcumin1001 toolsbeta END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the toolsbeta cluster [05:32:46] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the toolsbeta cluster [05:33:10] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the toolsbeta cluster [05:34:39] 10Data-Services, 06DBA: Prepare and check storage layer for moswiki - https://phabricator.wikimedia.org/T375568#10174133 (10ABran-WMF) a:03ABran-WMF [05:37:49] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-10 [05:37:51] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-10 [05:38:09] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-10 [05:38:10] !log raymond-ndibe@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-10 [05:38:56] !log raymond-ndibe@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the toolsbeta cluster [05:48:17] !log raymond-ndibe@cloudcumin1001 toolsbeta Added a new k8s worker-nfs toolsbeta-test-k8s-worker-nfs-10.toolsbeta.eqiad1.wikimedia.cloud to the cluster [05:48:17] !log raymond-ndibe@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the toolsbeta cluster [05:49:15] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [05:49:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:50:32] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [05:50:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [05:59:52] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [05:59:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:06:55] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [06:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:16:34] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [06:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:23:52] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [06:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:25:12] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [06:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:32:06] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [06:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:33:56] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_node for host toolsbeta-test-k8s-worker-nfs-7 [06:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:48:06] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [06:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [06:48:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudidm2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:49:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [06:55:23] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [06:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:02:24] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [07:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:04:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [07:15:10] !log raymondndibe@wmf3402 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-worker-nfs-7 [07:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:32:29] !log raymondndibe@wmf3402 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host toolsbeta-test-k8s-worker-nfs-7 [07:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:32:56] (03open) 10hashar: Remove unused #wikimedia-quibble [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/7 (https://phabricator.wikimedia.org/T346901) [07:46:17] !log raymondndibe@wmf3402 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-worker-nfs-7 [07:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [07:51:29] (03PS2) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) [07:54:56] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [07:56:07] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review, 07Upstream: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417#10174288 (10dcaro) [07:56:18] (03merge) 10jjmc89: Remove unused #wikimedia-quibble [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/7 (https://phabricator.wikimedia.org/T346901) (owner: 10hashar) [07:57:45] 10Toolforge (Toolforge iteration 15): [builds-cli,builds-api] `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701#10174286 (10dcaro) [07:57:49] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15), 05Goal: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#10174294 (10dcaro) [07:58:20] 06cloud-services-team, 10Toolforge (Toolforge iteration 15): toolforge: Refresh certs that are not controlled by kubeadm (mid 2024 edition) - https://phabricator.wikimedia.org/T309782#10174290 (10dcaro) [07:58:34] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [jobs-cli,jobs-api] quota shows different units for limit and usage - https://phabricator.wikimedia.org/T361120#10174283 (10dcaro) [07:59:05] 06cloud-services-team, 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: Toolforge: Replace all bastion with grid-less bookworm based bastion hosts - https://phabricator.wikimedia.org/T314665#10174292 (10dcaro) [07:59:12] 10Toolforge (Toolforge iteration 15): [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#10174302 (10dcaro) [07:59:23] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 [07:59:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:59:29] 10Toolforge (Toolforge iteration 15), 07Upstream: [builds-builder] golang based images get infinite nested loops for procfile entries - https://phabricator.wikimedia.org/T363417#10174300 (10dcaro) [08:00:25] 10Toolforge (Toolforge iteration 15), 07Upstream: [builds-builder,jobs-api,upstream] Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#10174298 (10dcaro) [08:00:27] 10Toolforge (Toolforge iteration 15), 07Documentation: [harbor,docs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#10174296 (10dcaro) [08:00:52] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 [08:00:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:01:23] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15), 07Epic: [Hypothesis] WE6.3.1 Consulting Toolforge roots/maintainers - https://phabricator.wikimedia.org/T368601#10174314 (10dcaro) [08:01:33] 06cloud-services-team, 10Toolforge (Toolforge iteration 15): [infra,k8s,monitoring] Add an alert to warn when the prometheus k8s cert is about to expire - https://phabricator.wikimedia.org/T366579#10174317 (10dcaro) [08:01:47] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [builds-builder,builds-api] Upgrade tekton - https://phabricator.wikimedia.org/T374908#10174310 (10dcaro) [08:02:05] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27 - https://phabricator.wikimedia.org/T359641#10174312 (10dcaro) [08:02:28] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [jobs-api] Split the API, business, and k8s models - https://phabricator.wikimedia.org/T359808#10174323 (10dcaro) [08:02:51] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [toolforge] Investigate authentication - https://phabricator.wikimedia.org/T363983#10174321 (10dcaro) [08:02:52] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: restarting a continuous jobs causes for some seconds two jobs are running side by side - https://phabricator.wikimedia.org/T375366#10174308 (10dcaro) [08:03:30] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#10174319 (10dcaro) [08:04:53] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15): [components-api] Get a skeleton of API webservice and implement `/tool//deploy` with build-only features - https://phabricator.wikimedia.org/T362069#10174327 (10dcaro) [08:06:17] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [jobs-api] Save business models in a DB - https://phabricator.wikimedia.org/T359650#10174325 (10dcaro) [08:06:18] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:06:23] 10Toolforge: [toolforge, toolforge-cli] Experiment with PyInstaller to package CLI tools for buildpack images - https://phabricator.wikimedia.org/T369693#10174350 (10dcaro) [08:06:26] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15), 07Epic: [Hypothesis] WE6.3.4 By building an "orchestrator" toolforge component (components-api) we will be able to automate most manually-triggered deployments - https://phabricator.wikimedia.org/T375199#10174352 (10dcaro) [08:06:29] 10Toolforge (Toolforge iteration 15): [maintain-dbusers] When it stops working (ex. nfs got stuck), it still replies as ok to prometheus - https://phabricator.wikimedia.org/T375224#10174353 (10dcaro) [08:06:30] 10Toolforge (Toolforge iteration 15): lima-kilo installation giving inconsistent result. Sometimes it works, sometimes it doesn't - https://phabricator.wikimedia.org/T375163#10174354 (10dcaro) [08:06:33] 10Toolforge: [toolforge-cli,jobs-cli,builds-cli,envvars-cli] Explore OpenAPI SDK tooling for client consolidation - https://phabricator.wikimedia.org/T356261#10174341 (10dcaro) [08:06:35] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: add --force to wmcs.toolforge.remove_k8s_node cookbook - https://phabricator.wikimedia.org/T375158#10174355 (10dcaro) [08:06:37] 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [lima-kilo] allow for the creation of a multi-node high availability cluster - https://phabricator.wikimedia.org/T374585#10174356 (10dcaro) [08:06:38] 10Toolforge (Toolforge iteration 15): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10174357 (10dcaro) [08:06:41] 10Toolforge (Toolforge iteration 15): [k8s,infra,cookbook] change the hiera under the -k8s-control prefix whet adding/removing an etcd node - https://phabricator.wikimedia.org/T371370#10174359 (10dcaro) [08:06:42] 10Toolforge (Toolforge iteration 15): Support HTTP health checks in jobs framework - https://phabricator.wikimedia.org/T362621#10174358 (10dcaro) [08:06:46] 06cloud-services-team, 10Toolforge (Toolforge iteration 15): [api-gateway] add alert for uptime - https://phabricator.wikimedia.org/T348633#10174361 (10dcaro) [08:06:50] 06cloud-services-team, 10Toolforge (Toolforge iteration 15), 13Patch-For-Review: [toolforge,storage] Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496#10174360 (10dcaro) [08:06:58] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge (Toolforge iteration 15), 07Epic, 05Goal: Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#10174362 (10dcaro) [08:07:29] 10Toolforge, 07Epic: [jobs-cli,builds-cli,toolforge-cli,webservice] Consolidate the Toolforge CLIs - https://phabricator.wikimedia.org/T356262#10174343 (10dcaro) [08:12:40] 10Toolforge (Toolforge iteration 15): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10174408 (10aborrero) [08:12:45] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge webservices and bots - https://phabricator.wikimedia.org/T127367#10174407 (10aborrero) [08:23:28] (03PS3) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) [08:24:13] !log raymondndibe@wmf3402 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers [08:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:26:06] !log raymondndibe@wmf3402 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers [08:26:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [08:37:21] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: vxlan: verify nova proxy and floating IPs work with new VXLAN-based network - https://phabricator.wikimedia.org/T374828#10174493 (10aborrero) 05Open→03In progress [08:38:57] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596 (10aborrero) 03NEW [08:40:22] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596#10174521 (10aborrero) 05Open→03In progress p:05Triage→03Medium [08:42:00] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596#10174541 (10aborrero) some checks: `lang=shell-session aborrero@arturo-test-vm5:~$ dig -x 172.16.129.103 +short arturo-test-vm5.cloudinfra-codfw1dev... [08:56:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-7 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [08:59:42] 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10174566 (10fnegri) Thanks @bd808, I'm fine with keeping labtestwiki around until Andrew is back. I have some more questions though :) * do we need to update the passwords detailed in the description of this... [09:19:52] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10174599 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=087b480f-3f34-4877-a07a-3baa2b98f863) s... [09:22:24] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596#10174605 (10aborrero) [09:27:14] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596#10174634 (10aborrero) floating IPs DNS records have nothing to do with the new vxlan subnet: `lang=shell-session $ host bastion.bastioninfra-codfw1d... [09:27:28] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596#10174637 (10aborrero) 05In progress→03Resolved [09:27:53] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: verify DNS integration for new VXLAN-based network - https://phabricator.wikimedia.org/T375596#10174652 (10aborrero) [09:37:36] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#10174720 (10fnegri) Apologies for the lack of updates on this task, it remained on my radar but I didn't find the time for a more in-d... [09:38:11] 10Tool-Global-user-contributions, 10Special:GlobalContributions, 06Stewards-and-global-tools, 07Epic, and 2 others: [Epic] Implement global contributions feature - https://phabricator.wikimedia.org/T337089#10174688 (10kostajh) [09:40:01] (03open) 10aborrero: codfw1dev: drop a bunch of test and seemingly unused projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/56 [09:44:25] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [09:45:49] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:46:40] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [09:46:44] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:48:16] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [09:48:30] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:50:48] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on cloudcephosd1040:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [09:52:18] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [09:52:20] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:53:06] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [09:54:08] (03merge) 10aborrero: codfw1dev: drop a bunch of test and seemingly unused projects [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/56 [09:54:26] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [09:54:59] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [09:59:12] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: cleanup and prepare security groups - https://phabricator.wikimedia.org/T375604 (10aborrero) 03NEW [10:02:01] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: cleanup and prepare security groups - https://phabricator.wikimedia.org/T375604#10174779 (10aborrero) 05Open→03In progress p:05Triage→03Medium [10:23:31] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: cleanup and prepare security groups - https://phabricator.wikimedia.org/T375604#10174870 (10aborrero) I'm using this script to cleanup security groups that belongs to projects that no longer exist: `lang=bash #!/bin/bash set -e IFS=$'\n' for... [10:35:34] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: openstack: cleanup and prepare security groups - https://phabricator.wikimedia.org/T375604#10174888 (10aborrero) saved output here: {P69412} [10:44:06] 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10174926 (10Ladsgroup) IMHO, since this has been the case for more than 1.5 years, it can wait for a couple more months too? wikiadmin is the user that connect mw CLI tools to the database, wikiuser is the us... [10:48:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudidm2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:51:22] 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10174968 (10fnegri) [10:51:28] 06cloud-services-team, 10Cloud-VPS, 06Data-Persistence: Decommission clouddb2002-dev.codfw.wmnet - https://phabricator.wikimedia.org/T369308#10174969 (10fnegri) [10:51:59] 06cloud-services-team: update labtestwiki user and password - https://phabricator.wikimedia.org/T328289#10174966 (10fnegri) p:05Triage→03Low @Ladsgroup thanks, so if I'm understanding correctly nothing is broken in labtestwiki, but both the CLI and the mw appserver are still using old passwords that have not... [11:06:48] 06cloud-services-team: 2024-09-24 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375458#10174998 (10fnegri) p:05Triage→03Medium [11:07:53] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10175002 (10fnegri) [11:07:55] 06cloud-services-team: 2024-09-24 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375458#10175000 (10fnegri) →14Duplicate dup:03T375223 [11:08:48] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: Improve WMCS NodeDown alerts - https://phabricator.wikimedia.org/T375479#10175007 (10fnegri) p:05Triage→03Medium [11:10:22] 06cloud-services-team: SystemdUnitDown Unit wmf_auto_restart_virtlogd.service on node cloudvirt1063 has been down for long. - https://phabricator.wikimedia.org/T375403#10175013 (10fnegri) →14Duplicate dup:03T375223 [11:10:26] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: 2024-09-21 NodeDown cloudvirt1063 - https://phabricator.wikimedia.org/T375223#10175015 (10fnegri) [12:08:42] (03open) 10aborrero: codfw1dev: track novaproxy.codfw1dev.wmcloud.org record [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/57 (https://phabricator.wikimedia.org/T374828) [12:15:30] (03open) 10aborrero: zones, records: add common description to resources [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/58 [12:21:07] (03update) 10aborrero: zones, records: add common description to resources [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/58 [12:23:23] PROBLEM - Host cloudcephosd1025 is DOWN: PING CRITICAL - Packet loss = 100% [12:24:08] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10175282 (10dcaro) @wiki_willy Okok, the node is ready, I just shut it down and created a downtime for 180 days, you... [12:34:28] FIRING: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [12:35:49] (03CR) 10David Caro: [C:03+1] "LGTM, some nits but good to merge whichever way" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [12:44:29] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance tools-prometheus-7 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:49:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance metricsinfra-prometheus-2 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:54:38] (03merge) 10dcaro: change setup order [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/190 [12:59:29] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance tools-prometheus-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [13:00:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-prometheus-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [13:04:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance metricsinfra-prometheus-3 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [13:20:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:42:26] (03open) 10dcaro: toolforge_get_versions: add ingress to the list [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/530 [13:42:28] (03update) 10dcaro: toolforge_get_versions: add ingress to the list [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/530 [13:42:59] (03close) 10dcaro: toolforge_get_versions: add ingress to the list [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/530 [13:46:47] 10Tool-Global-user-contributions, 10Special:GlobalContributions, 06Stewards-and-global-tools, 07Epic, and 2 others: [Epic] Implement global contributions feature - https://phabricator.wikimedia.org/T337089#10175639 (10STran) [13:51:04] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on cloudcephosd1040:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [13:57:18] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#10175692 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/70 [13:57:21] supertassu closed https://github.com/toolforge/quarry/pull/70 [13:59:18] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#10175693 (10taavi) 05Open→03Resolved a:03taavi [14:00:00] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#10175706 (10LucasWerkmeister) Works for me now, thanks \o/ [14:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:15:26] !log raymondndibe@wmf3402 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host toolsbeta-test-k8s-worker-nfs-10 [14:15:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:21:59] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance toolsbeta-prometheus-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [14:25:21] 10Toolforge: [cookbook,infra] wmcs.toolforge.k8s.worker.drain failed to finish with `KeyError` on one node - https://phabricator.wikimedia.org/T364821#10175859 (10taavi) 05Open→03Resolved a:03taavi this was fixed in https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1049131 [14:34:28] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance metricsinfra-prometheus-3 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [14:36:58] RESOLVED: WidespreadPuppetAgentFailure: Widespread puppet agent failures in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [14:39:29] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance tools-prometheus-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [14:48:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudidm2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:04:33] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#10176057 (10wiki_willy) Thanks @dcaro. @Jclark-ctr is out the rest of this week, but should be able to ship these o... [15:32:43] (03open) 10aborrero: codfw1dev: proxy-codfw1dev: track security groups [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/59 [15:41:16] (03open) 10aborrero: secgroups: for delete_default_rules, use null to avoid changes [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/60 [15:42:13] (03update) 10aborrero: codfw1dev: proxy-codfw1dev: track security groups [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/59 [15:50:14] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 15): Intermittent redis connection timeouts in Toolforge - https://phabricator.wikimedia.org/T318479#10176235 (10fnegri) Looks like timeouts are visible in sample-complex-app, which is using Python and Celery: ` tools.sample-complex-a... [15:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [15:57:26] (03open) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [15:57:30] (03update) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [16:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:15:54] (03CR) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force (033 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [16:20:13] (03CR) 10David Caro: [C:03+1] [wmcs-cookbooks.depool_and_remove_node] force node delete with --force (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [16:22:22] (03CR) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [16:23:09] (03update) 10dcaro: tekton: upgrade to v0.59.3 [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/61 [16:23:36] (03update) 10dcaro: tekton: upgrade to v0.59.3 [repos/cloud/toolforge/builds-builder] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_requests/61 [16:32:10] (03update) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [16:34:36] (03update) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [16:35:23] (03update) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [16:37:00] (03update) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [16:37:55] (03update) 10dcaro: builds-buidler: upgrade tekton [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/531 [16:56:13] (03PS4) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) [16:56:25] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE, 13Patch-For-Review: Put cloudcephosd10[39-41] into service - https://phabricator.wikimedia.org/T372814#10176557 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by dcaro@cumin1002 for host cloudcephosd1039.eqiad.wmnet with OS bul... [17:00:01] (03CR) 10Raymond Ndibe: [wmcs-cookbooks.depool_and_remove_node] force node delete with --force (033 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [17:00:49] (03CR) 10CI reject: [V:04-1] [wmcs-cookbooks.depool_and_remove_node] force node delete with --force [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1075348 (https://phabricator.wikimedia.org/T375158) (owner: 10Raymond Ndibe) [18:48:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudidm2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [20:07:57] 10Tool-video-answer-tool, 06Future-Audiences: Implement attribution requirements for demo video - https://phabricator.wikimedia.org/T374376#10177487 (10Maryana) [20:08:05] 10Tool-video-answer-tool, 06Future-Audiences, 07Spike: Image layout adjustment for lower-res images - https://phabricator.wikimedia.org/T375690 (10Maryana) 03NEW [20:10:25] 10Tool-video-answer-tool, 06Future-Audiences: Implement attribution requirements for demo video - https://phabricator.wikimedia.org/T374376#10177503 (10Maryana) [20:13:59] 10Tool-video-answer-tool, 06Future-Audiences: Implement attribution requirements for demo video - https://phabricator.wikimedia.org/T374376#10177514 (10Maryana) a:03etz [21:25:54] 06cloud-services-team, 06DC-Ops, 10ops-eqiad, 06SRE: Put cloudcephosd10[39-41] into service - https://phabricator.wikimedia.org/T372814#10177726 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by dcaro@cumin1002 for host cloudcephosd1039.eqiad.wmnet with OS bullseye completed: - cloudce... [21:50:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:00:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:48:19] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudidm2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources