[00:07:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:12:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:18:41] (CloudVPSDesignateLeaks) firing: (3) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:21:19] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9699409 (10Raymond_Ndibe) >>! In T335592#9691103, @bd808 wrote: > @Raymond_Ndibe I think this feature deserves a section on https://wikitech.wikimedia.org/wiki/Hel... [00:21:48] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9699410 (10Raymond_Ndibe) I think we can mark this as resolved now @taavi [00:46:51] (ProbeDown) firing: (2) Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:51:51] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-5:30000 has failed probes (http_this_tool_does_not_exist_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:57:42] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [builds-api,jobs-api,envvars-api,api-gateway] FIgure out and document how to do non-backwards compatible changes - https://phabricator.wikimedia.org/T356974#9699424 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforg... [01:39:28] (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:41:32] 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 10vm-requests: Site: 1 VM for codfw1dev bitu deployment - https://phabricator.wikimedia.org/T362128 (10Andrew) 03NEW [01:41:48] 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 10vm-requests: Site: 1 VM for codfw1dev bitu deployment - https://phabricator.wikimedia.org/T362128#9699485 (10Andrew) [01:41:53] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Set up a bitu instance for codfw1dev - https://phabricator.wikimedia.org/T360795#9699484 (10Andrew) [01:43:43] 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 10vm-requests: Site: 1 VM for codfw1dev bitu deployment - https://phabricator.wikimedia.org/T362128#9699486 (10Andrew) [01:59:28] (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [02:23:41] (CloudVPSDesignateLeaks) firing: (3) Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [02:28:41] (CloudVPSDesignateLeaks) resolved: (3) Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [03:10:13] 10Quarry: Users get logged out from Quarry every day (or two) - https://phabricator.wikimedia.org/T362025#9699626 (10GTrang) [03:12:29] 10Wikibugs: 14Replace Redis queue with custom http solution - 14https://phabricator.wikimedia.org/T361518#9699628 (10bd808) 05In progress→03Resolved [03:23:22] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9699632 (10bd808) test [03:23:54] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Gather feedback from users of the 'unmanaged' debian-12.0-nopuppet image - https://phabricator.wikimedia.org/T355963#9699633 (10Andrew) Thanks for the feedback! > I'll also list wishlist items, which might be in the works already, below: >... [03:24:00] (03CR) 10BryanDavis: [C:04-2] "Done" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1008016 (https://phabricator.wikimedia.org/T90594) (owner: 10BryanDavis) [07:22:12] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove obsolete stub secret [labs/private] - 10https://gerrit.wikimedia.org/r/1016312 (https://phabricator.wikimedia.org/T360412) (owner: 10Muehlenhoff) [07:47:19] 10Horizon: HTTP 500 trying to load interfaces, UI just keeps spinning - https://phabricator.wikimedia.org/T362138 (10Reedy) 03NEW [07:50:28] (PuppetSyncFailure) firing: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [08:34:13] (03PS1) 10Majavah: branches: Add REL1_42 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1018193 [08:40:28] (PuppetSyncFailure) resolved: Failed to update Puppet repository /srv/git/operations/puppet on instance metricsinfra-puppetserver-1 in project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [08:46:39] 10Horizon: HTTP 500 trying to load interfaces, UI just keeps spinning - https://phabricator.wikimedia.org/T362138#9699985 (10taavi) @Andrew this looks like a possible issue with one of our local hacks on top of the Neutron dashboard: ` 208.80.154.150 - - [09/Apr/2024:08:44:59 +0000] "GET /project/instances/f72e5... [09:59:22] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Gather feedback from users of the 'unmanaged' debian-12.0-nopuppet image - https://phabricator.wikimedia.org/T355963#9700116 (10fgiunchedi) >>! In T355963#9699633, @Andrew wrote: > Thanks for the feedback! Of course, thank you for working... [10:32:52] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [infra] Add alert when workers have a sustained large amount of D processes - https://phabricator.wikimedia.org/T362093#9700275 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/11 kubernetes: add... [10:47:57] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[jobs-api] Remove flask-restful - 14https://phabricator.wikimedia.org/T359806#9700310 (10CodeReviewBot) 14dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/234 jobs-api: bump to 0.0.276-20240408100... [11:23:08] 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: [wmcs][alerting] Allow silencing alerts metricsinfra alerts on alerts.wikimedia.org - https://phabricator.wikimedia.org/T320973#9700421 (10taavi) [11:31:09] (03PS1) 10Muehlenhoff: Remove obsolete dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/1018238 (https://phabricator.wikimedia.org/T360636) [11:37:21] (03CR) 10Clément Goubert: [C:03+1] Remove obsolete dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/1018238 (https://phabricator.wikimedia.org/T360636) (owner: 10Muehlenhoff) [11:48:23] (03PS1) 10Majavah: openstack: cloudgw: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018241 [11:48:23] (03PS1) 10Majavah: openstack: cloudnet: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 [11:48:23] (03PS1) 10Majavah: openstack: cloudcontrol: Update for designate on cloudcontrols [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 [11:48:23] (03PS1) 10Majavah: openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 [11:50:19] (03CR) 10Majavah: [C:03+1] ceph: use timedelta instead of integers [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990975 (owner: 10David Caro) [11:51:34] (03CR) 10CI reject: [V:04-1] openstack: cloudnet: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 (owner: 10Majavah) [11:51:58] (03CR) 10CI reject: [V:04-1] openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 (owner: 10Majavah) [11:52:13] (03CR) 10CI reject: [V:04-1] openstack: cloudcontrol: Update for designate on cloudcontrols [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 (owner: 10Majavah) [11:53:34] (03PS2) 10Majavah: openstack: cloudnet: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 [11:53:34] (03PS2) 10Majavah: openstack: cloudcontrol: Update for designate on cloudcontrols [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 [11:53:34] (03PS2) 10Majavah: openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 [11:56:30] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [11:56:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:57:05] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [11:57:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [12:02:51] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9700521 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/25 d/changelog: bump to 16.0.6 [12:04:35] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [jobs-api,jobs-cli] Support job health checks - https://phabricator.wikimedia.org/T335592#9700525 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/25 d/changelog: bump to 16.0.6 [12:08:33] (03CR) 10Muehlenhoff: [V:03+2 C:03+2] Remove obsolete dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/1018238 (https://phabricator.wikimedia.org/T360636) (owner: 10Muehlenhoff) [12:10:11] 10Toolforge (Toolforge iteration 08): [jobs-cli] output logs on stderr - https://phabricator.wikimedia.org/T362153 (10dcaro) 03NEW [12:10:13] 10Toolforge (Toolforge iteration 08): [jobs-cli] output logs on stderr - https://phabricator.wikimedia.org/T362153#9700567 (10dcaro) p:05Triage→03Medium [12:17:43] (03PS1) 10Majavah: hieradata: alerting_host: add fake metricsinfra password [labs/private] - 10https://gerrit.wikimedia.org/r/1018248 (https://phabricator.wikimedia.org/T320973) [12:18:32] (03CR) 10Majavah: [V:03+2 C:03+2] hieradata: alerting_host: add fake metricsinfra password [labs/private] - 10https://gerrit.wikimedia.org/r/1018248 (https://phabricator.wikimedia.org/T320973) (owner: 10Majavah) [12:20:57] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [12:21:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:21:28] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [12:21:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:27:22] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - 14https://phabricator.wikimedia.org/T360329#9700636 (10CodeReviewBot) 14dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolfo... [12:28:48] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - 14https://phabricator.wikimedia.org/T360329#9700639 (10CodeReviewBot) 14project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened htt... [12:33:12] (03PS1) 10Majavah: hieradata: cloudcumin: Add fake metricsinfra password [labs/private] - 10https://gerrit.wikimedia.org/r/1018257 (https://phabricator.wikimedia.org/T360932) [12:40:36] 10Quarry: Error in web instances. - https://phabricator.wikimedia.org/T362157 (10rook) 03NEW [12:43:46] (03CR) 10Jforrester: [C:03+2] "Aha, yes, will add this to the release checklist!" [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1018193 (owner: 10Majavah) [12:44:21] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: 14Allow authenticated write access from the wikiprod network to metricsinfra alertmanager API - 14https://phabricator.wikimedia.org/T362061#9700684 (10taavi) 05Open→03Resolved [12:44:22] (03Merged) 10jenkins-bot: branches: Add REL1_42 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1018193 (owner: 10Majavah) [12:44:59] 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: 14[wmcs][alerting] Allow silencing alerts metricsinfra alerts on alerts.wikimedia.org - 14https://phabricator.wikimedia.org/T320973#9700687 (10taavi) 05In progress→03Resolved [12:45:14] (03CR) 10Jforrester: [C:03+2] "Done: https://www.mediawiki.org/w/index.php?diff=6460919&oldid=6460442&title=Release_checklist" [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1018193 (owner: 10Majavah) [12:54:47] (03CR) 10Majavah: [V:03+2 C:03+2] hieradata: cloudcumin: Add fake metricsinfra password [labs/private] - 10https://gerrit.wikimedia.org/r/1018257 (https://phabricator.wikimedia.org/T360932) (owner: 10Majavah) [13:16:17] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Gather feedback from users of the 'unmanaged' debian-12.0-nopuppet image - https://phabricator.wikimedia.org/T355963#9700818 (10Andrew) > That's correct yeah, I don't think there are security implications. What I'm after is the possibility to upload/co... [13:18:10] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [13:20:27] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [13:33:22] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [13:33:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:33:55] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [13:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:36:55] (03PS3) 10Majavah: openstack: cloudnet: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018242 [13:36:55] (03PS3) 10Majavah: openstack: cloudcontrol: Update for designate on cloudcontrols [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018243 [13:36:55] (03PS3) 10Majavah: openstack: cloudcontrol: Migrate to spicerack logging and alerting [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018244 [13:38:31] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [13:38:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:39:02] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [13:39:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:39:06] (03PS1) 10Majavah: vps: remove_instance: Metricsinfra silencing demo [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018272 (https://phabricator.wikimedia.org/T360932) [13:42:34] (03CR) 10CI reject: [V:04-1] vps: remove_instance: Metricsinfra silencing demo [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018272 (https://phabricator.wikimedia.org/T360932) (owner: 10Majavah) [13:43:56] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: 14[builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - 14https://phabricator.wikimedia.org/T360329#9700927 (10CodeReviewBot) 14dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolfo... [13:44:37] (03CR) 10Majavah: "(failure is expected until a new spicerack release is cut)" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1018272 (https://phabricator.wikimedia.org/T360932) (owner: 10Majavah) [13:46:15] 10Horizon: HTTP 500 trying to load interfaces, UI just keeps spinning - https://phabricator.wikimedia.org/T362138#9700939 (10Andrew) Yeah, the first thing to test here is reverting dbf47bd5a4100b96741fae320e7e0326e72880dc and see if that changes this. [14:01:32] 10Quarry: Error in web instances. - https://phabricator.wikimedia.org/T362157#9701028 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/37 [14:01:42] vivian-rook opened https://github.com/toolforge/quarry/pull/37 [14:08:09] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:08:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:08:42] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:08:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [14:11:15] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [14:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:11:48] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [14:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:21:19] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [infra] Replace PodSecurityPolicy in Toolforge Kubernetes - https://phabricator.wikimedia.org/T279110#9701065 (10CodeReviewBot) aborrero opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/238 components: add kyv... [14:22:12] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [14:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:22:17] !log dcaro@urcuchillay tools END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) [14:22:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:22:59] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [14:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:23:04] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [14:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:23:13] !log dcaro@urcuchillay tools START - Cookbook wmcs.openstack.cloudvirt.vm_console [14:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:23:18] !log dcaro@urcuchillay tools END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) [14:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:29:32] 10Tools: 14tmg/articlemedia tool not working - 14https://phabricator.wikimedia.org/T89695#9701094 (10thiemowmde) 05Open→03Declined a:03thiemowmde [14:33:02] 10Quarry: Error in web instances. - https://phabricator.wikimedia.org/T362157#9701132 (10rook) Seems like this error is coming from the sqlite file not existing, all the requests seem to be for csv or json files. It is not clear who is calling for these files. This error should probably be caught as a file not f... [15:20:53] 10Wikibugs, 10GitLab (Integrations), 10Release-Engineering-Team (Priority Backlog 📥): Connect WikiBugs IRC bot to Wikimedia GitLab - https://phabricator.wikimedia.org/T288381#9701478 (10dancy) @bd808 I read over your proposal and all of the ideas sound reasonable. The code behind gitlab-webhooks is pretty s... [15:21:01] 10Tools, 06Tech-Docs-Team, 07Documentation, 03Wikimedia-Hackathon-2024: [Hackathon 2024] Improve technical documentation of tools - https://phabricator.wikimedia.org/T358040#9701479 (10TBurmeister) Created a quick first draft of instructions for how to claim phab tasks and work on docs during the hackathon... [16:04:34] 10Toolforge, 07Epic: [component-api] First iteration of the component API - https://phabricator.wikimedia.org/T362051#9701729 (10fnegri) [16:19:18] 10Toolforge, 07Epic: [component-api] First iteration of the component API - https://phabricator.wikimedia.org/T362051#9701785 (10fnegri) @dcaro I edited the description of this task to reflect what we discussed in the [Toolforge Monthly Meeting](https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Monthl... [16:30:22] 10Toolforge, 07Epic: [component-api] First iteration of the component API - https://phabricator.wikimedia.org/T362051#9701823 (10dcaro) >>! In T362051#9701785, @fnegri wrote: > @dcaro I edited the description of this task to reflect what we discussed in the [Toolforge Monthly Meeting](https://wikitech.wikimedi... [16:47:04] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [16:57:14] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [16:59:52] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 508 bytes in 3.017 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [17:03:20] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [17:11:00] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [17:12:04] (03PS1) 10Andrea Denisse: wmcs: Remove redundant grafana-labs.discovery.wmnet.key [labs/private] - 10https://gerrit.wikimedia.org/r/1018318 (https://phabricator.wikimedia.org/T360414) [17:12:09] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [17:21:07] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [17:24:50] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.317 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [17:27:28] (InstanceDown) firing: Project tools instance tools-k8s-etcd-17 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:32:28] (InstanceDown) resolved: Project tools instance tools-k8s-etcd-17 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [17:38:38] 10ToolforgeBundle, 06Community-Tech, 10CopyPatrol: 14Session can't be invalidated, causing problems with language selection - 14https://phabricator.wikimedia.org/T357821#9701949 (10MusikAnimal) 05Open→03Resolved a:03Samwilson [17:41:14] (03CR) 10Dzahn: [V:03+1 C:03+1] wmcs: Remove redundant grafana-labs.discovery.wmnet.key [labs/private] - 10https://gerrit.wikimedia.org/r/1018318 (https://phabricator.wikimedia.org/T360414) (owner: 10Andrea Denisse) [17:42:41] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:47:41] (CloudVPSDesignateLeaks) firing: (3) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:09:55] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T349207) [18:09:58] T349207: [infra] Upgrade Toolforge K8s etcd nodes to Bullseye - https://phabricator.wikimedia.org/T349207 [18:10:28] (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:17:37] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [18:20:28] (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [18:21:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-etcd-26 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [18:22:41] (CloudVPSDesignateLeaks) firing: (3) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:27:41] (CloudVPSDesignateLeaks) resolved: (3) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:39:26] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T349207) [18:39:30] T349207: [infra] Upgrade Toolforge K8s etcd nodes to Bullseye - https://phabricator.wikimedia.org/T349207 [18:41:39] (03CR) 10Andrea Denisse: [C:03+2] wmcs: Remove redundant grafana-labs.discovery.wmnet.key [labs/private] - 10https://gerrit.wikimedia.org/r/1018318 (https://phabricator.wikimedia.org/T360414) (owner: 10Andrea Denisse) [18:41:41] (03CR) 10Andrea Denisse: [V:03+2 C:03+2] wmcs: Remove redundant grafana-labs.discovery.wmnet.key [labs/private] - 10https://gerrit.wikimedia.org/r/1018318 (https://phabricator.wikimedia.org/T360414) (owner: 10Andrea Denisse) [18:55:22] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [19:01:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-test-k8s-etcd-26 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:13:28] (WidespreadPuppetAgentFailure) firing: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [19:17:53] 10Tool-ducttape, 06Abstract Wikipedia team, 10WikiLambda, 03Abstract Wikipedia Fix-It tasks: Automatically read AppArmor profiles from Puppet - https://phabricator.wikimedia.org/T342365#9702139 (10Jdforrester-WMF) p:05Triage→03Low [19:18:28] (WidespreadPuppetAgentFailure) firing: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [19:22:28] (InstanceDown) firing: Project clouddb-services instance clouddb-services-puppetmaster-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:27:28] (InstanceDown) resolved: Project clouddb-services instance clouddb-services-puppetmaster-2 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [19:28:52] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T349207) [19:28:56] T349207: [infra] Upgrade Toolforge K8s etcd nodes to Bullseye - https://phabricator.wikimedia.org/T349207 [19:30:44] 06cloud-services-team, 10VPS-Projects, 10Puppet (Puppet 7.0): 14Migrate Puppet servers in Cloud Services team managed projects to Puppet 7 - 14https://phabricator.wikimedia.org/T351453#9702185 (10Andrew) 05Open→03Resolved [19:32:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on clouddb-services-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [19:33:28] (WidespreadPuppetAgentFailure) resolved: Widespread puppet agent failures in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [19:36:28] (WidespreadPuppetAgentFailure) firing: Widespread puppet agent failures in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [19:42:28] (PuppetStaleCertificates) resolved: Found non-revoked Puppet certificates for 1 deleted instances on clouddb-services-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [19:42:54] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [19:43:50] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [19:48:28] (WidespreadPuppetAgentFailure) resolved: Widespread puppet agent failures in project gitlab-runners - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [19:50:06] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: replace buster machines in devtools project - https://phabricator.wikimedia.org/T360964#9702306 (10Dzahn) gerrit-prod-1001 - wasn't reachable via ssh, soft rebooted it, couldn't ssh as regular user still, but could get in with my separate globa... [19:52:38] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [19:56:28] (WidespreadPuppetAgentFailure) resolved: Widespread puppet agent failures in project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DWidespreadPuppetAgentFailure [19:59:43] 10Cloud-VPS (Debian Buster Deprecation), 06collaboration-services: replace buster machines in devtools project - https://phabricator.wikimedia.org/T360964#9702314 (10Dzahn) [20:13:41] (CloudVPSDesignateLeaks) firing: (2) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:18:41] (CloudVPSDesignateLeaks) firing: (3) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:21:18] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:29:25] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [toolforge-cd] remove duplicated run on tag and push to master (just do one if possible) - https://phabricator.wikimedia.org/T353563#9702394 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/... [20:30:55] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [20:37:28] (InstanceDown) firing: Project toolsbeta instance toolsbeta-test-k8s-etcd-28 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:42:28] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-test-k8s-etcd-28 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [20:44:11] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:44:44] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [20:52:10] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [20:53:41] (CloudVPSDesignateLeaks) firing: (3) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:58:41] (CloudVPSDesignateLeaks) resolved: (3) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:01:57] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [21:04:23] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T349207) [21:04:26] T349207: [infra] Upgrade Toolforge K8s etcd nodes to Bullseye - https://phabricator.wikimedia.org/T349207 [21:06:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-etcd-26 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:06:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance toolsbeta-test-k8s-etcd-26 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [21:08:50] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [21:09:11] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T349207) [21:17:58] (PuppetAgentStaleLastRun) firing: (3) Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-etcd-26 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:24:18] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [21:42:58] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance toolsbeta-test-k8s-etcd-28 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [21:57:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-test-k8s-etcd-28 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [22:01:06] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T349207) [22:01:10] T349207: [infra] Upgrade Toolforge K8s etcd nodes to Bullseye - https://phabricator.wikimedia.org/T349207 [22:12:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:16:41] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) [22:17:58] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [22:25:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-test-k8s-etcd-29 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [22:29:36] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [22:34:11] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:40:29] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [22:51:25] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901#9702619 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/85 [builds-api... [22:52:11] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) [22:52:35] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901#9702617 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/61 [builds-cli... [22:59:00] 10Toolforge (Toolforge iteration 08), 13Patch-For-Review: [builds-api] replace all error message models with ResponseMessages - https://phabricator.wikimedia.org/T361901#9702652 (10Raymond_Ndibe) 05Open→03In progress [23:07:11] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [23:07:47] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) [23:08:06] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_etcd_node [23:13:30] 10Wikibugs, 10GitLab (Integrations), 10Release-Engineering-Team (Priority Backlog 📥): Connect WikiBugs IRC bot to Wikimedia GitLab - https://phabricator.wikimedia.org/T288381#9702700 (10bd808) 05Open→03In progress a:03bd808 I talked with @dancy on irc and have decided to try this variant: > * Implemen... [23:17:50] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99)