[00:54:34] (DiskSpace) firing: Disk space cloudbackup1004:9100:/ 5.803% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [01:31:16] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [02:21:21] 14Grid-Engine-to-K8s-Migration: 14Migrate mabot from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319870#9636495 (10Pppery) 05Declined→03Resolved [02:21:35] PROBLEM - Disk space on cloudbackup1004 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=93%): /tmp 0 MB (0% inode=93%): /var/tmp 0 MB (0% inode=93%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1004&var-datasource=eqiad+prometheus/ops [02:21:56] (SystemdUnitDown) firing: (4) The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:25:51] ACKNOWLEDGEMENT - Disk space on cloudbackup1004 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=93%): /tmp 0 MB (0% inode=93%): /var/tmp 0 MB (0% inode=93%): Andrew Bogott Andrew resizing things https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1004&var-datasource=eqiad+prometheus/ops [02:26:56] (SystemdUnitDown) firing: (6) The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:31:56] (SystemdUnitDown) firing: (8) The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:36:56] (SystemdUnitDown) firing: (9) The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:41:56] (SystemdUnitDown) firing: (9) The service unit backup_vms.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:46:56] (SystemdUnitDown) resolved: (5) The service unit confd_prometheus_metrics.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [02:51:56] (SystemdUnitDown) firing: (2) The service unit postgresql@11-main.service is in failed status on host cloudbackup1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:16:56] (SystemdUnitDown) firing: The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:17:01] 06cloud-services-team: SystemdUnitDown Unit backup_vms.service on node cloudbackup1004 has been down for long. - https://phabricator.wikimedia.org/T360278 (10phaultfinder) 03NEW [04:21:56] (SystemdUnitDown) firing: (2) The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:22:01] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279 (10phaultfinder) 03NEW [04:31:56] (SystemdUnitDown) firing: (3) The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:32:00] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9636523 (10phaultfinder) [04:32:48] (PuppetFailure) firing: Puppet has failed on cloudbackup1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [04:32:53] 06cloud-services-team: PuppetFailure Puppet failure on cloudbackup1004:9100 - https://phabricator.wikimedia.org/T360280 (10phaultfinder) 03NEW [04:39:04] (DiskSpace) resolved: Disk space cloudbackup1004:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [04:41:35] RECOVERY - Disk space on cloudbackup1004 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=cloudbackup1004&var-datasource=eqiad+prometheus/ops [04:46:56] (SystemdUnitDown) firing: (4) The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:47:00] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T360279#9636531 (10phaultfinder) [04:51:56] (SystemdUnitDown) firing: The service unit postgresql@11-main.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:51:56] (SystemdUnitDown) firing: (4) The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:56:56] (SystemdUnitDown) resolved: The service unit postgresql@11-main.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [04:56:56] (SystemdUnitDown) resolved: (2) The systemd unit postgresql@11-main.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [05:19:18] (PuppetFailure) resolved: Puppet has failed on cloudbackup1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:31:16] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:01:01] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [07:21:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:26:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:43:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:48:41] (CloudVPSDesignateLeaks) firing: (4) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:53:41] (CloudVPSDesignateLeaks) firing: (4) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:58:41] (CloudVPSDesignateLeaks) resolved: (4) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:41:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:03:53] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-12.tools.eqiad1.wikimedia.cloud [10:04:03] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-bastion-12.tools.eqiad1.wikimedia.cloud [10:16:41] (CloudVPSDesignateLeaks) firing: (5) Detected 36 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:18:40] 10Toolforge, 07Documentation, 07good first task, 03Wikimedia-Hackathon-2024: Find and fix inaccuracies in Toolforge Django tutorial - https://phabricator.wikimedia.org/T245683#9636777 (10Slst2020) Using the new [[ https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service | Build Service ]] is now th... [10:20:52] 10Toolforge: [envvars-api, envvars-cli] Create envvar name error message is not user friendly - https://phabricator.wikimedia.org/T360147#9636795 (10dcaro) p:05Triage→03Low Keep in mind though that the validation should be kept as much as possible on the API side, not the client. If you want to debug insid... [10:22:12] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [jobs-cli] Allow exporting jobs list in YAML format - https://phabricator.wikimedia.org/T320575#9636806 (10dcaro) [10:23:05] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9636844 (10taavi) [10:23:25] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9636845 (10taavi) [10:30:35] 06cloud-services-team, 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools: 14[spicerack] Add remote command output to log file - 14https://phabricator.wikimedia.org/T347093#9636927 (10aborrero) 14I was bitten by this recently. I think the proposal made to show at least _something_ in the logs with... [10:32:37] 10Toolforge (Toolforge iteration 07): Rust image build on toolforge fails - https://phabricator.wikimedia.org/T358552#9636943 (10Aklapper) Bumping project tag so task shows up on an active workboard [10:44:57] 06cloud-services-team, 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: 14refresh kube-state-metrics version for k8s 1.24 - 14https://phabricator.wikimedia.org/T359798#9636981 (10CodeReviewBot) 14aborrero opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_request... [10:46:42] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.23.17 to 1.24.17 (T307651) [10:46:46] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [10:47:04] (03Merged) 10jenkins-bot: Remove Grid Engine support [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1010520 (https://phabricator.wikimedia.org/T314664) (owner: 10Majavah) [10:47:06] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.23.17 to 1.24.17 (T307651) [10:47:39] 06cloud-services-team, 10Toolforge (Toolforge iteration 07): Upgrade Toolforge static server (tools-static.wmflabs.org) to Debian Bullseye - https://phabricator.wikimedia.org/T311913#9636983 (10taavi) a:03taavi [10:47:42] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9636989 (10taavi) [10:48:10] (03PS1) 10Majavah: vps: refresh_puppet_certs: Fix for Puppet 7 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012355 [10:50:35] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.refresh_puppet_certs on toolsbeta-static-2.toolsbeta.eqiad1.wikimedia.cloud [10:50:43] !log taavi@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on toolsbeta-static-2.toolsbeta.eqiad1.wikimedia.cloud [10:51:41] (CloudVPSDesignateLeaks) firing: (5) Detected 36 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:53:02] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-7 from 1.23.17 to 1.24.17 (T359638) [10:53:03] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-7 from 1.23.17 to 1.24.17 (T359638) [10:53:07] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [10:53:13] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.23.17 to 1.24.17 (T359638) [10:54:47] 06cloud-services-team, 10Toolforge (Toolforge iteration 07): Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651#9637009 (10aborrero) saving this info here in case is required later: ` ----- OUTPUT of 'sudo -i kubeadm ...ade plan 1.24.17' -----... [10:55:23] (03PS2) 10Majavah: vps: refresh_puppet_certs: Fix for Puppet 7 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012355 [10:55:26] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.refresh_puppet_certs on toolsbeta-static-2.toolsbeta.eqiad1.wikimedia.cloud [10:55:55] 10Cloud-VPS: [cloud-vps] creating a new project can override existing DNS entries - https://phabricator.wikimedia.org/T360294 (10fnegri) 03NEW [10:56:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 36 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:56:59] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on toolsbeta-static-2.toolsbeta.eqiad1.wikimedia.cloud [10:58:32] (03CR) 10CI reject: [V:04-1] vps: refresh_puppet_certs: Fix for Puppet 7 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012355 (owner: 10Majavah) [10:59:45] (03PS3) 10Majavah: vps: refresh_puppet_certs: Fix for Puppet 7 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012355 [11:00:23] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.23.17 to 1.24.17 (T359638) [11:00:27] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [11:00:35] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [11:00:39] !log aborrero@cloudcumin1001 tools END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-api [11:00:47] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics [11:01:00] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics [11:01:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [11:01:22] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [etcd,infra] Find a backup solution for the etcd database - https://phabricator.wikimedia.org/T339934#9637038 (10dcaro) [11:01:51] 06cloud-services-team, 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: 14refresh kube-state-metrics version for k8s 1.24 - 14https://phabricator.wikimedia.org/T359798#9637043 (10CodeReviewBot) 14aborrero merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_request... [11:02:37] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [builds-api] Add triggering support - https://phabricator.wikimedia.org/T334587#9637036 (10LucasWerkmeister) >>! In T334587#8831957, @taavi wrote: > * Ability to trigger builds when I push a commit to a Git... [11:06:16] 10Tools, 05Community-Wishlist-Survey-2023, 03Wikimedia Wishathon: Investigate Dabfix tool implementation - https://phabricator.wikimedia.org/T336545#9637066 (10Aklapper) [11:06:32] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [builds-api] Add triggering support - https://phabricator.wikimedia.org/T334587#9637063 (10Sascha) See https://phabricator.wikimedia.org/T360295 for the request to run tests before deploying a tool. [11:08:40] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.23.17 to 1.24.17 (T359638) [11:08:45] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [11:11:36] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9637078 (10taavi) [11:13:56] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.23.17 to 1.24.17 (T359638) [11:14:00] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [11:17:01] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.23.17 to 1.24.17 (T359638) [11:17:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud [11:17:14] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-static-15.tools.eqiad1.wikimedia.cloud [11:17:24] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud [11:19:06] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-static-15.tools.eqiad1.wikimedia.cloud [11:19:46] (03PS1) 10Arturo Borrero Gonzalez: toolforge.k8s.worker.upgrade: give a hint about static pod restarts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012358 [11:23:46] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.23.17 to 1.24.17 (T359638) [11:23:49] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [11:24:12] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge, 07Epic, 05Goal: Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#9637136 (10taavi) [11:24:53] (03CR) 10Arturo Borrero Gonzalez: [C:03+2] toolforge.k8s.worker.upgrade: give a hint about static pod restarts [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012358 (owner: 10Arturo Borrero Gonzalez) [11:25:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 07): Upgrade Toolforge static server (tools-static.wmflabs.org) to Debian Bullseye - https://phabricator.wikimedia.org/T311913#9637135 (10taavi) 05Open→03In progress [11:26:16] 10Toolforge: Toolforge build service should be able to run unit tests - https://phabricator.wikimedia.org/T360295#9637142 (10Aklapper) [11:27:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 07): Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651#9637144 (10aborrero) the control plane is now upgraded: `lang=shell-session aborrero@tools-k8s-control-7:~$ sudo -i kubectl get nodes NAME... [11:29:27] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 (T307651) [11:29:32] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [11:30:30] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 (T307651) [11:30:32] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 (T307651) [11:31:36] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 (T307651) [11:31:37] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 (T307651) [11:32:39] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 (T307651) [11:32:40] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 (T307651) [11:33:44] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 (T307651) [11:33:45] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 (T307651) [11:35:41] 06cloud-services-team, 10Toolforge: Upgrade Toolforge acme-chief hosts to Debian Bullseye or later - https://phabricator.wikimedia.org/T311907#9637172 (10taavi) a:03taavi [11:39:36] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 (T307651) [11:39:37] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 (T307651) [11:39:41] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [11:40:08] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.refresh_puppet_certs on toolsbeta-acme-chief-2.toolsbeta.eqiad1.wikimedia.cloud [11:40:40] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 (T307651) [11:40:41] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 (T307651) [11:41:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:41:44] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 (T307651) [11:41:45] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 (T307651) [11:42:49] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 (T307651) [11:42:50] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 (T307651) [11:42:51] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on toolsbeta-acme-chief-2.toolsbeta.eqiad1.wikimedia.cloud [11:43:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance toolsbeta-acme-chief-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:43:50] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 (T307651) [11:43:51] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 (T307651) [11:44:58] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 (T307651) [11:45:00] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 (T307651) [11:45:03] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [11:46:03] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 (T307651) [11:46:04] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 (T307651) [11:47:06] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 (T307651) [11:47:07] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 (T307651) [11:48:14] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 (T307651) [11:48:15] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 (T307651) [11:48:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance toolsbeta-acme-chief-2 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:49:15] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 (T307651) [11:49:16] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 (T307651) [11:50:21] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 (T307651) [11:50:22] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 (T307651) [11:50:26] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [11:51:24] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 (T307651) [11:51:25] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 (T307651) [11:51:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:51:56] 10Toolforge: 14[builds-api] Toolforge build service should be able to run unit tests - 14https://phabricator.wikimedia.org/T360295#9637235 (10dcaro) [11:52:12] 10Toolforge: 14[builds-api] Toolforge build service should be able to run unit tests - 14https://phabricator.wikimedia.org/T360295#9637230 (10dcaro) 05Open→03Declined 14This is expected to be run on the CI system where you host your git repository, be that github (github acitons), gitlab (using gitlab C... [11:52:23] 06cloud-services-team, 10Toolforge, 05Cloud-Services-Origin-Team, 07Cloud-Services-Worktype-Project: [builds-api] Add triggering support - https://phabricator.wikimedia.org/T334587#9637236 (10dcaro) p:05Triage→03High [11:52:29] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 (T307651) [11:52:32] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 (T307651) [11:53:35] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 (T307651) [11:53:36] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 (T307651) [11:54:38] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 (T307651) [11:54:39] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 (T307651) [11:55:28] (InstanceDown) firing: Project toolsbeta instance toolsbeta-acme-chief-01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:55:43] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 (T307651) [11:55:48] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [11:56:10] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 (T359638) [11:56:14] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [11:56:31] (03PS4) 10Majavah: vps: refresh_puppet_certs: Fix for Puppet 7 [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012355 (https://phabricator.wikimedia.org/T351453) [11:56:32] (03PS1) 10Majavah: vps: refresh_puppet_certs: Fix Puppet agent profile name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012359 [11:58:50] (ProbeDown) firing: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:59:27] (03CR) 10CI reject: [V:04-1] vps: refresh_puppet_certs: Fix Puppet agent profile name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012359 (owner: 10Majavah) [12:00:06] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud [12:00:18] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud [12:00:28] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-acme-chief-01 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:01:42] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud [12:01:52] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 (T359638) [12:01:56] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [12:03:28] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance tools-acme-chief-3 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:03:44] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 (T359638) [12:03:50] (ProbeDown) resolved: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:04:29] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 (T359638) [12:04:34] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud [12:05:17] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 (T307651) [12:05:20] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:08:28] (PuppetAgentStaleLastRun) firing: (2) Last Puppet run was over 24 hours ago on instance tools-acme-chief-3 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:11:18] !log aborrero@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 (T307651) [12:11:19] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 (T307651) [12:11:23] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:13:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:14:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance tools-harbor-1 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:14:30] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud [12:15:39] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 (T307651) [12:15:40] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 (T307651) [12:17:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud [12:18:13] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 (T307651) [12:18:14] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 (T307651) [12:18:17] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:18:28] (PuppetAgentStaleLastRun) resolved: (2) Last Puppet run was over 24 hours ago on instance tools-acme-chief-3 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [12:19:10] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 (T307651) [12:19:11] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 (T307651) [12:20:17] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 (T307651) [12:20:18] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 (T307651) [12:21:19] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 (T307651) [12:21:22] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 (T307651) [12:22:26] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 (T307651) [12:22:27] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 (T307651) [12:23:31] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 (T307651) [12:23:32] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 (T307651) [12:23:36] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:24:40] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 (T307651) [12:24:41] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 (T307651) [12:25:45] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 (T307651) [12:25:46] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 (T307651) [12:26:47] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 (T307651) [12:26:48] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 (T307651) [12:27:33] (03PS2) 10Majavah: vps: refresh_puppet_certs: Fix Puppet agent profile name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012359 [12:27:49] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 (T307651) [12:27:50] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 (T307651) [12:28:53] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 (T307651) [12:28:54] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 (T307651) [12:28:58] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:29:56] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 (T307651) [12:29:57] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 (T307651) [12:31:03] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 (T307651) [12:31:04] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 (T307651) [12:32:05] (03PS1) 10Majavah: vps: remove_instance: Use Puppet 7 for cert cleanup [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1012370 (https://phabricator.wikimedia.org/T351453) [12:32:07] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 (T307651) [12:32:08] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 (T307651) [12:32:31] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-filesystemtest-1 [12:33:08] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 (T307651) [12:33:09] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 (T307651) [12:33:21] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-filesystemtest-1 [12:33:52] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.remove_instance for instance toolsbeta-static-1 [12:34:12] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 (T307651) [12:34:13] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 (T307651) [12:34:17] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:34:40] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolsbeta-static-1 [12:35:17] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 (T307651) [12:35:18] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 (T307651) [12:36:24] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 (T307651) [12:38:14] (ProbeDown) firing: Service toolsbeta-static-1:80 has failed probes (http_toolsbeta_static_wmcloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-static-1:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:39:28] (InstanceDown) firing: Project tools instance tools-filesystemtest-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:43:14] (ProbeDown) resolved: Service toolsbeta-static-1:80 has failed probes (http_toolsbeta_static_wmcloud_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-static-1:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [12:44:28] (InstanceDown) resolved: Project tools instance tools-filesystemtest-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:44:34] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 (T359638) [12:44:40] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [12:45:18] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 (T359638) [12:45:40] 06cloud-services-team, 10Toolforge, 07Kubernetes: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T349207#9637584 (10taavi) [12:45:48] 10Toolforge: 14kubernetes 1.27 requires etcd 3.4.22+ or 3.5.6+ - 14https://phabricator.wikimedia.org/T359642#9637579 (10taavi) 14Bookworm ships with 3.4.23 so merging to {T349207}. [12:45:56] 10Toolforge: 14kubernetes 1.27 requires etcd 3.4.22+ or 3.5.6+ - 14https://phabricator.wikimedia.org/T359642#9637582 (10taavi) →14Duplicate dup:03T349207 [12:45:58] 06cloud-services-team, 10Toolforge, 07Kubernetes: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T349207#9637587 (10taavi) [12:46:21] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 (T307651) [12:46:24] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:47:24] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 (T307651) [12:47:25] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 (T307651) [12:48:28] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 (T307651) [12:48:29] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 (T307651) [12:49:36] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 (T307651) [12:49:37] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 (T307651) [12:50:41] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 (T307651) [12:50:42] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 (T307651) [12:51:42] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 (T307651) [12:51:43] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 (T307651) [12:51:46] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:52:47] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 (T307651) [12:52:49] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 (T307651) [12:53:52] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 (T307651) [12:53:53] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 (T307651) [12:54:57] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 (T307651) [12:54:58] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 (T307651) [12:56:09] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 (T307651) [12:56:10] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 (T307651) [12:57:14] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 (T307651) [12:57:15] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 (T307651) [12:57:18] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:58:16] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 (T307651) [12:58:17] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 (T307651) [12:58:22] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [12:59:23] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 (T307651) [12:59:24] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 (T307651) [13:00:30] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 (T307651) [13:00:31] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 (T307651) [13:01:35] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 (T307651) [13:01:36] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 (T307651) [13:02:42] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 (T307651) [13:02:43] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 (T307651) [13:03:28] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance toolsbeta-harbor-1 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [13:03:48] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 (T307651) [13:03:59] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [13:04:28] (InstanceDown) firing: Project toolsbeta instance toolsbeta-puppetdb-02 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:04:28] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance tools-harbor-1 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [13:07:11] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9637757 (10MBH) 14One of my continuous jobs runs on k8s, it fails with error, but it `.err` file doesn't updated, so I can't read what's the error. It starts today, e... [13:09:28] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-puppetdb-02 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:09:56] (HarborDown) firing: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [13:09:57] (HarborComponentDown) firing: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown [13:11:02] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 (T307651) [13:11:06] T307651: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651 [13:11:41] (CloudVPSDesignateLeaks) firing: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:12:06] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 (T307651) [13:12:07] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 (T307651) [13:12:16] 10Toolforge, 13Patch-For-Review: [envvars-cli] Either hide or show envvars values, but not both - https://phabricator.wikimedia.org/T359558#9637779 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/29 Envvars values update [13:13:14] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 (T307651) [13:13:15] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 (T307651) [13:14:34] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 (T307651) [13:14:56] 10Wikibugs, 13Patch-For-Review: Print events in closed tasks in grey - https://phabricator.wikimedia.org/T140881#9637794 (10hashar) 05Resolved→03Open I found out wikibugs changed some of its messages text color to grey which has led https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/commit/a4fe92d271... [13:14:57] (HarborDown) resolved: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [13:14:57] (HarborComponentDown) resolved: No data about Harbor components found. #page - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborComponentDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborComponentDown [13:16:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [13:29:39] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 (T359638) [13:29:44] T359638: [toolsbeta,infra] upgrade kubernetes to 1.24 - https://phabricator.wikimedia.org/T359638 [13:30:33] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 (T359638) [13:30:46] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 (T359638) [13:31:39] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 (T359638) [13:31:46] !log aborrero@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 (T359638) [13:32:41] !log aborrero@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 (T359638) [13:33:22] 06cloud-services-team, 10Toolforge: Upgrade Toolforge Kubernetes to version 1.25 - https://phabricator.wikimedia.org/T316107#9637874 (10aborrero) [13:35:12] 06cloud-services-team, 10Toolforge (Toolforge iteration 07): 14Upgrade Toolforge Kubernetes to version 1.24 - 14https://phabricator.wikimedia.org/T307651#9637872 (10aborrero) 05In progress→03Resolved 14completed. [13:37:22] 06cloud-services-team, 10Toolforge, 07Kubernetes: [infra] Remove TTLAfterFinished from config before upgrade to 1.25 - https://phabricator.wikimedia.org/T349197#9637898 (10taavi) [13:45:22] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [13:45:40] (03CR) 10Jforrester: [C:03+2] build: Updating mediawiki/mediawiki-codesniffer to 43.0.0 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1011424 (owner: 10Libraryupgrader) [13:50:38] 10Toolforge, 07Kubernetes: Toolforge: replace admission controllers with an existing policy admin project - https://phabricator.wikimedia.org/T335131#9637967 (10aborrero) Just noticed: * OPA: CNCF graduated * Kyverno: CNCF incubating * Kubewarnen: CNCF sandbox [13:58:34] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [toolforge-webservice] Remove old webservice-runner code - https://phabricator.wikimedia.org/T358320#9638024 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/31 Remove grid engine support [13:58:36] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9638023 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webser... [14:01:30] 10Cloud-VPS: [cloud-vps] creating a new project can override existing DNS entries - https://phabricator.wikimedia.org/T360294#9638031 (10fnegri) While we find if there's a better way to prevent this, I've added a note to the project creation steps to check for DNS clashes: https://wikitech.wikimedia.org/wiki/Por... [14:03:20] 10Toolforge, 13Patch-For-Review: [envvars-cli] Either hide or show envvars values, but not both - https://phabricator.wikimedia.org/T359558#9638039 (10CodeReviewBot) sstefanova merged https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/30 d/changelog: bump to 0.0.5 [14:03:53] 10Toolforge, 13Patch-For-Review: [envvars-cli] Either hide or show envvars values, but not both - https://phabricator.wikimedia.org/T359558#9638032 (10CodeReviewBot) sstefanova opened https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/30 d/changelog: bump to 0.0.5 [14:04:24] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [toolforge-webservice] Remove old webservice-runner code - https://phabricator.wikimedia.org/T358320#9638044 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/32 Tag 0.103.5 [14:04:27] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9638042 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webser... [14:10:29] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638078 (10MBH) 14And webservice's `error.log` now contain only `200` responses, they should be written into `access.log` instead. @dcaro could you see? I also repor... [14:15:22] (HAProxyBackendUnavailable) resolved: HAProxy service nova-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [14:21:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:26:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:32:41] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [toolforge-webservice] Remove old webservice-runner code - https://phabricator.wikimedia.org/T358320#9638137 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/32 Tag 0.103.5 [14:32:43] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9638136 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webser... [14:39:45] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: 14[toolforge-webservice] Remove old webservice-runner code - 14https://phabricator.wikimedia.org/T358320#9638161 (10taavi) 05Open→03Resolved [14:45:13] 14Grid-Engine-to-K8s-Migration: 14Migrate dibot from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319676#9638170 (10dcaro) 14>>! In T319676#9635363, @MBH wrote: > https://github.com/Saisengen/dmitry89-tools Sent PR with an example on how to continue :) https://github.... [14:46:37] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Toolforge (Toolforge iteration 07), 05Goal, 13Patch-For-Review: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9638171 (10taavi) [14:54:59] 10Toolforge (Toolforge iteration 07): [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329 (10dcaro) 03NEW [14:56:24] 10Toolforge (Toolforge iteration 07): [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329#9638219 (10dcaro) p:05Triage→03High a:03dcaro [14:56:41] (CloudVPSDesignateLeaks) resolved: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:58:11] 10VPS-project-Wikistats: Add kuswiki to wikistats - https://phabricator.wikimedia.org/T360307#9638234 (10Dzahn) a:03Dzahn [14:58:25] 10VPS-project-Wikistats: Add bewwiki to wikistats - https://phabricator.wikimedia.org/T360314#9638235 (10Dzahn) a:03Dzahn [15:01:10] 14Grid-Engine-to-K8s-Migration: 14Migrate dibot from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319676#9638253 (10MBH) 14Thanks, I merged it. Now could I delete https://github.com/Saisengen/wikibots/tree/main/php-tools ? Will you construct a build image, like on my t... [15:01:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [15:02:24] 10Toolforge (Toolforge iteration 07): [builds-builder,builds-admission] Remove direct access to tekton from tools and remove the admission controller - https://phabricator.wikimedia.org/T360329#9638264 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/merge_reque... [15:04:19] 14Grid-Engine-to-K8s-Migration: 14Migrate dibot from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319676#9638284 (10dcaro) 14>>! In T319676#9638253, @MBH wrote: > Thanks, I merged it. Now could I delete https://github.com/Saisengen/wikibots/tree/main/php-tools ? Yes,... [15:06:03] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9638287 (10Jhancock.wm) [15:06:09] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9638286 (10Jhancock.wm) @Andrew we've received these servers. Could you update this ticket with racking requirements and names of the servers? [15:08:16] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638288 (10MBH) 14I also have a question. As far as I understand, an image we built, running my tools on k8s now, contains some virtual filesystem, and I can't view i... [15:13:07] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638295 (10dcaro) 14>>! In T319883#9638288, @MBH wrote: > I also have a question. As far as I understand, an image we built, running my tools on k8s now, contains som... [15:21:42] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638307 (10dcaro) 14>>! In T319883#9637757, @MBH wrote: > One of my continuous jobs runs on k8s, it fails with error, but it `.err` file doesn't updated, so I can't r... [15:29:27] 06cloud-services-team, 10wikitech.wikimedia.org, 07Epic: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#9638340 (10taavi) [15:43:13] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638413 (10MBH) 14I will try this, but .err file is empty after many hours after crashes, it doesn't look like buffering/caching problem. [16:22:27] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [16:22:30] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.restart_openstack (exit_code=99) [16:24:42] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack [16:28:12] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [16:30:42] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638706 (10dcaro) 14>>! In T319883#9594043, @MBH wrote: > Thank you very much. Is there no way to automatically remove deleted and renamed tool files from "cgi-bin" f... [16:32:49] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638720 (10dcaro) 14>>! In T319883#9638413, @MBH wrote: > I will try this, but .err file is empty after many hours after crashes, it doesn't look like buffering/cachi... [16:38:39] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638737 (10MBH) 14I stopped a bot to load a new version of it, now I re-runned it. [16:51:46] 06cloud-services-team, 10Toolforge (Toolforge iteration 07): Upgrade Toolforge static server (tools-static.wmflabs.org) to Debian Bullseye - https://phabricator.wikimedia.org/T311913#9638856 (10dcaro) p:05Triage→03High [16:55:12] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638895 (10dcaro) 14>>! In T319883#9638737, @MBH wrote: > I stopped a bot to load a new version of it, now I re-runned it. It seems to be running, no crashes so far,... [17:03:23] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9638938 (10MBH) 14No, by default it doesn't output anything after I avoided using DotNetWikiBot framework. New version can be stable and not producing errors. But so... [17:10:31] 10Wikibugs: Hashar does not like grey foreground color for distinguising closed status events - https://phabricator.wikimedia.org/T360353 (10bd808) 03NEW [17:12:07] 10Wikibugs, 13Patch-For-Review: 14Print events in closed tasks in grey - 14https://phabricator.wikimedia.org/T140881#9639017 (10bd808) 05Open→03Resolved 14I created {T360353} for the bug report that was used to reopen this completed task. [17:13:03] 10Wikibugs: Hashar does not like grey foreground color for distinguising closed status events - https://phabricator.wikimedia.org/T360353#9639026 (10bd808) > * change the grey to be darker / black but that defeat its purposes in other context when a dimmed text is desirable The color used is just the standard A... [17:14:41] 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9639048 (10bd808) [17:15:05] 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9639050 (10bd808) [17:15:13] 10Wikibugs, 13Patch-For-Review: 14Print events in closed tasks in grey - 14https://phabricator.wikimedia.org/T140881#9639051 (10bd808) [17:15:57] 14Grid-Engine-to-K8s-Migration: 14Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - 14https://phabricator.wikimedia.org/T319883#9639058 (10dcaro) 14>>! In T319883#9638938, @MBH wrote: > No, by default it doesn't output anything after I avoided using DotNetWikiBot framework. New version can be... [17:18:33] 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9639079 (10bd808) We could try to mimic the Phabricator rendering of the titles of closed tasks by using strikethrough styling instead of a color change. I'm not sure how wel... [17:22:18] 10Wikibugs: Hashar does not like grey foreground color for distinguishing closed status events - https://phabricator.wikimedia.org/T360353#9639106 (10bd808) @greg and @taavi are the other two #wikibugs watchers who have asked in public about the intent of the grey color implementation. Perhaps they have addition... [17:23:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance ntp-04 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:26:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance cvn-nfs-1 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:28:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance cloudinfra-internal-puppetmaster-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:36:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance cvn-app10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:36:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance extdist-06 on project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:36:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance tf-bastion on project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:39:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:41:28] (PuppetAgentNoResources) firing: (4) No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [17:49:04] 10Toolforge (Toolforge iteration 07): [jobs-api] Remove flask-restful - https://phabricator.wikimedia.org/T359806#9639231 (10dcaro) a:03dcaro [17:49:09] 10Toolforge: [jobs-api] Refactor before webservice support - https://phabricator.wikimedia.org/T359804#9639234 (10dcaro) [17:49:38] 10Toolforge (Toolforge iteration 07): [jobs-api] Remove flask-restful - https://phabricator.wikimedia.org/T359806#9639230 (10dcaro) [17:49:59] 10Toolforge (Toolforge iteration 07): [jobs-api] Remove flask-restful - https://phabricator.wikimedia.org/T359806#9639233 (10dcaro) 05Open→03In progress [17:53:28] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance cloudinfra-internal-puppetmaster-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:08:28] (PuppetAgentNoResources) firing: (3) No Puppet resources found on instance cloudinfra-internal-puppetmaster-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:13:28] (PuppetAgentNoResources) resolved: (3) No Puppet resources found on instance cloudinfra-internal-puppetmaster-02 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:15:09] 10Toolforge (Toolforge iteration 07), 13Patch-For-Review: [jobs-api] Remove flask-restful - https://phabricator.wikimedia.org/T359806#9639337 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/67 remove flask restful [18:16:28] (PuppetAgentNoResources) firing: (4) No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:26:28] (PuppetAgentNoResources) firing: (4) No Puppet resources found on instance cvn-apache10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:29:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance bastion on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:46:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance paws-puppetserver-1 on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:47:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance clouddb-services-puppetserver-1 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:51:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance extdist-06 on project extdist - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:51:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance cvn-app10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:56:28] (PuppetAgentNoResources) resolved: (2) No Puppet resources found on instance cvn-app10 on project cvn - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [18:56:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance tf-bastion on project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:01:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [19:08:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance etcd-discovery-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:11:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance metricsinfra-puppetserver-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:14:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance project-proxy-puppetserver-1 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:23:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:24:33] 10Tool-Global-user-contributions, 06Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Synthesise user testing results - https://phabricator.wikimedia.org/T358098#9639513 (10KColeman-WMF) [19:25:17] 10Cloud-VPS: As a CloudVPS user, I want to specify a wildcard subdomain webproxy to direct to an instance in my project - https://phabricator.wikimedia.org/T360363 (10SDunlap) 03NEW [19:26:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance metricsinfra-puppetserver-1 on project metricsinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:28:04] 10Cloud-VPS: Request to add catalyst.wmcloud.org webproxy subdomain for the catalyst CloudVPS project - https://phabricator.wikimedia.org/T360364 (10SDunlap) 03NEW [19:31:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance paws-puppetserver-1 on project paws - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:32:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance clouddb-services-puppetserver-1 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:34:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance project-proxy-puppetserver-1 on project project-proxy - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:38:28] (PuppetAgentNoResources) resolved: (2) No Puppet resources found on instance cloudinfra-internal-puppetserver-1 on project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [19:41:06] 10VPS-project-Wikistats: Add bewwiki to wikistats - https://phabricator.wikimedia.org/T360314#9639611 (10Dzahn) waiting for T357866 (there is no direct link to it because of the "post creation work" task in between tasks which imho makes things harder without a clear benefit [19:41:37] 10VPS-project-Wikistats: Add kuswiki to wikistats - https://phabricator.wikimedia.org/T360307#9639616 (10Dzahn) waiting for T359757 [19:59:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance cloud-cumin-05 in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:09:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance cloud-cumin-05 in project cloudinfra - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [20:55:37] 10Toolforge (Software install/update): Provide a Redis container for use within a tool's namespace - https://phabricator.wikimedia.org/T360378 (10bd808) 03NEW [20:58:52] 10Toolforge (Software install/update): Provide a Redis container for use within a tool's namespace - https://phabricator.wikimedia.org/T360378#9640019 (10bd808) At this time I would propose that the Redis is configured not to use any persistent storage strategy. The lack of persistent volume claims (PVCs) in the... [21:02:34] 10Toolforge (Software install/update): Provide a Redis container for use within a tool's namespace - https://phabricator.wikimedia.org/T360378#9640026 (10bd808) My near term ulterior motive for this task is being able to add a tool-local Redis to #wikibugs without needing to do a similar amount of work as I did... [21:21:02] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:21:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:26:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:35:15] 06cloud-services-team, 10Toolforge: Large number of tools with servicegroup but no matching user - https://phabricator.wikimedia.org/T360379 (10taavi) 03NEW [22:30:52] (03CR) 10Krinkle: [C:03+2] repositories: Add some "performance" repos [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1010647 (owner: 10Umherirrender) [22:31:32] (03Merged) 10jenkins-bot: repositories: Add some "performance" repos [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/1010647 (owner: 10Umherirrender) [23:00:59] 10Wikibugs: Explore replacing asyncio-redis with redis.asyncio - https://phabricator.wikimedia.org/T360074#9640336 (10bd808) 05Open→03In progress a:03bd808 [23:05:25] 10Wikibugs: Wikibugs testing task - https://phabricator.wikimedia.org/T90594#9640350 (10bd808) test [23:17:31] 10Wikibugs: Explore replacing asyncio-redis with redis.asyncio - https://phabricator.wikimedia.org/T360074#9640393 (10CodeReviewBot) bd808 updated https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/merge_requests/15 Revamp irc bot plugin and queue