[00:01:31] 10Quarry: Remove redis - https://phabricator.wikimedia.org/T360584#9647966 (10Frostly) Maybe consider https://www.dragonflydb.io/replace-redis or https://docs.keydb.dev/ [00:08:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:10:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [00:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:13:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:16:41] (PrometheusRestarted) firing: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [00:41:41] (PrometheusRestarted) resolved: Prometheus/cloud restarted: beware monitoring artifacts. - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_was_restarted - https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?var-datasource=eqiad%20prometheus%2Fcloud - https://alerts.wikimedia.org/?q=alertname%3DPrometheusRestarted [01:41:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 14 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [01:48:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 12 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [01:56:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on metricsinfra-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [02:26:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 719 deleted instances on cloudinfra-cloudvps-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [03:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [04:10:15] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:41:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 14 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [04:48:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 12 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [04:56:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on metricsinfra-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [05:26:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 744 deleted instances on cloudinfra-cloudvps-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [06:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [07:41:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 14 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [07:53:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 12 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [07:56:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on metricsinfra-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [08:10:16] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [08:26:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 767 deleted instances on cloudinfra-cloudvps-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [08:30:50] (ProbeDown) firing: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [08:35:50] (ProbeDown) resolved: Service tools-static-15:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-15:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:14:23] (03PS1) 10Muehlenhoff: Delete peopleweb dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/1013227 (https://phabricator.wikimedia.org/T360413) [10:20:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:22:22] (HAProxyBackendUnavailable) firing: HAProxy service nova-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [10:28:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-test-k8s-ingress-7 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:33:28] (PuppetAgentNoResources) firing: (11) No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:38:28] (PuppetAgentNoResources) firing: (15) No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:41:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 14 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [10:43:28] (PuppetAgentNoResources) firing: (20) No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:48:28] (PuppetAgentNoResources) firing: (24) No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:53:01] 10Quarry, 10Toolforge, 10ChangeProp, 06Commons, and 7 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9648786 (10taavi) [10:53:28] (PuppetAgentNoResources) firing: (24) No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [10:53:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 11 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [10:56:28] (PuppetStaleCertificates) firing: Found non-revoked Puppet certificates for 1 deleted instances on metricsinfra-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [10:58:28] (PuppetAgentNoResources) firing: (24) No Puppet resources found on instance toolsbeta-acme-chief-2 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:03:28] (PuppetAgentNoResources) resolved: (22) No Puppet resources found on instance toolsbeta-bastion-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:03:28] (PuppetStaleCertificates) resolved: Found non-revoked Puppet certificates for 11 deleted instances on toolsbeta-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:04:38] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9648841 (10dcaro) [11:07:26] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9648853 (10dcaro) New hard drives offline uncorrectable values (cloudcephosd1030) are all 0: ` root@cloudcephosd1030... [11:12:22] (HAProxyBackendUnavailable) resolved: HAProxy service nova-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [11:16:28] (PuppetStaleCertificates) resolved: Found non-revoked Puppet certificates for 4 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:21:28] (PuppetStaleCertificates) resolved: Found non-revoked Puppet certificates for 1 deleted instances on metricsinfra-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:21:28] (PuppetStaleCertificates) firing: (2) Found non-revoked Puppet certificates for 790 deleted instances on cloudinfra-cloudvps-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [11:32:43] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [11:32:59] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [11:35:20] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [11:35:31] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [11:35:48] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [11:35:57] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [11:36:31] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [11:36:42] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [11:37:51] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [11:38:01] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=99) [11:38:34] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy [11:40:23] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) [11:41:55] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.depool_and_destroy (T348643) [12:09:53] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_haproxy_node [12:10:01] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:16:00] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) [12:17:02] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_haproxy_node [12:20:00] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:21:30] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:23:18] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) [12:36:30] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [13:07:05] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS, 06DC-Ops, 10ops-eqiad, 06SRE: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643#9648883 (10dcaro) [13:16:04] 10Quarry, 10Toolforge, 10ChangeProp, 06Commons, and 7 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9648979 (10Reedy) https://github.com/Snapchat/KeyDB already existed as a fork. https://github.com/Snapchat/KeyDB/issues/798 was fi... [13:28:35] 10Cloud-VPS: Frequent radosgw 500 errors with OpenTofu - https://phabricator.wikimedia.org/T360626 (10taavi) 03NEW [13:37:14] (03PS5) 10Majavah: vps: create_instance: Do not assume k8s-specific security group [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013013 [13:37:22] (03PS3) 10Majavah: vps: create_instance: Add flag to sign Puppet certs [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013080 [13:37:30] (03PS3) 10Majavah: wmcs_libs: openstack: Improve Neutron port handling [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013081 [13:37:38] (03PS6) 10Majavah: toolforge: Add cookbook to add new K8s HAProxy node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013082 (https://phabricator.wikimedia.org/T349206) [13:37:46] (03PS1) 10Majavah: toolforge: Add cookbook to remove a K8s HAProxy node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013302 (https://phabricator.wikimedia.org/T349206) [13:38:25] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-test-k8s-haproxy-3 [13:38:58] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node toolsbeta-test-k8s-haproxy-3 [13:39:23] (03CR) 10CI reject: [V:04-1] toolforge: Add cookbook to remove a K8s HAProxy node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013302 (https://phabricator.wikimedia.org/T349206) (owner: 10Majavah) [13:41:11] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation): Migrate metricsinfra off buster - https://phabricator.wikimedia.org/T360630 (10taavi) 03NEW [13:41:19] (03PS2) 10Majavah: toolforge: Add cookbook to remove a K8s HAProxy node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013302 (https://phabricator.wikimedia.org/T349206) [13:42:37] (ProbeDown) firing: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:44:03] (03PS3) 10Majavah: toolforge: Add cookbook to remove a K8s HAProxy node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013302 (https://phabricator.wikimedia.org/T349206) [13:47:37] (ProbeDown) resolved: (2) Service toolsbeta-test-k8s-haproxy-3:30000 has failed probes (http_admin_beta_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-test-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [13:50:08] (03PS4) 10Majavah: toolforge: Add cookbook to remove a K8s HAProxy node [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013302 (https://phabricator.wikimedia.org/T349206) [13:56:42] 06cloud-services-team, 10wikitech.wikimedia.org: Bitu not importing key changes directly in LDAP - https://phabricator.wikimedia.org/T360634 (10taavi) 03NEW [13:57:14] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9649435 (10Jhancock.wm) [13:58:15] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.depool_and_destroy (exit_code=0) (T348643) [14:01:28] (PuppetStaleCertificates) resolved: Found non-revoked Puppet certificates for 811 deleted instances on cloudinfra-cloudvps-puppetserver-1 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:06:38] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:08:56] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:10:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:10:54] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-test-k8s-haproxy-4 [14:11:57] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node toolsbeta-test-k8s-haproxy-4 [14:12:10] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:14:24] 06cloud-services-team, 10wikitech.wikimedia.org: Disable SSH key management on Wikitech - https://phabricator.wikimedia.org/T359544#9649647 (10SLyngshede-WMF) [14:15:41] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:15:41] (CloudVPSDesignateLeaks) firing: (4) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:19:45] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:19:55] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:21:05] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:21:35] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:25:14] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:25:16] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:28:31] 10Cloud-VPS: Frequent radosgw 500 errors with OpenTofu - https://phabricator.wikimedia.org/T360626#9649732 (10dcaro) The log from the radosgw is a 200 for a HEAD, maybe it's a different request? (usually that's used to get etags/caching stuff afaik). [14:31:10] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:35:24] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:35:28] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:35:49] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:35:51] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:37:29] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:37:31] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:41:33] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:44:05] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:44:08] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:44:23] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:48:40] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T348643) [14:49:08] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:49:26] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:50:41] (CloudVPSDesignateLeaks) firing: (5) Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:51:32] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:51:59] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [14:52:01] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [14:55:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:56:21] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0) (T348643) [14:58:32] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T360643 (10rook) 03NEW [14:59:49] 10Quarry: Scrape prometheus metrics from Quarry - https://phabricator.wikimedia.org/T360220#9649858 (10phuedx) >>! In T360220#9634600, @taavi wrote: > They are supposed to be available on https://prometheus.wmcloud.org ([[ https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Monitoring | docs ]]) but seems... [15:00:06] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.bootstrap_and_add (T348643) [15:03:43] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T360643#9649873 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/392 [15:03:56] vivian-rook opened https://github.com/toolforge/paws/pull/392 [15:07:09] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:08:47] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99) (T348643) [15:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:12:09] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [15:14:18] (03CR) 10BryanDavis: [C:04-2] "Let's sit on this idea for a bit while we wait to see if a strong hard fork of Redis shows up following https://redis.com/blog/redis-adopt" [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1012797 (https://phabricator.wikimedia.org/T360378) (owner: 10BryanDavis) [15:20:43] 10Quarry, 10Toolforge, 10ChangeProp, 06Commons, and 8 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9649904 (10brennen) For GitLab: I //think// we currently run the bundled Redis in their Omnibus package. In that case, the easiest... [15:22:51] 10Toolforge (Software install/update), 13Patch-For-Review: Provide a Redis container for use within a tool's namespace - https://phabricator.wikimedia.org/T360378#9649924 (10bd808) 05In progress→03Stalled I have put a -2 lock on my https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+... [15:35:37] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_haproxy_node [15:41:05] 10Toolforge: Toolforge should not re-invent profile::mail::default_mail_relay - https://phabricator.wikimedia.org/T360651 (10taavi) 03NEW [15:41:31] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216#9650076 (10Jhancock.wm) [15:42:30] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) [15:42:53] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-k8s-haproxy-3 [15:42:54] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=99) for node toolsbeta-k8s-haproxy-3 [15:43:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-k8s-haproxy-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [15:43:39] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-3 [15:44:42] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-3 [15:46:05] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-4 [15:47:08] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-4 [15:47:51] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:52:51] (ProbeDown) resolved: (4) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [15:53:04] 06cloud-services-team, 10Toolforge (Toolforge iteration 07), 07Kubernetes, 13Patch-For-Review: [infra] Upgrade Toolforge K8s haproxies to Bookworm - https://phabricator.wikimedia.org/T349206#9650132 (10taavi) The hosts have been replaced, but the cookbooks are still pending reviews. [15:53:28] (InstanceDown) firing: Project tools instance tools-k8s-haproxy-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:58:28] (InstanceDown) resolved: Project tools instance tools-k8s-haproxy-4 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [16:03:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tools-k8s-haproxy-6 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [16:04:15] (03CR) 10Dzahn: [C:03+2] Delete peopleweb dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/1013227 (https://phabricator.wikimedia.org/T360413) (owner: 10Muehlenhoff) [16:04:16] (03CR) 10Dzahn: [V:03+2 C:03+2] Delete peopleweb dummy cert [labs/private] - 10https://gerrit.wikimedia.org/r/1013227 (https://phabricator.wikimedia.org/T360413) (owner: 10Muehlenhoff) [16:11:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:16:41] (CloudVPSDesignateLeaks) firing: (5) Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:51:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:52:09] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [16:56:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [17:13:55] (03PS1) 10Dzahn: delete rt.discovery.wmnet dummy key [labs/private] - 10https://gerrit.wikimedia.org/r/1013367 (https://phabricator.wikimedia.org/T360413) [17:14:07] (03PS2) 10Dzahn: delete rt.discovery.wmnet dummy key [labs/private] - 10https://gerrit.wikimedia.org/r/1013367 (https://phabricator.wikimedia.org/T360413) [17:14:32] (03CR) 10Dzahn: [V:03+2 C:03+2] delete rt.discovery.wmnet dummy key [labs/private] - 10https://gerrit.wikimedia.org/r/1013367 (https://phabricator.wikimedia.org/T360413) (owner: 10Dzahn) [17:20:37] (03PS8) 10David Caro: ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 [17:20:45] (03PS8) 10David Caro: ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 [17:20:53] (03PS8) 10David Caro: ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 [17:21:01] (03PS1) 10David Caro: ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 [17:22:09] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:25:21] (03CR) 10CI reject: [V:04-1] ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (owner: 10David Caro) [17:26:34] (03CR) 10CI reject: [V:04-1] ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 (owner: 10David Caro) [17:27:30] (03PS2) 10David Caro: ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 [17:31:07] (03CR) 10CI reject: [V:04-1] ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (owner: 10David Caro) [17:32:21] (03PS9) 10David Caro: ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 [17:32:29] (03PS3) 10David Caro: ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 [17:33:50] (03CR) 10CI reject: [V:04-1] ceph: drain and undrain in chunks [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1013369 (owner: 10David Caro) [18:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:35:57] 10PAWS, 10OpenRefine: Starting OpenRefine projects with Wikimedia Commons extension fails for some (but not all) users on PAWS - https://phabricator.wikimedia.org/T360690 (10Spinster) 03NEW [18:36:37] (03PS1) 10Dzahn: delete planet.discovery.wmnet key [labs/private] - 10https://gerrit.wikimedia.org/r/1013388 (https://phabricator.wikimedia.org/T360413) [18:37:45] 10PAWS, 10OpenRefine: Starting OpenRefine projects with Wikimedia Commons extension fails for some (but not all) users on PAWS - https://phabricator.wikimedia.org/T360690#9651406 (10Spinster) I referred to this ticket here https://github.com/OpenRefine/CommonsExtension/issues/99 [18:38:03] 10PAWS, 10OpenRefine: Starting OpenRefine projects with Wikimedia Commons extension fails for some (but not all) users on PAWS - https://phabricator.wikimedia.org/T360690#9651407 (10Spinster) Works perfectly fine for me in my own PAWS installation by the way. [18:57:17] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T360643#9651440 (10rook) Getting ` Not accepting cookie auth on POST /user/VRook%20%28WMF%29/rstudio/rpc/client_init: HTTP 403: Forbidden ('_xsrf' argument missing from POST) ` When trying to open rstudio and openrefine [19:01:38] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Cloud-vps Buster deprecation - https://phabricator.wikimedia.org/T331738#9651458 (10Andrew) [19:05:13] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Migrate project-proxy off of Debian Buster - https://phabricator.wikimedia.org/T360693 (10Andrew) 03NEW [19:05:43] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-Projects, 10Puppet (Puppet 7.0): Are clouddb-wikireplicas-query-1 and the cloudb-services project still useful? - https://phabricator.wikimedia.org/T359810#9651491 (10Andrew) [19:07:54] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Migrate cloudinfra project off of debian buster - https://phabricator.wikimedia.org/T360696 (10Andrew) 03NEW [19:08:17] (03PS2) 10Dzahn: delete planet.discovery.wmnet key [labs/private] - 10https://gerrit.wikimedia.org/r/1013388 (https://phabricator.wikimedia.org/T360413) [19:09:36] (03CR) 10Dzahn: [V:03+2 C:03+2] delete planet.discovery.wmnet key [labs/private] - 10https://gerrit.wikimedia.org/r/1013388 (https://phabricator.wikimedia.org/T360413) (owner: 10Dzahn) [19:20:19] (03PS1) 10Dzahn: delete etherpad.discovery.wmnet dummy key, migrated to cfssl [labs/private] - 10https://gerrit.wikimedia.org/r/1013394 (https://phabricator.wikimedia.org/T360413) [19:33:06] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Toolsbeta: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T360699 (10Andrew) 03NEW [19:37:14] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'appservers' cloud-vps project - https://phabricator.wikimedia.org/T360700 (10Andrew) 03NEW [19:38:48] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Toolsbeta: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T360699#9651657 (10taavi) [19:38:50] 10Toolforge: Upgrade toolsbeta-nfs to Debian Bullseye/Bookworm - https://phabricator.wikimedia.org/T360419#9651658 (10taavi) [19:39:01] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge, 07Epic, 05Goal: Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#9651659 (10taavi) [19:39:09] 10Toolforge: Upgrade toolsbeta-nfs to Debian Bullseye/Bookworm - https://phabricator.wikimedia.org/T360419#9651671 (10taavi) [19:39:18] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-VPS (Debian Buster Deprecation), 10Toolforge, 07Epic, 05Goal: Toolforge: migrate to Debian Bullseye or later - https://phabricator.wikimedia.org/T311897#9651672 (10taavi) [19:39:44] 06cloud-services-team: Replace or remove Debian Buster VMs in 'monitoring' cloud-vps project - https://phabricator.wikimedia.org/T360703 (10Andrew) 03NEW [19:40:00] 10Toolforge: Upgrade Toolsbeta Redis cluster to Bookworm - https://phabricator.wikimedia.org/T360704 (10taavi) 03NEW [19:40:16] 10Toolforge: Upgrade Toolsbeta Redis cluster to Bookworm - https://phabricator.wikimedia.org/T360704#9651694 (10taavi) [19:40:41] (CloudVPSDesignateLeaks) firing: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:41:50] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-Projects, 10Puppet (Puppet 7.0): Are clouddb-wikireplicas-query-1 and the cloudb-services project still useful? - https://phabricator.wikimedia.org/T359810#9651715 (10Andrew) I'm shutting down clouddb-wikireplicas-query-1 today; if it's... [19:43:04] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'monitoring' cloud-vps project - https://phabricator.wikimedia.org/T360703#9651717 (10taavi) [19:45:41] (CloudVPSDesignateLeaks) firing: (4) Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:49:42] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10VPS-Projects, 10Puppet (Puppet 7.0): Are clouddb-wikireplicas-query-1 and the cloudb-services project still useful? - https://phabricator.wikimedia.org/T359810#9651740 (10Andrew) https://phabricator.wikimedia.org/T272723 [19:50:16] 10Quarry: Remove redis - https://phabricator.wikimedia.org/T360584#9651747 (10Legoktm) >>! In T360584#9647966, @Frostly wrote: > Maybe consider https://www.dragonflydb.io/replace-redis It's not under an OSI-approved license, see https://github.com/dragonflydb/dragonfly/blob/main/LICENSE.md [19:50:22] 10Quarry: Remove redis from Quarry - https://phabricator.wikimedia.org/T360584#9651748 (10Legoktm) [19:50:41] (CloudVPSDesignateLeaks) firing: (5) Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:52:04] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T360643#9651751 (10rook) https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/3372 [19:59:27] 10Cloud-Services: Reset of Security Group rules... - https://phabricator.wikimedia.org/T360694#9651787 (10Peachey88) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag t... [20:00:50] 10Cloud-VPS: Reset of Security Group rules... - https://phabricator.wikimedia.org/T360694#9651802 (10JJMC89) [20:05:35] 10Cloud-VPS: Reset of Security Group rules... - https://phabricator.wikimedia.org/T360694#9651825 (10bd808) @KHurd-WMF Are you not able to use Horizon to change the rules back? https://openstack-browser.toolforge.org/project/openvas lists you and your sockpuppet account as members (OpenStack for administrator) o... [20:07:16] 10Cloud-VPS: Reset default Security Group rules for the openvas Cloud VPS project - https://phabricator.wikimedia.org/T360694#9651842 (10bd808) [20:19:08] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'traffic' cloud-vps project - https://phabricator.wikimedia.org/T360710 (10Andrew) 03NEW [20:20:21] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'video' cloud-vps project - https://phabricator.wikimedia.org/T360711 (10Andrew) 03NEW [20:22:44] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation): Replace or remove Debian Buster VMs in 'wikidata-dev' cloud-vps project - https://phabricator.wikimedia.org/T360713 (10Andrew) 03NEW [20:28:07] 10Quarry, 10Toolforge, 10ChangeProp, 06Commons, and 9 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9651946 (10Krinkle) [20:35:51] 10Quarry, 10Toolforge, 10ChangeProp, 10GitLab, and 8 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9651962 (10Peachey88) [20:45:08] 10Cloud-VPS: Reset default Security Group rules for the openvas Cloud VPS project - https://phabricator.wikimedia.org/T360694#9651988 (10KHurd-WMF) {F42969765} {F42970218} Those are the 4 that are not able to be re-added. The "any" option is not available. [21:06:24] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9652068 (10bking) We're going to upgrade curator (as well as its library)... [21:10:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:14:14] 10Quarry, 10Toolforge, 10ChangeProp, 10GitLab, and 8 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9652082 (10Krinkle) In MediaWiki (as deployed at WMF), there exists 1 use of Redis, which is during file uploads via LockManager. T... [21:20:10] 10Wikibugs, 07Software-Licensing: Relicense from MIT to GPL-3.0-or-later after approval by all substantive contributors - https://phabricator.wikimedia.org/T360718 (10bd808) 03NEW [21:20:29] 10Wikibugs, 07Software-Licensing: Relicense from MIT to GPL-3.0-or-later after approval by all substantive contributors - https://phabricator.wikimedia.org/T360718#9652112 (10bd808) p:05Triage→03High [21:39:17] 10Wikibugs, 07Software-Licensing: Relicense Wikibugs from MIT to GPL-3.0-or-later after approval by all substantive contributors - https://phabricator.wikimedia.org/T360718#9652187 (10bd808) [21:50:28] (InstanceDown) firing: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:00:28] (InstanceDown) resolved: Project cloudinfra instance cloudinfra-cloudvps-puppetserver-1 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [22:00:41] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9652267 (10Volans) >>! In T345337#9652068, @bking wrote: > We're going to... [22:02:49] 10Quarry, 10Toolforge, 10ChangeProp, 10GitLab, and 8 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9652274 (10Ladsgroup) >>! In T360596#9652082, @Krinkle wrote: > In MediaWiki (as deployed at WMF), there exists 1 use of Redis, whi... [22:04:21] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node on host 'cloudcontrol2004-dev.codfw.wmnet' (T357133) [22:04:26] T357133: Integrate Bookworm 12.5 point update - https://phabricator.wikimedia.org/T357133 [22:05:01] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) on host 'cloudcontrol2004-dev.codfw.wmnet' (T357133) [22:05:10] (GaleraClusterSizeMismatch) firing: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [22:10:10] (GaleraClusterSizeMismatch) resolved: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [22:10:52] 10cloud-services-team (FY2023/2024-Q3-Q4), 06Infrastructure-Foundations, 10Spicerack, 10SRE-tools, 13Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337#9652299 (10bking) > The linked task is this same one. Did you meant to li... [22:20:41] (CloudVPSDesignateLeaks) firing: (5) Detected 16 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:24:10] (GaleraClusterSizeMismatch) firing: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [22:24:22] (HAProxyBackendUnavailable) firing: HAProxy service mysql backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:25:41] (CloudVPSDesignateLeaks) resolved: (5) Detected 16 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:26:25] 10Toolforge: [jobs-api,jobs-cli] Support services in jobs - https://phabricator.wikimedia.org/T348758#9652350 (10Raymond_Ndibe) a:03Raymond_Ndibe [22:29:10] (GaleraClusterSizeMismatch) resolved: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [22:29:22] (HAProxyBackendUnavailable) resolved: HAProxy service mysql backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:30:52] (HAProxyBackendUnavailable) firing: (2) HAProxy service mysql backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:31:10] (GaleraClusterSizeMismatch) firing: (2) Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [22:31:23] (03PS2) 10Dzahn: delete etherpad.discovery.wmnet dummy key, migrated to cfssl [labs/private] - 10https://gerrit.wikimedia.org/r/1013394 (https://phabricator.wikimedia.org/T360413) [22:34:51] (03CR) 10Dzahn: [V:03+2 C:03+2] delete etherpad.discovery.wmnet dummy key, migrated to cfssl [labs/private] - 10https://gerrit.wikimedia.org/r/1013394 (https://phabricator.wikimedia.org/T360413) (owner: 10Dzahn) [22:35:52] (HAProxyBackendUnavailable) resolved: (2) HAProxy service mysql backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [22:36:10] (GaleraClusterSizeMismatch) resolved: (2) Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch [23:15:40] (03PS1) 10Dzahn: delete aphlict.discovery dummy key, migrated to cfssl [labs/private] - 10https://gerrit.wikimedia.org/r/1013417 (https://phabricator.wikimedia.org/T360413) [23:16:11] (03PS1) 10Dzahn: delete releases.discovery dummy key, migrated to cfssl [labs/private] - 10https://gerrit.wikimedia.org/r/1013418 (https://phabricator.wikimedia.org/T360413) [23:19:18] (03PS1) 10Dzahn: delete doc.discovery dummy key, migrated to cfssl [labs/private] - 10https://gerrit.wikimedia.org/r/1013419 (https://phabricator.wikimedia.org/T360413) [23:25:51] 10Quarry: Remove redis from Quarry - https://phabricator.wikimedia.org/T360584#9652467 (10Frostly) :( [23:32:28] 10Quarry: Remove redis from Quarry - https://phabricator.wikimedia.org/T360584#9652473 (10bd808) [23:32:35] 10Quarry, 10Toolforge, 10ChangeProp, 10GitLab, and 8 others: Figure out a plan to move forward with regarding Redis License changes - https://phabricator.wikimedia.org/T360596#9652472 (10bd808)