[00:01:37] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[00:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[00:02:22] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: (2) HAProxy service Abuse has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[00:02:27] <wikibugs>	 10cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T352544 (10phaultfinder)
[00:03:52] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:11:37] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:15:09] <wikibugs>	 (03CR) 10Eugene233: [C: 03+2] Merge m2c branch to main [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/998266 (https://phabricator.wikimedia.org/T356772) (owner: 10Eugene233)
[00:15:36] <wikibugs>	 (03Merged) 10jenkins-bot: Merge m2c branch to main [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/998266 (https://phabricator.wikimedia.org/T356772) (owner: 10Eugene233)
[00:18:52] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:31:37] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:32:22] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: (3) HAProxy service Abuse has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[00:32:28] <wikibugs>	 10cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T352544 (10phaultfinder)
[00:33:52] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:33:56] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudcontrol1006 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:37:22] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: (3) HAProxy service Abuse has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[00:38:56] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:41:37] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:42:22] <jinxer-wm>	 (HAProxyServiceUnavailable) firing: (3) HAProxy service Abuse has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[00:43:56] <jinxer-wm>	 (SystemdUnitDown) firing: (3) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[00:47:22] <jinxer-wm>	 (HAProxyServiceUnavailable) resolved: (3) HAProxy service Abuse has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[00:52:52] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (2) HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[00:55:49] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[00:57:05] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[00:57:52] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (2) HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[01:06:07] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[01:28:56] <jinxer-wm>	 (SystemdUnitDown) firing: (5) The service unit nova-fullstack.service is in failed status on host cloudcontrol1006. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[01:31:07] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: (3) HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[01:32:23] <jinxer-wm>	 (HAProxyServiceUnavailable) resolved: (2) HAProxy service neutron-api_backend has no available backends on cloudlb1001:9900 - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyServiceUnavailable
[01:32:32] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.restart_openstack
[01:32:52] <jinxer-wm>	 (NeutronAgentDown) firing: (51) Neutron neutron-linuxbridge-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[01:32:59] <jinxer-wm>	 (MetricsinfraAlertmanagerDown) resolved: Metricsinfra alertmanager is unreachable #page - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/MetricsinfraAlertmanagerDown - TODO - https://alerts.wikimedia.org/?q=alertname%3DMetricsinfraAlertmanagerDown
[01:37:15] <logmsgbot_cloud>	 !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0)
[01:38:56] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit rabbitmq_detect_partition.service is in failed status on host cloudrabbit1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudrabbit1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[01:58:45] <jinxer-wm>	 (NovafullstackSustainedFailures) resolved: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[01:59:21] <jinxer-wm>	 (NeutronAgentDown) resolved: (51) Neutron neutron-linuxbridge-agent on cloudnet1005 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[02:47:31] <wmcs-alerts>	 (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 36074 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh
[03:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[03:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[03:12:01] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[03:21:26] <jinxer-wm>	 (SystemdUnitDown) resolved: The systemd unit backup_vms.service on node cloudbackup1004 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[04:37:00] <jinxer-wm>	 (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[04:47:00] <jinxer-wm>	 (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[05:47:31] <wmcs-alerts>	 (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 46874 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh
[06:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[06:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[06:14:09] <wikibugs>	 10Toolforge Build Service: [apt-buildpack] Need local Ubuntu mirror or package cache - https://phabricator.wikimedia.org/T357251 (10tstarling)
[06:16:28] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate zoomviewer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320210 (10tstarling) This is basically done, but performance seems very bad. Please test to confirm that it's not just me.  The server is not showing significant load while I...
[07:25:30] <wikibugs>	 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T356448 (10LibUp-bot) A new upstream version of OpenRefine is now available: 3.7.9. * https://github.com/OpenRefine/OpenRefine/releases/tag/3.7.9
[07:46:24] <wikibugs>	 (03CR) 10Eugene233: [C: 03+2] "Thank you so much for this fix." [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/998479 (owner: 10Juniorbesong)
[07:46:50] <wikibugs>	 (03Merged) 10jenkins-bot: BUG: T355466. Solve cannot import name "url_decode" from "werkzeug.urls" [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/998479 (owner: 10Juniorbesong)
[07:52:01] <jinxer-wm>	 (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[07:55:03] <wikibugs>	 (03PS1) 10Eugene233: SQL statement for pre-ping does not execute [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1000297 (https://phabricator.wikimedia.org/T355983)
[07:57:49] <wikibugs>	 (03CR) 10Eugene233: [C: 03+2] "Basic fix needs urgent testing." [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1000297 (https://phabricator.wikimedia.org/T355983) (owner: 10Eugene233)
[07:58:13] <wikibugs>	 (03Merged) 10jenkins-bot: SQL statement for pre-ping does not execute [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1000297 (https://phabricator.wikimedia.org/T355983) (owner: 10Eugene233)
[08:07:01] <jinxer-wm>	 (OpenstackAPIResponse) resolved: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[08:07:31] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[08:11:24] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudweb1003.wikimedia.org with OS bullseye
[08:27:59] <wikibugs>	 10Cloud-VPS: Automatically install Node.js on cloud instances - https://phabricator.wikimedia.org/T356441 (10taavi) 05Open→03Declined No, let's not pull Node, NPM and its hundreds of dependencies to all the instances where it would not be used in most of them.  >  While this might be straightforward for some...
[08:31:31] <wikibugs>	 10Cloud-VPS, 10MediaWiki-Vagrant: Update Vagrant puppet role  to work on Bookworm. - https://phabricator.wikimedia.org/T356551 (10taavi) a:03taavi
[08:33:09] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate zoomviewer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320210 (10Tacsipacsi) The full page load on https://zoomviewer.toolforge.org/index.php?f=Seattle+7.jpg&flash=no took 48 seconds for me as well, but it didn’t feel very long –...
[08:33:19] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Gather feedback from users of the 'unmanaged' debian-12.0-nopuppet image - https://phabricator.wikimedia.org/T355963 (10taavi)
[08:47:31] <wmcs-alerts>	 (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 57674 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh
[08:51:41] <wikibugs>	 10Cloud-VPS, 10MediaWiki-Vagrant, 10Patch-For-Review: Update Vagrant puppet role  to work on Bookworm. - https://phabricator.wikimedia.org/T356551 (10taavi) The above patch fixes the Puppet provisioning error, however the vagrant-lxc plugin seems to be broken: `lines=15 taavi@taavi-vagrant:/srv/mediawiki-vag...
[08:54:20] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudweb1003.wikimedia.org with OS bullseye completed: - cloudweb1003 (**PASS**)   - Do...
[08:59:14] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966 (10taavi)
[09:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[09:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[09:12:45] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[09:13:45] <jinxer-wm>	 (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[09:22:31] <jinxer-wm>	 (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[09:23:15] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12 - https://phabricator.wikimedia.org/T357264 (10dcaro) p:05Triage→03High
[09:23:19] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12 - https://phabricator.wikimedia.org/T357264 (10dcaro) 05Open→03In progress
[09:27:06] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12 - https://phabricator.wikimedia.org/T357264 (10dcaro)
[09:30:40] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Move cloudcontrol memcached flows to cloud-private - https://phabricator.wikimedia.org/T355417 (10taavi) 05Open→03Resolved
[09:31:35] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org, 10Trust-and-Safety: Account recovery help needed for Developer account Adamham - https://phabricator.wikimedia.org/T348663 (10taavi) Any news here?
[09:32:58] <wikibugs>	 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10LDAP, 10User-MoritzMuehlenhoff: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663 (10MoritzMuehlenhoff)
[09:36:08] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge: "My first Buildpack .NET tool" manual doesn't work due to ERR_CERT_INVALID - https://phabricator.wikimedia.org/T357206 (10dcaro) Yep, on toolforge the https endpoint is managed by the proxy, the webservices themselves just have to listen on port `$PORT` using http :)
[09:42:18] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge: Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10dcaro) Note that there's no stability or availability assurance for any of the k8s APIs. I understand they are way more powerful than the APIs/abstractions that we...
[09:42:43] <wikibugs>	 10Toolforge, 10cloud-services-team, 10Documentation, 10Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919 (10dcaro) Note that there's no stability or availability assurance for any of the k8s APIs. I understand...
[09:45:25] <wikibugs>	 10Toolforge (Quota-requests): Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209 (10dcaro) +1
[09:45:28] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate zoomviewer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320210 (10tstarling) It's really not a lot of data -- a previous maintainer turned the JPEG quality down to 50. In Chromium with a viewport width of 1373, reloading with cache...
[09:46:37] <wikibugs>	 10Toolforge Build Service: [apt-buildpack] Add local Ubuntu mirror or package cache - https://phabricator.wikimedia.org/T357251 (10dcaro)
[09:46:49] <wikibugs>	 10Toolforge Build Service: [apt-buildpack] Add local Ubuntu mirror or package cache - https://phabricator.wikimedia.org/T357251 (10dcaro) p:05Triage→03Medium
[09:46:56] <wikibugs>	 10Toolforge Build Service: [apt-buildpack] Add local Ubuntu mirror or package cache - https://phabricator.wikimedia.org/T357251 (10dcaro) p:05Medium→03Low
[09:47:30] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge: Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10dcaro) p:05Triage→03Low
[09:50:03] <wikibugs>	 10cloud-services-team: MetricsinfraAlertmanagerDown - https://phabricator.wikimedia.org/T357248 (10dcaro) This was a hiccup on neutron side: `  02:45:13 <andrewbogott> Andrew Bogott Ok, quick wrap-up:  It was not a denial of service. Neutron was in a split-brained state which meant it timed out on many operation...
[09:50:12] <wikibugs>	 10cloud-services-team: MetricsinfraAlertmanagerDown - https://phabricator.wikimedia.org/T357248 (10dcaro) 05Open→03Resolved a:03dcaro
[09:51:12] <wikibugs>	 10cloud-services-team: HAProxyServiceUnavailable - https://phabricator.wikimedia.org/T352544 (10dcaro) 05Open→03Resolved a:03dcaro This was a restart due to cloudrabbit unstability, making neutron unstable: `  02:45:13 <andrewbogott> Andrew Bogott Ok, quick wrap-up:  It was not a denial of service. Neutron...
[09:51:27] <wikibugs>	 10cloud-services-team: CRITICAL - degraded: The following units failed: check-private-data.service on clouddb1015, 1019, 1021 - https://phabricator.wikimedia.org/T355953 (10taavi) 05Open→03Resolved ` Feb 12 05:04:51 clouddb1015 systemd[1]: check-private-data.service: Succeeded. `
[09:58:08] <wikibugs>	 10cloud-services-team: SystemdUnitDown  Unit backup_vms.service on node cloudbackup1004 has been down for long. - https://phabricator.wikimedia.org/T357244 (10dcaro) This seems due to the same neutron outage yesterday: ` Feb 11 17:01:37 cloudbackup1004 wmcs-backup[28127]: <class 'neutronclient.common.exceptions....
[09:58:10] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge (Toolforge iteration 05): Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10taavi) a:03taavi
[09:58:23] <wikibugs>	 10cloud-services-team: SystemdUnitDown  Unit backup_vms.service on node cloudbackup1004 has been down for long. - https://phabricator.wikimedia.org/T357244 (10dcaro) 05Open→03Resolved a:03dcaro
[10:00:53] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12 - https://phabricator.wikimedia.org/T357264 (10dcaro) Previous instance of this {T355411}
[10:02:12] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge (Toolforge iteration 05), 10Patch-For-Review: Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/1...
[10:21:49] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge (Toolforge iteration 05), 10Patch-For-Review: Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/1...
[10:22:15] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org, 10Trust-and-Safety: Account recovery help needed for Developer account Adamham - https://phabricator.wikimedia.org/T348663 (10Nahid) 05Open→03Declined We have closed the ticket on T&S' end as we were not successful in confirming the identity. I will close...
[10:32:32] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge (Toolforge iteration 05): Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/189  maintain-kubeusers:...
[10:32:57] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers
[10:33:09] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers
[10:33:15] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers
[10:33:28] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers
[10:33:46] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Toolforge (Toolforge iteration 05): Tool user not allowed to read jobs/status in Kubernetes - https://phabricator.wikimedia.org/T357172 (10taavi) 05Open→03Resolved
[10:33:48] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140 (10taavi)
[10:48:45] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api
[10:49:00] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api
[10:50:58] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api
[10:51:11] <logmsgbot_cloud>	 !log sstefanova@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api
[10:55:16] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10Slst2020) 05In progress→03Resolved
[10:56:18] <wikibugs>	 10Toolforge (Toolforge iteration 05): [Toolforge CLI consolidation] Explore OpenAPI tooling - https://phabricator.wikimedia.org/T356261 (10Slst2020) a:03Slst2020
[10:57:19] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10taavi) @Slst2020 this does not seem resolved to me? I can still reproduce the issue and there are no patches attached to this task.
[11:14:34] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10CodeReviewBot) sstefanova updated https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge_requests/77  quota: show an error if project does not...
[11:18:37] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10Slst2020) >>! In T353701#9533076, @taavi wrote: > @Slst2020 this does not seem resolved to me? I can still reproduce the issue and there are no patches attac...
[11:19:50] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10taavi) Ok, then this task should be stalled and have the robot account permissions task added as a subtask, instead of being marked as Resolved.
[11:21:22] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10Slst2020) 05Resolved→03Stalled
[11:22:42] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10Slst2020)
[11:22:46] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: [harbor] upgrade to 2.10.x - https://phabricator.wikimedia.org/T354507 (10Slst2020)
[12:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[12:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[12:13:38] <wikibugs>	 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: Some VPS instances still using ns-recursor0 - https://phabricator.wikimedia.org/T346426 (10taavi) I think we can remove the redirects here. If someone has Puppet broken for months and did not react to the cloud-announce email when thi...
[12:15:23] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Use cloud-private and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145 (10taavi) p:05Triage→03Low
[12:15:33] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Move Cloud VPS internal flows from cloud-hosts to cloud-private - https://phabricator.wikimedia.org/T355416 (10taavi) p:05Triage→03Medium
[12:20:00] <wikibugs>	 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Epic, 10Goal, 10User-aborrero: openstack eqiad1: introduce cloud-private and cloudlb - https://phabricator.wikimedia.org/T341060 (10taavi)
[12:21:27] <wikibugs>	 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: Some VPS instances still using ns-recursor0 - https://phabricator.wikimedia.org/T346426 (10aborrero) >>! In T346426#9533557, @taavi wrote: > I think we can remove the redirects here. If someone has Puppet broken for months and did not...
[12:22:51] <wikibugs>	 10Toolforge, 10cloud-services-team, 10Documentation, 10Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919 (10Anomie) Provide something better that fits the requirements and I'll look at using it. Last I've heard...
[12:24:41] <wikibugs>	 10cloud-services-team: NovafullstackSustainedFailures  The automated tests were unable to create, provision and decommission a VM in the last 5h - https://phabricator.wikimedia.org/T357234 (10dcaro) 05Open→03Resolved a:03dcaro This is running again, might be related to the neutron outage, same as {T357244}
[12:25:25] <wikibugs>	 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227 (10taavi)
[12:34:42] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/map-of-monuments] - 10https://gerrit.wikimedia.org/r/1002458 (owner: 10L10n-bot)
[12:35:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-52
[12:36:32] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-52
[12:36:38] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-53
[12:37:14] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-53
[12:37:20] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[12:38:04] <wikibugs>	 10Toolforge, 10cloud-services-team, 10Documentation, 10Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919 (10dcaro) >>! In T321919#9533573, @Anomie wrote: > Provide something better that fits the requirements an...
[12:44:51] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster
[12:45:59] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-15
[12:46:14] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-15
[12:46:23] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[12:55:04] <wikibugs>	 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T356448 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/374
[12:55:23] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/paws/pull/374
[12:56:03] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-15.tools.eqiad1.wikimedia.cloud to the cluster
[12:56:03] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[12:57:46] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-54
[12:58:22] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-54
[12:58:37] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-55
[12:59:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-55
[12:59:25] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[13:04:03] <wikibugs>	 (03Abandoned) 10Kosta Harlan: Link to merging patches docs and add as first step [labs/tools/deploy-commands] - 10https://gerrit.wikimedia.org/r/720741 (owner: 10Kosta Harlan)
[13:04:08] <wikibugs>	 (03Abandoned) 10Kosta Harlan: Link to docs about verifying on mwdebug [labs/tools/deploy-commands] - 10https://gerrit.wikimedia.org/r/720742 (owner: 10Kosta Harlan)
[13:09:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-16.tools.eqiad1.wikimedia.cloud to the cluster
[13:09:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[13:09:23] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-56
[13:10:00] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-56
[13:10:08] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-57
[13:10:46] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-57
[13:12:03] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[13:12:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[13:16:28] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-k8s-worker-56 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:17:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 6 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[13:21:28] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-k8s-worker-56 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:22:26] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-17.tools.eqiad1.wikimedia.cloud to the cluster
[13:22:26] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[13:23:50] <wikibugs>	 10Toolforge Jobs framework, 10User-aborrero: Support tool-internal networking - https://phabricator.wikimedia.org/T348758 (10aborrero)
[13:27:24] <wikibugs>	 10Toolforge Jobs framework, 10cloud-services-team, 10User-Raymond_Ndibe: Toolforge jobs framework: introduce swagger to the  API - https://phabricator.wikimedia.org/T327279 (10aborrero)
[13:29:24] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523 (10aborrero)
[13:29:48] <wikibugs>	 10Toolforge Jobs framework, 10cloud-services-team, 10User-aborrero: Toolforge: consider introducing a command line for creating reverse proxies - https://phabricator.wikimedia.org/T337191 (10aborrero)
[13:29:50] <wikibugs>	 10Toolforge Jobs framework, 10cloud-services-team, 10User-aborrero: Toolforge: consider introducing a command line for creating reverse proxies - https://phabricator.wikimedia.org/T337191 (10aborrero)
[13:32:52] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-58
[13:33:39] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-58
[13:33:53] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-59
[13:34:13] <wikibugs>	 10Toolforge, 10cloud-services-team, 10User-aborrero: Toolforge: consider introducing a command line for creating reverse proxies - https://phabricator.wikimedia.org/T337191 (10aborrero)
[13:34:32] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-59
[13:34:51] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[13:40:28] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-k8s-worker-59 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:43:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-18.tools.eqiad1.wikimedia.cloud to the cluster
[13:43:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[13:45:28] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-k8s-worker-59 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[13:46:30] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-60
[13:47:06] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-60
[13:50:22] <jinxer-wm>	 (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[13:54:08] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge...
[13:55:22] <jinxer-wm>	 (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[14:03:32] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge...
[14:10:31] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repo...
[14:25:20] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-api/-/merge...
[14:26:10] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-61
[14:26:52] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-61
[14:35:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[14:39:24] <wikibugs>	 (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/map-of-monuments] - 10https://gerrit.wikimedia.org/r/1002458 (owner: 10L10n-bot)
[14:42:18] <wikibugs>	 (03CR) 10Jforrester: [C: 03+2] "Yeah, we should finish the stylelint 16 upgrade for stylelint-config-wikimedia. Thanks!" [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/999115 (owner: 10Majavah)
[14:42:55] <wikibugs>	 (03Merged) 10jenkins-bot: Bump stylelint to 15.10.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/999115 (owner: 10Majavah)
[14:47:16] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-19.tools.eqiad1.wikimedia.cloud to the cluster
[14:47:16] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[14:47:40] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-62
[14:48:19] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-62
[14:48:39] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[14:58:23] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-20.tools.eqiad1.wikimedia.cloud to the cluster
[14:58:24] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[15:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[15:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[15:04:30] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/build...
[15:29:40] <wikibugs>	 10cloud-services-team, 10Observability-Alerting, 10SRE Observability (FY2023/2024-Q3): Karma UI shows duplicate alerts - https://phabricator.wikimedia.org/T353457 (10joanna_borun)
[15:31:02] <wikibugs>	 10Cloud-VPS, 10cloud-services-team, 10Infrastructure-Foundations, 10netbox: Netbox device location information not available on the first Puppet run of a device - https://phabricator.wikimedia.org/T347375 (10joanna_borun) p:05Triage→03Medium
[15:36:28] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Infrastructure-Foundations: Remove wmcs-admin access from production cumin hosts - https://phabricator.wikimedia.org/T347979 (10MoritzMuehlenhoff) p:05Triage→03Low
[15:36:58] <wikibugs>	 10cloud-services-team, 10Bitu, 10Infrastructure-Foundations, 10LDAP, 10User-MoritzMuehlenhoff: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663 (10MoritzMuehlenhoff) p:05Triage→03Low
[15:42:15] <wikibugs>	 10PAWS: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T357027 (10rook) 05Resolved→03Open
[15:42:43] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudnet.reboot_node
[15:43:23] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudnet.reboot_node (exit_code=99)
[15:46:09] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[15:46:30] <wikibugs>	 10PAWS: Upgrade Jupyterlab - https://phabricator.wikimedia.org/T357027 (10rook) notebook looks like it is downgrading jupyterlab. Notebook is upgrading to 7.1, but is not quite there. Until then it requires jupyterlab of less than 4.1.0. We can wait a little while to see if it resolves.
[15:50:30] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0)
[15:51:44] <jinxer-wm>	 (InterfaceSpeedError) firing: brq05a5494a-18 on cloudvirt2001-dev:9100 has the wrong speed: 1.25e+06. - https://wikitech.wikimedia.org/wiki/Monitoring/check_eth - https://grafana.wikimedia.org/d/000000562 - https://alerts.wikimedia.org/?q=alertname%3DInterfaceSpeedError
[15:51:49] <wikibugs>	 10cloud-services-team: InterfaceSpeedError  brq05a5494a-18 on cloudvirt2001-dev:9100 has the wrong speed: 1.25e+06. - https://phabricator.wikimedia.org/T357319 (10phaultfinder)
[15:52:42] <wikibugs>	 10cloud-services-team: InterfaceSpeedError  brq05a5494a-18 on cloudvirt2001-dev:9100 has the wrong speed: 1.25e+06. - https://phabricator.wikimedia.org/T357319 (10taavi) a:03taavi Looking as I just rebooted this host.
[15:54:46] <wikibugs>	 10cloud-services-team, 10SRE: ceph: test and decide 1 network interface setup - https://phabricator.wikimedia.org/T325531 (10joanna_borun)
[15:56:41] <wikibugs>	 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T356448 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/374
[15:56:44] <jinxer-wm>	 (InterfaceSpeedError) resolved: brq05a5494a-18 on cloudvirt2001-dev:9100 has the wrong speed: 1.25e+06. - https://wikitech.wikimedia.org/wiki/Monitoring/check_eth - https://grafana.wikimedia.org/d/000000562 - https://alerts.wikimedia.org/?q=alertname%3DInterfaceSpeedError
[15:56:51] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/paws/pull/374
[15:57:25] <wikibugs>	 10PAWS: New upstream release for OpenRefine - https://phabricator.wikimedia.org/T356448 (10rook) 05Open→03Resolved a:03rook
[15:57:46] <wikibugs>	 10cloud-services-team: InterfaceSpeedError  brq05a5494a-18 on cloudvirt2001-dev:9100 has the wrong speed: 1.25e+06. - https://phabricator.wikimedia.org/T357319 (10taavi) 05Open→03Resolved It fixed itself.
[15:59:07] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[16:04:32] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0)
[16:10:32] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523 (10aborrero) Out of curiosity, I generated the server code using https://openapi-generator.tech/ , and I got this...
[16:14:49] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[16:20:53] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0)
[16:21:02] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[16:25:35] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge...
[16:26:57] <wikibugs>	 10Toolforge Build Service, 10Patch-For-Review: builds-cli utils/bump_version.sh fails with '--userns: invalid USER mode.' - https://phabricator.wikimedia.org/T354876 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/56  d/changelog: bump to 0....
[16:27:01] <wikibugs>	 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10CodeReviewBot) raymond-ndibe opened https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/56  d/changelog: bump to 0.0.13
[16:27:09] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge...
[16:27:11] <wikibugs>	 10Toolforge Build Service, 10Patch-For-Review: builds-cli utils/bump_version.sh fails with '--userns: invalid USER mode.' - https://phabricator.wikimedia.org/T354876 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/56  d/changelog: bump to 0....
[16:27:16] <wikibugs>	 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-cli/-/merge_requests/56  d/changelog: bump to 0.0.13
[16:29:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0)
[16:30:09] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[16:36:52] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99)
[16:39:22] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[16:39:45] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99)
[16:52:02] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[16:53:36] <wikibugs>	 10Tool-gitlab-account-approval: "LDAPInvalidFilterError: malformed filter" error checking user https://gitlab.wikimedia.org/haak - https://phabricator.wikimedia.org/T357328 (10bd808)
[17:03:57] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolf...
[17:04:01] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-...
[17:04:38] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0)
[17:06:08] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot
[17:07:51] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10Raymond_Ndibe)
[17:07:57] <wikibugs>	 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10Raymond_Ndibe)
[17:08:33] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [builds-api] refactor build start response type - https://phabricator.wikimedia.org/T356724 (10Raymond_Ndibe) 05In progress→03Resolved
[17:09:10] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10Raymond_Ndibe) 05In progress→03Resolved
[17:10:12] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=0)
[17:17:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 44 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[17:21:42] <wm-bot2>	 !log taavi@runko admin START - Cookbook wmcs.openstack.cloudnet.reboot_node
[17:21:43] <wikibugs>	 10Tool-gitlab-account-approval, 10Patch-For-Review, 10User-bd808: "LDAPInvalidFilterError: malformed filter" error checking user https://gitlab.wikimedia.org/haak - https://phabricator.wikimedia.org/T357328 (10CodeReviewBot) bd808 opened https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/...
[17:21:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:22:09] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate phetools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319965 (10Soda) I'm looking into migrating some of the usable aspects (statistics and match + split) of phetools into seperate standalone tools. This might take a while however,...
[17:22:57] <wikibugs>	 10Tool-gitlab-account-approval, 10Patch-For-Review, 10User-bd808: "LDAPInvalidFilterError: malformed filter" error checking user https://gitlab.wikimedia.org/haak - https://phabricator.wikimedia.org/T357328 (10CodeReviewBot) bd808 merged https://gitlab.wikimedia.org/toolforge-repos/gitlab-account-approval/-/...
[17:23:02] <wm-bot2>	 !log taavi@runko admin END (FAIL) - Cookbook wmcs.openstack.cloudnet.reboot_node (exit_code=99)
[17:23:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[17:25:08] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudnet.reboot_node
[17:28:00] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudnet.reboot_node (exit_code=0)
[17:30:08] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudnet.reboot_node
[17:33:24] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudnet.reboot_node (exit_code=0)
[17:37:03] <wikibugs>	 10Tool-gitlab-account-approval, 10User-bd808: "LDAPInvalidFilterError: malformed filter" error checking user https://gitlab.wikimedia.org/haak - https://phabricator.wikimedia.org/T357328 (10bd808) 05In progress→03Resolved
[17:40:00] <jinxer-wm>	 (NovafullstackSustainedFailures) firing: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[17:40:05] <wikibugs>	 10cloud-services-team: NovafullstackSustainedFailures  The automated tests were unable to create, provision and decommission a VM in the last 5h - https://phabricator.wikimedia.org/T357335 (10phaultfinder)
[17:58:46] <wikibugs>	 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Epic: Investigate: How to make the GUC query performant - https://phabricator.wikimedia.org/T355672 (10Tchanders) Thanks @MusikAnimal, this is really helpful.  Noting down some thoughts following a conversati...
[18:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[18:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[18:24:45] <wikibugs>	 10Toolforge: rm'ing a specific file on NFS hangs on (dev|login).toolforge.org - https://phabricator.wikimedia.org/T357340 (10Count_Count)
[18:29:01] <wmcs-alerts>	 (ToolsToolsDBReplicationLagIsTooHigh) resolved: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 3661 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh
[18:29:40] <wikibugs>	 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, 10Epic: Investigate: How to make the GUC query performant - https://phabricator.wikimedia.org/T355672 (10MusikAnimal) I didn't elaborate on IP ranges, but doing that is pretty fast as-is, by simply querying `ip...
[18:33:20] <wikibugs>	 10Tool-Pageviews: Massviews is creating URLs which cannot be used - https://phabricator.wikimedia.org/T357087 (10MusikAnimal) p:05Triage→03High
[18:33:49] <wikibugs>	 10Tool-Pageviews: Massviews is creating URLs which cannot be used - https://phabricator.wikimedia.org/T357087 (10MusikAnimal) >>! In T357087#9531455, @Vahurzpu wrote: > I'm having trouble setting up a dev environment on my local machine, but I'm fairly confident that the problem here is with https://github.com/M...
[18:41:46] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate women-in-red from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320183 (10Ragesoss) @dcaro I just disabled the cron.
[18:42:00] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888 (10MusikAnimal) >>! In T306888#9531153, @eranroz wrote: > Beside copyright bot /copypatrol /plagia bot - all jobs of the bot were moved to new toolforge-jobs . > I think we can...
[18:46:17] <wikibugs>	 10Toolforge, 10cloud-services-team: [tools.meta] can't delete file inside cache/wikimedia-wikis.dat - https://phabricator.wikimedia.org/T357098 (10bd808)
[18:53:52] <wikibugs>	 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): [toolsdb] set gtid_domain_id to 0 - https://phabricator.wikimedia.org/T357341 (10fnegri)
[18:54:00] <wikibugs>	 10Toolforge: rm'ing a specific file on NFS hangs on (dev|login).toolforge.org - https://phabricator.wikimedia.org/T357340 (10Count_Count)
[18:54:43] <wikibugs>	 10Data-Services, 10cloud-services-team: ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947 (10fnegri)
[18:54:46] <wikibugs>	 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): [toolsdb] set gtid_domain_id to 0 - https://phabricator.wikimedia.org/T357341 (10fnegri)
[18:55:38] <wikibugs>	 10Toolforge: rm'ing a specific file on NFS hangs on (dev|login).toolforge.org - https://phabricator.wikimedia.org/T357340 (10bd808) ` $ ssh root@tools-nfs-2.tools.eqiad1.wikimedia.cloud $ cd /srv/tools/project/xlinks $ file xlinks xlinks: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linke...
[18:59:45] <wikibugs>	 10Toolforge: rm'ing a specific file on NFS hangs on (dev|login).toolforge.org - https://phabricator.wikimedia.org/T357340 (10bd808) Things seem to hang in the same way as {T357098}: `lang=shell-session root@tools-nfs-2:/srv/tools/project/xlinks# rm xlinks & [1] 3894371 root@tools-nfs-2:/srv/tools/project/xlinks#...
[19:01:40] <wikibugs>	 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal: [toolsdb] test creating a new replica host - https://phabricator.wikimedia.org/T344717 (10fnegri) While taking a Cinder snapshot as MariaDB is running //seems// to work (MariaDB will fix corrupted tables when restoring the snapshot), the [of...
[19:09:58] <wikibugs>	 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4): [toolsdb] set gtid_domain_id to 0 - https://phabricator.wikimedia.org/T357341 (10fnegri) p:05Triage→03Medium
[19:25:01] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-02-12 - https://phabricator.wikimedia.org/T357264 (10fnegri) 05In progress→03Resolved Replication lag is back to zero: {F41...
[19:28:20] <wikibugs>	 10Toolforge: Cannot delete directory from incolabot project on Toolforge - https://phabricator.wikimedia.org/T357342 (10Incola)
[19:50:18] <wikibugs>	 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Prototype and user testing plan - https://phabricator.wikimedia.org/T356099 (10KColeman-WMF)
[20:22:42] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 33 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:27:42] <jinxer-wm>	 (CloudVPSDesignateLeaks) resolved: (2) Detected 33 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:01:49] <wmcs-alerts>	 (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed
[21:01:49] <wmcs-alerts>	 (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed
[21:28:56] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:38:56] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:40:01] <jinxer-wm>	 (NovafullstackSustainedFailures) firing: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[21:43:56] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[21:48:56] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:03:56] <jinxer-wm>	 (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown  - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:08:56] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:11:56] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:21:56] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:26:56] <jinxer-wm>	 (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown
[22:31:56] <jinxer-wm>	 (SystemdUnitDown) resolved: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown