[00:46:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [01:48:26] 10Grid-Engine-to-K8s-Migration: Migrate wmds-archive from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320179 (10Tgr) There is apparently no `lighttpd-plain` type in Kubernetes. I guess I just pick a programming language at random? [01:51:19] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [01:52:25] 10Grid-Engine-to-K8s-Migration: Migrate wiki-irc from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320148 (10Tgr) https://grid-deprecation.toolforge.org/t/wiki-irc doesn't show the tool as using GridEngine (which it shouldn't; I was planning to use it for T253491 but never ac... [01:56:19] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [02:00:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [02:00:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [02:32:27] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:00:51] 10Grid-Engine-to-K8s-Migration: Migrate derivative from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319671 (10Tgr) Restarted the tool in k8s. Will check in a few days if it's still uploading files ([[https://quarry.wmcloud.org/query/78603|query]]). [03:29:42] (03PS1) 10Andrew Bogott: WMF Hacks create-instance workflow: add a missing comma [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/981705 (https://phabricator.wikimedia.org/T326818) [03:30:05] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] WMF Hacks create-instance workflow: add a missing comma [openstack/horizon/horizon] (2023.1) - 10https://gerrit.wikimedia.org/r/981705 (https://phabricator.wikimedia.org/T326818) (owner: 10Andrew Bogott) [03:46:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [05:00:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [05:00:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [06:32:42] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [06:46:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [08:00:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [08:00:37] (CephSlowOps) firing: Ceph cluster in eqiad has 107 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [08:00:48] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570 (10phaultfinder) [08:04:37] (CephClusterInWarning) firing: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [08:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:09:37] (CephClusterInWarning) resolved: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [08:10:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 35 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [09:01:52] 10Grid-Engine-to-K8s-Migration: Migrate isa from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319818 (10Sebastian_Berlin-WMSE) What is running on GridEngine? https://grid-deprecation.toolforge.org/t/isa shows nothing. `qstat` prints nothing. The webservice uses Kubernetes. [09:04:38] (03CR) 10JMeybohm: [V: 03+2 C: 03+2] kubernetes: Remove cergen certs from kubernetes secrets [labs/private] - 10https://gerrit.wikimedia.org/r/980891 (https://phabricator.wikimedia.org/T300033) (owner: 10JMeybohm) [09:46:03] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [10:00:04] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Create user flows for different GUC scenarios - https://phabricator.wikimedia.org/T349902 (10KColeman-WMF) [10:01:31] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Comparative review - https://phabricator.wikimedia.org/T349907 (10KColeman-WMF) [10:08:24] 10Tool-Global-user-contributions, 10Stewards-and-global-tools, 10Temporary accounts, 10XTools, and 2 others: [Design] Comparative review - https://phabricator.wikimedia.org/T349907 (10KColeman-WMF) [10:32:42] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [10:34:18] 10Toolforge (Toolforge iteration 02): [envvars-cli] move pytest from tox to pre-commit - https://phabricator.wikimedia.org/T351476 (10dcaro) I think we might want to keep pytest outside of pre-commit, we kept it out for the reasons mentioned here: https://github.com/pre-commit/pre-commit-hooks/issues/291 There'... [10:34:29] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10dcaro) 05Openβ†’03In progress [10:34:34] 10Toolforge (Toolforge iteration 02): [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092 (10dcaro) [11:00:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [11:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [11:25:37] (CephSlowOps) firing: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [11:25:43] 10cloud-services-team: CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T352570 (10phaultfinder) [11:30:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [11:46:12] (03PS1) 10Ladsgroup: Add Github:wmde/new-lexeme-special-page [labs/codesearch] - 10https://gerrit.wikimedia.org/r/982049 (https://phabricator.wikimedia.org/T351938) [11:53:13] 10Toolforge (Quota-requests): Request increased quota for cewbot, toc, signature-checker, mgp-cewbot Toolforge tool - https://phabricator.wikimedia.org/T353104 (10Kanashimi) [12:47:17] (03CR) 10Hnowlan: [C: 03+1] restbase: add missing keys & certs, remove obsolete [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) (owner: 10Eevans) [12:51:04] (InstanceDown) firing: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:57:33] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:02:33] (SystemdUnitDown) firing: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:07:33] (SystemdUnitDown) resolved: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:18:13] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolf... [13:22:37] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-admission (T352774) [13:22:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:22:42] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [13:23:08] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-admission (T352774) [13:23:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:23:53] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console [13:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:25:44] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) [13:25:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:25:49] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console [13:25:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:26:21] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) [13:26:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:28:27] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-admission (T352774) [13:28:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:28:31] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [13:28:59] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-admission (T352774) [13:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:31:04] (InstanceDown) resolved: Project toolsbeta instance toolsbeta-bastion-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [13:31:37] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolf... [13:35:22] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolf... [13:35:31] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [13:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:35:35] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [13:36:02] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api (T352774) [13:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [13:41:40] (03CR) 10Michael Große: [C: 03+1] "πŸ™" [labs/codesearch] - 10https://gerrit.wikimedia.org/r/982049 (https://phabricator.wikimedia.org/T351938) (owner: 10Ladsgroup) [13:42:56] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [13:43:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:43:01] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [13:43:29] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api (T352774) [13:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:53:03] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/tool... [13:59:33] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:00:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [14:03:38] (03CR) 10D3r1ck01: [C: 03+2] Add Github:wmde/new-lexeme-special-page [labs/codesearch] - 10https://gerrit.wikimedia.org/r/982049 (https://phabricator.wikimedia.org/T351938) (owner: 10Ladsgroup) [14:04:33] (SystemdUnitDown) resolved: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [14:04:35] (03Merged) 10jenkins-bot: Add Github:wmde/new-lexeme-special-page [labs/codesearch] - 10https://gerrit.wikimedia.org/r/982049 (https://phabricator.wikimedia.org/T351938) (owner: 10Ladsgroup) [14:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [14:08:22] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10dcaro) [14:08:37] (CephSlowOps) firing: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [14:08:49] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10dcaro) [14:13:37] (CephSlowOps) resolved: Ceph cluster in eqiad has 7 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps [14:32:43] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [14:42:49] 10Cloud Services Proposals, 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs - https://phabricator.wikimedia.org/T194332 (10dcaro) [14:42:55] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [toolforge-envvars.api,toolforge-build.api] Support flagging environment variables to be injected at build time - https://phabricator.wikimedia.org/T338142 (10dcaro) a:03dca... [14:43:07] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [toolforge-envvars.api,toolforge-build.api] Support flagging environment variables to be injected at build time - https://phabricator.wikimedia.org/T338142 (10dcaro) 05Open... [14:45:49] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [toolforge-envvars.api,toolforge-build.api] Support using custom environment variables at build time - https://phabricator.wikimedia.org/T338142 (10dcaro) [14:53:12] 10Tools, 10WMDE-TechWish-Maintenance, 10WMDE-TechWish-Sprint-2023-11-22, 10WMDE-TechWish-Sprint-2023-12-06: Check technischewuensche tool code and publish in a public repo - https://phabricator.wikimedia.org/T350352 (10WMDE-Fisch) [15:02:34] (SystemdUnitDown) firing: The service unit wikitech_run_jobs.service is in failed status on host cloudweb1004. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudweb1004 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:07:34] (SystemdUnitDown) resolved: (2) The service unit wikitech_run_jobs.service is in failed status on host cloudweb1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:09:42] (03CR) 10MVernon: [C: 04-1] "One thing that looks a bit strange to me here, but perhaps I misunderstand..." [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) (owner: 10Eevans) [15:10:23] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10jsn.sherman) 05Openβ†’03In progress a:03jsn.sherman >>! In T352902#9393485, @jhathaway wrote: > @herron & @jsn.... [15:22:56] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [15:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:23:01] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [15:23:12] !log toolsbeta dcaro@urcuchillay END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api (T352774) [15:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:24:32] !log toolsbeta dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [15:24:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:25:01] !log toolsbeta dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api (T352774) [15:25:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [15:36:09] !log tools dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api (T352774) [15:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:36:14] T352774: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 [15:36:41] !log tools dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api (T352774) [15:36:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:37:33] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Goal: Support 'unmanaged' projects in cloud-vps - https://phabricator.wikimedia.org/T326818 (10Andrew) a:03Andrew [15:50:11] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 02), 10cloud-services-team, 10Cloud-Services-Origin-Team, and 3 others: [toolforge-envvars.api,toolforge-build.api] Support using custom environment variables at build time - https://phabricator.wikimedia.org/T338142 (10CodeReviewBot) dcaro opene... [15:53:55] 10Cloud-VPS, 10SRE, 10observability, 10Patch-For-Review, and 2 others: ossl rsyslog errors post-migration - https://phabricator.wikimedia.org/T351710 (10fgiunchedi) [16:06:19] (03CR) 10Eevans: restbase: add missing keys & certs, remove obsolete (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) (owner: 10Eevans) [16:06:25] (03PS4) 10Eevans: restbase: add missing keys & certs, remove obsolete [labs/private] - 10https://gerrit.wikimedia.org/r/981601 (https://phabricator.wikimedia.org/T352468) [16:24:37] 10cloud-services-team: [wmf-sre-laptop] fetch public keys for Cloud bastions - https://phabricator.wikimedia.org/T329322 (10Volans) @fnegri This will require WMCS to publish the fingerprint in some official place (not wikitech) managed by WMCS similar to what we do in [[ https://config-master.wikimedia.org/known... [16:26:28] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-dcaro: [builds-builder] Investigate how to enable mono/dotnet/c# and implement the best one to unblock us to migrate tools - https://phabricator.wikimedia.org/T352774 (10dcaro) To be able to specify the project file to build though we have to p... [17:00:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [17:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [17:05:29] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10Jclark-ctr) @Andrew Dell is requesting smartctl output showing what drives errors are coming from if you can se... [17:05:52] 10Toolforge, 10SecTeam-Processed, 10Security, 10User-bd808, 10Vuln-Misconfiguration: Should Toolforge allow internationalized domain names / Punycode in tool names, or have some protection against homograph attacks? - https://phabricator.wikimedia.org/T353100 (10sbassett) [17:22:28] (OpenstackAPIResponse) resolved: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [17:28:46] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [toolsdb] MariaDB process is killed by OOM killer (December 2023) - https://phabricator.wikimedia.org/T353093 (10fnegri) After discussing with @dcaro I started a tmux session in tool... [17:34:57] 10Striker, 10GitLab: Maintainer 'Ross Mallett' unable to create new GitLab repositories connected to the 'milhistbot' tool using Striker - https://phabricator.wikimedia.org/T353176 (10bd808) [17:36:35] 10Striker, 10GitLab (Integrations), 10User-bd808: Maintainer 'Ross Mallett' unable to create new GitLab repositories connected to the 'milhistbot' tool using Striker - https://phabricator.wikimedia.org/T353176 (10bd808) 05Openβ†’03In progress p:05Triageβ†’03Medium a:03bd808 [17:52:08] 10Cloud-VPS, 10cloud-services-team: Check Cloud VPS running kernels for ext4 data corruption bug - https://phabricator.wikimedia.org/T353178 (10taavi) [18:09:54] 10Striker, 10GitLab (Integrations), 10User-bd808: Maintainer 'Ross Mallett' unable to create new GitLab repositories connected to the 'milhistbot' tool using Striker - https://phabricator.wikimedia.org/T353176 (10bd808) This is basically the lookup algorithm that is failing when Striker tries to create a new... [18:12:34] 10Striker, 10GitLab (Integrations), 10User-bd808: Maintainer 'Ross Mallett' unable to create new GitLab repositories connected to the 'milhistbot' tool using Striker - https://phabricator.wikimedia.org/T353176 (10bd808) @Hawkeye7 You should be able to fix this problem specifically for your https://gitlab.wik... [18:24:11] 10PAWS: Upgrade paws to k8s 1.24 - https://phabricator.wikimedia.org/T353183 (10rook) [18:26:05] 10PAWS: Test PAWS on k8s 1.25 - https://phabricator.wikimedia.org/T326985 (10rook) This does not work on Antelope. Though the git log and https://docs.openstack.org/magnum/latest/user/index.html#supported-versions suggests it will work on bobcat. [18:34:59] 10Striker, 10GitLab (Integrations), 10User-bd808: Maintainer 'Ross Mallett' unable to create new GitLab repositories connected to the 'milhistbot' tool using Striker - https://phabricator.wikimedia.org/T353176 (10bd808) Past-bd808 knew this was a potential problem, but present bd808 forgot: >>! In T343485#9... [18:37:26] (03PS1) 10AntiCompositeNumber: Ignore canary events in SULWatcher [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/982147 [18:37:58] 10Cloud-VPS, 10cloud-services-team: Check Cloud VPS running kernels for ext4 data corruption bug - https://phabricator.wikimedia.org/T353178 (10Andrew) There's only one VM running the linux-image-6.1.0-14 kernel, mint.language.eqiad1.wikimedia.cloud. I'll open a subtask for that. I've disabled the affected b... [18:43:05] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies as not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) [18:43:12] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) [18:44:52] 10Cloud-VPS, 10cloud-services-team: Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T353185 (10Andrew) [18:51:32] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10jsn.sherman) It's looking pretty good to me on staging: ` Original Message Message ID <170232041148.496.15449658617... [18:52:07] 10Cloud-VPS, 10cloud-services-team: Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T353185 (10Andrew) [19:02:15] 10Cloud-VPS, 10cloud-services-team: Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud - https://phabricator.wikimedia.org/T353185 (10Andrew) Actually, on second thought... your VM may just update itself via unattended upgrades. So the first thing to try is just reboot it, run 'uname -r'... [19:07:12] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10jsn.sherman) PR here: https://github.com/WikipediaLibrary/TWLight/pull/1237 [19:13:51] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-dep... [19:14:46] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10taavi) bounces@wmflabs.org does not seem to exist in the aliases file or route anywhere else. `lang=shell-session t... [19:18:43] 10Toolforge Jobs framework, 10Patch-For-Review: toolforge-jobs --wait will only wait 5 minutes - https://phabricator.wikimedia.org/T352945 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/5 Allow specifying a timeout for --wait [19:19:20] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) raymond-ndibe closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/148 envvars-ap... [19:22:28] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-dep... [19:27:19] 10Toolforge Jobs framework, 10Patch-For-Review: toolforge jobs restart sometimes times out - https://phabricator.wikimedia.org/T352874 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/49 job: Specify a lower termination grace period [19:30:11] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) raymond-ndibe closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/149 envvars-ap... [19:35:52] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api [19:36:06] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api [19:42:01] 10Cloud-VPS, 10Moderator-Tools-Team (Kanban): enable lists.wikimedia.org or wikimedia.org email addresses to receive dmarc reports for *.wmflabs.org - https://phabricator.wikimedia.org/T352902 (10jsn.sherman) >>! In T352902#9397617, @taavi wrote: > bounces@wmflabs.org does not seem to exist in the aliases file... [19:45:50] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-dep... [19:53:38] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) raymond-ndibe closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/151 envvars-ap... [19:56:00] 10Toolforge (Toolforge iteration 02), 10Patch-For-Review, 10User-Raymond_Ndibe: [apis] nginx fails to reload on config change - https://phabricator.wikimedia.org/T350928 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-dep... [20:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [20:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [20:31:17] 10cloud-services-team: [wmf-sre-laptop] fetch public keys for Cloud bastions - https://phabricator.wikimedia.org/T329322 (10bd808) >>! In T329322#9396868, @Volans wrote: > in some official place (not wikitech) @volans, could you elaborate on the specific security or technical limitation that requires this restri... [20:40:02] 10PAWS: Upgrade paws to k8s 1.24 - https://phabricator.wikimedia.org/T353183 (10rook) Perhaps not. 1.24 is not deploying. https://docs.openstack.org/magnum/latest/user/index.html#supported-versions suggests we will be able to upgrade on bobcat. [20:40:18] 10PAWS: Upgrade paws to k8s 1.24 - https://phabricator.wikimedia.org/T353183 (10rook) 05Openβ†’03Resolved [21:05:42] 10Grid-Engine-to-K8s-Migration: Migrate afdstats from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319486 (10Ahecht) I have submitted a [[ https://github.com/enterprisey/afdstats/pull/27 | pull request ]] to bring to code up to Python 3.9 which should allow it to run on kuber... [21:37:17] 10Grid-Engine-to-K8s-Migration: Migrate dykmoverbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319711 (10Wugapodes) 05Openβ†’03Resolved a:03Wugapodes This should be running on Kubernetes now, let me know if there are still issues [21:44:28] 10Grid-Engine-to-K8s-Migration: Migrate ganreportbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319763 (10Wugapodes) 05Openβ†’03Resolved a:03Wugapodes This should be running on Kubernetes now, let me know if there are still issues [21:46:13] 10Grid-Engine-to-K8s-Migration: Migrate wugbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320190 (10Wugapodes) 05Openβ†’03Resolved a:03Wugapodes The only task run by this tool is defunct, so I've removed the crontab entry that generated the jsub calls. [21:49:42] 10Grid-Engine-to-K8s-Migration: Migrate deltaquad-bots from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319668 (10AmandaNP) @nskaggs @komla I remember an early on notification about this, but I was not tagged/subscribed here. Only found out this past weekend about this by a... [21:49:47] 10Grid-Engine-to-K8s-Migration: Migrate deltaquad-bots from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319668 (10AmandaNP) a:03AmandaNP [22:01:37] 10Grid-Engine-to-K8s-Migration: Migrate yfdyh-bot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320197 (10JJMC89) 05Stalledβ†’03Open [22:02:02] 10Grid-Engine-to-K8s-Migration: Migrate multichill from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319912 (10JJMC89) 05Stalledβ†’03Open [22:06:33] 10cloud-services-team: [wmf-sre-laptop] fetch public keys for Cloud bastions - https://phabricator.wikimedia.org/T329322 (10Volans) @bd808 ideally a place that: * is automatically updated directly from the source of truth of the bastions (their Puppetmaster?) * is easily downloadable already in the known hosts... [23:05:04] (TfInfraTestDestroyFailed) firing: Terraform failed to destroy the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestDestroyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestDestroyFailed [23:05:04] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [23:31:03] 10Striker, 10GitLab (Integrations), 10User-bd808: GitLab users with only provider=cas3 identies are not found when Striker attempts to create GitLab repostories - https://phabricator.wikimedia.org/T353176 (10bd808) `lang=shell-session $ export GITLAB_HOST=gitlab.wikimedia.org $ glab api users --paginate > pa...