[00:31:56] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [00:41:56] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [01:11:55] FIRING: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [01:41:55] RESOLVED: ToolforgeKubernetesCapacity: Kubernetes cluster k8s.tools.eqiad1.wikimedia.cloud:6443 in risk of running out of cpu - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesCapacity - https://grafana.wmcloud.org/d/8GiwHDL4k/kubernetes-cluster-overview?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesCapacity [04:16:56] (03update) 10kevinpayravi: Adding support for requesting files from latest release [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/13 [04:18:21] (03update) 10kevinpayravi: Adding support for requesting files from latest release [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/13 [04:18:53] (03update) 10kevinpayravi: Adding support for requesting files from latest release [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/13 [06:17:10] (03update) 10raymond-ndibe: [deploy_task, tool_handlers] queue deployments to allow creation of multiple deployments at once [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/131 (https://phabricator.wikimedia.org/T402568) [06:17:15] (03open) 10raymond-ndibe: [deploy_task, tool_handlers] queue deployments to allow creation of multiple deployments at once [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/131 (https://phabricator.wikimedia.org/T402568) [07:16:43] 10Cloud Services Proposals, 06cloud-services-team, 10Toolforge: DRAFT Decision request - Improving lima-kilo developer experience - https://phabricator.wikimedia.org/T403051#11175200 (10fgiunchedi) +1 to build process in CI as a Pro, it will be pretty handy to be able to get gitlab-ci to run lima-kilo as a job [07:18:37] 06cloud-services-team, 10Tool-openstack-browser: openstack-browser: Display Octavia load balancers - https://phabricator.wikimedia.org/T404419 (10taavi) 03NEW p:05Triage→03Medium [07:23:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:28:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [07:32:22] (03CR) 10Brouberol: "check experimental" [labs/private] - 10https://gerrit.wikimedia.org/r/1187463 (owner: 10Brouberol) [08:02:41] 06cloud-services-team, 10Cloud-VPS: tofu-infra: opentofu-created flavors may be disabled by default - https://phabricator.wikimedia.org/T391252#11175288 (10taavi) [08:27:37] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [08:29:54] (03merge) 10taavi: tools: dns: Migrate dev.toolforge.org to new Trixie bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/77 (https://phabricator.wikimedia.org/T392510) [08:29:56] (03update) 10taavi: tools: dns: Migration login.toolforge.org to new Trixie bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/78 (https://phabricator.wikimedia.org/T392510) [08:42:35] (03update) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [08:46:17] (03merge) 10taavi: tools: dns: Migration login.toolforge.org to new Trixie bastion [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/78 (https://phabricator.wikimedia.org/T392510) [08:46:18] (03update) 10taavi: tools: Drop floating IPs for Bookworm bastions [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/79 (https://phabricator.wikimedia.org/T392510) [08:49:35] (03merge) 10samwilson: Add GitLab CI to test extension installation [toolforge-repos/wikispore-config] - 10https://gitlab.wikimedia.org/toolforge-repos/wikispore-config/-/merge_requests/2 [10:04:44] (03CR) 10Ladsgroup: [C:03+1] Add a dummy secret file containing the wikiadmin password [labs/private] - 10https://gerrit.wikimedia.org/r/1187463 (owner: 10Brouberol) [10:17:11] (03CR) 10Brouberol: [C:03+2] Add a dummy secret file containing the wikiadmin password [labs/private] - 10https://gerrit.wikimedia.org/r/1187463 (owner: 10Brouberol) [10:17:14] (03CR) 10Brouberol: [V:03+2 C:03+2] Add a dummy secret file containing the wikiadmin password [labs/private] - 10https://gerrit.wikimedia.org/r/1187463 (owner: 10Brouberol) [10:35:03] (03PS1) 10Majavah: inventory: Remove Bookworm based bastions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1187762 (https://phabricator.wikimedia.org/T392510) [10:40:11] (03update) 10raymond-ndibe: [deploy_task, tool_handlers] queue deployments to allow creation of multiple deployments at once [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/131 (https://phabricator.wikimedia.org/T402568) [11:39:27] (03update) 10raymond-ndibe: [deploy_task, tool_handlers] queue deployments to allow creation of multiple deployments at once [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/131 (https://phabricator.wikimedia.org/T402568) [11:40:12] (03PS1) 10NkwadaNora: properly align the shared production text, that was out of position [labs/tools/WdTmCollab] - 10https://gerrit.wikimedia.org/r/1187775 [12:35:41] (03open) 10taavi: build: Do not install full Keystone server [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/8 [12:36:09] (03update) 10taavi: FIx build issues [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/8 [12:45:17] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review, 07Security: Move cloud-wide root keys to the main puppet repo - https://phabricator.wikimedia.org/T317362#11175947 (10fnegri) Thanks @fgiunchedi for the patch! A few docs will need to be updated after it's merged (there might be more, these are the o... [12:46:47] 10Cloud-VPS (Quota-requests): Increase iops for recommendation-api project - https://phabricator.wikimedia.org/T404254#11175952 (10fnegri) [12:49:04] 14Grid-Engine-to-K8s-Migration, 10Tool-wikitanvirbot: Migrate wikitanvirbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320167#11175957 (10Wikitanvir) 05Open→03Resolved a:03Wikitanvir .kube/config was manually restored by @dcaro per IRC request. [12:58:40] 06cloud-services-team, 10Cloud-VPS: Newly-added member of wikitextexp is not in project-bastion LDAP group, but is in bastion project - https://phabricator.wikimedia.org/T404382#11176011 (10fnegri) a:03fnegri > It also doesn't seem to be the same as T379550 I think it is one of the possible scenarios of {T3... [12:58:51] 06cloud-services-team, 10Cloud-VPS: Newly-added member of wikitextexp is not in project-bastion LDAP group, but is in bastion project - https://phabricator.wikimedia.org/T404382#11176013 (10fnegri) [12:58:53] 06cloud-services-team, 10Cloud-VPS: openstack: keystone may be failing to add users to the bastion project - https://phabricator.wikimedia.org/T379550#11176014 (10fnegri) [13:03:47] (03PS1) 10Majavah: profile: Drop support for ssh-dss keys [labs/striker] - 10https://gerrit.wikimedia.org/r/1187798 [13:03:55] 06cloud-services-team, 10Cloud-VPS: Newly-added member of wikitextexp is not in project-bastion LDAP group, but is in bastion project - https://phabricator.wikimedia.org/T404382#11176066 (10fnegri) Hmm the user seems to be already in the `project-bastion` LDAP group: https://ldap.toolforge.org/user/osleger Ma... [13:04:12] 06cloud-services-team, 10Cloud-VPS: Newly-added member of wikitextexp is not in project-bastion LDAP group, but is in bastion project - https://phabricator.wikimedia.org/T404382#11176067 (10fnegri) 05Open→03In progress p:05Triage→03High [13:10:42] 06cloud-services-team, 10Cloud-VPS: Newly-added member of wikitextexp is not in project-bastion LDAP group, but is in bastion project - https://phabricator.wikimedia.org/T404382#11176094 (10Urbanecm_WMF) >>! In T404382#11176066, @fnegri wrote: > Maybe it just took a long time? I remember the current process be... [13:13:50] 06cloud-services-team, 10Cloud-VPS: Newly-added member of wikitextexp is not in project-bastion LDAP group, but is in bastion project - https://phabricator.wikimedia.org/T404382#11176115 (10fnegri) 05In progress→03Resolved > something run again and caused a re-check? Yes that's very likely. I will ma... [13:14:16] 06cloud-services-team, 10Cloud-VPS: Write a script to fully re-export ENC data to Git - https://phabricator.wikimedia.org/T404425#11176119 (10fnegri) p:05Triage→03Medium [13:14:23] 10cloud-services-team (FY2025/26-Q1): WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11176120 (10PatEhlert) Thank you for your quick responses. The user agent for all requests is axios/0.21.1. Unfortunately I can't provide header information. We... [13:14:45] 06cloud-services-team, 10Cloud-VPS: wmf-auto-restart can get wedged on nfs4 mounts even when the filesystem is excluded - https://phabricator.wikimedia.org/T404322#11176121 (10fnegri) p:05Triage→03Medium [13:31:47] 06cloud-services-team: JobUnavailable Reduced availability for job openstack in cloud@codfw - https://phabricator.wikimedia.org/T404109#11176172 (10fnegri) 05Open→03Resolved a:03fnegri This alert fired a couple times on Sep 9 and then resolved, I'm not sure what was the cause but it doesn't seem concer... [13:34:02] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: Support proxy backends using IPv6 - https://phabricator.wikimedia.org/T404302#11176181 (10fnegri) p:05Triage→03Low [13:37:48] 06cloud-services-team, 10Toolforge: Mount /etc/openstack/clouds.yaml in mount-enabled containers - https://phabricator.wikimedia.org/T404438 (10taavi) 03NEW [13:38:05] 06cloud-services-team, 10Tool-openstack-browser: openstack-browser: Display Octavia load balancers - https://phabricator.wikimedia.org/T404419#11176206 (10taavi) [13:38:06] 06cloud-services-team, 10Toolforge: Mount /etc/openstack/clouds.yaml in mount-enabled containers - https://phabricator.wikimedia.org/T404438#11176207 (10taavi) [13:38:16] 10Cloud-VPS (Quota-requests): Increase iops for recommendation-api project - https://phabricator.wikimedia.org/T404254#11176215 (10Andrew) It sounds like you don't have a specific performance target but you'd just like things to be faster, is that right? Or will 500mb/s actually get you to a point where IO is n... [13:40:23] (03open) 10taavi: values: Mount /etc/openstack/clouds.yaml [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/36 (https://phabricator.wikimedia.org/T404438) [13:40:26] (03update) 10taavi: values: Mount /etc/openstack/clouds.yaml [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/36 (https://phabricator.wikimedia.org/T404438) [13:42:00] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Mount /etc/openstack/clouds.yaml in mount-enabled containers - https://phabricator.wikimedia.org/T404438#11176229 (10taavi) p:05Triage→03Medium [13:42:05] (03update) 10taavi: values: Mount /etc/openstack/clouds.yaml [repos/cloud/toolforge/volume-admission] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/volume-admission/-/merge_requests/36 (https://phabricator.wikimedia.org/T404438) [13:57:46] (03approved) 10andrew: FIx build issues [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/8 (owner: 10taavi) [13:58:52] (03merge) 10taavi: FIx build issues [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/8 [14:02:11] (03open) 10taavi: Always check for IPv6 addresses [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/9 [14:02:15] (03update) 10taavi: Always check for IPv6 addresses [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/9 [14:02:50] (03update) 10taavi: Always check for IPv6 addresses [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/9 [14:05:25] (03update) 10taavi: Always check for IPv6 addresses [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/9 [14:12:58] (03approved) 10andrew: Always check for IPv6 addresses [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/9 (owner: 10taavi) [14:14:13] (03merge) 10taavi: Always check for IPv6 addresses [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/9 [14:27:59] 10Cloud-VPS (Quota-requests): Increase iops for recommendation-api project - https://phabricator.wikimedia.org/T404254#11176380 (10fnegri) We're discussing this in the WMCS team. The current (default) limits are: `iops_sec='5000', total_bytes_sec='200000000', write_iops_sec='500'`. I checked in [Grafana](https:... [14:51:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:56:56] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:58:36] 10cloud-services-team (FY2025/26-Q1): WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11176527 (10bd808) >>! In T404347#11173361, @SomeRandomDeveloper wrote: > Specifically https://github.com/multichill/toollabs/blob/37943cad62cefd7dc489f6b56c70c96... [15:03:04] 10cloud-services-team (FY2025/26-Q1): WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11176565 (10fnegri) Thanks, that's helpful! I tried searching in our NAT logs and I found a specific internal IP doing lots of connections to the `api.europeana.... [15:03:26] 10cloud-services-team (FY2025/26-Q1), 10Wikidocumentaries: WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11176566 (10fnegri) [15:13:54] 10cloud-services-team (FY2025/26-Q1), 10Wikidocumentaries: WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11176607 (10bd808) >>! In T404347#11176564, @fnegri wrote: > This IP points to `hupu2.wikidocumentaries.eqiad1.wikimedia.cloud`, from the #... [15:26:58] 10cloud-services-team (FY2025/26-Q1), 10Wikidocumentaries: wikidocumentaries on WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11176624 (10Aklapper) [15:50:02] (03open) 10taavi: Remove non-useful defaults [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/10 [15:50:07] (03update) 10taavi: Remove non-useful defaults [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/10 [15:51:16] (03update) 10taavi: Remove non-useful defaults [repos/cloud/cloud-vps/nova_fullstack_test] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/nova_fullstack_test/-/merge_requests/10 [16:21:52] (03CR) 10BryanDavis: [C:03+1] "I'm not sure these changes will prevent someone from uploading a DSA public key if they actually still have one, but it does remove the th" [labs/striker] - 10https://gerrit.wikimedia.org/r/1187798 (owner: 10Majavah) [16:23:58] (03CR) 10Majavah: [C:03+2] profile: Drop support for ssh-dss keys [labs/striker] - 10https://gerrit.wikimedia.org/r/1187798 (owner: 10Majavah) [16:25:24] (03Merged) 10jenkins-bot: profile: Drop support for ssh-dss keys [labs/striker] - 10https://gerrit.wikimedia.org/r/1187798 (owner: 10Majavah) [16:52:48] (03PS1) 10Majavah: alertmanager: Ignore resolved alerts for Phab integration [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1187863 [17:02:55] (03CR) 10BryanDavis: [C:03+1] "The only thing more awesome would be if it could find and resolve the original ticket, but that seems pretty specific to our implementatio" [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1187863 (owner: 10Majavah) [17:03:12] (03CR) 10Majavah: [C:03+2] alertmanager: Ignore resolved alerts for Phab integration [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1187863 (owner: 10Majavah) [17:03:53] (03Merged) 10jenkins-bot: alertmanager: Ignore resolved alerts for Phab integration [cloud/metricsinfra/prometheus-configurator] - 10https://gerrit.wikimedia.org/r/1187863 (owner: 10Majavah) [17:21:56] FIRING: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:26:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:42:06] 06cloud-services-team, 10Toolforge: Request for deletion of my tool 'firstedit' from toolforge - https://phabricator.wikimedia.org/T404465 (10Gnoeee) 03NEW [17:42:41] 06cloud-services-team, 10Cloud-VPS, 07Epic: Support IPv6-only VMs - https://phabricator.wikimedia.org/T404466 (10taavi) 03NEW p:05Triage→03Low [17:44:18] 06cloud-services-team, 10Cloud-VPS, 07Epic: Support IPv6-only VMs - https://phabricator.wikimedia.org/T404466#11177191 (10taavi) [17:44:22] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Add IPv6 DNS recursor to v6-capable hosts - https://phabricator.wikimedia.org/T397822#11177193 (10taavi) [17:44:25] 06cloud-services-team, 10Cloud-VPS, 07IPv6, 13Patch-For-Review: Support proxy backends using IPv6 - https://phabricator.wikimedia.org/T404302#11177192 (10taavi) [17:44:26] 06cloud-services-team, 10Cloud-VPS, 07IPv6: Enable IPv6 on Cloud VPS infrastructure services - https://phabricator.wikimedia.org/T392688#11177194 (10taavi) [17:44:48] 06cloud-services-team, 10Cloud-VPS, 07Epic, 07IPv6: Support IPv6-only VMs - https://phabricator.wikimedia.org/T404466#11177195 (10taavi) [17:54:49] 06cloud-services-team, 10Toolforge: Request for deletion of my tool 'firstedit' from toolforge - https://phabricator.wikimedia.org/T404465#11177227 (10Gnoeee) a:03taavi [18:10:01] 06cloud-services-team, 10Toolforge: Request for deletion of my tool 'firstedit' from toolforge - https://phabricator.wikimedia.org/T404465#11177248 (10JJMC89) a:05taavi→03None [18:10:51] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:15:35] 10cloud-services-team (FY2025/26-Q1), 10Wikidocumentaries: wikidocumentaries on WMCS is sending millions of invalid requests to Europeana.eu servers - https://phabricator.wikimedia.org/T404347#11177254 (10TuukkaH) I have re-added the environment variable for the API key and checked that the Europeana requests... [18:15:51] FIRING: [2x] ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [18:45:07] 10Toolforge (Toolforge iteration 24), 13Patch-For-Review: [components-api] Queue builds when the build queue is full - https://phabricator.wikimedia.org/T402568#11177340 (10Raymond_Ndibe) 05Open→03In progress [19:10:54] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471 (10Iniquity) 03NEW [19:11:11] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177446 (10Iniquity) p:05Triage→03Unbreak! [19:14:43] 06cloud-services-team, 10Toolforge, 07Wikimedia-production-error: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177452 (10Iniquity) [19:15:36] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177460 (10JJMC89) Toolforge is not part of prod. [19:16:19] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177464 (10Iniquity) >>! In T404471#11177459, @JJMC89 wrote: > Toolforge is not part of prod. I didn't know, thanks! [19:19:02] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177470 (10JJMC89) I cannot reproduce. The tool's home page successfully loads and so does the example linked there. [19:21:01] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177477 (10Iniquity) >>! In T404471#11177470, @JJMC89 wrote: > I cannot reproduce. The tool's home page successfully loads and so does t... [19:21:02] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177480 (10dcaro) There's an active alert on one haproxy, probably flapping, looking {F66018633} [19:24:00] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177484 (10dcaro) It seems it's not logging anything since the 8th of September: ` root@tools-k8s-haproxy-5:~# journalctl -f -u haproxy.... [19:27:23] 06cloud-services-team, 10Toolforge: Recover missing .kube/config for firstedit tool - https://phabricator.wikimedia.org/T404465#11177486 (10SD0001) [19:29:17] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177492 (10dcaro) Both haproxies are failing the health checks for the last 3h or so: {F66018643} [19:31:05] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177504 (10dcaro) tools-k8s-haproxy-6 logs for haproxy services stopped also on sep 8th, just restarted haproxy there also [19:33:44] 06cloud-services-team, 10Toolforge: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177519 (10taavi) It seems like we are hitting the HAProxy session limit for ingresses: {F66018640} Looking at the Nginx (front proxy) l... [19:44:02] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177530 (10dcaro) I think it might be geohack getting most the connections: ` root@tools-proxy-9:~# tail -n 10000... [19:50:18] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177533 (10bd808) p:05Unbreak!→03High Lowering from UBN! to High. [19:50:51] RESOLVED: ProbeDown: Service tools-k8s-haproxy-5:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/k8s-haproxy - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:51:37] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177538 (10dcaro) @taavi's patch is working as expected, geohack is getting throttled, and the rest of tools start... [19:52:50] 06cloud-services-team, 10Toolforge: Unexpected error "Subquery returns more than 1 row" on wiki replicas - https://phabricator.wikimedia.org/T404473 (10SD0001) 03NEW [19:52:57] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177553 (10Iniquity) Thanks! It works :) [19:53:41] 06cloud-services-team, 10Data-Services: Unexpected error "Subquery returns more than 1 row" on wiki replicas - https://phabricator.wikimedia.org/T404473#11177555 (10taavi) [19:55:28] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177557 (10dcaro) 05Open→03Resolved a:03dcaro Closing now, will think on followups if any next week, tha... [19:55:31] 06cloud-services-team, 10Toolforge: Recover missing .kube/config for firstedit tool - https://phabricator.wikimedia.org/T404465#11177561 (10bd808) 05Open→03In progress a:03bd808 https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Regenerate_kubernetes_credentials_for_tools_(.kube/config) `lang=she... [19:56:03] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Wikimedia Toolforge Error: Failed to load resource: the server responded with a status of 500 - https://phabricator.wikimedia.org/T404471#11177567 (10taavi) a:05dcaro→03taavi [19:59:22] 06cloud-services-team, 10Toolforge: Recover missing .kube/config for firstedit tool - https://phabricator.wikimedia.org/T404465#11177571 (10bd808) 05In progress→03Resolved [20:11:47] 06cloud-services-team, 10Data-Services, 06Community-Tech, 10Multiblocks: Unexpected error "Subquery returns more than 1 row" on wiki replicas - https://phabricator.wikimedia.org/T404473#11177612 (10SD0001) Simplified query with the same issue: `SELECT actor_id from actor`. Reproducible on production when... [22:38:00] 10Cloud-VPS (Project-requests): Request creation of gitlab-runners-staging VPS project - https://phabricator.wikimedia.org/T404386#11177958 (10dduvall) @Andrew I don't see any zones listed in the project. Is that normal for a new project? [22:56:36] 10Cloud-VPS (Project-requests): Request creation of gitlab-runners-staging VPS project - https://phabricator.wikimedia.org/T404386#11177989 (10bd808) >>! In T404386#11177958, @dduvall wrote: > @Andrew I don't see any zones listed in the project. Is that normal for a new project? No, it is not normal for a p...