[05:39:29] (03update) 10samwilson: Use HTTP client object from API, with User-Agent set [toolforge-repos/wsexport] - 10https://gitlab.wikimedia.org/toolforge-repos/wsexport/-/merge_requests/4 (https://phabricator.wikimedia.org/T403435) [05:58:32] 10VPS-project-Codesearch, 06MediaWiki-Engineering: Codesearch: Add "View JSON" link to from action=repos - https://phabricator.wikimedia.org/T363698#11431557 (10A_smart_kitten) [05:59:59] 06cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install clouddb1026-1033 - https://phabricator.wikimedia.org/T409162#11431559 (10Marostegui) [07:48:21] 10Tool-global-search: Global Search logs you out too often - https://phabricator.wikimedia.org/T411749 (10Jack_who_built_the_house) 03NEW [08:33:29] 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Temporary quota increase for mwoffliner project - https://phabricator.wikimedia.org/T411751 (10Benoit74) 03NEW [09:10:51] (03PS1) 10Brouberol: growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) [10:04:24] (03CR) 10Btullis: [C:03+1] growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol) [10:09:25] (03CR) 10Brouberol: [C:03+2] growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol) [10:09:27] (03CR) 10Brouberol: [V:03+2 C:03+2] growthbook-next: add stub OIDC client secret [labs/private] - 10https://gerrit.wikimedia.org/r/1215081 (https://phabricator.wikimedia.org/T411752) (owner: 10Brouberol) [10:19:50] (03update) 10taavi: kubernetes: Alert on misplaced ingress-nginx pods [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/49 (https://phabricator.wikimedia.org/T410382) [10:32:31] 06cloud-services-team, 10Toolforge: Add paging alert if Toolforge HAProxy connection limit is reached - https://phabricator.wikimedia.org/T410421#11431997 (10taavi) a:03taavi [10:32:32] (03update) 10taavi: kubernetes: Alert on misplaced ingress-nginx pods [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/49 (https://phabricator.wikimedia.org/T410382) [10:34:25] (03open) 10taavi: web: Alert on frontend connection limits [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/53 (https://phabricator.wikimedia.org/T410421) [10:34:28] (03update) 10taavi: web: Alert on frontend connection limits [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/53 (https://phabricator.wikimedia.org/T410421) [11:15:35] (03update) 10fnegri: Clean up and adapt imported alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/52 (https://phabricator.wikimedia.org/T410505) [11:15:40] (03update) 10fnegri: Clean up and adapt imported alerts [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/52 (https://phabricator.wikimedia.org/T410505) [11:23:18] (03approved) 10fnegri: kubernetes: Alert on misplaced ingress-nginx pods [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/49 (https://phabricator.wikimedia.org/T410382) (owner: 10taavi) [11:24:44] (03merge) 10taavi: kubernetes: Alert on misplaced ingress-nginx pods [repos/cloud/toolforge/alerts] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/49 (https://phabricator.wikimedia.org/T410382) [11:26:02] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Ensure ingress pods get scheduled on ingress nodes - https://phabricator.wikimedia.org/T410382#11432117 (10taavi) 05Open→03Resolved a:03taavi [11:45:28] 06cloud-services-team, 10Horizon, 13Patch-For-Review: Page on cloudweb/horizon down - https://phabricator.wikimedia.org/T411470#11432184 (10taavi) p:05Triage→03High [11:45:46] 06cloud-services-team, 10Cloud-VPS, 10Data-Services: Plan to make clouddumps more resilient and easier to operate - https://phabricator.wikimedia.org/T411248#11432186 (10taavi) p:05Triage→03Medium [12:25:03] supertassu closed https://github.com/toolforge/quarry/pull/23 [12:28:31] 06cloud-services-team, 10Quarry, 07CSS, 13Patch-For-Review: Improve maintenance message CSS - https://phabricator.wikimedia.org/T343644#11432371 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/94 [12:28:41] 06cloud-services-team, 10Quarry, 07CSS: Improve maintenance message CSS - https://phabricator.wikimedia.org/T343644#11432372 (10taavi) 05Open→03Resolved [12:28:50] supertassu closed https://github.com/toolforge/quarry/pull/94 [12:29:04] 06cloud-services-team, 10Quarry, 07CSS: Improve maintenance message CSS - https://phabricator.wikimedia.org/T343644#11432375 (10taavi) [12:32:03] (03CR) 10CI reject: [V:04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1215139 (owner: 10L10n-bot) [12:53:41] supertassu opened https://github.com/toolforge/quarry/pull/95 [13:06:56] FIRING: SystemdUnitDown: The service unit backup_glance_images.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:07:37] 06cloud-services-team, 10Wikimedia Enterprise Volunteer Request, 06Wikimedia Enterprise (WME Kanban): Toolforge no longer has IP-based access to Wikimedia Enterprise - https://phabricator.wikimedia.org/T410994#11432455 (10RThomas-WMF) 05In progress→03Resolved [13:13:40] supertassu closed https://github.com/toolforge/quarry/pull/95 [13:15:29] (03update) 10piracalamina: [ Restore scholary works to author works query results ] [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/38 (owner: 10obediobadiah) [13:15:40] (03close) 10piracalamina: [ Restore scholary works to author works query results ] [toolforge-repos/paulina] - 10https://gitlab.wikimedia.org/toolforge-repos/paulina/-/merge_requests/38 (owner: 10obediobadiah) [13:42:25] (03PS1) 10Rehan_khan_78: Fix campaign categories handling and filter hidden categories [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1215160 [13:44:57] (03PS2) 10Rehan_khan_78: Fix campaign categories handling and filter hidden categories [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1215160 (https://phabricator.wikimedia.org/T224005) [14:01:29] 06cloud-services-team, 10Horizon, 10Striker, 06Infrastructure-Foundations, 10netops: Move cloudweb hosts to cloud racks? - https://phabricator.wikimedia.org/T411783 (10taavi) 03NEW [14:33:15] (03CR) 10Arendpieter: "no, I did not use any LLMs or similar tools." [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) (owner: 10Arendpieter) [14:38:07] 10Toolforge, 06tools-platform-team: jobs-api lists running buildservice images as "unknown" - https://phabricator.wikimedia.org/T411790 (10taavi) 03NEW [14:47:03] 10Toolforge, 06tools-platform-team: jobs-api lists running buildservice images as "unknown" - https://phabricator.wikimedia.org/T411790#11432937 (10taavi) p:05Triage→03Low a:03taavi [14:49:23] (03open) 10taavi: core: image: Set correct state for buildservice images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/257 (https://phabricator.wikimedia.org/T411790) [14:59:30] (03update) 10taavi: core: image: Set correct state for buildservice images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/257 (https://phabricator.wikimedia.org/T411790) [15:01:56] FIRING: SystemdUnitDown: The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:21:51] (03update) 10taavi: core: image: Set correct state for buildservice images [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/257 (https://phabricator.wikimedia.org/T411790) [15:35:08] (03PS9) 10Arendpieter: Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) [15:37:29] (03CR) 10CI reject: [V:04-1] Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) (owner: 10Arendpieter) [15:39:25] (03PS10) 10Arendpieter: Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) [15:40:48] (03CR) 10CI reject: [V:04-1] Use IDP for authentication [labs/striker] - 10https://gerrit.wikimedia.org/r/1189915 (https://phabricator.wikimedia.org/T359554) (owner: 10Arendpieter) [15:40:53] 06cloud-services-team, 10Toolforge, 07Kubernetes: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237#11433190 (10Andrew) I'm partway into this process but everyone is about to travel so I'm rolling things back to Bullseye everywhere. [15:42:09] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [15:42:13] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [15:48:38] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) [17:13:32] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [17:13:37] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [17:20:06] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) [17:26:32] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [17:26:37] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [17:36:05] 06cloud-services-team, 10Cloud-VPS (Quota-requests), 07affects-Kiwix-and-openZIM: Temporary quota increase for mwoffliner project - https://phabricator.wikimedia.org/T411751#11433647 (10bd808) +1 [17:36:56] RESOLVED: SystemdUnitDown: The service unit backup_glance_images.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:42:26] RESOLVED: SystemdUnitDown: The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:43:21] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [17:52:24] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster [18:02:55] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=0) [18:05:25] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [18:05:29] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [18:21:13] 06Toolforge-standards-committee: Check Community CRM for any known conflicts with committee nominees - https://phabricator.wikimedia.org/T411440#11433825 (10bd808) 05Open→03Resolved No conflicts were found for candidates Sohom Datta, EggRoll97, JJMC89, Lucas Werkmeister, Pintoch, or SD0001. [18:24:46] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [18:36:27] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [18:36:32] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [18:44:36] !log andrew@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) [18:45:49] !log andrew@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [18:45:53] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [18:53:57] !log andrew@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) [19:06:45] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [19:06:51] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [19:13:51] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) [19:19:14] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 488 bytes in 0.248 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [19:28:34] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [19:28:39] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [19:35:28] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) [19:38:24] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [19:38:29] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [19:54:14] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.319 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [19:56:13] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [19:56:53] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [19:56:58] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [20:03:16] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) [20:03:36] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [20:03:40] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [20:09:14] PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 488 bytes in 0.222 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [20:10:46] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) [20:23:53] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [20:23:58] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [20:40:11] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [20:45:49] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [20:45:54] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [20:52:04] !log andrew@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) [20:53:14] FIRING: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-8.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [20:58:14] RESOLVED: [2x] ToolforgeKubernetesHAproxyServerDown: Toolforge HAProxy server tools-k8s-control-8.tools.eqiad1.wikimedia.cloud is DOWN - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesHAproxyServerDown - https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesHAproxyServerDown [21:02:51] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (T361237) [21:02:57] T361237: [infra] Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T361237 [21:10:01] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) [21:20:15] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_etcd_node (T375217) [21:20:20] T375217: Complete upgrading WMCS bare metal hosts to Trixie - https://phabricator.wikimedia.org/T375217 [21:37:33] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) [21:42:00] supertassu opened https://github.com/toolforge/quarry/pull/96 [21:49:14] RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.704 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker [21:54:06] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545#11434806 (10komla) I've gotten the email list of toolforge and cloudvps users via ldap(using project-tools and project-bastion groups). I've removed duplicates and also taken out those on the optout list. I sti... [22:54:49] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545#11435023 (10bd808) `lang=shell-session bd808@tools-bastion-14.tools.eqiad1:~$ ldap -b cn=project-bastion,ou=groups,dc=wikimedia,dc=org | grep member: | wc -l 5558 bd808@tools-bastion-14.tools.eqiad1:~$ ldap -b c... [23:01:57] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545#11435038 (10bd808) One of the things I wondered about was missing lots of users who had opt-ed out of email contact in the past via their Wikitech preferences. That doesn't seem like a big problem however: `lang... [23:12:50] 06cloud-services-team: Update make-toolforge-user-list.py - https://phabricator.wikimedia.org/T411545#11435059 (10bd808) Ah ha! The https://gitlab.wikimedia.org/repos/cloud/wmcs/wmcs-survey-mail-list/-/blob/main/make-cloudvps-email-list.py script only gathered emails for folks with the `projectadmin` role (repla... [23:19:37] 10Tools, 05PES1.3.3 WP25 Easter Eggs, 07Software-Licensing: wikipedia25-years-of-wikipedia tool loads and uses non-free JavaScript - https://phabricator.wikimedia.org/T410465#11435077 (10taavi) @ATitkov: Ping? Any news here?