[00:23:15] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10824694 (10Jhancock.wm) [02:18:46] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10824773 (10Andrew) I have made some progress here... I now have octavia launching amphora VMs; an example is 440ef96a-5902-40f3-be40-e0c07791c994 in the 'service' project. Th... [05:58:51] (03open) 10chuckonwumelu: Temporary: For demo purposes only [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/28 [06:25:38] 06cloud-services-team, 10Data-Services, 06Data-Persistence: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372 (10Marostegui) 03NEW [07:44:54] 06cloud-services-team, 10Cloud-VPS: project-proxy puppetserver CA about to expire - https://phabricator.wikimedia.org/T392792#10825117 (10taavi) p:05Medium→03High a:03taavi [08:00:11] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10825151 (10abi_) 05Open→03In progress [08:00:58] RESOLVED: PuppetCertificateAboutToExpire: Puppet CA certificate Puppet CA: toolsbeta-puppetmaster-04.toolsbeta.eqiad.wmflabs is about to expire in 26d 3h 57m 43s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [08:16:00] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10825206 (10Wangombe) a:03Wangombe [08:16:04] 10Tool-dabfix, 05Community-Wishlist-Survey-2023: Investigate Dabfix tool implementation - https://phabricator.wikimedia.org/T336545#10825209 (10KSiebert) [08:17:47] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10825216 (10Wangombe) I think this project needs message documentation. See [[ https://t... [08:21:36] PROBLEM - Wikitech-static main page has content on wikitech-static.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string Wikitech not found on https://wikitech-static.wikimedia.org:443/wiki/Main_Page?debug=true - 2517 bytes in 0.118 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [08:23:28] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10825236 (10Nokib_Sarkar) @Wangombe Would [this](https://github.com/nokibsarkar/campwiz-... [08:50:08] 10wikitech.wikimedia.org, 06serviceops, 10Shellbox: Shellbox is broken on wikitech-static due to disk fullness - https://phabricator.wikimedia.org/T338520#10825311 (10fnegri) This happened again: ` root@wikitech-static:~# df -h Filesystem Size Used Avail Use% Mounted on udev 979M 0... [08:50:39] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29762 bytes in 0.438 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [08:51:35] 10wikitech.wikimedia.org, 06serviceops, 10Shellbox: Shellbox is broken on wikitech-static due to disk fullness - https://phabricator.wikimedia.org/T338520#10825315 (10fnegri) That worked: ` root@wikitech-static:~# df -h Filesystem Size Used Avail Use% Mounted on udev 979M 0 979M... [09:10:42] (03open) 10aborrero: codfw1dev: network: make octavia-lb-mgmt network dualstack IPv6/IPv4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/232 (https://phabricator.wikimedia.org/T394099) [09:12:36] (03approved) 10taavi: codfw1dev: network: make octavia-lb-mgmt network dualstack IPv6/IPv4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/232 (https://phabricator.wikimedia.org/T394099) (owner: 10aborrero) [09:25:22] 06cloud-services-team, 10Cloud-VPS: project-proxy puppetserver CA about to expire - https://phabricator.wikimedia.org/T392792#10825361 (10taavi) 05Open→03Resolved [09:28:09] 10Striker: Use IDP for authentication in Striker - https://phabricator.wikimedia.org/T359554#10825368 (10dcaro) Related {T363983} [09:41:55] (03PS3) 10Majavah: Swap Memcached driver [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) [09:41:55] (03PS4) 10Majavah: Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (https://phabricator.wikimedia.org/T359217) [09:41:55] (03PS4) 10Majavah: build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 [09:41:56] (03PS4) 10Majavah: Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 [09:43:16] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [09:47:29] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [09:56:19] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [09:57:14] 10wikitech.wikimedia.org, 06serviceops-radar, 06SRE, 13Patch-For-Review, 07SRE-Unowned: Redesign wikitech-static - https://phabricator.wikimedia.org/T376400#10825452 (10taavi) The site at http://ec2-54-81-201-239.compute-1.amazonaws.com/ seems to embed images from `upload.wikimedia.org`, for pages like n... [09:58:03] 10wikitech.wikimedia.org, 10Wikidata, 10Wikimedia-Interwiki-links, 13Patch-For-Review, 10Wikidata Integration in Wikimedia projects (Kanban Board): Enable interwiki links to/from Wikitech - https://phabricator.wikimedia.org/T290147#10825455 (10taavi) >>! In T290147#10751338, @JoelyRooke-WMDE wrote: > @ta... [10:11:55] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Understand Octavia network needs - https://phabricator.wikimedia.org/T394099#10825554 (10aborrero) >>! In T394099#10824773, @Andrew wrote: > I have made some progress here... I now have octavia launching amphora VMs; an example is 440ef96a-5902-40f3-be4... [10:24:15] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [10:24:29] (03update) 10raymond-ndibe: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) [10:24:41] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [10:28:03] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [harbor, builds-builder] Audit robot account permissions - https://phabricator.wikimedia.org/T361708#10825622 (10Raymond_Ndibe) 05In progress→03Resolved [10:34:37] (03CR) 10Majavah: [V:03+1] "Tested locally." [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) (owner: 10Majavah) [10:34:39] (03CR) 10Majavah: [V:03+1 C:03+2] Swap Memcached driver [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) (owner: 10Majavah) [10:36:03] (03Merged) 10jenkins-bot: Swap Memcached driver [labs/striker] - 10https://gerrit.wikimedia.org/r/1145829 (https://phabricator.wikimedia.org/T394278) (owner: 10Majavah) [10:39:34] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10825660 (10fnegri) 05In progress→03Resolved The replica is not back in sync yet, but it's past the moment where I discarded the domain on... [10:41:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): ToolsDB: discard obsolete GTID domains - https://phabricator.wikimedia.org/T334947#10825686 (10fnegri) [10:41:37] 06cloud-services-team, 10Toolforge: [toolsdb] set gtid_domain_id to 0 - https://phabricator.wikimedia.org/T357341#10825685 (10fnegri) [10:44:52] 06cloud-services-team, 10Toolforge: [toolsdb] set gtid_domain_id to 0 - https://phabricator.wikimedia.org/T357341#10825693 (10fnegri) [11:47:16] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 3868 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [12:35:41] 10Tool-wdrecentchanges: Add the edit damaging score - https://phabricator.wikimedia.org/T393317#10826032 (10Gnoeee) 05In progress→03Resolved [12:41:59] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Toolforge (Toolforge iteration 20): [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2025-05-09 - https://phabricator.wikimedia.org/T393766#10826038 (10fnegri) 05In progress→03Resolved The replica is back in sync. {F60014637} [12:43:04] 10Striker: django.core.cache.backends.memcached.MemcachedCache is removed in Django 4.1 - https://phabricator.wikimedia.org/T394278#10826048 (10taavi) 05Open→03Resolved [12:43:27] 10Striker: Groups and tools only refreshed at login - https://phabricator.wikimedia.org/T144943#10826052 (10taavi) 05Open→03Resolved [12:44:59] 06cloud-services-team, 10Toolforge, 07IPv6, 07Kubernetes: Support IPv6 in Toolforge Kubernetes - https://phabricator.wikimedia.org/T380060#10826056 (10aborrero) There are several things to consider here. First of all, we just cannot take a chunk of the Cloud VPS `VXLAN/IPv6-dualstack` CIDR and give it to T... [12:56:22] (03update) 10andrew: codfw1dev: network: make octavia-lb-mgmt network dualstack IPv6/IPv4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/232 (https://phabricator.wikimedia.org/T394099) (owner: 10aborrero) [12:56:36] (03merge) 10andrew: codfw1dev: network: make octavia-lb-mgmt network dualstack IPv6/IPv4 [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/232 (https://phabricator.wikimedia.org/T394099) (owner: 10aborrero) [12:57:01] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [12:58:53] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826090 (10VRiley-WMF) [12:58:57] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan+apply for main branch [13:19:59] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826158 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1072.eqiad.wmnet with OS bookworm [13:20:05] (03open) 10andrew: octavia-lb-mgmt: move to private ipv6 range [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/233 (https://phabricator.wikimedia.org/T394099) [13:20:53] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826164 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1074.eqiad.wmnet with OS bookworm [13:22:48] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826169 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1071.eqiad.wmnet with OS bookworm [13:25:00] 10Toolforge (Toolforge iteration 20): [envvars] show the 'global' envvars when running `toolforge envvars list` - https://phabricator.wikimedia.org/T394408 (10dcaro) 03NEW [13:25:58] (03merge) 10andrew: octavia-lb-mgmt: move to private ipv6 range [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/233 (https://phabricator.wikimedia.org/T394099) [13:26:11] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826186 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1073.eqiad.wmnet with OS bookworm [13:26:53] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan+apply for main branch [13:27:55] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan+apply for main branch [13:30:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [jobs-api] Periodically refresh image-config data - https://phabricator.wikimedia.org/T357112#10826211 (10Raymond_Ndibe) >>! In T357112#10775170, @Raymond_Ndibe wrote: > wondering why we just don't fetch the images from k8s conf... [13:31:58] 10Toolforge (Toolforge iteration 20): [envvars] show the 'global' envvars when running `toolforge envvars list` - https://phabricator.wikimedia.org/T394408#10826222 (10dcaro) p:05Triage→03Medium [13:45:15] 10Toolforge (Toolforge iteration 20): [envvars] show the 'global' envvars when running `toolforge envvars list` - https://phabricator.wikimedia.org/T394408#10826300 (10dcaro) [13:45:46] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826301 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1073.eqiad.wmnet with OS bookworm executed... [13:46:32] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826303 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1073.eqiad.wmnet with OS bookworm [13:49:04] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826313 (10VRiley-WMF) [13:56:38] (03update) 10dcaro: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [13:58:42] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826396 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1072.eqiad.wmnet with OS bookworm completed... [13:59:40] (03update) 10raymond-ndibe: [jobs-api] periodically refresh image-config data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/160 (https://phabricator.wikimedia.org/T357112) [14:01:33] (03approved) 10dcaro: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [14:02:05] (03update) 10dcaro: [jobs-cli] health_check and quota refactor [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/97 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [14:02:44] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [14:03:32] (03update) 10dcaro: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [14:04:15] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826479 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1074.eqiad.wmnet with OS bookworm completed... [14:04:41] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826491 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1073.eqiad.wmnet with OS bookworm executed... [14:08:51] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826511 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1071.eqiad.wmnet with OS bookworm executed... [14:13:31] (03update) 10chuckonwumelu: Temporary: For demo purposes only [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/28 [14:18:54] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826598 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1073.eqiad.wmnet with OS bookworm [14:22:11] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826659 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host cloudvirt1071.eqiad.wmnet with OS bookworm [14:24:23] (03update) 10raymond-ndibe: Draft: [envvars-api] fix envvars-api EnvvarName regex bug [repos/cloud/toolforge/envvars-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-api/-/merge_requests/54 (https://phabricator.wikimedia.org/T391966) [14:24:44] (03update) 10dcaro: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [14:24:48] (03update) 10dcaro: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) (owner: 10raymond-ndibe) [14:27:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:37:48] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [14:39:55] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826758 (10VRiley-WMF) [14:42:29] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826770 (10VRiley-WMF) [14:45:55] (03open) 10andrew: Move octavia mgmt network from admin to service project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 (https://phabricator.wikimedia.org/T394099) [14:48:45] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 [14:49:02] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 [14:53:00] (03update) 10andrew: Move octavia mgmt network from admin to service project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 (https://phabricator.wikimedia.org/T394099) [14:53:03] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 [14:53:20] !log andrew@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.tofu (exit_code=99) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 [14:58:12] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826854 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1073.eqiad.wmnet with OS bookworm completed... [15:02:36] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826875 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1071.eqiad.wmnet with OS bookworm completed... [15:03:09] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#10826876 (10VRiley-WMF) 05Open→03Resolved [15:38:03] (03close) 10andrew: Move octavia mgmt network from admin to service project [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/234 (https://phabricator.wikimedia.org/T394099) [15:40:30] (03open) 10andrew: Mark octavia mgmt network as shared [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 (https://phabricator.wikimedia.org/T394099) [15:42:51] (03update) 10andrew: Mark octavia mgmt network as shared [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 (https://phabricator.wikimedia.org/T394099) [15:45:54] (03approved) 10aborrero: Mark octavia mgmt network as shared [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 (https://phabricator.wikimedia.org/T394099) (owner: 10andrew) [15:48:39] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 [15:48:58] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 [15:49:06] !log andrew@cloudcumin1001 admin START - Cookbook wmcs.openstack.tofu running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 [15:49:25] !log andrew@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.tofu (exit_code=0) running tofu plan for https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 [15:49:39] (03merge) 10andrew: Mark octavia mgmt network as shared [repos/cloud/cloud-vps/tofu-infra] - 10https://gitlab.wikimedia.org/repos/cloud/cloud-vps/tofu-infra/-/merge_requests/235 (https://phabricator.wikimedia.org/T394099) [15:56:50] 06cloud-services-team, 10Data-Services: enwiki_p query returned empty results on May 14 from ~UTC 0:00 - 05:00 - https://phabricator.wikimedia.org/T394429#10827171 (10Pppery) [16:54:33] 06cloud-services-team, 10Cloud-VPS: Stop configuring the openstack osbpo repos on most VMs - https://phabricator.wikimedia.org/T394438 (10Andrew) 03NEW [17:03:12] 06cloud-services-team, 10Data-Services: enwiki_p query returned empty results on May 14 from ~UTC 0:00 - 05:00 - https://phabricator.wikimedia.org/T394429#10827499 (10Umherirrender) Your SQL indicates you want all category member of that category, but https://en.wikipedia.org/wiki/Category:A-Class_Austria_arti... [17:08:16] (03open) 10dcaro: runtimes.k8s.images: use config for image refresh interval [repos/cloud/toolforge/jobs-api] (refresh_image_config_data) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/165 [17:08:25] (03open) 10bd808: zuul: autovoice corvus [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/26 [17:09:02] (03update) 10dcaro: runtimes.k8s.images: use config for image refresh interval [repos/cloud/toolforge/jobs-api] (refresh_image_config_data) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/165 [17:14:19] (03update) 10dcaro: runtimes.k8s.images: use config for image refresh interval [repos/cloud/toolforge/jobs-api] (refresh_image_config_data) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/165 [17:16:40] (03merge) 10bd808: zuul: autovoice corvus [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/26 [17:16:45] (03update) 10dcaro: runtime.k8s.image: periodically refresh image-config data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/160 (https://phabricator.wikimedia.org/T357112) (owner: 10raymond-ndibe) [17:17:03] (03update) 10dcaro: runtimes.k8s.images: use config for image refresh interval [repos/cloud/toolforge/jobs-api] (refresh_image_config_data) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/165 [17:20:52] 06cloud-services-team, 10Cloud-VPS: Test (and implement?) Openstack Octavia lbaas - https://phabricator.wikimedia.org/T393783#10827558 (10Andrew) Proof of life! https://roundrobin.codfw1dev.wmcloud.org/ uses a round-robin balancer backed by one nginx and one apache server. Depending on your luck you will get... [17:25:44] 06cloud-services-team, 10Cloud-VPS: Test (and implement?) Openstack Octavia lbaas - https://phabricator.wikimedia.org/T393783#10827579 (10Andrew) [] fix health check traffic from amphora -> cloudcontrol [] move amphorae out of 'service' project into a properly named 'octavia' project [] adjust haproxy frontend... [17:26:28] (03update) 10dcaro: runtimes.k8s.images: use config for image refresh interval [repos/cloud/toolforge/jobs-api] (refresh_image_config_data) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/165 [17:33:48] FIRING: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [17:33:53] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol2004-dev:9100 - https://phabricator.wikimedia.org/T394443 (10phaultfinder) 03NEW [17:34:29] (03open) 10bd808: Use account names rather than nicks for marxarelli and jeblair [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/27 [17:35:53] (03update) 10dcaro: runtimes.k8s.images: use config for image refresh interval [repos/cloud/toolforge/jobs-api] (refresh_image_config_data) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/165 [17:38:36] (03merge) 10bd808: Use account names rather than nicks for marxarelli and jeblair [toolforge-repos/ircservserv-config] - 10https://gitlab.wikimedia.org/toolforge-repos/ircservserv-config/-/merge_requests/27 [17:48:48] FIRING: PuppetConstantChange: Puppet performing a change on every puppet run on clouddumps1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [17:55:27] 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Consider setting up an https://github.com/knyar/phalerts instance in metricsinfra - https://phabricator.wikimedia.org/T394446 (10bd808) 03NEW [18:03:48] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on clouddumps1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [18:13:48] RESOLVED: PuppetFailure: Puppet has failed on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [18:29:18] 06cloud-services-team, 10Data-Services: enwiki_p query returned empty results on May 14 from ~UTC 0:00 - 05:00 - https://phabricator.wikimedia.org/T394429#10827863 (10Audiodude) It has the talk page articles in it, which is what we are looking for (WP 1.0 articles are categorized on their talk pages, not their... [19:22:34] (03PS2) 10LD: frwiki: Enable the NewUserMessage extension [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1146702 (https://phabricator.wikimedia.org/T382199) [19:22:34] (03CR) 10LD: "Made as T30689 : https://codesearch.wmcloud.org/search/?q=T30689" [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1146702 (https://phabricator.wikimedia.org/T382199) (owner: 10LD) [19:47:13] 06cloud-services-team: Emails to cloudservices@wikimedia.org from root@beta.toolforge.org bouncing - https://phabricator.wikimedia.org/T394453 (10bd808) 03NEW [20:31:55] (03PS1) 10AntiCompositeNumber: megatable: Refresh data via /bin/update-www-var [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1146724 [20:43:27] 06cloud-services-team, 10Cloud-VPS: Test (and implement?) Openstack Octavia lbaas - https://phabricator.wikimedia.org/T393783#10828201 (10Andrew) [] fix health check traffic from amphora -> cloudcontrol [] move amphorae out of 'service' project into a properly named 'octavia' project [] adjust haproxy frontend... [20:55:30] (03CR) 10AntiCompositeNumber: "Only relevant change is that private wikis were removed from wmgRC2UDPPrefix. This should have no effect on us, I think." [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1146724 (owner: 10AntiCompositeNumber) [20:58:58] (03CR) 10AntiCompositeNumber: [C:04-2] "As Pppery mentioned, this is the wrong place to make this change. The correct change was merged as 9214e5c8613d4dc583a1c8e978f9f13bc56b740" [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1146702 (https://phabricator.wikimedia.org/T382199) (owner: 10LD) [21:50:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-9 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:51:11] (03approved) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/37 (owner: 10l10n-bot) [21:51:13] (03merge) 10lucaswerkmeister: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/37 (owner: 10l10n-bot) [22:03:48] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on clouddumps1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [23:10:53] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): Upgrade python buildpack to v0.17.0 or newer for Poetry support - https://phabricator.wikimedia.org/T374056#10828460 (10bd808) >>! In T374056#10812792, @dcaro wrote: > @bd808 just released a new flag for the cli `toolforge build start --use-latest-ve... [23:28:20] 10Tool-gitlab-content, 07User-notice: Implement a reverse proxy for gitlab.wikimedia.org raw content that supports mime-type specification - https://phabricator.wikimedia.org/T392431#10828470 (10bd808) 05In progress→03Resolved >>! In T392431#10808847, @bd808 wrote: >>>! In T392431#10802966, @UOzurumba... [23:50:22] 06cloud-services-team, 10Toolforge: [build-service] remove legacy fagiani/apt 0.2.5 builder from `--use-latest-versions` stack - https://phabricator.wikimedia.org/T394466 (10bd808) 03NEW [23:51:07] 06cloud-services-team, 10Toolforge: [build-service] remove legacy fagiani/apt 0.2.5 builder from `--use-latest-versions` stack - https://phabricator.wikimedia.org/T394466#10828509 (10bd808) [23:51:11] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [builds-builder] Add support for Heroku's "24" builder stack based on Ubuntu 2024.04 noble - https://phabricator.wikimedia.org/T380127#10828510 (10bd808) [23:56:53] (03approved) 10bd808: Upgrade to ZNC 1.9.0 [toolforge-repos/containers-bnc] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-bnc/-/merge_requests/1 (https://phabricator.wikimedia.org/T380108) [23:56:56] (03merge) 10bd808: Upgrade to ZNC 1.9.0 [toolforge-repos/containers-bnc] - 10https://gitlab.wikimedia.org/toolforge-repos/containers-bnc/-/merge_requests/1 (https://phabricator.wikimedia.org/T380108)