[00:08:49] FIRING: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1031 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [00:12:49] FIRING: [8x] NeutronAgentDownForLong: Neutron neutron-openvswitch-agent on cloudvirt1031 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [00:13:00] 06cloud-services-team: NeutronAgentDownForLong - https://phabricator.wikimedia.org/T394861#10842276 (10phaultfinder) [00:27:49] FIRING: NeutronAgentDownForLong: Neutron neutron-openvswitch-agent on cloudvirt1032 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [00:27:53] 06cloud-services-team: NeutronAgentDownForLong Neutron neutron-openvswitch-agent on cloudvirt1032 has been down for more than 2h - https://phabricator.wikimedia.org/T394875 (10phaultfinder) 03NEW [00:30:52] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842314 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1034.eqiad.wmnet` - cloudvirt1034.eqiad... [00:33:17] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842316 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1035.eqiad.wmnet` - cloudvirt1035.eqiad... [00:36:37] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842319 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1033.eqiad.wmnet` - cloudvirt1033.eqiad... [00:41:42] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842332 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1034.eqiad.wmnet` - cloudvirt1034.eqiad... [00:51:19] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842349 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1036.eqiad.wmnet` - cloudvirt1036.eqiad... [00:57:20] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842350 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1037.eqiad.wmnet` - cloudvirt1037.eqiad... [01:02:31] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842359 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1038.eqiad.wmnet` - cloudvirt1038.eqiad... [01:08:12] 06cloud-services-team, 10decommission-hardware, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842362 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by andrew@cumin1002 for hosts: `cloudvirt1039.eqiad.wmnet` - cloudvirt1039.eqiad... [01:12:55] 06cloud-services-team, 06DC-Ops, 10decommission-hardware, 10ops-eqiad, 13Patch-For-Review: decommission cloudvirt103[1-9].eqiad.wmnet - https://phabricator.wikimedia.org/T394727#10842371 (10Andrew) [01:51:44] !log andrew@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary on eqiad1, with recreate False, for hosts list: ['cloudvirt1072'] [01:52:08] !log andrew@cloudcumin1001 cloudvirt-canary END (PASS) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=0) on eqiad1, with recreate False, for hosts list: ['cloudvirt1072'] [02:07:19] RESOLVED: NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1032 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [02:09:19] RESOLVED: NeutronAgentDownForLong: Neutron neutron-openvswitch-agent on cloudvirt1032 has been down for more than 2h - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDownForLong [02:10:53] 10Tool-gitlab-content: Add maxage/smaxage cache header controls to gilab-content proxy - https://phabricator.wikimedia.org/T393928#10842448 (10bd808) 05Open→03In progress a:03bd808 [02:11:28] 06cloud-services-team: NeutronAgentDownForLong Neutron neutron-openvswitch-agent on cloudvirt1032 has been down for more than 2h - https://phabricator.wikimedia.org/T394875#10842454 (10Andrew) 05Open→03Resolved a:03Andrew host was being decommissioned. [02:11:32] 06cloud-services-team: NeutronAgentDownForLong - https://phabricator.wikimedia.org/T394861#10842458 (10Andrew) 05Open→03Resolved a:03Andrew host was being decommissioned. [02:12:15] 06cloud-services-team: SystemdUnitDown - https://phabricator.wikimedia.org/T394837#10842462 (10Andrew) 05Open→03Resolved a:03Andrew A side-effect of my debugging a cinder issue T394790 [02:12:45] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol2006-dev:9100 - https://phabricator.wikimedia.org/T394349#10842467 (10Andrew) 05Open→03Resolved a:03Andrew [02:12:49] 06cloud-services-team: PuppetFailure Puppet has failed on cloudcontrol2004-dev:9100 - https://phabricator.wikimedia.org/T394443#10842469 (10Andrew) 05Open→03Resolved a:03Andrew [02:13:26] 06cloud-services-team, 10Cloud-VPS: Service implementation for Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T394671#10842471 (10Andrew) 05Open→03Resolved [02:14:04] 06cloud-services-team: PuppetFailure Puppet has failed on cloudvirt1076:9100 - https://phabricator.wikimedia.org/T394706#10842473 (10Andrew) 05Open→03Resolved a:03Andrew [02:14:20] 06cloud-services-team, 10Cloud-VPS: cloudbackup100[34] still do not actually do 'backy2 cleanup - https://phabricator.wikimedia.org/T394618#10842475 (10Andrew) p:05Triage→03Medium [02:14:49] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Stop configuring the openstack osbpo repos on most VMs - https://phabricator.wikimedia.org/T394438#10842476 (10Andrew) p:05Triage→03Medium [02:14:58] (03open) 10bd808: Add maxage Cache-Control support [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/9 (https://phabricator.wikimedia.org/T393928) [02:15:35] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: ldaptui fails to create new users - https://phabricator.wikimedia.org/T394341#10842479 (10Andrew) p:05Triage→03Medium [02:42:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [02:53:31] FIRING: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 3671 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [02:57:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:12:19] FIRING: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:17:19] RESOLVED: HighIOWaitStalling: High iowait detected on clouddumps1002:9100. - https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage#Dumps - https://grafana.wikimedia.org/d/000000568/wmcs-dumps-general-view - https://alerts.wikimedia.org/?q=alertname%3DHighIOWaitStalling [03:48:31] RESOLVED: ToolsToolsDBReplicationLagIsTooHigh: ToolsDB replication on tools-db-5 is lagging behind the primary, the current lag is 4268 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [07:06:15] 06cloud-services-team, 10Cloud-VPS: ldaptui fails to create new users - https://phabricator.wikimedia.org/T394341#10842641 (10SLyngshede-WMF) 05Open→03Resolved Fixed, I've added the missing schemas to the configuration. [07:16:01] 06cloud-services-team, 10Toolforge, 10OpenRefine, 10Wikidata: wdreconcile.toolforge.org acting as though HTTP 502 Gateway errors are cached - https://phabricator.wikimedia.org/T257405#10842651 (10Pintoch) 05Open→03Resolved a:03Pintoch This is already quite some years old, so I don't think it make... [07:17:28] 06cloud-services-team, 10Cloud-VPS: cloudservices: codfw1dev: fix backups - https://phabricator.wikimedia.org/T339894#10842655 (10jcrespo) I think this has started failing again, I will create a new ticket. [07:34:46] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883 (10jcrespo) 03NEW [07:35:33] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842694 (10jcrespo) FYI. Related: T339894 [07:44:08] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884 (10Marostegui) 03NEW [07:44:18] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10842710 (10Marostegui) p:05Triage→03Medium [07:48:50] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842719 (10jcrespo) I have cancelled the never-finishing jobs from the host and now other backups flow correctly. I have disabled pupp... [07:49:37] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842720 (10jcrespo) [07:53:16] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842725 (10taavi) a:03taavi Thanks for the poke. This host seems to have accidentally gotten rebooted back to the broken kernel from... [07:54:10] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842730 (10jcrespo) Should I retry a backup now, to check this ticket is resolved? [07:54:56] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842731 (10taavi) Yes please. [07:55:04] 06cloud-services-team, 10Data-Services, 10Quarry: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10842734 (10taavi) This is due to a hardware issue with one of the hosts involved in the replication chain to the wiki replicas: {... [07:55:33] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842736 (10jcrespo) Doing. Thanks for the prompt response! I will report if backups now finish correctly. [08:00:20] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842749 (10jcrespo) 05Open→03Resolved The backups now worked nicely, thank you. This is resolved to me. ` Terminated Jobs: Jo... [08:02:59] (03CR) 10Majavah: [C:03+2] Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (https://phabricator.wikimedia.org/T359217) (owner: 10Majavah) [08:02:59] 06cloud-services-team, 10Cloud-VPS, 10bacula, 10Data-Persistence-Backup: cloudservices2005-dev backups are clogging all backups - https://phabricator.wikimedia.org/T394883#10842770 (10jcrespo) ` [10:01:54] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 145 jobs https://wikitech.wik... [08:03:09] (03CR) 10Majavah: [C:03+2] build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 (owner: 10Majavah) [08:05:36] (03Merged) 10jenkins-bot: Upgrade to Django 4.2 LTS [labs/striker] - 10https://gerrit.wikimedia.org/r/1145821 (https://phabricator.wikimedia.org/T359217) (owner: 10Majavah) [08:06:04] (03Merged) 10jenkins-bot: build: Remove unused direct dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145822 (owner: 10Majavah) [08:06:11] (03PS1) 10Majavah: tools: Fix sudo policy base DN [labs/striker] - 10https://gerrit.wikimedia.org/r/1148798 (https://phabricator.wikimedia.org/T394823) [08:07:03] 06cloud-services-team, 10Striker, 13Patch-For-Review: toolsbeta sudo rules for new tools being created on wrong project - https://phabricator.wikimedia.org/T394823#10842813 (10taavi) p:05Triage→03High a:03taavi [08:08:59] (03CR) 10CI reject: [V:04-1] tools: Fix sudo policy base DN [labs/striker] - 10https://gerrit.wikimedia.org/r/1148798 (https://phabricator.wikimedia.org/T394823) (owner: 10Majavah) [08:12:01] (03PS2) 10Majavah: tools: Fix sudo policy base DN [labs/striker] - 10https://gerrit.wikimedia.org/r/1148798 (https://phabricator.wikimedia.org/T394823) [08:12:01] (03PS1) 10Majavah: static: Remove broken source map references [labs/striker] - 10https://gerrit.wikimedia.org/r/1148799 [08:15:21] (03CR) 10Majavah: [C:03+2] static: Remove broken source map references [labs/striker] - 10https://gerrit.wikimedia.org/r/1148799 (owner: 10Majavah) [08:16:53] (03Merged) 10jenkins-bot: static: Remove broken source map references [labs/striker] - 10https://gerrit.wikimedia.org/r/1148799 (owner: 10Majavah) [08:19:51] (03CR) 10Majavah: [C:03+2] tools: Fix sudo policy base DN [labs/striker] - 10https://gerrit.wikimedia.org/r/1148798 (https://phabricator.wikimedia.org/T394823) (owner: 10Majavah) [08:21:02] 10Wikibugs: Slack interaction for Wikibugs - https://phabricator.wikimedia.org/T394533#10842874 (10Aklapper) [08:21:16] (03Merged) 10jenkins-bot: tools: Fix sudo policy base DN [labs/striker] - 10https://gerrit.wikimedia.org/r/1148798 (https://phabricator.wikimedia.org/T394823) (owner: 10Majavah) [08:47:35] 10Striker, 10Continuous-Integration-Infrastructure, 10Toolhub, 10ci-test-error (WMF-deployed Build Failure): Multiple *-pipeline-test jobs failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755#10843002 (10Slack-bot) [08:47:45] 10Striker, 10Continuous-Integration-Infrastructure, 10Toolhub, 10ci-test-error (WMF-deployed Build Failure): Multiple *-pipeline-test jobs failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755#10843004 (10Slack-bot) [08:48:03] (03approved) 10dcaro: dns: use the floating ip for docker-registry.svc.t.o [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/39 [08:48:17] (03merge) 10dcaro: dns: use the floating ip for docker-registry.svc.t.o [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/39 [08:50:43] 10Striker, 10Continuous-Integration-Infrastructure, 10Toolhub, 10ci-test-error (WMF-deployed Build Failure), 07Jenkins: Multiple *-pipeline-test jobs failing to load pipelinelib with git error - https://phabricator.wikimedia.org/T386755#10843016 (10hashar) This occurred again (T394817) and @dancy fou... [09:02:20] (03PS4) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) [09:03:02] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [09:03:03] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 [09:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:15] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 [09:03:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:28] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 [09:03:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:42] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 [09:03:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:55] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 [09:03:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:04:08] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 [09:04:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:04:19] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 [09:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:04:28] (03CR) 10David Caro: [V:03+1] "This is ready now (ran it locally to update the images), I'll send a new patch to update the registry name in all the cookbooks." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:04:32] !log dcaro@acme tools Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 [09:04:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:04:44] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [09:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:05:43] (03CR) 10CI reject: [V:04-1] kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:06:51] (03open) 10aborrero: k9s: bump version to 0.50.6 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/242 [09:13:25] (03PS5) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) [09:15:01] (03approved) 10dcaro: k9s: bump version to 0.50.6 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/242 (owner: 10aborrero) [09:15:51] (03merge) 10aborrero: k9s: bump version to 0.50.6 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/242 [09:16:31] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:17:47] (03CR) 10David Caro: [C:03+2] kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:19:43] (03PS6) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) [09:19:43] (03PS1) 10David Caro: docker-registry: use the new .svc.toolforge.org hostname [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148809 [09:23:31] (03CR) 10David Caro: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:23:36] (03CR) 10David Caro: [C:03+2] kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:25:22] !log dcaro@acme tools START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry [09:25:22] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 [09:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:25:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:25:38] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno-cli:v1.13.6 [09:25:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:25:50] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyvernopre:v1.13.6 [09:25:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:26:02] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-background-controller:v1.13.6 [09:26:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:26:13] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-cleanup-controller:v1.13.6 [09:26:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:26:25] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-reports-controller:v1.13.6 [09:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:26:37] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/bitnami-kubectl:1.30.2 [09:26:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:26:47] (03CR) 10Arturo Borrero Gonzalez: [C:03+1] "LGTM." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148809 (owner: 10David Caro) [09:26:49] !log dcaro@acme tools Updating container image docker-registry.svc.toolforge.org/busybox:1.35 [09:26:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:27:02] !log dcaro@acme tools END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) [09:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:27:13] (03Merged) 10jenkins-bot: kyverno.copy_images_to_registry: update the versions [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148352 (https://phabricator.wikimedia.org/T394787) (owner: 10David Caro) [09:30:42] (03update) 10taavi: Create tools-prometheus-9 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/38 (https://phabricator.wikimedia.org/T393697) [09:32:52] (03approved) 10aborrero: Create tools-prometheus-9 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/38 (https://phabricator.wikimedia.org/T393697) (owner: 10taavi) [09:33:51] (03merge) 10taavi: Create tools-prometheus-9 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/38 (https://phabricator.wikimedia.org/T393697) [09:37:14] (03CR) 10David Caro: [C:03+2] docker-registry: use the new .svc.toolforge.org hostname [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148809 (owner: 10David Caro) [09:39:28] FIRING: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:41:28] FIRING: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [09:42:09] (03Merged) 10jenkins-bot: docker-registry: use the new .svc.toolforge.org hostname [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1148809 (owner: 10David Caro) [09:44:28] RESOLVED: InstanceDown: Project tools instance tools-prometheus-6 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [09:45:00] 06cloud-services-team, 10Data-Services: Create a view for existencelinks table - https://phabricator.wikimedia.org/T394898 (10Bugreporter) 03NEW [09:45:59] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-9.tools.eqiad1.wikimedia.cloud [09:47:16] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-9.tools.eqiad1.wikimedia.cloud [09:55:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 07IPv6, 13Patch-For-Review: Rebuild Toolforge Prometheus nodes in v6-dualstack network - https://phabricator.wikimedia.org/T393697#10843311 (10taavi) 05In progress→03Resolved [09:56:10] 06cloud-services-team, 10Toolforge: [toolforge-prometheus] upgrade to bookworm - https://phabricator.wikimedia.org/T375523#10843315 (10taavi) 05Open→03Resolved [10:05:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:12:34] (03open) 10dcaro: docker: use the .svc.toolforge.org registry name [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/62 [10:21:17] (03open) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 [10:27:25] 10Toolforge (Toolforge iteration 20): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902 (10dcaro) 03NEW [10:27:33] 10Toolforge (Toolforge iteration 20): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10843435 (10dcaro) p:05Triage→03Low [10:27:45] (03update) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) [10:29:39] (03update) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) [10:32:47] (03open) 10dcaro: registry: move to the new registry url [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/13 [10:38:10] 10Toolforge (Toolforge iteration 20): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10843455 (10dcaro) [10:41:01] (03update) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) [10:41:11] (03update) 10dcaro: registry: move to the new registry url [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/13 [10:45:59] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#10843471 (10dcaro) I uploaded the newer kyverno 1.13 version images [10:49:19] FIRING: [3x] ProbeDown: Service toolsbeta-legacy-redirector-2:80 has failed probes (http_tools_wmflabs_org_main_page_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-legacy-redirector-2:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:50:43] (03approved) 10aborrero: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) (owner: 10dcaro) [10:51:21] (03approved) 10aborrero: registry: move to the new registry url [repos/cloud/toolforge/image-config] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/image-config/-/merge_requests/13 (owner: 10dcaro) [10:53:52] 10Toolforge (Toolforge iteration 20): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10843504 (10dcaro) [10:54:20] RESOLVED: [4x] ProbeDown: Service toolsbeta-legacy-redirector-2:80 has failed probes (http_tools_wmflabs_org_main_page_ip6) - https://wikitech.wikimedia.org/wiki/Runbook#toolsbeta-legacy-redirector-2:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [11:40:20] (03PS5) 10Majavah: Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 [11:46:50] (03PS6) 10Majavah: Upgrade non-Django dependencies [labs/striker] - 10https://gerrit.wikimedia.org/r/1145823 [12:10:38] 06cloud-services-team, 10Toolforge, 07IPv6: Enable IPv6 on toolforge.org - https://phabricator.wikimedia.org/T211575#10843830 (10taavi) a:03taavi [12:22:46] (03open) 10taavi: shared: Manage front proxy server/security groups [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:23:09] (03update) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:25:32] (03update) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:28:59] (03update) 10aborrero: maintain-kubeusers: add logic to drop access to deleted admins [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/71 (https://phabricator.wikimedia.org/T394786) [12:29:09] (03update) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:29:14] (03update) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:31:14] (03open) 10aborrero: toolforge: create an 'admin' tool account, with a fake human user [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/243 (https://phabricator.wikimedia.org/T394786) [12:36:22] (03update) 10dcaro: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) (owner: 10raymond-ndibe) [12:36:44] (03update) 10dcaro: WIP k8s: upgrade to 1.30 [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/241 (https://phabricator.wikimedia.org/T362869) [12:37:11] (03update) 10dcaro: docker: use the .svc.toolforge.org registry name [repos/cloud/cicd/gitlab-ci] - 10https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci/-/merge_requests/62 [12:37:45] 06cloud-services-team, 10Cloud-VPS: wmcs-enc-cli: keystoneauth1.exceptions.http.Forbidden: You are not authorized to perform the requested action: identity:list_services. - https://phabricator.wikimedia.org/T394775#10843912 (10taavi) 05Open→03Resolved a:03taavi [12:40:05] (03update) 10dcaro: docker-registry: use the .svc.toolforge.org one [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/793 (https://phabricator.wikimedia.org/T394902) [12:41:26] (03update) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:41:32] (03update) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [12:54:01] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [12:54:21] (03update) 10raymond-ndibe: [runtimes.k8s.runtime] fix bug in diff_with_running_job method [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/169 (https://phabricator.wikimedia.org/T394734) [12:57:02] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10844013 (10Ladsgroup) No objections from my side. [13:00:27] (03close) 10raymond-ndibe: [do not merge] test deprecation metrics [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/167 [13:05:54] 06cloud-services-team, 10Striker: toolsbeta sudo rules for new tools being created on wrong project - https://phabricator.wikimedia.org/T394823#10844069 (10taavi) 05Open→03Resolved This is fixed, and I moved all the existing rules to the correct OU. [13:07:45] (03approved) 10aborrero: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) (owner: 10taavi) [13:08:02] (03merge) 10taavi: shared: Start migrating front proxies here [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/40 (https://phabricator.wikimedia.org/T211575) [13:11:52] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase [13:12:00] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) [13:12:05] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.openstack.quota_increase [13:12:13] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) [13:15:47] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.refresh_puppet_certs on toolsbeta-proxy-7.toolsbeta.eqiad1.wikimedia.cloud [13:17:33] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on toolsbeta-proxy-7.toolsbeta.eqiad1.wikimedia.cloud [13:19:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-proxy-7 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:19:28] FIRING: TargetDown: Job frontproxy-nginx is unreachable in project toolsbeta instance toolsbeta-proxy-8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:20:06] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.vps.refresh_puppet_certs on toolsbeta-proxy-8.toolsbeta.eqiad1.wikimedia.cloud [13:20:43] 10Toolforge (Toolforge iteration 20): [k8s,infra] use the new docker-registry.svc.toolforge.org host everywhere - https://phabricator.wikimedia.org/T394902#10844157 (10dcaro) 05Open→03In progress [13:20:46] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787#10844159 (10dcaro) 05Open→03In progress [13:21:34] (03open) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [13:21:52] !log taavi@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on toolsbeta-proxy-8.toolsbeta.eqiad1.wikimedia.cloud [13:22:12] (03update) 10raymond-ndibe: [toolforge-deploy] run specific tests on deploy [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/755 (https://phabricator.wikimedia.org/T381011) [13:22:50] (03update) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [13:24:28] RESOLVED: TargetDown: Job frontproxy-nginx is unreachable in project toolsbeta instance toolsbeta-proxy-8 - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTargetDown [13:25:21] (03update) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [13:26:58] (03open) 10raymond-ndibe: [jobs-api penapi spec] change mapping from path to http [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/170 [13:27:03] (03update) 10raymond-ndibe: [jobs-api penapi spec] change mapping from path to http [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/170 [13:27:24] (03update) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [13:27:29] (03approved) 10dcaro: [jobs-api penapi spec] change mapping from path to http [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/170 (owner: 10raymond-ndibe) [13:38:32] FIRING: [2x] PuppetConstantChange: Puppet performing a change on every puppet run on clouddumps1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [13:44:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance toolsbeta-proxy-7 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [13:49:22] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10844267 (10fnegri) I would suggest splitting the `an-redacteddb1001` upgrade to a separate task. It can be a... [13:50:57] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10844269 (10fnegri) Sounds good to me. [13:54:15] (03update) 10chuckonwumelu: Temporary: For demo purposes only [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/28 [13:57:53] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Cloud VPS mail servers should drop mail sent from non-supported domains - https://phabricator.wikimedia.org/T366935#10844296 (10taavi) 05Open→03Resolved Done, and documented at https://wikitech.wikimedia.org/wiki/Help:Email_in_Cloud_VPS. [13:58:36] 10cloud-services-team (FY2024/2025-Q3-Q4), 06Data-Platform-SRE: PuppetConstantChange on clouddumps100[12] - https://phabricator.wikimedia.org/T394921 (10fnegri) 03NEW [14:00:26] (03update) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [14:05:13] (03update) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [14:06:28] RESOLVED: PuppetStaleCertificates: Found non-revoked Puppet certificates for 1 deleted instances on tools-puppetserver-01 - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetStaleCertificates - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetStaleCertificates [14:07:04] (03update) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [14:08:38] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:18:32] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [14:30:39] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10844464 (10JAllemandou) >>! In T394372#10840706, @Ahoelzl wrote: > @JAllemandou any implications for the Dat... [14:31:41] 10cloud-services-team (FY2024/2025-Q3-Q4), 06Data-Platform-SRE, 13Patch-For-Review: PuppetConstantChange on clouddumps100[12] - https://phabricator.wikimedia.org/T394921#10844477 (10XXXX100000) [14:32:55] 10cloud-services-team (FY2024/2025-Q3-Q4), 06Data-Platform-SRE, 13Patch-For-Review: PuppetConstantChange on clouddumps100[12] - https://phabricator.wikimedia.org/T394921#10844493 (10taavi) [14:37:05] (03update) 10raymond-ndibe: [jobs-api] use pydantic for all models [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/139 (https://phabricator.wikimedia.org/T389118) [14:37:51] (03update) 10raymond-ndibe: [jobs-api] refactor quota models [repos/cloud/toolforge/jobs-api] (use_pydantic_for_core_job_model) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/164 (https://phabricator.wikimedia.org/T389118) [14:41:21] (03approved) 10raymond-ndibe: [jobs-api penapi spec] change mapping from path to http [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/170 [14:41:27] (03update) 10raymond-ndibe: [jobs-api penapi spec] change mapping from path to http [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/170 [14:41:31] (03merge) 10raymond-ndibe: [jobs-api penapi spec] change mapping from path to http [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/170 [14:44:09] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.377-20250521144143-46deeb2b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/792 (https://phabricator.wikimedia.org/T390137) [14:44:14] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: jobs-api: bump to 0.0.377-20250521144143-46deeb2b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/792 (https://phabricator.wikimedia.org/T390137) [14:45:16] 10Tool-campwiz-nxt, 06translatewiki.net, 10LPL Essential (LPL Essential 2025 Apr-Jun: CX), 13Patch-For-Review, 07Unplanned-Sprint-Work: Add CampWiz NXT to translatewiki.net - https://phabricator.wikimedia.org/T393850#10844579 (10Wangombe) [14:50:00] 10Toolforge (Toolforge iteration 20): [functional-tests,builds-builder] create a test suite to run builds for all the sample tools we have - https://phabricator.wikimedia.org/T394927 (10dcaro) 03NEW [14:51:39] 10Toolforge (Toolforge iteration 20): [functional-tests,builds-builder] create a test suite to run builds for all the sample tools we have - https://phabricator.wikimedia.org/T394927#10844637 (10dcaro) p:05Triage→03Medium [14:57:04] (03update) 10aborrero: toolforge: create an 'admin' tool account, with a fake human user [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/243 (https://phabricator.wikimedia.org/T394786) [15:02:18] (03update) 10aborrero: toolforge: create an 'admin' tool account, with a fake human user [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/243 (https://phabricator.wikimedia.org/T394786) [15:03:18] RESOLVED: PuppetConstantChange: Puppet performing a change on every puppet run on clouddumps1001:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [15:10:40] (03approved) 10aborrero: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) (owner: 10taavi) [15:11:11] (03merge) 10taavi: Enable IPv6 for toolsbeta tools [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/41 (https://phabricator.wikimedia.org/T211575) [15:14:56] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 2 others: an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930 (10fnegri) 03NEW [15:15:31] (03open) 10taavi: Create tools-proxy-9/10 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/42 (https://phabricator.wikimedia.org/T211575) [15:17:06] (03approved) 10aborrero: Create tools-proxy-9/10 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/42 (https://phabricator.wikimedia.org/T211575) (owner: 10taavi) [15:17:21] (03merge) 10taavi: Create tools-proxy-9/10 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/42 (https://phabricator.wikimedia.org/T211575) [15:18:42] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Engineering, 06Data-Persistence, and 3 others: Migrate clouddb* hosts to MariaDB 10.11 - https://phabricator.wikimedia.org/T394372#10844756 (10fnegri) > As you prefer. If better for you, you can start by the an-redacteddb1001 :) No prefere... [15:20:26] (03approved) 10hoo: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) (owner: 10arthurtaylor) [15:20:39] (03update) 10hoo: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) (owner: 10arthurtaylor) [15:21:08] (03merge) 10hoo: Add support for build completion notification [toolforge-repos/phpunit-results-cache] - 10https://gitlab.wikimedia.org/toolforge-repos/phpunit-results-cache/-/merge_requests/10 (https://phabricator.wikimedia.org/T392892) (owner: 10arthurtaylor) [15:22:43] (03update) 10chuckonwumelu: [cli] Adding warning message for beta [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/31 (https://phabricator.wikimedia.org/T394277) [15:24:04] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-9.tools.eqiad1.wikimedia.cloud [15:24:38] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-10.tools.eqiad1.wikimedia.cloud [15:26:43] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-9.tools.eqiad1.wikimedia.cloud [15:27:16] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-10.tools.eqiad1.wikimedia.cloud [15:31:15] (03update) 10aborrero: toolforge: create an 'admin' tool account, with a fake human user [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/243 (https://phabricator.wikimedia.org/T394786) [15:42:18] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-cli,components-api] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10844864 (10dcaro) [15:42:39] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-cli,components-api] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10844866 (10dcaro) [15:42:44] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-cli,components-api] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10844867 (10dcaro) [15:47:30] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-cli,components-api] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10844898 (10dcaro) [15:49:38] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-cli,components-api] Add a warning message saying it's 'beta' - https://phabricator.wikimedia.org/T394277#10844902 (10dcaro) [15:53:35] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10844920 (10FCeratto-WMF) I can use the 2 hosts to test https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1131954 [15:57:35] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10844942 (10Marostegui) >>! In T394884#10844920, @FCeratto-WMF wrote: > I can use the 2 hosts to test https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1131954 Yes, but for now... [16:08:14] (03open) 10taavi: Migrate traffic to tools-proxy-9 [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/43 (https://phabricator.wikimedia.org/T211575) [16:11:15] (03open) 10taavi: Add AAAA record for *.toolforge.org [repos/cloud/toolforge/tofu-provisioning] (taavi/proxy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/44 (https://phabricator.wikimedia.org/T211575) [16:12:26] (03update) 10taavi: Add AAAA record for *.toolforge.org [repos/cloud/toolforge/tofu-provisioning] (taavi/proxy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/44 (https://phabricator.wikimedia.org/T211575) [16:14:35] (03update) 10taavi: Add AAAA record for *.toolforge.org [repos/cloud/toolforge/tofu-provisioning] (taavi/proxy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/44 (https://phabricator.wikimedia.org/T211575) [16:24:04] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10845098 (10FCeratto-WMF) I ran the cookbook in dry-run mode but that has limited utility. I suggest we move forward with this task removing them from prod use and then run restarts i... [16:25:31] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10845103 (10Marostegui) You can restart one of them in codfw if you like now. [16:53:18] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10845169 (10FCeratto-WMF) I could add a flag to filter with hosts to restart (it's meant to restart all hosts by default), I'll get back to this tomorrow. [17:07:26] 06cloud-services-team, 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30) - https://phabricator.wikimedia.org/T362869#10845214 (10dcaro) a:05aborrero→03dcaro [17:11:44] (03open) 10raymond-ndibe: [components-api] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [17:12:48] (03approved) 10dcaro: Add AAAA record for *.toolforge.org [repos/cloud/toolforge/tofu-provisioning] (taavi/proxy) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/44 (https://phabricator.wikimedia.org/T211575) (owner: 10taavi) [17:28:46] (03update) 10dcaro: runtime.k8s.image: periodically refresh image-config data [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/160 (https://phabricator.wikimedia.org/T357112) (owner: 10raymond-ndibe) [17:48:32] 06cloud-services-team, 10Data-Services, 06DBA: Remove sanitarium hosts from codfw - https://phabricator.wikimedia.org/T394884#10845335 (10Marostegui) >>! In T394884#10845169, @FCeratto-WMF wrote: > I could add a flag to filter with hosts to restart (it's meant to restart all hosts by default), I'll get back... [18:35:41] FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:45:41] RESOLVED: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:47:03] 06cloud-services-team, 10Data-Services, 10Quarry: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10845620 (10Marostegui) The server has been fixed and it is now slowly catching up. @Voyagerim I would like to understand where th... [20:44:24] (03merge) 10bd808: Add maxage Cache-Control support [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/9 (https://phabricator.wikimedia.org/T393928) [21:25:36] 06cloud-services-team, 10Data-Services: Create a view for existencelinks table - https://phabricator.wikimedia.org/T394898#10845909 (10Umherirrender) The `linktarget` view should get also an exists check on the new table. Append to the existing where: `or exists ( select 1 from existencelinks where exl_target_... [21:30:46] 10Tool-gitlab-content: Add maxage/smaxage cache header controls to gilab-content proxy - https://phabricator.wikimedia.org/T393928#10845920 (10bd808) [21:35:41] FIRING: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [21:38:30] (03open) 10bd808: Update README and landing page instructions [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/10 [21:39:34] (03merge) 10bd808: Update README and landing page instructions [toolforge-repos/gitlab-content] - 10https://gitlab.wikimedia.org/toolforge-repos/gitlab-content/-/merge_requests/10 [21:48:22] 06cloud-services-team, 10Data-Services, 10Quarry: Quarry WMCloud (ruwiki_p, section s6) experiencing sustained replication lag (~16 h) - https://phabricator.wikimedia.org/T394859#10845998 (10Voyagerim) @Marostegui , expectations regarding replication latency thresholds - namely that web replicas should maint... [21:53:11] RESOLVED: CloudVPSDesignateLeaks: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [22:28:24] (03update) 10raymond-ndibe: [components-api] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [22:49:39] (03update) 10raymond-ndibe: [components-api] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [23:58:42] (03update) 10raymond-ndibe: [components-api] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044) [23:58:56] (03update) 10raymond-ndibe: [components-api] skip build if refs are same [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/77 (https://phabricator.wikimedia.org/T389044)