[00:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [01:19:35] 10Tools: 'hoiscript' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349913#9585928 (10bd808) >>! In T349913#9287769, @Hoi wrote: > I have deleted the stale logs. The 20k PDF are public domain books to be uploaded to Commons. I think it would take a few more days to complete.... [01:23:18] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9585930 (10MusikAnimal) The full error: ` {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"unknown host [cloudelastic1001.wikimedia.org]"}],"type":"illegal_argument_exception","reason":... [01:42:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [01:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [02:01:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:57:51] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9586009 (10MusikAnimal) Also poking @VRiley-WMF in case she knows what the problem might be. [03:02:53] 05Grid-Engine-to-K8s-Migration, 06Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888#9586012 (10komla) >>! In T306888#9585346, @MusikAnimal wrote: > @komla Going by the [[ https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#Timeline | time... [03:06:01] (OpenstackAPIResponse) firing: (2) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [03:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [03:15:12] 10Cloud-VPS, 06serviceops: OOM livelock stalls - https://phabricator.wikimedia.org/T358634#9586029 (10tstarling) >>! In T358634#9582468, @Joe wrote: > I don't think it holds any ground for systems involved in live responses or which have strict latency requirements in general. > > For instance, enabling swap... [03:51:59] 10Cloud-VPS, 06serviceops: OOM livelock stalls - https://phabricator.wikimedia.org/T358634#9586043 (10tstarling) >>! In T358634#9582894, @dcaro wrote: > To clarify, this task is to request enabling it on CloudVPS instances by default, or to enable it in wiki production machines? (or both?) I wanted to share m... [04:06:01] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [04:06:38] 10Wikibugs: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9586050 (10bd808) [05:01:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [05:42:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [05:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [06:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:01:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [08:06:01] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [09:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:27:49] (03PS1) 10Jaime Nuche: jenkins: add security patch bot token to releases instance secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1007319 (https://phabricator.wikimedia.org/T350065) [09:32:31] 10Cloud-VPS, 15User-aborrero: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373#9586395 (10taavi) [09:34:05] 10Cloud-VPS, 15User-aborrero: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373#9586398 (10taavi) a:03taavi [09:42:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:16:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:21:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:52:41] (CloudVPSDesignateLeaks) firing: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [10:53:19] (03CR) 10EoghanGaffney: [C: 03+2] jenkins: add security patch bot token to releases instance secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1007319 (https://phabricator.wikimedia.org/T350065) (owner: 10Jaime Nuche) [10:53:23] (03CR) 10EoghanGaffney: [V: 03+2 C: 03+2] jenkins: add security patch bot token to releases instance secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1007319 (https://phabricator.wikimedia.org/T350065) (owner: 10Jaime Nuche) [10:57:42] (CloudVPSDesignateLeaks) resolved: (5) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [11:01:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:16:35] 10Toolforge: Expose Toolforge service names via environment variables - https://phabricator.wikimedia.org/T151002#9586640 (10dcaro) > This use-case is sounds to me like a one-size-fits-all mandate that all tools using redis in Toolforge must be adapted to get connection information from a fixed format envvar or... [11:41:28] (PuppetAgentNoResources) firing: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:43:55] (03PS5) 10Arturo Borrero Gonzalez: kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) [11:46:28] (PuppetAgentNoResources) resolved: (2) No Puppet resources found on instance clouddb-services-puppetmaster-2 on project clouddb-services - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:46:56] 10Toolforge Build Service, 06cloud-services-team, 13Patch-For-Review: webservice: Add option to run without NFS mounts - https://phabricator.wikimedia.org/T346605#9586707 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/19 cli: Warn when --m... [11:49:13] (03PS6) 10Arturo Borrero Gonzalez: kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) [12:06:01] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [12:09:13] (03PS7) 10Arturo Borrero Gonzalez: kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) [12:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:13:55] (03PS1) 10Btullis: Fix typo in IRC channel name [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1006881 (https://phabricator.wikimedia.org/T352783) [12:14:03] (03CR) 10CI reject: [V: 04-1] Fix typo in IRC channel name [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1006881 (https://phabricator.wikimedia.org/T352783) (owner: 10Btullis) [12:15:09] (03CR) 10Btullis: [V: 03+2 C: 03+2] Fix typo in IRC channel name [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1006881 (https://phabricator.wikimedia.org/T352783) (owner: 10Btullis) [12:15:18] (03CR) 10CI reject: [V: 04-1] Fix typo in IRC channel name [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1006881 (https://phabricator.wikimedia.org/T352783) (owner: 10Btullis) [12:16:01] (03CR) 10Majavah: [C: 04-2] "This repository now exists at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1006881 (https://phabricator.wikimedia.org/T352783) (owner: 10Btullis) [12:24:04] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1007600 (owner: 10L10n-bot) [12:24:06] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1007599 (owner: 10L10n-bot) [12:43:56] 10Wikibugs, 06Data-Platform-SRE, 15User-bd808: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594#9586822 (10taavi) test [12:44:07] 10Wikibugs, 15User-bd808: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594#9586824 (10taavi) [12:53:07] (03PS1) 10Arturo Borrero Gonzalez: toolforge: add restart-static-pods cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) [12:55:27] (03PS2) 10Arturo Borrero Gonzalez: toolforge: add restart-static-pods cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) [12:56:12] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9586845 (10dcaro) Hey, the build went well, the warning permission denied messages are expected too, let me check the image [13:09:15] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9586863 (10dcaro) Running locally seems to work, let me try on toolforge. Locally (using podman): ` dcaro@urcuchillay$ mkdir -p kk dcaro@urcuchillay$ podman run -ti --entryp... [13:11:30] (03CR) 10Majavah: [C: 04-1] "See a few minor things inline, otherwise looks good." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) (owner: 10Arturo Borrero Gonzalez) [13:12:34] (03CR) 10Majavah: toolforge: add restart-static-pods cookbook (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) (owner: 10Arturo Borrero Gonzalez) [13:16:19] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9586882 (10dcaro) On toolforge works for me too: ` tools.dcaro-test10@tools-sgebastion-10:~$ toolforge jobs run --image tool-mbh/tool-mbh:latest --command "recompile" --mount=... [13:20:38] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9586885 (10dcaro) Just ran it from within your tool (I hope I did not break anything, cgi-bin was empty): ` tools.mbh@tools-sgebastion-10:~$ toolforge jobs run --image tool-mb... [13:34:36] 10Cloud-VPS, 10FY2023/2024-Q3-Q4, 15User-aborrero: Migrate Cloud VPS to Neutron Open vSwitch agent - https://phabricator.wikimedia.org/T326373#9586917 (10taavi) [13:35:25] 10Cloud-VPS, 10FY2023/2024-Q3-Q4, 15User-aborrero: Deploy OVS test setup in codfw1dev - https://phabricator.wikimedia.org/T358761 (10taavi) [13:45:38] (03PS1) 10Jforrester: Empty repo, point to GitLab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1007612 [13:45:45] (03CR) 10CI reject: [V: 04-1] Empty repo, point to GitLab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1007612 (owner: 10Jforrester) [13:45:50] (03Abandoned) 10Jforrester: Fix typo in IRC channel name [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1006881 (https://phabricator.wikimedia.org/T352783) (owner: 10Btullis) [13:46:30] (03CR) 10Jforrester: [V: 03+2 C: 03+2] Empty repo, point to GitLab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1007612 (owner: 10Jforrester) [13:46:38] (03CR) 10CI reject: [V: 04-1] Empty repo, point to GitLab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/1007612 (owner: 10Jforrester) [13:50:14] 14Toolforge iteration 04, 13Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320#9586970 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/27 d... [13:50:16] 10Toolforge Build Service, 06cloud-services-team, 13Patch-For-Review: webservice: Add option to run without NFS mounts - https://phabricator.wikimedia.org/T346605#9586971 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/27 d/changelog: bump... [13:51:03] 10Toolforge iteration 06, 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9586968 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/27 d/changelog: bump to 0.103.3 [13:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [14:10:16] (03PS8) 10Arturo Borrero Gonzalez: kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) [14:10:18] (03PS3) 10Arturo Borrero Gonzalez: toolforge: add restart-static-pods cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) [14:13:29] (03CR) 10CI reject: [V: 04-1] kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) (owner: 10Arturo Borrero Gonzalez) [14:13:36] (03CR) 10CI reject: [V: 04-1] toolforge: add restart-static-pods cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) (owner: 10Arturo Borrero Gonzalez) [14:34:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:34:57] 10Toolforge Build Service, 06cloud-services-team, 13Patch-For-Review: webservice: Add option to run without NFS mounts - https://phabricator.wikimedia.org/T346605#9587198 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/27 d/changelog: bump... [14:39:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [14:42:36] 10tool-wdlocator, 06translatewiki.net, 10Language-2024-January-March, 03Localization Infrastructure FY2023-24, 07Unplanned-Sprint-Work: Add wdlocator to translatewiki.net - https://phabricator.wikimedia.org/T357495#9587228 (10Nikerabbit) I guess there weren't anything to export yet: ` /home/betawiki/conf... [14:47:47] 10Toolforge Jobs framework: Support job health checks - https://phabricator.wikimedia.org/T335592#9587269 (10Raymond_Ndibe) a:03Raymond_Ndibe [14:51:06] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4, 13Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9587276 (10fnegri) After looking more carefully at the logs above, what happened is that `remove_dangling_cinder_snapshots.service` re... [15:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:13:23] 10Tool-global-search: 400 - Bad Request on any Global Search - https://phabricator.wikimedia.org/T358541#9587389 (10EBernhardson) @bking this is likely related to the transition of cloudelastic to private ips? I'll take a look later if you don't have ideas. [15:17:29] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9587402 (10MBH) I don't understand why it rewrites my `index.html`, but okay. I have taken it from tool, and also added -w permission for group too. [15:27:15] 10Wikibugs, 15User-bd808: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9587424 (10bd808) 05Open→03In progress a:03bd808 [15:27:45] 10Wikibugs, 15User-bd808: Remove legacy taxonomy.py script - https://phabricator.wikimedia.org/T357928#9587427 (10bd808) 05Open→03In progress a:03bd808 [15:27:47] 10Wikibugs, 15User-bd808: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9587430 (10bd808) [15:31:41] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9587453 (10MBH) @dcaro After that I run two your commands again. Result was the same: first command was executed in ~1 minute and outputed the same permission error (it repeat... [15:45:52] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up - https://phabricator.wikimedia.org/T358774 (10fnegri) [15:46:11] 10Cloud-VPS, 10FY2023/2024-Q3-Q4: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up - https://phabricator.wikimedia.org/T358774#9587519 (10fnegri) [15:46:32] 10Wikibugs: GitLab CI tests fail for MRs from forks because of missing secrets - https://phabricator.wikimedia.org/T358775 (10bd808) [15:47:00] 10Wikibugs: GitLab CI tests fail for MRs from forks because of missing secrets - https://phabricator.wikimedia.org/T358775#9587534 (10bd808) p:05Triage→03High [15:49:10] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4, 13Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9587536 (10fnegri) I split the cleanup issue into a separate task: {T358774} The issue described in this task should be fixed by the... [16:02:54] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4: [wmcs-backup] Race condition between backup and cleanup timers - https://phabricator.wikimedia.org/T358780 (10fnegri) [16:06:01] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:17:56] 06cloud-services-team, 06Wikimedia-Medicine, 10Project-requests: Request creation of mdwiki-offline VPS project - https://phabricator.wikimedia.org/T358023#9587673 (10bd808) Discussed in 2024-02-29 WMCS meeting. Consensus is +1 for a VPS for this project. I think we are close to a future with better persist... [16:29:32] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4, 13Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9587749 (10fnegri) > we should probably combine the two scripts into one to avoid any risk of remove_dangling_cinder_snapshots starti... [16:29:50] 10Cloud-VPS, 10FY2023/2024-Q3-Q4: [wmcs-backup] Race condition between backup and cleanup timers - https://phabricator.wikimedia.org/T358780#9587761 (10fnegri) p:05Triage→03Low a:03fnegri [16:30:31] 10Cloud-VPS, 10Data-Services, 10FY2023/2024-Q3-Q4, 13Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9587774 (10fnegri) [16:31:08] 10Cloud-VPS, 10FY2023/2024-Q3-Q4: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up - https://phabricator.wikimedia.org/T358774#9587771 (10fnegri) 05Open→03In progress p:05Triage→03Medium a:03fnegri [16:36:43] (03Abandoned) 10Dzahn: delete grafana password classes [labs/private] - 10https://gerrit.wikimedia.org/r/1007011 (owner: 10Dzahn) [16:40:59] 05Grid-Engine-to-K8s-Migration, 06Growth-Team: Migrate ERANBOT project off of Grid Engine - https://phabricator.wikimedia.org/T306888#9587827 (10taavi) We briefly discussed this in our team meeting today. From our perspective the original deadline was February 14th and the current one month period was supposed... [16:50:29] 10Cloud-VPS, 10FY2023/2024-Q3-Q4: [wmcs-backup] Backup snapshots of deleted volumes are never cleaned up - https://phabricator.wikimedia.org/T358774#9587887 (10fnegri) The problem sits in the [ImageBackup.remove](https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/p... [16:52:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [16:57:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [17:10:33] 10VPS-project-Codesearch: Can't search for multi-line regex any more - https://phabricator.wikimedia.org/T358786 (10thiemowmde) [17:15:57] (03PS9) 10Arturo Borrero Gonzalez: kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) [17:15:59] (03PS4) 10Arturo Borrero Gonzalez: toolforge: add restart-static-pods cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) [17:19:34] (03CR) 10CI reject: [V: 04-1] kubernetes: refactor static pod restart logic [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1006529 (https://phabricator.wikimedia.org/T358476) (owner: 10Arturo Borrero Gonzalez) [17:19:52] (03CR) 10CI reject: [V: 04-1] toolforge: add restart-static-pods cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1007604 (https://phabricator.wikimedia.org/T358476) (owner: 10Arturo Borrero Gonzalez) [17:52:02] 10Wikibugs: wikibugs only shows milestone name without parent project name - https://phabricator.wikimedia.org/T358653#9588237 (10BTullis) We're experiencing something similar to this as well, but I don't know if I should make a new ticket for it. We have a new IRC channel `#wikimedia-data-platform` We have a P... [17:52:50] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9588239 (10dcaro) >>! In T319883#9587402, @MBH wrote: > I don't understand why it rewrites my `index.html`, but okay. I have taken it from tool, and also added -w permission f... [17:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [17:55:44] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9588278 (10dcaro) >>! In T319883#9588239, @dcaro wrote: >>>! In T319883#9587402, @MBH wrote: >> I don't understand why it rewrites my `index.html`, but okay. I have taken it f... [18:05:19] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9588303 (10dcaro) Hmm, the job logs command did not work as expected, and the logs were cleaned up very quick :/ I was able to get: ` cp: cannot create regular file '/data/p... [18:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [18:17:59] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9588339 (10dcaro) This time it finished \o/ ` tools.mbh@tools-sgebastion-11:~$ ls -la /data/project/mbh/public_html/cgi-bin/ total 82988... [18:28:31] 10Wikibugs: wikibugs only shows milestone name without parent project name - https://phabricator.wikimedia.org/T358653#9588363 (10bd808) p:05Triage→03High >>! In T358653#9588237, @BTullis wrote: > We're experiencing something similar to this as well, but I don't know if I should make a new ticket for it. It... [19:08:37] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9588554 (10Dvorapa) @dcaro Seems it is not deployed yet? `toolforge webservice: error: unrecognized arguments: --health-check-url /healthz` [19:17:27] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9588600 (10taavi) The flag is `--health-check-path`. [19:28:01] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9588628 (10Dvorapa) Oh, I see. With the flag, https://kmlexport.toolforge.org/ gives 503. Without the flag, it gives 200, so the flag breaks root :/ [19:40:45] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9588680 (10Dvorapa) ` $ webservice stop Stopping webservice tools.kmlexport@tools-sgebastion-10:~$ webservice --backend kubernetes -m 1536Mi --health-che... [19:52:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [19:53:09] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9588694 (10taavi) ` Warning Unhealthy 1s (x18 over 18s) kubelet Startup probe failed: HTTP probe failed with statuscode: 404 ` ` $ curl -... [19:57:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [20:06:01] (OpenstackAPIResponse) firing: (3) Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [20:11:08] 05Grid-Engine-to-K8s-Migration: Migrate mbh from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319883#9588741 (10dcaro) The logs vanishing so quickly from k8s is quite inconvenient :/ I'll try to prioritize having a solution [20:29:06] 05Grid-Engine-to-K8s-Migration, 15User-dcaro: Migrate kmlexport from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T356905#9588781 (10Dvorapa) No, I just deactivated the probe as it broke the root (I had to revert back to `webservice --backend kubernetes -m 1536Mi perl5.36 st... [21:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:53:49] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on cloudweb2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [22:06:17] (03PS2) 10Dzahn: delete passwords::mysql::wikimania_scholarships and passwords::tor [labs/private] - 10https://gerrit.wikimedia.org/r/1007010 [22:08:42] (03PS3) 10Dzahn: delete passwords for wikimania_scholarships, tor, private_static_site [labs/private] - 10https://gerrit.wikimedia.org/r/1007010 [22:08:48] (03CR) 10Dzahn: [V: 03+1] "All of these are historic and don't exist anymore in the private repo." [labs/private] - 10https://gerrit.wikimedia.org/r/1007010 (owner: 10Dzahn) [22:09:34] (03CR) 10Dzahn: [V: 03+2 C: 03+2] delete passwords for wikimania_scholarships, tor, private_static_site [labs/private] - 10https://gerrit.wikimedia.org/r/1007010 (owner: 10Dzahn) [22:22:24] (03CR) 10Dzahn: "the linked ticket is closed since 2021 but this was never merged" [labs/private] - 10https://gerrit.wikimedia.org/r/739586 (https://phabricator.wikimedia.org/T282787) (owner: 10BBlack) [22:24:03] (03CR) 10Dzahn: "the linked ticket was closed as declined in 2022 but this is still open" [labs/private] - 10https://gerrit.wikimedia.org/r/672451 (https://phabricator.wikimedia.org/T277483) (owner: 10Dave Pifke) [22:28:41] 10VPS-Projects, 06cloud-services-team, 10Puppet 7.0: Migrate per-project Puppet servers to Puppet 7 - https://phabricator.wikimedia.org/T351452#9589183 (10Andrew) [22:29:54] 10VPS-Projects, 06cloud-services-team, 10Puppet 7.0: Migrate Puppet servers in Cloud Services team managed projects to Puppet 7 - https://phabricator.wikimedia.org/T351453#9589184 (10Andrew) [22:31:15] (03CR) 10Dzahn: "open since 2017 but meanwhile added in 2020/2021 - https://gerrit.wikimedia.org/r/c/labs/private/+/572918/1/hieradata/labs.yaml" [labs/private] - 10https://gerrit.wikimedia.org/r/340148 (owner: 10Tim Landscheidt) [22:37:41] 10Cloud-VPS, 10FY2023/2024-Q3-Q4, 05Goal, 10Puppet 7.0: Migrate Cloud VPS puppet infrastructure to Puppet 7 - https://phabricator.wikimedia.org/T351450#9589192 (10Andrew) [22:37:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:40:27] 10Cloud-VPS, 06cloud-services-team, 10Puppet 7.0: Migrate Cloud VPS central puppet server to Puppet 7 - https://phabricator.wikimedia.org/T351451#9589193 (10Andrew) [22:42:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:59:58] 10Toolforge iteration 06, 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9589241 (10tstarling) If HTTP probes are configurable in service.template, can that please be documented [[https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Webservice_templat... [23:09:48] 10Toolforge iteration 06, 13Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9589279 (10LucasWerkmeister) >>! In T341919#9589241, @tstarling wrote: > If HTTP probes are configurable in service.template, can that please be documented [[https://wikitech.wikim... [23:21:29] 10Toolforge, 07Epic, 15User-Raymond_Ndibe: seperate jobs-framework k8s object templates from code - https://phabricator.wikimedia.org/T358815 (10Raymond_Ndibe) [23:25:38] 10Toolforge, 07Epic, 15User-Raymond_Ndibe: seperate jobs-framework k8s object templates from code - https://phabricator.wikimedia.org/T358815#9589302 (10Raymond_Ndibe) a:03Raymond_Ndibe