[00:10:49] (TfInfraTestApplyFailed) resolved: Terraform failed to apply/create the resources on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [01:14:26] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9570814 (10Legoktm) > Not easily, the same Pending status as reported by kube-state-metrics seems to also include things pods where the image configured does not exist and other us... [01:36:21] 10Tool-global-search: Global Search is down: 500: Internal Server Error / Could not resolve host: cloudelastic1004.wikimedia.org - https://phabricator.wikimedia.org/T358061#9570830 (10matmarex) [01:49:50] 10ToolforgeBundle, 10SVG Translate Tool, 10Community-Tech (CommTech-Kanban), 10Patch-Needs-Improvement: Git tag/version fetching times out - https://phabricator.wikimedia.org/T334454#9570855 (10Samwilson) 05Open→03Resolved The preview does not work in some situations, but it seems a bit flaky: it's som... [05:38:15] 10ToolforgeBundle, 10SVG Translate Tool, 10Community-Tech (CommTech-Kanban): Add Toolforge-specific deploy script for running npm in a separate job - https://phabricator.wikimedia.org/T346009#9571023 (10Samwilson) a:03Samwilson PR: https://github.com/wikimedia/svgtranslate/pull/728 [06:42:53] 10ToolforgeBundle, 10SVG Translate Tool, 10Community-Tech (CommTech-Kanban): Add Toolforge-specific deploy script for running npm in a separate job - https://phabricator.wikimedia.org/T346009#9571062 (10Samwilson) The paths referenced in `toolforge/install.yaml` are always relative to the tool's home directo... [09:31:09] 10Cloud Services Proposals, 10Toolforge, 10User-aborrero: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039#9571297 (10aborrero) I assume that this is for public IPv4 (185.15.x.y, etc) only, no? I guess for priv addresses we will keep using `svc.too... [09:40:11] 10Cloud Services Proposals, 10Toolforge, 10User-aborrero: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039#9571313 (10aborrero) [09:56:38] 10Toolforge, 10User-aborrero: toolforge-jobs job emails should have information on why events happened - https://phabricator.wikimedia.org/T306310#9571355 (10aborrero) [10:09:41] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: Support probes in kubernetes webservices - https://phabricator.wikimedia.org/T341919#9571412 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/23 k8s: add default tcp probes [10:12:42] 10Toolforge (Toolforge iteration 06): dbreps job pending to start for 2d16h on Toolforge - https://phabricator.wikimedia.org/T358175#9571414 (10taavi) >>! In T358175#9570814, @Legoktm wrote: >> Not easily, the same Pending status as reported by kube-state-metrics seems to also include things pods where the image... [10:16:50] (03PS1) 10Majavah: Convert remaining images to shell webservice-runner [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1005952 (https://phabricator.wikimedia.org/T293552) [10:19:22] 10Toolforge: [toolforge-webservice] Remove old webservice-runner code - https://phabricator.wikimedia.org/T358320#9571441 (10taavi) [10:19:26] 10Toolforge: [toolforge-webservice] Remove old webservice-runner code - https://phabricator.wikimedia.org/T358320#9571452 (10taavi) [10:19:35] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Remove Python/webservice-runner from toolforge web containers - https://phabricator.wikimedia.org/T293552#9571453 (10taavi) [10:19:42] 10Toolforge: [toolforge-webservice] Remove old webservice-runner code - https://phabricator.wikimedia.org/T358320#9571441 (10taavi) [10:19:51] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10Patch-For-Review: Toolforge: Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9571455 (10taavi) [10:49:10] (03PS1) 10Majavah: toolforge: k8s: Support containerd as container runtime [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005954 (https://phabricator.wikimedia.org/T284656) [10:51:23] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005954 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [10:51:36] (03CR) 10Majavah: [C: 03+2] toolforge: k8s: Support containerd as container runtime [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005954 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [10:54:45] (03Merged) 10jenkins-bot: toolforge: k8s: Support containerd as container runtime [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1005954 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [11:40:12] 10Cloud-VPS, 10cloud-services-team, 10Patch-For-Review, 10User-aborrero: nova-compute: error running local ceph command - https://phabricator.wikimedia.org/T358101#9571695 (10aborrero) 05Open→03Declined on IRC @dcaro suggested to use a class-level dependency instead of a file-level one. I too think tha... [11:57:38] 10Toolforge, 10cloud-services-team: Remove toolschecker grid engine checks - https://phabricator.wikimedia.org/T358333#9571728 (10taavi) [12:01:29] 10Toolforge, 10cloud-services-team: Remove toolschecker grid engine checks - https://phabricator.wikimedia.org/T358333#9571743 (10taavi) a:03taavi [12:01:53] 10Toolforge, 10cloud-services-team: Remove toolschecker grid engine checks - https://phabricator.wikimedia.org/T358333#9571728 (10taavi) [12:01:58] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10Patch-For-Review: Toolforge: Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9571747 (10taavi) [12:17:19] 10cloud-services-team, 10Infrastructure-Foundations, 10netops, 10User-aborrero: clouddb: evaluate moving them into cloud-private - https://phabricator.wikimedia.org/T357543#9571774 (10aborrero) In {T346947}, in https://gerrit.wikimedia.org/r/c/operations/homer/public/+/973769/comments/dedcd277_a07c883b @cm... [12:20:24] 10Toolforge, 10cloud-services-team, 10User-aborrero: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651#9571776 (10taavi) a:03aborrero [12:21:43] 10Toolforge, 10cloud-services-team, 10User-aborrero: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651#9571781 (10aborrero) [12:35:15] 10ToolforgeBundle, 10SVG Translate Tool, 10Community-Tech (CommTech-Kanban): Add Toolforge-specific deploy script for running npm in a separate job - https://phabricator.wikimedia.org/T346009#9571830 (10TheresNoTime) Merged :) [13:07:11] 10Toolforge, 10cloud-services-team, 10User-aborrero: Upgrade Toolforge Kubernetes to version 1.24 - https://phabricator.wikimedia.org/T307651#9571871 (10aborrero) p:05Triage→03Medium [13:13:07] 10Cloud-VPS (Project-requests): Request creation of logger-discord-bot VPS project - https://phabricator.wikimedia.org/T358337#9571892 (100xDeadbeef) [13:42:10] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Remove toolschecker grid engine checks - https://phabricator.wikimedia.org/T358333#9571957 (10taavi) 05Open→03Resolved [13:42:14] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10Patch-For-Review: Toolforge: Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664#9571958 (10taavi) [13:42:17] 10Toolforge, 10cloud-services-team: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030#9571959 (10taavi) [13:45:57] 10Toolforge, 10cloud-services-team: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030#9571977 (10taavi) [13:52:29] 10Toolforge (Toolforge iteration 06): disable-tool is stuck on tools-nfs-2 - https://phabricator.wikimedia.org/T358340#9571987 (10taavi) [13:53:11] 10Toolforge (Toolforge iteration 06): disable-tool is stuck on tools-nfs-2 - https://phabricator.wikimedia.org/T358340#9571999 (10taavi) p:05Triage→03High a:03taavi [14:13:11] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: disable-tool is stuck on tools-nfs-2 - https://phabricator.wikimedia.org/T358340#9572030 (10CodeReviewBot) taavi opened https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/15 Do not hold a ToolsDB connection during archival [14:14:24] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: disable-tool is stuck on tools-nfs-2 - https://phabricator.wikimedia.org/T358340#9572034 (10taavi) 05Open→03In progress [14:29:56] 10Toolforge, 10cloud-services-team: Toolforge: systemd monitoring - https://phabricator.wikimedia.org/T215155#9572068 (10taavi) [14:30:01] 10Cloud-VPS, 10cloud-services-team, 10Epic: [Epic] Provide logging/metrics/monitoring SaaS for Cloud VPS tenants - https://phabricator.wikimedia.org/T194333#9572069 (10taavi) [14:33:02] 10Cloud-VPS, 10cloud-services-team, 10Infrastructure-Foundations, 10Puppet: wmf_auto_restart_cron.service failing in Cloud VPS bookworm instances - https://phabricator.wikimedia.org/T358343#9572080 (10taavi) [14:59:44] 10ToolforgeBundle, 10SVG Translate Tool, 10Community-Tech (CommTech-Kanban): Add Toolforge-specific deploy script for running npm in a separate job - https://phabricator.wikimedia.org/T346009#9572171 (10dom_walden) As this appears to be a change to how we deploy/build svgtranslate, I don't know if there is a... [16:18:24] 10Toolforge (Toolforge iteration 06), 10Patch-For-Review: disable-tool is stuck on tools-nfs-2 - https://phabricator.wikimedia.org/T358340#9572451 (10CodeReviewBot) taavi merged https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/15 Do not hold a ToolsDB connection during archival [16:42:34] 10Tool-bub2: Move Wikimedia URLs to .env - https://phabricator.wikimedia.org/T358358#9572538 (10wassan.anmol117) [17:15:42] 10Toolforge (Toolforge iteration 06): disable-tool is stuck on tools-nfs-2 - https://phabricator.wikimedia.org/T358340#9572627 (10taavi) 05In progress→03Resolved [17:52:13] 10Cloud-VPS (Project-requests): Request creation of logger-discord-bot VPS project - https://phabricator.wikimedia.org/T358337#9572734 (10bd808) > As it uses PostgreSQL, the most straightforward way of hosting it must be on VPS since toolforge doesn't support Postgres. @fnegri Is it reasonably easy for a Toolfo... [18:03:10] (03CR) 10BryanDavis: [C: 03+1] Convert remaining images to shell webservice-runner [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1005952 (https://phabricator.wikimedia.org/T293552) (owner: 10Majavah) [18:18:27] 10Cloud-VPS, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9572818 (10fnegri) This scenario is also causing an uncaught exception in `wmcs-backup`. I think this happens b... [18:38:44] 10Cloud-VPS (Project-requests): Request creation of logger-discord-bot VPS project - https://phabricator.wikimedia.org/T358337#9572898 (10fnegri) @bd808 it's reasonably easy but it still involves creating a Cloud-VPS project. It can be created with quotas that only let you create a Trove database, but not instan... [18:38:57] 10Cloud-VPS (Project-requests): Request creation of logger-discord-bot VPS project - https://phabricator.wikimedia.org/T358337#9572901 (10fnegri) From https://wikitech.wikimedia.org/wiki/Help:Trove_database_user_guide#Key_concepts: > Toolforge tools that are approved to use Trove have a Cloud VPS project that ex... [18:39:20] 10Openstack-Magnum: magnum: kubectl fails to connect after time - https://phabricator.wikimedia.org/T336586#9572914 (10rook) 05In progress→03Resolved [18:46:38] 10Cloud-VPS (Project-requests): Request creation of logger-discord-bot VPS project - https://phabricator.wikimedia.org/T358337#9572955 (10fnegri) @0xDeadbeef do you know how much space you need for the database? [19:04:21] 10cloud-services-team (FY2022/2023-Q3): Clarify Trove and Toolsdb usage within WMCS - https://phabricator.wikimedia.org/T326754#9572997 (10bd808) Current user facing doc for this feature is at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Trove [20:05:39] 10Cloud-VPS, 10Data-Services, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Patch-For-Review: [toolsdb] [cinder] [ceph] Deleting snapshot does not work - https://phabricator.wikimedia.org/T356904#9573138 (10fnegri) The patch above should fix the race condition, but it still won't clean up the snapshots. To... [21:01:35] 10Cloud-VPS, 10PAWS, 10OpenRefine, 10SecTeam-Processed, and 2 others: Open refine stored password available in PAWS public - https://phabricator.wikimedia.org/T283839#9573395 (10sbassett) [21:02:10] 10Cloud-VPS, 10PAWS, 10OpenRefine, 10SecTeam-Processed, and 2 others: Open refine stored password available in PAWS public - https://phabricator.wikimedia.org/T283839#9573399 (10sbassett) a:03Bstorm [21:07:38] 10Tools, 10Tech-Docs-Team, 10Documentation, 10Wikimedia-Hackathon-2024: [Hackathon 2024] Improve technical documentation of tools - https://phabricator.wikimedia.org/T358040#9573436 (10apaskulin) [21:20:59] 10Tool-ducttape, 10Abstract Wikipedia team: DUCT exits with "panic: runtime error: invalid memory address or nil pointer dereference" on every run during setup-web-proxy - https://phabricator.wikimedia.org/T357354#9573483 (10SDunlap) 05Open→03Resolved [23:56:27] 10tool-wdlocator: tomba Kanssa - https://phabricator.wikimedia.org/T358400#9573746 (10Berete5212)