[00:09:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [00:58:50] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [01:03:50] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:04:56] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [02:09:56] (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [03:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [06:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [08:59:09] 10Toolforge Build Service: Toolforge refuses to install build-essential - https://phabricator.wikimedia.org/T355575 (10dcaro) >>! In T355575#9482119, @Soda wrote: >>>! In T355575#9480311, @dcaro wrote: >> Had a quick look at the code, I see also that there's a lot going on on the [[ https://github.com/sohomdatta... [09:01:23] 10Toolforge Build Service: [apt-buildpack] Does not handle virtual packages correctly - https://phabricator.wikimedia.org/T355575 (10dcaro) [09:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [09:17:17] 10Toolforge (Toolforge iteration 04), 10Toolforge Jobs framework: [jobs-api] Migrate to Poetry - https://phabricator.wikimedia.org/T354751 (10dcaro) [09:18:07] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [dev][harbor] reconcile harbor install methods - https://phabricator.wikimedia.org/T354942 (10dcaro) [09:18:10] 10Toolforge (Toolforge iteration 03), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Upgrade cadvisor - https://phabricator.wikimedia.org/T349795 (10dcaro) [09:18:18] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10Patch-For-Review: [apt-buildpack] Not sourcing /layers/fagiani_apt/apt/.profile.d/000_apt.sh - https://phabricator.wikimedia.org/T355214 (10dcaro) [09:18:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [09:18:34] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Create Bookworm-based standalone webservice image - https://phabricator.wikimedia.org/T355231 (10dcaro) [09:18:38] 10Toolforge (Toolforge iteration 04), 10Toolforge Jobs framework, 10Patch-For-Review: [jobs-api] Migrate to Gunicorn - https://phabricator.wikimedia.org/T354752 (10dcaro) [09:18:44] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 (10taavi) [09:18:50] 10Toolforge (Toolforge iteration 03), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Upgrade cadvisor - https://phabricator.wikimedia.org/T349795 (10taavi) [09:18:57] 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320 (10dcaro) [09:19:08] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10Patch-For-Review, 10Upstream: [maintain-harbor] Manage project quotas via maintain-harbor - https://phabricator.wikimedia.org/T352417 (10dcaro) [09:19:12] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 04): Decision request – Toolforge CLI consolidation - https://phabricator.wikimedia.org/T348749 (10dcaro) [09:19:38] 10Toolforge (Toolforge iteration 03), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Upgrade cadvisor - https://phabricator.wikimedia.org/T349795 (10taavi) 05Open→03Resolved Let's call this done. We're back to normal memory usage after the upgrade. [09:19:44] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [harbor] upgrade to 2.10.x - https://phabricator.wikimedia.org/T354507 (10dcaro) [09:19:47] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: [harbor] Redis using all available memory - https://phabricator.wikimedia.org/T354176 (10dcaro) [09:20:00] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10cloud-services-team, 10Cloud-Services-Origin-Team, and 2 others: [builds-api] Automatically deploy the webservice when the image is built - https://phabricator.wikimedia.org/T341065 (10dcaro) [09:20:24] 10Cloud-VPS, 10Toolforge (Toolforge iteration 04), 10cloud-services-team: Ensure Toolforge and Cloud VPS comply with Google's new email sender guidelines - https://phabricator.wikimedia.org/T354112 (10dcaro) [09:20:26] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] some packages install broken links - https://phabricator.wikimedia.org/T355217 (10dcaro) [09:20:28] 10Toolforge (Toolforge iteration 04): Toolforge next user stories - 2024 version - https://phabricator.wikimedia.org/T352857 (10dcaro) [09:21:09] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [tbs.maintain-harbor] Document current setup and admin procedures - https://phabricator.wikimedia.org/T329176 (10dcaro) [09:21:12] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10Patch-For-Review: [tbs][builds-api] Refactor `internal/builds.go` - https://phabricator.wikimedia.org/T352762 (10dcaro) [09:21:22] 10Toolforge (Toolforge iteration 04), 10User-Raymond_Ndibe: [toolforge-cd] find out why we run two gitlab ci/cd pipelines after merge - https://phabricator.wikimedia.org/T353563 (10dcaro) [09:21:26] 10Toolforge (Toolforge iteration 04): [toolforge-cd] gitlab-ci refactor - https://phabricator.wikimedia.org/T353514 (10dcaro) [09:21:32] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 (10dcaro) [09:21:38] 10Toolforge (Toolforge iteration 04): [toolforge API] Investigate ways to present our openapi definitions to users - https://phabricator.wikimedia.org/T354745 (10dcaro) [09:21:40] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] alternatives aren’t being set up - https://phabricator.wikimedia.org/T355215 (10dcaro) [09:21:45] 10Toolforge (Toolforge iteration 04): [toolforge-cd] discuss the possibility of removing tests from merge request ci/cd pipelines - https://phabricator.wikimedia.org/T353740 (10dcaro) [09:21:50] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: `build quota` fails if tool has no builds - https://phabricator.wikimedia.org/T353701 (10dcaro) [09:21:53] 10Toolforge (Toolforge iteration 04), 10Toolforge Jobs framework, 10Patch-For-Review: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537 (10dcaro) [09:21:55] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10dcaro) [09:22:01] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2): [tbs] Create a tutorial on how to deploy a Node.js app using Build Service - https://phabricator.wikimedia.org/T353313 (10dcaro) [09:22:03] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [maintain-harbor] Improvements to subcommands and config validation - https://phabricator.wikimedia.org/T353059 (10dcaro) [09:22:05] 10Toolforge (Toolforge iteration 04): [ci] Add shellcheck to pre-commit where missing - https://phabricator.wikimedia.org/T353052 (10dcaro) [09:22:07] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [tbs] Add dashboards with the new statistics - https://phabricator.wikimedia.org/T352764 (10dcaro) [09:22:09] 10Toolforge (Toolforge iteration 04): [dev] Investigate lima-vm as an alternative to Vagrant for lima-kilo - https://phabricator.wikimedia.org/T354406 (10dcaro) [09:22:11] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [tbs] cleanup robot account related code - https://phabricator.wikimedia.org/T352763 (10dcaro) [09:22:14] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178 (10dcaro) [09:22:16] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Maintenance, 10User-dcaro: [webservice] Error shown when restarting buildpack-based tool - https://phabricator.wikimedia.org/T348312 (10dcaro) [09:22:57] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] Does not handle virtual packages correctly - https://phabricator.wikimedia.org/T355575 (10dcaro) a:03dcaro [09:23:23] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [10:03:28] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 (10taavi) > cadvisor does not work Fixed with the upgrade. > I haven't checked if the log file max size still... [10:40:41] (03PS1) 10David Caro: inventory: split into submodules [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992638 [10:46:48] 10Data-Services: Configure `report_host` on ToolsDB - https://phabricator.wikimedia.org/T355761 (10taavi) [10:48:47] (03CR) 10Majavah: toolsdb: add cookbook to retrieve stuck table+query (034 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992215 (owner: 10David Caro) [11:02:36] (03CR) 10David Caro: toolsdb: add cookbook to retrieve stuck table+query (034 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992215 (owner: 10David Caro) [11:31:00] !log taavi@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the toolsbeta cluster [11:37:44] !log taavi@cloudcumin1001 admin Added a new k8s worker toolsbeta-test-k8s-worker-10.toolsbeta.eqiad1.wikimedia.cloud to the cluster [11:37:44] !log taavi@cloudcumin1001 admin END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the toolsbeta cluster [12:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [12:17:57] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/992147 (owner: 10L10n-bot) [12:26:24] 10Data-Services, 10Toolforge: Requesting SQL code review for application on Toolforge - https://phabricator.wikimedia.org/T355779 (10KBach) [12:31:44] 10Data-Services, 10Toolforge, 10cloud-services-team: Requesting SQL code review for application on Toolforge - https://phabricator.wikimedia.org/T355779 (10taavi) [12:47:38] 10PAWS: Remove paws-123-11 cluster - https://phabricator.wikimedia.org/T355785 (10rook) [12:48:47] 10PAWS: Remove paws-123-11 cluster - https://phabricator.wikimedia.org/T355785 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/paws/pull/367 [12:48:57] vivian-rook opened https://github.com/toolforge/paws/pull/367 [12:56:08] 10PAWS: Remove paws-123-11 cluster - https://phabricator.wikimedia.org/T355785 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/367 [12:56:20] vivian-rook closed https://github.com/toolforge/paws/pull/367 [12:56:31] 10PAWS: Remove paws-123-11 cluster - https://phabricator.wikimedia.org/T355785 (10rook) 05Open→03Resolved [13:29:49] 10Toolforge, 10Tools-Kubernetes, 10Kubernetes: Setup monitoring for kubernetes core components. - https://phabricator.wikimedia.org/T131929 (10dcaro) [13:30:29] 10Toolforge, 10Tools-Kubernetes, 10Kubernetes: Monitor that not too many replicasets have a big difference between desired and current+pending - https://phabricator.wikimedia.org/T140561 (10dcaro) 05Open→03Declined Will reopen if needed [13:34:07] 10Cloud-Services, 10Toolforge: Convert most top level tool and bastion dns records to CNAMEs - https://phabricator.wikimedia.org/T131796 (10dcaro) 05Open→03Resolved a:03dcaro The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/proj... [13:34:44] 10Cloud-Services, 10Toolforge: Possible race condition in webservice HSET/HDEL - https://phabricator.wikimedia.org/T122515 (10dcaro) 05Open→03Resolved a:03dcaro This seems resolved somehow, no more issues since then, will reopen if new issues arise. [13:36:29] (03PS1) 10Majavah: vps: refresh_puppet_certs: Parse SAL project from FQDN [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992697 [13:40:08] (03CR) 10CI reject: [V: 04-1] vps: refresh_puppet_certs: Parse SAL project from FQDN [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992697 (owner: 10Majavah) [13:40:45] (03PS1) 10Filippo Giunchedi: deployment_server: add dummy oauth2-proxy secrets for jaeger [labs/private] - 10https://gerrit.wikimedia.org/r/992699 (https://phabricator.wikimedia.org/T320555) [13:41:59] 10Data-Services, 10Toolforge, 10DBA, 10Tracking-Neverending: Certain tools users create multiple long running queries that take all memory and/or CPU from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601 (10dcaro) [13:42:02] (03CR) 10Filippo Giunchedi: "Goes with https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/984143" [labs/private] - 10https://gerrit.wikimedia.org/r/992699 (https://phabricator.wikimedia.org/T320555) (owner: 10Filippo Giunchedi) [13:42:05] 10Tools, 10Tracking-Neverending: merl tools (tracking) - https://phabricator.wikimedia.org/T69556 (10dcaro) [13:43:46] 10Toolforge, 10Stewards-and-global-tools, 10linkwatcher: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094 (10dcaro) 05Open→03Resolved a:03dcaro I think linkwatcher DB was moved out of toolsdb to it's own trove DB {T328691}, so this task can be res... [13:44:00] 10Tools: https://tools.wmflabs.org/merlbot-web/ 404s - https://phabricator.wikimedia.org/T85739 (10taavi) 05Open→03Invalid [13:44:04] 10Tools, 10Tracking-Neverending: merl tools (tracking) - https://phabricator.wikimedia.org/T69556 (10taavi) [13:45:22] 10Tools, 10Tracking-Neverending: merl tools (tracking) - https://phabricator.wikimedia.org/T69556 (10taavi) 05Open→03Invalid [13:49:41] 10Toolforge: [docs] Update Toolforge component README's - https://phabricator.wikimedia.org/T352964 (10dcaro) p:05Medium→03Low [13:50:00] 10Toolforge, 10Sustainability (Incident Followup): Add monitoring for expected load issues on tool labs exec nodes - https://phabricator.wikimedia.org/T109732 (10dcaro) 05Open→03Declined The Grid is going away soon, no need to add more monitoring. [13:54:13] 10Toolforge, 10cloud-services-team (Kanban): Include unique ip pageviews in the toolviews report - https://phabricator.wikimedia.org/T317714 (10taavi) [13:54:15] 10Toolforge, 10cloud-services-team: Eliminate single point of failure from Toolforge front proxy - https://phabricator.wikimedia.org/T283948 (10taavi) [13:55:28] 10Toolforge, 10cloud-services-team: Pull toolviews data from Kubernetes HAProxy or ingress-nginx instead of the front nginx - https://phabricator.wikimedia.org/T284558 (10taavi) [13:57:21] 10Toolforge, 10cloud-services-team: Eliminate single point of failure from Toolforge front proxy - https://phabricator.wikimedia.org/T283948 (10taavi) [13:57:23] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Update webservicemonitor to work without dynamicproxy - https://phabricator.wikimedia.org/T284564 (10taavi) 05Open→03Declined [13:58:08] 10Toolforge, 10Documentation: Update [[Help:Toolforge/Pywikibot#Setup_job_submission]] docs - https://phabricator.wikimedia.org/T174084 (10taavi) 05Open→03Resolved a:03taavi I'm calling this done with https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts. [13:59:44] 10Toolforge: [envvars,maintain-kubeusers] create and populate envvars for common service names - https://phabricator.wikimedia.org/T347141 (10dcaro) p:05Triage→03Low [14:00:00] 10Toolforge: [harbor] Create backups and/or replication - https://phabricator.wikimedia.org/T336668 (10dcaro) p:05Triage→03High [14:02:18] 10Toolforge: Standardize Toolforge CLI user interface looks - https://phabricator.wikimedia.org/T348442 (10dcaro) 05Open→03Resolved a:03dcaro I think we are already doing this to some extent, and {T348749} will also improve this in the medium term. [14:02:23] 10Toolforge, 10cloud-services-team, 10Kubernetes: Upgrade Toolforge K8s etcd nodes to Bookworm - https://phabricator.wikimedia.org/T349207 (10taavi) p:05Triage→03Medium [14:02:28] 10Toolforge, 10cloud-services-team, 10Kubernetes: Upgrade Toolforge K8s haproxies to Bookworm - https://phabricator.wikimedia.org/T349206 (10taavi) p:05Triage→03Medium [14:02:41] 10Toolforge: [maintain-harbor] investigate how the tools deletion process currently works and how that can be handled in maintain-harbor - https://phabricator.wikimedia.org/T336813 (10dcaro) p:05Triage→03Medium [14:03:23] 10Cloud-Services, 10Toolforge: Create developer environment using Docker images from Tool Labs Kubernetes - https://phabricator.wikimedia.org/T157733 (10dcaro) 05Open→03Resolved a:03dcaro We got lima-kilo now for this: https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo [14:05:47] 10Toolforge, 10cloud-services-team: webgrid-lighttpd queues kill OOM jobs with SIGKILL leaving php-cgi processes behind - https://phabricator.wikimedia.org/T153281 (10dcaro) 05Open→03Declined No more work is going to be done in the Grid. [14:07:37] 10Toolforge: Webservice crashes loudly when out of deployment quota - https://phabricator.wikimedia.org/T354808 (10dcaro) p:05Triage→03Medium [14:07:50] 10Toolforge: Webservice crashes loudly when out of deployment quota - https://phabricator.wikimedia.org/T354808 (10dcaro) [14:07:55] 10Toolforge: Webservice crashes loudly when out of deployment quota - https://phabricator.wikimedia.org/T354808 (10dcaro) [14:08:10] 10Toolforge: [dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10dcaro) p:05Triage→03Medium [14:09:11] 10Toolforge (Toolforge iteration 04): [dev] Investigate lima-vm as an alternative to Vagrant for lima-kilo - https://phabricator.wikimedia.org/T354406 (10Slst2020) There is now [[ https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/tree/main/lima-vm?ref_type=heads | support for lima ]] in addition to... [14:09:40] 10Toolforge: [dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10Slst2020) [14:10:09] 10Toolforge (Toolforge iteration 04): [dev] Investigate lima-vm as an alternative to Vagrant for lima-kilo - https://phabricator.wikimedia.org/T354406 (10Slst2020) 05In progress→03Resolved [14:10:33] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 04): Decision request – Toolforge CLI consolidation - https://phabricator.wikimedia.org/T348749 (10Slst2020) 05Stalled→03In progress [14:13:32] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 04): Decision request – Toolforge CLI consolidation - https://phabricator.wikimedia.org/T348749 (10Slst2020) As there is no clear consensus, a decision meeting will be scheduled as described here: https://www.mediawiki.org/wiki/Wikimedia_Cloud_Servic... [14:18:23] 10Toolforge: [dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10Slst2020) 05Open→03Resolved [14:18:45] 10Toolforge: [dev] find an alternative to Vagrant - https://phabricator.wikimedia.org/T348960 (10Slst2020) With there being support for lima in addition to vagrant now ({T354406}) I think this task can be closed. If anyone feels like exploring other alternatives, feel free to open it again. [14:22:39] 10Toolforge, 10cloud-services-team, 10Kubernetes: [k8s] Remove TTLAfterFinished from config before upgrade to 1.25 - https://phabricator.wikimedia.org/T349197 (10dcaro) p:05Triage→03Medium [14:23:01] 10Toolforge, 10cloud-services-team, 10Kubernetes: [k8s] Remove TTLAfterFinished from config before upgrade to 1.25 - https://phabricator.wikimedia.org/T349197 (10dcaro) [14:23:06] 10Toolforge: Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687 (10dcaro) p:05Triage→03High [14:23:27] 10Toolforge: Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687 (10dcaro) This might solve {T336668} [14:24:46] 10Toolforge, 10cloud-services-team, 10Kubernetes: [k8s] Remove TTLAfterFinished from config before upgrade to 1.25 - https://phabricator.wikimedia.org/T349197 (10taavi) [14:24:50] 10Toolforge, 10cloud-services-team: Upgrade Toolforge Kubernetes to version 1.25 - https://phabricator.wikimedia.org/T316107 (10taavi) [14:26:36] 10Toolforge: webservice and webservice-runner have no man pages - https://phabricator.wikimedia.org/T95097 (10dcaro) 05Open→03Declined I think this is less relevant now, it will be replaced by an api + thin cli relatively soon. [14:29:25] 10Toolforge, 10cloud-services-team: Toolforge: Ensure long-running Kubernetes pods get container updates applied - https://phabricator.wikimedia.org/T314705 (10dcaro) @taavi is this something that you want to still push? It might make sense to put it in the webservice+jobs api service instead of on it's own. [14:42:02] 10Toolforge, 10Documentation: Create a "my first Python webservice" tutorial for Toolforge - https://phabricator.wikimedia.org/T134494 (10dcaro) 05Open→03Resolved a:03dcaro I think this can be closed as we have https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service/My_first_Buildpack_Python_too... [14:42:08] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131 (10dcaro) [14:42:10] 10cloud-services-team (FY2017-18), 10Documentation, 10Goal: Improve "My first Flask OAuth tool" tutorial until it can be used as an example of a "good" tutorial - https://phabricator.wikimedia.org/T177124 (10dcaro) [14:42:20] 10Toolforge, 10Documentation: Run a documentation sprint for Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T101659 (10dcaro) [14:43:58] 10Tool-admin, 10Toolforge, 10I18n: Internationalize Toolforge's homepage - https://phabricator.wikimedia.org/T105590 (10dcaro) 05Open→03Resolved a:03dcaro toolforge.org now redirects to wikitech: https://wikitech.wikimedia.org/wiki/Portal:Toolforge, that currently does not support internationalization... [14:52:02] 10Toolforge, 10Documentation: Update and Improve Toolforge and Cloud VPS Technical Documentation - https://phabricator.wikimedia.org/T203131 (10dcaro) [14:52:08] 10Toolforge, 10Documentation: Run a documentation sprint for Cloud VPS and Toolforge - https://phabricator.wikimedia.org/T101659 (10dcaro) [14:52:11] 10Toolforge, 10Documentation: Create a "my first PHP webservice" tutorial for Toolforge - https://phabricator.wikimedia.org/T134493 (10dcaro) 05Open→03Resolved a:03dcaro This is now live: https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_PHP_tool [15:00:25] 10Toolforge, 10cloud-services-team, 10Documentation: Document and update Gerrit groups for repositories related to Toolforge - https://phabricator.wikimedia.org/T159051 (10dcaro) 05Open→03Invalid We have moved most of the toolforge repos to gitlab :), we should move the rest too, I think there's no reaso... [15:00:37] 10Toolforge, 10cloud-services-team: [toollabs-images] Move the repository to gitlab - https://phabricator.wikimedia.org/T355799 (10dcaro) [15:01:38] 10Toolforge, 10cloud-services-team: [toollabs-images] Move the repository to gitlab - https://phabricator.wikimedia.org/T355799 (10dcaro) p:05Triage→03Low [15:08:33] 10Toolforge, 10cloud-services-team (Kanban): Revamp the build process for debian packages in Toolforge - https://phabricator.wikimedia.org/T249837 (10dcaro) [15:08:54] 10Toolforge, 10cloud-services-team: wmcs-package-build.py: add support for creating a git tag - https://phabricator.wikimedia.org/T272290 (10dcaro) 05Open→03Resolved a:03dcaro We are using scripts in the different toolforge clients to generate the tags + bump_version [15:10:52] 10Toolforge, 10cloud-services-team (Kanban): Revamp the build process for debian packages in Toolforge - https://phabricator.wikimedia.org/T249837 (10dcaro) [15:11:10] 10Toolforge, 10cloud-services-team, 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, and 2 others: [toolforge-cli.build] Implement a --json flag to output machine-readable output - https://phabricator.wikimedia.org/T334589 (10dcaro) p:05Triage→03Low [15:11:17] 10Toolforge, 10cloud-services-team: wmcs-package-build.py: add support for testing the packages - https://phabricator.wikimedia.org/T272289 (10dcaro) 05Open→03Declined You can now test the packages on lima-kilo downloading them with the https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/blob/ma... [15:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [15:26:21] 10Cloud-VPS (Ubuntu Trusty Deprecation), 10Toolforge, 10cloud-services-team (Kanban), 10Epic: Upgrade the tools gridengine system - https://phabricator.wikimedia.org/T199271 (10dcaro) [15:26:24] 10Toolforge, 10cloud-services-team (Kanban), 10Infrastructure-Foundations, 10Goal, 10Puppet: Fully puppetize Grid Engine - https://phabricator.wikimedia.org/T88711 (10dcaro) [15:27:00] 10Toolforge, 10Documentation, 10Puppet: Document our GridEngine set up - https://phabricator.wikimedia.org/T88733 (10dcaro) 05Open→03Declined No more grid work is going to be done, we are retiring it :) [15:28:39] 10Toolforge, 10Patch-For-Review, 10User-dcaro: Toolforge grid automation: consider creating a cookbook to heal the grid from D state procs - https://phabricator.wikimedia.org/T336034 (10dcaro) 05Open→03Declined No more work on the grid is going to be done :), we are retiring it [15:28:44] 10Toolforge, 10Cloud-Services-Origin-User, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: Toolforge grid seems overloaded - https://phabricator.wikimedia.org/T335009 (10dcaro) [15:28:49] 10Data-Services, 10cloud-services-team: NFS v4.1/2 as possible fix for elevated load and lock contention on our NFS servers - https://phabricator.wikimedia.org/T257945 (10dcaro) [15:29:46] 10Toolforge: Cleanup duplicate bibleversefinder tools - https://phabricator.wikimedia.org/T91585 (10dcaro) 05Open→03Declined Closing for lack of activity, please feel free to open a new one if it's still relevant. [15:47:33] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Relocate disable-tool-archive-dbs.service - https://phabricator.wikimedia.org/T353642 (10dcaro) p:05Triage→03High [15:47:41] 10Toolforge, 10cloud-services-team, 10Puppet (Puppet 7.0): Migrate Toolforge to Puppet 7 - https://phabricator.wikimedia.org/T351494 (10dcaro) p:05Triage→03High [15:48:57] 10Toolforge, 10cloud-services-team (FY2021/2022-Q3): Figure out process for deleting an unused tool - https://phabricator.wikimedia.org/T170355 (10dcaro) [15:49:13] 10Toolforge: Store state information for the disable tool process outside NFS - https://phabricator.wikimedia.org/T332514 (10dcaro) 05Open→03In progress p:05Triage→03Medium [15:51:33] 10Toolforge, 10cloud-services-team: Remove Python/webservice-runner from toolforge web containers - https://phabricator.wikimedia.org/T293552 (10dcaro) p:05Triage→03Low [15:52:49] 10Toolforge, 10cloud-services-team: Update maintain_kubeusers to use the toolstate database - https://phabricator.wikimedia.org/T334629 (10dcaro) p:05Triage→03Medium [15:54:52] 10Toolforge, 10Composer, 10Patch-For-Review: Switch Toolforge installation of "composer" to use the Debian package - https://phabricator.wikimedia.org/T287900 (10dcaro) p:05Triage→03Low [16:43:00] 10Toolforge, 10cloud-services-team, 10User-Raymond_Ndibe: add on-wiki edits of toolforge tools to toolviews report - https://phabricator.wikimedia.org/T317953 (10dcaro) p:05Triage→03High [16:44:38] 10Toolforge, 10cloud-services-team, 10User-Raymond_Ndibe: add on-wiki edits of toolforge tools to toolviews report - https://phabricator.wikimedia.org/T317953 (10dcaro) I think this might need some detail about the design of the solution, @Raymond_Ndibe can you please add what you had in mind? [17:02:01] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10dcaro) Some 'raw' data on the last 30 days increase of errors per-host/drive: ` cloudcephosd1021-sdb 88 cloudceph... [17:36:36] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10dcaro) I'm running a script now to gather nicer reports with smartctl included, will send it once it's finished. [17:44:51] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10DC-Ops, 10SRE, 10ops-eqiad: cloudcephosd1021-1034: hard drive sector errors increasing - https://phabricator.wikimedia.org/T348643 (10dcaro) Here you go, that has one directory per host, with one file per drive with the total increase of errors in... [18:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [19:07:59] (03PS1) 10Eric Gardner: releases: Bump Codex to 1.3.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/992799 [20:01:54] (03CR) 10Catrope: [C: 03+2] releases: Bump Codex to 1.3.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/992799 (owner: 10Eric Gardner) [20:02:36] (03Merged) 10jenkins-bot: releases: Bump Codex to 1.3.1 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/992799 (owner: 10Eric Gardner) [20:17:38] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Relocate disable-tool-archive-dbs.service - https://phabricator.wikimedia.org/T353642 (10CodeReviewBot) andrew merged https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/12 Roll archive_dbs stage into the archive stage [21:14:49] (TfInfraTestApplyFailed) firing: Terraform failed to apply/create the resounces on tf-bastion - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/TfInfraTestApplyFailed - https://prometheus-alerts.wmcloud.org/?q=alertname%3DTfInfraTestApplyFailed [21:52:03] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Relocate disable-tool-archive-dbs.service - https://phabricator.wikimedia.org/T353642 (10CodeReviewBot) andrew opened https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/13 archive_dbs: fix path to tool_dir [21:53:01] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Relocate disable-tool-archive-dbs.service - https://phabricator.wikimedia.org/T353642 (10CodeReviewBot) andrew merged https://gitlab.wikimedia.org/repos/cloud/toolforge/disable-tool/-/merge_requests/13 archive_dbs: fix path to tool_dir [22:18:40] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Relocate disable-tool-archive-dbs.service - https://phabricator.wikimedia.org/T353642 (10Andrew) 05Open→03Resolved [22:18:45] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Toolforge: Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664 (10Andrew) [22:18:49] 10Toolforge, 10cloud-services-team (FY2021/2022-Q3): Figure out process for deleting an unused tool - https://phabricator.wikimedia.org/T170355 (10Andrew)