[00:10:06] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [00:27:02] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [01:05:30] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [01:58:24] 10Toolforge (Toolforge iteration 20): [envvars-api] bug in envvars-api EnvvarName validation Regex - https://phabricator.wikimedia.org/T391966#10851143 (10Raymond_Ndibe) 05In progress→03Resolved [02:02:17] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10851145 (10Raymond_Ndibe) a:03Raymond_Ndibe [02:02:33] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10851148 (10Raymond_Ndibe) 05Open→03In progress [02:02:39] 10Toolforge (Toolforge iteration 20): [components-api] Add alerts and runbooks for basic service health - https://phabricator.wikimedia.org/T394275#10851150 (10Raymond_Ndibe) a:03Raymond_Ndibe [02:14:04] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api] Add basic prometheus metrics - https://phabricator.wikimedia.org/T394276#10851152 (10Raymond_Ndibe) using the already existing `prometheus_fastapi_instrumentator` in the repo. Next is the puppet endpoint addition and the grafana dashboards [02:28:15] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10851153 (10Raymond_Ndibe) maybe we should add this "force" field to the config? that way i... [02:30:07] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10851154 (10Raymond_Ndibe) if instead we add this to the cli, then we should also add a way... [02:53:42] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 [03:10:05] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 [03:36:03] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-37 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [04:01:03] RESOLVED: [3x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-37 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProce [06:30:44] 06cloud-services-team, 10Data-Services, 06DBA, 10Wikidata, 07User-notice: Set up x3 replication to wikireplicas - https://phabricator.wikimedia.org/T390954#10851280 (10Marostegui) For what is worth, x3 should go to clouddb1020 and clouddb1016 (s5 and s8) as they have plenty of disk space available - also... [06:32:21] 06cloud-services-team, 10Data-Services, 06DBA, 10Wikidata, 07User-notice: Set up x3 replication to wikireplicas - https://phabricator.wikimedia.org/T390954#10851281 (10Marostegui) We shouldn't let s8 and x3 to run with each others tables for long. We should try to drop the not needed ones relatively soon... [07:00:12] 10Toolforge (Toolforge iteration 20), 13Patch-For-Review: [components-api,buildsa-api] When building and deploying, if none of the settings changed, the jobs are not restarted - https://phabricator.wikimedia.org/T389044#10851295 (10dcaro) >>! In T389044#10851153, @Raymond_Ndibe wrote: > maybe we should add thi... [07:13:21] 06cloud-services-team, 10Toolforge: New upstream release for Pywikibot - https://phabricator.wikimedia.org/T394615#10851311 (10taavi) a:03taavi [07:17:17] 06cloud-services-team, 10Toolforge: Emails to cloudservices@wikimedia.org from root@beta.toolforge.org bouncing - https://phabricator.wikimedia.org/T394453#10851320 (10taavi) 05Open→03Resolved [07:27:58] 06cloud-services-team, 10Data-Services, 07affects-Kiwix-and-openZIM: enwiki_p query returned empty results on May 14 from ~UTC 0:00 - 05:00 - https://phabricator.wikimedia.org/T394429#10851345 (10taavi) 05Open→03Invalid There are essentially three reasons that could cause this issue: * The view confi... [07:35:20] 06cloud-services-team, 10Data-Services, 06DBA: Add "wikishared" database to wiki replicas - https://phabricator.wikimedia.org/T395072#10851356 (10taavi) p:05Triage→03Low `wikishared` resides on the `x1` section which currently does not exist on the WMCS Wiki Replicas. My understanding is that this is som... [07:36:32] 06cloud-services-team, 10Cloud-VPS, 10Beta-Cluster-Infrastructure: Consider setting up an https://github.com/knyar/phalerts instance in metricsinfra - https://phabricator.wikimedia.org/T394446#10851370 (10taavi) p:05Triage→03Medium [07:53:47] 06cloud-services-team, 10Data-Services, 06DBA: Add "wikishared" database to wiki replicas - https://phabricator.wikimedia.org/T395072#10851386 (10Marostegui) I think this was more security related than anything else to be honest. We would need the green light from Security team on what can be exposed, what n... [08:20:31] 06cloud-services-team, 10Toolforge: 2025-05-22 Toolforge NFS cleanup - https://phabricator.wikimedia.org/T395000#10851420 (10taavi) 05Open→03Resolved p:05Triage→03High We're down to 78% which should be enough to keep the alerts away for a while. [08:22:39] 10Striker: Update Django version used in Striker - https://phabricator.wikimedia.org/T359217#10851428 (10taavi) 05Open→03Resolved 4.2 seems like a good place to be now. [08:27:26] 06cloud-services-team, 10Toolforge: /mnt/nfs/labstore-secondary-tools-project no longer seems to be mounted in the new container on Toolforge - https://phabricator.wikimedia.org/T363087#10851447 (10taavi) 05Open→03Invalid Bit late here, but closing as invalid since this seems to be working as expected... [08:43:33] (03open) 10taavi: dns: Remove jobs.svc records [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/47 (https://phabricator.wikimedia.org/T329443) [08:43:36] (03update) 10taavi: dns: Remove jobs.svc records [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/47 (https://phabricator.wikimedia.org/T329443) [08:50:49] (03approved) 10fnegri: dns: Remove jobs.svc records [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/47 (https://phabricator.wikimedia.org/T329443) (owner: 10taavi) [08:52:38] (03merge) 10taavi: dns: Remove jobs.svc records [repos/cloud/toolforge/tofu-provisioning] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning/-/merge_requests/47 (https://phabricator.wikimedia.org/T329443) [09:02:06] 10cloud-services-team (FY2024/2025-Q3-Q4), 06DC-Ops, 10ops-eqiad, 06SRE: Temperature Inlet Temp issue on clouddumps1001:9290 - https://phabricator.wikimedia.org/T383723#10851565 (10fnegri) I reverted my [change from last month](https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131051) and moved bac... [09:24:54] 06cloud-services-team, 10Toolforge: attempting to create a python virtual environment on the bastion has a confusing error message - https://phabricator.wikimedia.org/T369477#10851623 (10taavi) I looked briefly into this, but I don't see any ways to override this message without overriding Python stdlib files... [09:25:00] 06cloud-services-team, 10Toolforge: attempting to create a python virtual environment on the bastion has a confusing error message - https://phabricator.wikimedia.org/T369477#10851625 (10taavi) p:05Triage→03Low [09:31:47] (03close) 10taavi: Adding loki to install [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/669 (https://phabricator.wikimedia.org/T386480) (owner: 10rook) [09:36:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [10:06:46] (03open) 10taavi: toolforge: Refresh Apt repository information [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/244 [10:06:50] (03update) 10taavi: toolforge: Refresh Apt repository information [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/244 [10:08:40] (03update) 10taavi: toolforge: Refresh Apt repository information [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/244 [10:21:41] 06cloud-services-team, 10Data-Services, 06DBA: Add "wikishared" database to wiki replicas - https://phabricator.wikimedia.org/T395072#10851754 (10Ladsgroup) I think there is already a ticket for adding x1 to wikireplicas. We definitely should do that which helps us DBAs argue for moving data out of core sect... [11:20:58] 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#10851880 (10MoritzMuehlenhoff) I've made the necessary edit to reserve 1000000 to 2000000 for users. Next week I'll create a dummy user with ui... [11:31:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-65 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [11:32:34] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [11:38:29] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 10Data-Platform, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930#10851923 (10BTullis) a:05fnegri→03BTullis Thanks @fnegri - I can take... [12:01:40] 06cloud-services-team, 06Data-Persistence, 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, and 2 others: DBA Review of Tables that ORES Extension will create - https://phabricator.wikimedia.org/T391103#10852001 (10Marostegui) [12:09:56] 06cloud-services-team, 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122 (10taavi) 03NEW [12:46:42] 06cloud-services-team, 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852147 (10fnegri) From the parent task: > I ran it with multiple db options and only lawiki was run. So the cookbook must be either run one by one (or xargs) or the old fashioned... [12:47:52] 06cloud-services-team, 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852149 (10taavi) >>! In T395122#10852147, @fnegri wrote: > I think we can try running with `--all-databases`, though sometimes it hangs due to table locks. See for example T375751#... [12:49:27] 06cloud-services-team, 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852150 (10fnegri) True that. I'm also double checking why specifying multiple dbs did not work, the options says ` group.add_argument( "--databases", help=(... [12:53:42] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852159 (10fnegri) 05Open→03In progress a:03fnegri Looks like the "multiple database" list is supported by maintain-views.py, but not by the `update-views`... [12:54:34] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852177 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views run by fnegri: Started updating wiki replica views [13:07:38] (03approved) 10dcaro: toolforge: Refresh Apt repository information [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/244 (owner: 10taavi) [13:10:26] 10Cloud Services Proposals, 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Decision request - Who runs wikireplicas cookbooks - https://phabricator.wikimedia.org/T382607#10852355 (10Gehel) [13:11:10] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services, 06Data-Persistence, 10Data-Platform, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): an-redacteddb1001: upgrade MariaDB to 10.11 - https://phabricator.wikimedia.org/T394930#10852365 (10Gehel) [13:11:48] 10Data-Services, 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10852383 (10Gehel) [13:15:56] FIRING: SystemdUnitDown: The service unit backup_glance_images.service is in failed status on host cloudbackup1003. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [13:30:52] 06cloud-services-team, 10Toolforge (Toolforge iteration 20): [builds-builder] Upgrade python buildpack to v0.17.0 or newer for Poetry support - https://phabricator.wikimedia.org/T374056#10852520 (10DaxServer) I've tested the latest build with poetry project - https://github.com/DaxServer/wikibots/pull/37 - aft... [13:34:39] (03merge) 10taavi: toolforge: Refresh Apt repository information [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/244 [13:37:34] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [13:46:21] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852581 (10fnegri) The cookbook crashed unexpectedly, I somehow also lost my tmux session. There is nothing in the spicerack logs, these are the last lines: ` 2... [13:47:14] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852587 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views run by fnegri: Started updating wiki replica views [14:18:57] 06cloud-services-team, 06DC-Ops, 06SRE: Supporting new hardware in older debian releases - https://phabricator.wikimedia.org/T301162#10852638 (10taavi) 05Open→03Resolved I don't see any specific problems here that need addressing so closing in order to get this to stop lingering on our workboard. [14:19:51] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852640 (10ops-monitoring-bot) Cookbook cookbooks.sre.wikireplicas.update-views started by fnegri executed with errors: - an-redacteddb1001.eqiad.wmnet (**PASS**... [14:20:52] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10852642 (10fnegri) Failed again, but with an error message: ` pymysql.err.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction') ===... [14:29:08] 10Toolforge (Toolforge iteration 20): [jobs-api] prepend date and pod name to filelog lines - https://phabricator.wikimedia.org/T372025#10852648 (10taavi) [14:29:14] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10852649 (10taavi) [14:29:35] 14Toolforge (Software install/update), 07Kubernetes: toolforge-jobs: add logrotate - https://phabricator.wikimedia.org/T327165#10852650 (10taavi) [14:29:42] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10852651 (10taavi) [14:30:27] 06cloud-services-team, 10Toolforge: Simple logrotate service for users of Tools as stopgap before central logging - https://phabricator.wikimedia.org/T152235#10852652 (10taavi) [14:30:28] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10852653 (10taavi) [14:30:47] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10852657 (10taavi) [14:30:50] 14cloud-services-team (Kanban), 14Toolforge Jobs framework, 13Patch-For-Review: Allow customizing the out/err files with toolforge-jobs - https://phabricator.wikimedia.org/T304421#10852656 (10taavi) [14:31:12] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10852660 (10taavi) [14:31:12] 14cloud-services-team (FY2022/2023-Q3), 14Toolforge Jobs framework, 13Patch-For-Review: Allow specifying the path for log files for jobs executed on the new toolforge Jobs framework - https://phabricator.wikimedia.org/T301901#10852659 (10taavi) [14:32:23] 06cloud-services-team, 10Toolforge: [toolforge,infra] Cntralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10852661 (10taavi) [14:33:00] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [o11y,logging,infra] Deploy Loki to store Toolforge log data - https://phabricator.wikimedia.org/T386480#10852663 (10taavi) [14:33:54] 14Toolforge Jobs framework: toolforge-jobs: merge stdout/stderr output - https://phabricator.wikimedia.org/T302211#10852666 (10taavi) [14:34:00] 06cloud-services-team, 10Toolforge, 07Epic: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge tools - https://phabricator.wikimedia.org/T127367#10852667 (10taavi) [14:34:48] 06cloud-services-team, 10Toolforge: Simple logrotate service for users of Tools as stopgap before central logging - https://phabricator.wikimedia.org/T152235#10852673 (10taavi) [14:34:54] 06cloud-services-team, 10Toolforge: [toolforge,infra] Cntralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10852674 (10taavi) [14:35:18] 06cloud-services-team, 10Toolforge: [toolforge,infra] Cntralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10852676 (10taavi) [14:35:21] 14cloud-services-team (Kanban), 10Data-Services, 10Toolforge: Prevent overly-large log files - https://phabricator.wikimedia.org/T122508#10852677 (10taavi) [14:35:27] 06cloud-services-team, 10Toolforge, 07Epic, 07Tracking-Neverending: Make toolforge reliable enough (tracking) - https://phabricator.wikimedia.org/T90534#10852678 (10taavi) [14:37:08] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [o11y,logging,infra] Deploy Loki to store Toolforge log data - https://phabricator.wikimedia.org/T386480#10852707 (10taavi) [14:37:09] 06cloud-services-team, 10Toolforge: [toolforge,infra] Cntralized logging for Toolforge infrastructure logs - https://phabricator.wikimedia.org/T97861#10852708 (10taavi) [14:54:07] (03update) 10taavi: registry-admission: local: Exempt local-path-storage [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/795 [14:54:07] (03open) 10taavi: registry-admission: local: Exempt local-path-storage [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/795 [14:54:07] (03update) 10taavi: logging: Init component [repos/cloud/toolforge/toolforge-deploy] (main-Icb012f1ad81b582b65a569bb493095e12d3fbd72) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) [14:54:11] (03open) 10taavi: logging: Init component [repos/cloud/toolforge/toolforge-deploy] (main-Icb012f1ad81b582b65a569bb493095e12d3fbd72) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) [14:54:13] (03update) 10taavi: registry-admission: local: Exempt local-path-storage [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/795 [14:54:16] (03update) 10taavi: logging: Init component [repos/cloud/toolforge/toolforge-deploy] (main-Icb012f1ad81b582b65a569bb493095e12d3fbd72) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) [14:55:06] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: [o11y,logging,infra] Deploy Loki to store Toolforge log data - https://phabricator.wikimedia.org/T386480#10852755 (10taavi) a:03taavi Tentatively claiming. [14:56:55] (03update) 10taavi: logging: Init component [repos/cloud/toolforge/toolforge-deploy] (main-Icb012f1ad81b582b65a569bb493095e12d3fbd72) - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/796 (https://phabricator.wikimedia.org/T386480) [15:10:56] FIRING: SystemdUnitDown: The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:11:03] 06cloud-services-team: SystemdUnitDown The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://phabricator.wikimedia.org/T395133 (10phaultfinder) 03NEW [15:19:42] 06cloud-services-team, 10Data-Services, 07affects-Kiwix-and-openZIM: enwiki_p query returned empty results on May 14 from ~UTC 0:00 - 05:00 - https://phabricator.wikimedia.org/T394429#10852808 (10Audiodude) Thank you for carefully considering the issue. There haven't been any recent code changes in the t... [15:30:57] 06cloud-services-team, 10Bitu, 06Infrastructure-Foundations, 07LDAP: Allocate more available UNIX UIDs for human users - https://phabricator.wikimedia.org/T355663#10852835 (10jhathaway) I'm not sure it is much of an issue, but that range overlaps with `systemd-nspawn`'s range, 524288 to 1879048191 [1]. We... [15:39:51] 06cloud-services-team, 10Toolforge: [infra] Reports of slow connectivity from APAC - https://phabricator.wikimedia.org/T395135 (10dcaro) 03NEW [15:42:34] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [15:47:34] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [15:52:16] (03update) 10chuckonwumelu: [api] Adding warning message for beta [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/78 (https://phabricator.wikimedia.org/T394277) [15:52:34] FIRING: [2x] ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcess [16:23:04] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 [16:32:55] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 [16:46:25] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10853131 (10fnegri) The views were created in an-redacteddb1001 and clouddb1017, but failed on clouddb1018. I looked more at the logs and it failed while creating... [16:51:32] 10Tool-wdactle, 10Wikibase Action API (WPP), 10Wikidata, 10MW-1.45-notes (1.45.0-wmf.2; 2025-05-20): Add API (option) to format Wikidata entities as plain text in bulk - https://phabricator.wikimedia.org/T393691#10853142 (10LucasWerkmeister) 05Open→03Resolved a:03LucasWerkmeister The [commit ment... [17:12:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-38 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:30:56] (03PS1) 10Krinkle: build: Add phan, fix violations, enable it [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149703 [17:32:07] (03CR) 10Krinkle: [C:03+2] build: Add phan, fix violations, enable it [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149703 (owner: 10Krinkle) [17:32:57] (03Merged) 10jenkins-bot: build: Add phan, fix violations, enable it [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149703 (owner: 10Krinkle) [17:34:00] (03PS2) 10Krinkle: composer: Replace legoktm/clover-diff with wikimedia/clover-diff [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:34:28] (03CR) 10CI reject: [V:04-1] composer: Replace legoktm/clover-diff with wikimedia/clover-diff [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:35:08] (03CR) 10Krinkle: "Fixing "src/Frontend.php:97 PhanUndeclaredClassMethod Call to method __construct from undeclared class \Legoktm\CloverDiff\CloverXml". Add" [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:35:14] (03PS3) 10Krinkle: composer: Replace legoktm/clover-diff with wikimedia/clover-diff [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:35:51] (03CR) 10Krinkle: [C:03+2] composer: Replace legoktm/clover-diff with wikimedia/clover-diff [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:36:24] (03Merged) 10jenkins-bot: composer: Replace legoktm/clover-diff with wikimedia/clover-diff [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:45:27] 10Tools, 06Commons: Increase ZoomViewer JPEG quality - https://phabricator.wikimedia.org/T395153 (10JayCubby) 03NEW [17:45:52] (03CR) 10Jforrester: composer: Replace legoktm/clover-diff with wikimedia/clover-diff (031 comment) [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1148349 (owner: 10Jforrester) [17:54:20] (03PS1) 10Krinkle: docs: Document deployment steps and job config [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149706 [17:55:14] (03PS1) 10Krinkle: Upgrade from PHP 7.4 to PHP 8.2 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149707 [17:55:30] (03CR) 10CI reject: [V:04-1] Upgrade from PHP 7.4 to PHP 8.2 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149707 (owner: 10Krinkle) [17:56:46] (03CR) 10Krinkle: [C:03+2] docs: Document deployment steps and job config (031 comment) [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149706 (owner: 10Krinkle) [17:57:16] (03Merged) 10jenkins-bot: docs: Document deployment steps and job config [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149706 (owner: 10Krinkle) [17:57:32] (03CR) 10Krinkle: "recheck" [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149707 (owner: 10Krinkle) [18:00:16] (03PS2) 10Krinkle: Upgrade from PHP 7.4 to PHP 8.2 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149707 [18:13:47] (03open) 10tburmeister: Fix error in Succinct calc; add Overview [toolforge-repos/tech-doc-metrics] - 10https://gitlab.wikimedia.org/toolforge-repos/tech-doc-metrics/-/merge_requests/5 (https://phabricator.wikimedia.org/T390390) [18:14:01] (03merge) 10tburmeister: Fix error in Succinct calc; add Overview [toolforge-repos/tech-doc-metrics] - 10https://gitlab.wikimedia.org/toolforge-repos/tech-doc-metrics/-/merge_requests/5 (https://phabricator.wikimedia.org/T390390) [18:25:27] (03CR) 10Krinkle: [C:03+2] Upgrade from PHP 7.4 to PHP 8.2 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149707 (owner: 10Krinkle) [18:25:56] (03Merged) 10jenkins-bot: Upgrade from PHP 7.4 to PHP 8.2 [labs/tools/coverme] - 10https://gerrit.wikimedia.org/r/1149707 (owner: 10Krinkle) [18:36:17] 06cloud-services-team, 10Toolforge, 10Countervandalism-Network, 06Infrastructure-Foundations, 10Mail: Spam messages to cvn.maintainers@ - https://phabricator.wikimedia.org/T163656#10853482 (10Krinkle) [19:10:56] FIRING: SystemdUnitDown: The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [19:29:15] 10Tool-openstack-browser: Project name link on error page should link to said project - https://phabricator.wikimedia.org/T395161 (10Krinkle) 03NEW [19:29:22] 10Tool-openstack-browser: Project name link on error page should link to said project - https://phabricator.wikimedia.org/T395161#10853616 (10Krinkle) [19:41:43] 10cloud-services-team (FY2024/2025-Q3-Q4), 10Data-Services: Run maintain-views to create new ORES tables - https://phabricator.wikimedia.org/T395122#10853670 (10Ladsgroup) You can specify the table instead, ores_model and ores_classification. It means you have to run it twice but that should do the trick as th... [19:41:47] (03PS1) 10Krinkle: docroot: Fix link to Grafana dashboard, remove old Nagf link [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149737 (https://phabricator.wikimedia.org/T395164) [19:56:06] (03PS1) 10Krinkle: setup: Restore `git clone` of cvn-infrastructure repo [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149738 [20:02:36] (03PS1) 10Krinkle: setup: Adopt version-neutral php package aliases, remove unused php-apcu-bc [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149739 (https://phabricator.wikimedia.org/T395164) [20:04:04] (03PS2) 10Krinkle: setup: Restore `git clone` of cvn-infrastructure repo [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149738 (https://phabricator.wikimedia.org/T395164) [20:04:08] (03CR) 10Krinkle: [C:03+2] docroot: Fix link to Grafana dashboard, remove old Nagf link [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149737 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [20:04:13] (03CR) 10Krinkle: [V:03+2 C:03+2] setup: Restore `git clone` of cvn-infrastructure repo [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149738 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [20:04:16] (03PS2) 10Krinkle: setup: Adopt version-neutral php package aliases, remove unused php-apcu-bc [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149739 (https://phabricator.wikimedia.org/T395164) [20:04:19] (03CR) 10Krinkle: [V:03+2 C:03+2] setup: Adopt version-neutral php package aliases, remove unused php-apcu-bc [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149739 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [20:04:23] (03CR) 10Krinkle: [V:03+2 C:03+2] docroot: Fix link to Grafana dashboard, remove old Nagf link [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149737 (https://phabricator.wikimedia.org/T395164) (owner: 10Krinkle) [20:20:11] (03PS1) 10Krinkle: setup: Autodiscover the /etc/php/X.Y directory [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149744 (https://phabricator.wikimedia.org/T395164) [20:20:50] (03PS2) 10Krinkle: setup: Autodiscover the /etc/php/X.Y directory [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149744 (https://phabricator.wikimedia.org/T395164) [20:21:00] (03PS3) 10Krinkle: setup: Autodiscover the /etc/php/X.Y directory [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149744 (https://phabricator.wikimedia.org/T395164) [20:21:21] (03PS4) 10Krinkle: setup: Autodiscover the /etc/php/X.Y directory [labs/countervandalism/cvn-infrastructure] - 10https://gerrit.wikimedia.org/r/1149744 (https://phabricator.wikimedia.org/T395164) [20:22:12] (03open) 10don-vip: Draft: Migrate many media to a common media table [toolforge-repos/spacemedia] - 10https://gitlab.wikimedia.org/toolforge-repos/spacemedia/-/merge_requests/2 [20:25:07] 10Tool-openstack-browser: Project name link on error page should link to said project - https://phabricator.wikimedia.org/T395161#10853726 (10taavi) 05Open→03Resolved a:03taavi [23:10:56] FIRING: SystemdUnitDown: The systemd unit backup_glance_images.service on node cloudbackup1003 has been failing for more than two hours. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=cloudbackup1003 - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown