[02:39:33] FIRING: [4x] KernelErrors: Server cloudgw1003 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [03:50:26] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11072761 (10Danilo) I collected more data about the bots disconnections. This is the complete list of bots disconnecting daily with the approximate disconnection time... [03:50:29] FIRING: PuppetAgentNoResources: No Puppet resources found on instance tools-harbor-2 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:39:33] FIRING: [4x] KernelErrors: Server cloudgw1003 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [06:41:57] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/massmailer] - 10https://gerrit.wikimedia.org/r/1176464 (owner: 10L10n-bot) [06:42:21] (03CR) 10Abijeet Patro: [V:03+2] Localisation updates from https://translatewiki.net. [labs/tools/commons-mass-description] - 10https://gerrit.wikimedia.org/r/1176460 (owner: 10L10n-bot) [06:46:05] 10Tools: wikihistory.toolforge.org is down - https://phabricator.wikimedia.org/T401558#11072866 (10Wurgl) Restarted. Runs. Do not blame me! Until spring '24 when cron-jobs were allowed, I had a watchdog job which did this restart automagically when needed. But now, it is impossible for me. So every tool needs a... [07:08:12] 06cloud-services-team, 10Striker: 500 Internal Server Error when trying to access ssh keys on toolsadmin - https://phabricator.wikimedia.org/T401318#11072884 (10SLyngshede-WMF) It really shouldn't, I'll look into fixing that :-) [07:36:11] 10Tool-gawa: Request to be Added as Co-Owner of the GAWA Repository - https://phabricator.wikimedia.org/T401569 (10paulwiki) 03NEW [07:38:12] 06cloud-services-team, 10Tool-gawa: Request to be Added as Co-Owner of the GAWA Repository - https://phabricator.wikimedia.org/T401569#11072930 (10taavi) a:05taavi→03None [07:40:16] 06cloud-services-team, 10Cloud-VPS: KernelErrors Server cloudlb1001 logged kernel errors - https://phabricator.wikimedia.org/T401543#11072937 (10taavi) 05Open→03Resolved a:03taavi This was a single memory error caught by ECC. Don't think that requires further action, so closing. [07:43:08] 06cloud-services-team, 10Cloud-VPS: KernelErrors - https://phabricator.wikimedia.org/T401549#11072942 (10taavi) cloudgw1003 kernel log: {P80985} Looks like a possible NIC issue? [07:53:37] (03merge) 10dcaro: readme: add note about potential backwards incompatibility [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/86 [08:12:20] (03open) 10dcaro: functional_tests: harbor uses http2, that does not have `OK` [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/915 [08:13:05] (03approved) 10dcaro: functional_tests: harbor uses http2, that does not have `OK` [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/915 [08:13:07] (03merge) 10dcaro: functional_tests: harbor uses http2, that does not have `OK` [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/915 [08:22:03] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11073013 (10fgiunchedi) Thank you @Danilo for the break down, that is quite useful. I have set up a capture of dnsmasq state files on cloudnet hosts to see if we can... [08:23:01] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787#11073015 (10dcaro) Upgraded toolsbeta, I was running the tests also during that time and they did not fail, there were some weird events... [08:26:38] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 [08:33:27] 06cloud-services-team: Rename my developer usernames for consistency with my Wikimedia SUL account - https://phabricator.wikimedia.org/T401571 (10paulwiki) 03NEW [08:37:05] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 [08:41:48] 10Tools: wikihistory.toolforge.org is down - https://phabricator.wikimedia.org/T401558#11073048 (10Xqt) 05Open→03Resolved a:03Wurgl I'll ping you in such case cause this is detected by Pywikibot tests ;-) Thanks a lot. [08:44:34] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud [08:45:48] 10Tools: wikihistory.toolforge.org is down - https://phabricator.wikimedia.org/T401558#11073053 (10LucasWerkmeister) >>! In T401558#11072866, @Wurgl wrote: > Do not blame me! Until spring '24 when cron-jobs were allowed, I had a watchdog job which did this restart automagically when needed. But now, it is im... [08:46:10] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud [08:55:29] RESOLVED: PuppetAgentNoResources: No Puppet resources found on instance tools-harbor-2 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:09:19] 06cloud-services-team, 10Toolforge (Toolforge iteration 23): [kyverno] Upgrade to `3.3.9` chart (`1.13` app) for k8s 1.30 support - https://phabricator.wikimedia.org/T394787#11073079 (10dcaro) Sent notice for the upgrade window in tools https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia... [09:12:06] (03update) 10dcaro: config: add use_latest_versions to the source build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/72 (https://phabricator.wikimedia.org/T380127) [09:24:30] 06cloud-services-team: Rename my developer usernames for consistency with my Wikimedia SUL account - https://phabricator.wikimedia.org/T401571#11073107 (10Aklapper) [09:25:14] 10Tools: wikihistory.toolforge.org is down - https://phabricator.wikimedia.org/T401558#11073112 (10Wurgl) Thanks. This solves half of the problem. Sometimes, but extreme seldom, once every few month a background process hangs. It seems(!) it waits forever for some API-Answer (maybe name resolving or somethin... [09:26:00] 06cloud-services-team: Rename my developer usernames for consistency with my Wikimedia SUL account - https://phabricator.wikimedia.org/T401571#11073113 (10Aklapper) 05Open→03Declined > Wikimedia Developer Account: Hi, see https://wikitech.wikimedia.org/wiki/SRE/LDAP/Renaming_users Your Wikimedia GitLab... [09:30:31] 10Tools: wikihistory.toolforge.org is down - https://phabricator.wikimedia.org/T401558#11073120 (10LucasWerkmeister) Assuming the background process is running in `toolforge jobs`, you can also have a [health check](https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_jobs#Configuring_health_checks_for... [09:38:41] FIRING: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [09:40:28] FIRING: PuppetAgentFailure: Puppet agent failure detected on instance tools-harbor-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:45:33] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-nfs-58 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [09:48:07] (03merge) 10dcaro: config: add use_latest_versions to the source build [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/72 (https://phabricator.wikimedia.org/T380127) [09:48:30] (03update) 10dcaro: components-api: bump to 0.0.140-20250808051543-da1b63fb [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/914 (https://phabricator.wikimedia.org/T400064) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [09:50:25] 06cloud-services-team, 10Toolforge: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11073150 (10taavi) a:03taavi [09:51:10] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.141-20250811094820-170ea067 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/914 (https://phabricator.wikimedia.org/T380127 https://phabricator.wikimedia.org/T400064) [09:51:13] (03update) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.141-20250811094820-170ea067 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/914 (https://phabricator.wikimedia.org/T380127 https://phabricator.wikimedia.org/T400064) [09:52:58] 06cloud-services-team, 10Toolforge: toolforge: Add Trixie repository to deb.svc.toolforge.org - https://phabricator.wikimedia.org/T401574 (10taavi) 03NEW [09:53:05] 06cloud-services-team, 10Toolforge: toolforge: Add Trixie repository to deb.svc.toolforge.org - https://phabricator.wikimedia.org/T401574#11073171 (10taavi) p:05Triage→03Medium [09:53:28] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [09:55:23] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [09:55:28] RESOLVED: PuppetAgentFailure: Puppet agent failure detected on instance tools-harbor-2 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [09:56:49] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [10:00:36] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [10:04:18] FIRING: [4x] KernelErrors: Server cloudgw1003 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [10:07:19] 06cloud-services-team, 10Toolforge: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11073222 (10taavi) @bd808 and others, how would you feel about dropping the giant list of fonts installed in the PHP image and pushing users needing those to the build service instead? [10:15:20] (03merge) 10galrach600: added missing logic to session_post [toolforge-repos/miss-search] (update-cycle-toolforge-testing) - 10https://gitlab.wikimedia.org/toolforge-repos/miss-search/-/merge_requests/12 (owner: 10eliza189) [10:21:51] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [10:25:11] (03merge) 10mhorsey: feat: Add NotFound page for unknown routes [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/2 (owner: 10vriaa) [10:25:35] (03merge) 10mhorsey: docs: Update contributing guide for developers without repo access [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/3 (owner: 10vriaa) [10:26:52] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [10:27:27] (03approved) 10mhorsey: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1 (owner: 10vriaa) [10:29:53] (03approved) 10dcaro: components-api: bump to 0.0.141-20250811094820-170ea067 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/914 (https://phabricator.wikimedia.org/T380127 https://phabricator.wikimedia.org/T400064) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [10:29:58] (03merge) 10dcaro: components-api: bump to 0.0.141-20250811094820-170ea067 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/914 (https://phabricator.wikimedia.org/T380127 https://phabricator.wikimedia.org/T400064) (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [10:30:35] 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [components-api] store the config used for the deployment in the deployment themselves - https://phabricator.wikimedia.org/T400064#11073263 (10dcaro) 05In progress→03Resolved [10:30:51] 10Toolforge (Toolforge iteration 23): [components-api] store the config used for the deployment in the deployment themselves - https://phabricator.wikimedia.org/T400064#11073265 (10dcaro) 05Resolved→03In progress Missing the cli bits [10:33:21] (03approved) 10dcaro: build_deb fixes [repos/cloud/toolforge/webservice-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/webservice-cli/-/merge_requests/85 (https://phabricator.wikimedia.org/T400616) (owner: 10bd808) [10:34:10] (03merge) 10dcaro: cancel: add the missing autocomplete [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/51 [10:34:26] (03update) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [10:39:06] 06cloud-services-team, 10Toolforge: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11073273 (10LucasWerkmeister) I’d feel better about it if we had confirmation that the build service works for this use case, i.e. that the tricks used by the apt buildpack are enough t... [10:43:41] 06cloud-services-team, 10Cloud-VPS: KernelErrors - https://phabricator.wikimedia.org/T401549#11073279 (10fnegri) The error in `cloudlb1001` is a memory error that was automatically corrected, no action needed: ` Aug 10 09:32:16 cloudlb1001 kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware... [10:49:38] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T401578 (10phaultfinder) 03NEW [10:51:00] 06cloud-services-team, 10Toolforge: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11073304 (10taavi) 05Stalled→03Open [10:52:47] 06cloud-services-team, 10Cloud-VPS: KernelErrors - https://phabricator.wikimedia.org/T401549#11073311 (10fnegri) →14Duplicate dup:03T401578 [10:52:48] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T401578#11073313 (10fnegri) [10:53:05] 06cloud-services-team, 10Cloud-VPS: KernelErrors - https://phabricator.wikimedia.org/T401549#11073316 (10fnegri) Merging into the more specific {T401578} [10:54:58] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T401578#11073320 (10taavi) This feels rather deja vu compared to {T401543}.. [10:59:34] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T401578#11073324 (10fnegri) I think what happened is: 1. kernelerrors logged on cloudlb1001, triggering the creation of {T401543} 2. kernelerrors logged on cloudgw1003, triggering the creation of a mul... [11:01:41] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T401578#11073327 (10fnegri) cloudlb1001 errors were autocorrected memory errors as noted in {T401543}. cloudgw1003 errors were network errors, as noted in T401549#11072942. [11:07:28] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: toolforge: Add Trixie repository to deb.svc.toolforge.org - https://phabricator.wikimedia.org/T401574#11073335 (10taavi) 05Open→03Resolved Those repos now exist, but are empty. [11:07:41] 06cloud-services-team: KernelErrors Server cloudgw1003 logged kernel errors - https://phabricator.wikimedia.org/T401578#11073338 (10fnegri) 05Open→03Resolved a:03fnegri I'm gonna resolve this as well, if the network errors reoccur it might be worth looking into. [11:13:51] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://phabricator.wikimedia.org/T401363#11073345 (10fnegri) One-off failure to download from kiwix.org, the following runs were successful: ` Aug 06 20:15:... [11:13:54] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1001 has been failing for more than two hours. - https://phabricator.wikimedia.org/T401363#11073346 (10fnegri) 05Open→03Resolved a:03fnegri [11:14:24] 06cloud-services-team: SystemdUnitDown The systemd unit kiwix-mirror-update.service on node clouddumps1002 has been failing for more than two hours. - https://phabricator.wikimedia.org/T401389#11073349 (10fnegri) 05Open→03Resolved a:03fnegri Similar to {T401363}. One-off failure to download from kiwix... [11:22:36] (03update) 10vriaa: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1 [11:26:03] 06cloud-services-team, 10Toolforge: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11073363 (10taavi) The context for adding those fonts is https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/838939 from 2022 and the [[ https://lists.wikimedia.... [11:28:29] (03merge) 10mhorsey: Basic banner implementation [toolforge-repos/centralnotice-banner-editor] - 10https://gitlab.wikimedia.org/toolforge-repos/centralnotice-banner-editor/-/merge_requests/1 (owner: 10vriaa) [11:30:26] (03PS1) 10Majavah: Add Trixie images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1177345 (https://phabricator.wikimedia.org/T400255) [11:39:09] (03PS2) 10Majavah: Add Trixie images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1177345 (https://phabricator.wikimedia.org/T400255) [12:01:46] (03update) 10dcaro: [config] allow reading from stdin [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T398424) (owner: 10raymond-ndibe) [12:03:04] (03merge) 10dcaro: [config] allow reading from stdin [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/54 (https://phabricator.wikimedia.org/T398424) (owner: 10raymond-ndibe) [12:06:33] (03update) 10dcaro: d/changelog: bump to 0.0.14 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/57 (https://phabricator.wikimedia.org/T395077 https://phabricator.wikimedia.org/T398424) (owner: 10raymond-ndibe) [12:07:04] (03update) 10dcaro: d/changelog: bump to 0.0.14 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/57 (https://phabricator.wikimedia.org/T395077 https://phabricator.wikimedia.org/T398424) (owner: 10raymond-ndibe) [12:07:54] (03PS1) 10Majavah: aptly: Mark Trixie as supported [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) [12:07:54] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-cli [12:12:02] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [12:12:53] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-cli [12:14:34] (03CR) 10CI reject: [V:04-1] aptly: Mark Trixie as supported [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) (owner: 10Majavah) [12:15:45] (03CR) 10David Caro: [C:03+1] "Repos are there already 👍" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) (owner: 10Majavah) [12:16:29] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli [12:16:53] (03CR) 10David Caro: [C:03+1] "Might need to re-record the cookbook" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) (owner: 10Majavah) [12:17:09] (03approved) 10dcaro: d/changelog: bump to 0.0.14 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/57 (https://phabricator.wikimedia.org/T395077 https://phabricator.wikimedia.org/T398424) (owner: 10raymond-ndibe) [12:17:14] (03merge) 10dcaro: d/changelog: bump to 0.0.14 [repos/cloud/toolforge/components-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-cli/-/merge_requests/57 (https://phabricator.wikimedia.org/T395077 https://phabricator.wikimedia.org/T398424) (owner: 10raymond-ndibe) [12:18:40] 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [components-cli] Allow reading tool configuration from stdin - https://phabricator.wikimedia.org/T398424#11073466 (10dcaro) 05In progress→03Resolved This was a bug on the cli side expecting stdin to be json instead of yaml. [12:20:29] 10Toolforge (Toolforge iteration 23), 13Patch-For-Review: [components-cli] Allow reading tool configuration from stdin - https://phabricator.wikimedia.org/T398424#11073472 (10dcaro) Note also that with https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 you can now pass `s... [12:23:54] (03merge) 10dcaro: config: allow passing source_url [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/95 [12:24:47] (03PS2) 10Majavah: aptly: Mark Trixie as supported [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) [12:24:47] (03PS1) 10Majavah: build: Support Python 3.13 with Tox [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177371 [12:26:14] (03update) 10l10n-bot: Localisation updates from https://translatewiki.net. [toolforge-repos/wd-image-positions] - 10https://gitlab.wikimedia.org/toolforge-repos/wd-image-positions/-/merge_requests/42 [12:26:28] (03open) 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620: components-api: bump to 0.0.142-20250811122403-0d82d16b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/916 [12:26:44] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:32:20] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [12:32:36] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:35:29] (03CR) 10Majavah: [C:03+2] aptly: Mark Trixie as supported [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) (owner: 10Majavah) [12:37:37] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api [12:40:37] (03Merged) 10jenkins-bot: aptly: Mark Trixie as supported [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1177357 (https://phabricator.wikimedia.org/T401574) (owner: 10Majavah) [12:48:41] RESOLVED: CloudVPSDesignateLeaks: Detected 2 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:58:28] (03approved) 10dcaro: components-api: bump to 0.0.142-20250811122403-0d82d16b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/916 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:58:31] (03merge) 10dcaro: components-api: bump to 0.0.142-20250811122403-0d82d16b [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/916 (owner: 10group_203_bot_f4d95069bb2675e4ce1fff090c1c1620) [12:59:24] 06cloud-services-team, 10Toolforge: toolforge components - support providing ref in webhook - https://phabricator.wikimedia.org/T401388#11073577 (10dcaro) Not sure if it's going to be exactly the same, but this might help: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Deploy_your_tool#Fetching_your_tool_c... [13:00:56] (03update) 10dcaro: pre-commit: add check for openapi spec version bump [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/116 [13:01:16] (03update) 10dcaro: pre-commit: add check for openapi spec version bump [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/116 [13:01:31] (03update) 10dcaro: pre-commit: add check for openapi spec version bump [repos/cloud/toolforge/components-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/116 [13:13:31] 06cloud-services-team, 10Toolforge: toolforge components - support providing ref in webhook - https://phabricator.wikimedia.org/T401388#11073611 (10DamianZaremba) >>! In T401388#11073577, @dcaro wrote: > Not sure if it's going to be exactly the same, but this might help: https://wikitech.wikimedia.org/wiki/Hel... [13:26:21] (03open) 10dcaro: toolforge_deploy_mr: use the right package names for all clis [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/261 [13:26:30] (03update) 10dcaro: toolforge_deploy_mr: use the right package names for all clis [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/261 [13:32:09] 06cloud-services-team: Create debian 13.0 Trixie base images in cloud-vps - https://phabricator.wikimedia.org/T401584 (10Andrew) 03NEW [13:32:30] 06cloud-services-team, 10Cloud-VPS: Create debian 13.0 Trixie base images in cloud-vps - https://phabricator.wikimedia.org/T401584#11073678 (10Andrew) [13:36:20] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [13:45:14] 06cloud-services-team, 10Cloud-VPS: Create debian 13.0 Trixie base images in cloud-vps - https://phabricator.wikimedia.org/T401584#11073718 (10Andrew) Excerpts: ` [ 3.663935] systemd[1]: No hostname configured, using default hostname. [ 3.666569] systemd[1]: Hostname set to . [ 3.67317... [13:45:36] 06cloud-services-team, 10Cloud-VPS: Fix Puppet version/legacy fact issues with Cloud VPS Trixie image - https://phabricator.wikimedia.org/T401586 (10taavi) 03NEW [13:49:49] 06cloud-services-team, 10Cloud-VPS, 13Patch-For-Review: Fix Puppet version/legacy fact issues with Cloud VPS Trixie image - https://phabricator.wikimedia.org/T401586#11073742 (10taavi) Next up: ` Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error... [14:04:33] FIRING: [3x] KernelErrors: Server cloudgw1003 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudgw1003 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [14:10:42] (03update) 10raymond-ndibe: api: allow protocol to be specified for ports [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/186 (owner: 10dcaro) [14:15:09] (03approved) 10raymond-ndibe: toolforge_deploy_mr: use the right package names for all clis [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/261 (owner: 10dcaro) [14:15:14] (03update) 10raymond-ndibe: toolforge_deploy_mr: use the right package names for all clis [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/261 (owner: 10dcaro) [14:22:42] (03update) 10raymond-ndibe: d/changelog: bump to 0.0.14 [repos/cloud/toolforge/envvars-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/envvars-cli/-/merge_requests/86 (https://phabricator.wikimedia.org/T363544 https://phabricator.wikimedia.org/T400616) [14:29:42] 06cloud-services-team, 06Trust-and-Safety, 07LDAP: Reset developer account password and email address for "taxonbot" user - https://phabricator.wikimedia.org/T398220#11074003 (10CDanis) [14:31:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:35:25] 06cloud-services-team, 10Toolforge: toolforge components - support providing ref in webhook - https://phabricator.wikimedia.org/T401388#11074048 (10dcaro) I guess one issue is that different components on the same tool might have different repositories, so you would need a way to specify also which component t... [14:42:30] 06cloud-services-team, 10Toolforge: toolforge components - support providing ref in webhook - https://phabricator.wikimedia.org/T401388#11074093 (10DamianZaremba) >>! In T401388#11074048, @dcaro wrote: > I guess one issue is that different components on the same tool might have different repositories, so you w... [14:44:51] 06cloud-services-team, 10Toolforge: toolforge components - support providing ref in webhook - https://phabricator.wikimedia.org/T401388#11074100 (10DamianZaremba) I mocked this up for cluebotng staging with: https://github.com/cluebotng/component-configs [source for toolforge components configs] https://github... [14:46:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [14:48:45] 06cloud-services-team, 10Toolforge: toolforge components - support providing ref in webhook - https://phabricator.wikimedia.org/T401388#11074124 (10DamianZaremba) >>! In T401388#11074100, @DamianZaremba wrote: > I mocked this up for cluebotng staging with: > https://github.com/cluebotng/component-configs [sour... [14:54:03] (03CR) 10Majavah: Add Trixie images (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1177345 (https://phabricator.wikimedia.org/T400255) (owner: 10Majavah) [15:14:48] RESOLVED: [3x] KernelErrors: Server cloudgw1003 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudgw1003 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [15:15:59] 06cloud-services-team, 10Toolforge: TjfCliError - toolforge jobs logs broken - https://phabricator.wikimedia.org/T401422#11074290 (10DamianZaremba) Also happening on `cluebotng-staging` ` tools.cluebotng-staging@tools-bastion-12:~$ toolforge jobs logs -f bot ERROR: TjfCliError: Unknown error (sent 1009 (messag... [15:17:08] 06cloud-services-team, 10Toolforge: [TjfCliError] `toolforge jobs logs` breaks on long log lines - https://phabricator.wikimedia.org/T401422#11074325 (10DamianZaremba) [15:39:56] FIRING: [2x] SystemdUnitDown: The service unit hdfs_rsync_unique_devices.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:47:49] 06cloud-services-team: Rename my developer usernames for consistency with my Wikimedia SUL account - https://phabricator.wikimedia.org/T401571#11074513 (10poro26) >>! Dans T401571#11073113, @Aklapper a écrit : >> Wikimedia Developer Account: > > Hi, see https://wikitech.wikimedia.org/wiki/SRE/LDAP/Renaming_... [15:48:21] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11074514 (10bd808) >>! In T400255#11073221, @taavi wrote: > @bd808 and others, how would you feel about dropping the giant list of fonts installed in the PHP image... [15:49:56] FIRING: [3x] SystemdUnitDown: The service unit hdfs_rsync_mediacounts.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [15:50:05] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [15:54:56] FIRING: [4x] SystemdUnitDown: The service unit hdfs_rsync_mediacounts.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [16:05:04] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:15:17] 06cloud-services-team, 10Tool-gawa: Request to be Added as Co-Owner of the GAWA Repository - https://phabricator.wikimedia.org/T401569#11074642 (10bd808) > I am the maintainer of the GAWA tool, but due to technical reasons, I am not listed among the members of the repository on Wikimedia’s GitLab. It looks li... [16:24:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [16:32:51] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [16:39:07] (03merge) 10dcaro: toolforge_deploy_mr: use the right package names for all clis [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/261 [16:56:22] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [16:56:35] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [16:57:26] RESOLVED: [2x] SystemdUnitDown: The service unit hdfs_rsync_mediacounts.service is in failed status on host clouddumps1001. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/SystemdUnitDown - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitDown [17:04:29] 06cloud-services-team, 10Cloud-VPS: Create debian 13.0 Trixie base images in cloud-vps - https://phabricator.wikimedia.org/T401584#11074884 (10Andrew) The remaining issue has to do with sssd services: ` Error: /Stage[main]/Ldap::Client::Sssd/Service[sssd-pam]: Systemd restart for sssd-pam failed! ` ` Faile... [17:05:56] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [17:10:53] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11074934 (10fnegri) @RichSmith thanks for suggesting this! 9GB is not a lot compared to the really "big ones", but I think it's a good idea to move it, especially if you routinely stre... [17:14:18] FIRING: KernelErrors: Server cloudcephosd1014 logged kernel errors - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/KernelErrors - https://grafana.wikimedia.org/d/b013af4c-d405-4d9f-85d4-985abb3dec0c/wmcs-kernel-errors?orgId=1&var-instance=cloudcephosd1014 - https://alerts.wikimedia.org/?q=alertname%3DKernelErrors [17:14:29] 06cloud-services-team: KernelErrors Server cloudcephosd1014 logged kernel errors - https://phabricator.wikimedia.org/T401615 (10phaultfinder) 03NEW [17:16:29] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11074976 (10fnegri) p:05Triage→03Medium a:03fnegri [17:19:34] 06cloud-services-team: KernelErrors Server cloudcephosd1014 logged kernel errors - https://phabricator.wikimedia.org/T401615#11074982 (10fnegri) A blip in the NIC: ` fnegri@cloudcephosd1014:~$ sudo journalctl -k --since -1h -- Journal begins at Mon 2025-07-07 17:32:10 UTC, ends at Mon 2025-08-11 17:19:06 UTC. -... [17:21:36] 06cloud-services-team: KernelErrors Server cloudcephosd1014 logged kernel errors - https://phabricator.wikimedia.org/T401615#11074983 (10fnegri) 05Open→03Resolved a:03fnegri [17:39:09] (03update) 10dcaro: [jobs-api] split job models to oneoff, scheduled and continuous [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/154 (https://phabricator.wikimedia.org/T389118 https://phabricator.wikimedia.org/T390136) (owner: 10raymond-ndibe) [17:49:03] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:49:35] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:54:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [17:54:47] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11075063 (10RichSmith) >>! In T401347#11074934, @fnegri wrote: > @RichSmith thanks for suggesting this! 9GB is not a lot compared to the really "big ones", but I think it's a good idea... [17:55:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:13:05] 06cloud-services-team, 10Tool-gawa: Request to be Added as Co-Owner of the GAWA Repository - https://phabricator.wikimedia.org/T401569#11075108 (10Andrew) @poro26 I have added the paul26 account to that project, but please see Bryan's comments above and clarify that this is what you want what's up with the oth... [18:18:13] 10Cloud-VPS (Project-requests): Request creation of VPS project - https://phabricator.wikimedia.org/T401619 (10AlvinDulle) 03NEW [18:30:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [18:34:15] 10Tool-Pageviews: Topviews Analysis does not update or merge pageviews after article is moved to a new title - https://phabricator.wikimedia.org/T401475#11075168 (10MusikAnimal) This is {T121912}. There's little I can do to solve it in Topviews, unfortunately. [18:35:39] 10Cloud-VPS (Project-requests): Request creation of VPS project - https://phabricator.wikimedia.org/T401619#11075178 (10Pppery) a:05AlvinDulle→03None [18:35:52] 10Tool-Pageviews: Topviews Analysis does not update or merge pageviews after article is moved to a new title - https://phabricator.wikimedia.org/T401475#11075180 (10MusikAnimal) →14Duplicate dup:03T159046 [18:56:57] 06cloud-services-team, 10Toolforge, 13Patch-For-Review: Build Trixie based Toolforge pre-built images - https://phabricator.wikimedia.org/T400255#11075267 (10taavi) No, I have not heard anything from the maintainers of that tool about their plans for a migration. I don't think that is also a reasonable block... [19:02:45] 10Cloud-VPS (Project-requests): Request creation of VPS project - https://phabricator.wikimedia.org/T401619#11075288 (10taavi) Is there a reason this cannot run in Toolforge? > Developer account usernames of requestors: **wikiproject1** This account does not exist. [19:41:32] 06cloud-services-team, 10Toolforge: Investigate daily disconnections of IRC bots hosted in Toolforge - https://phabricator.wikimedia.org/T400223#11075485 (10Danilo) I have logs that register quit when I was online. The oldest register of bots disconnecting daily at same time by "Read error: Connection reset by... [20:15:04] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [20:19:33] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11075651 (10DamianZaremba) The disk usage is actually reported as less since I split 1 table into 2 (re-creating it), likely the files got optimised since we haven't updated the binary... [20:21:04] (03CR) 10BryanDavis: Add Trixie images (032 comments) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/1177345 (https://phabricator.wikimedia.org/T400255) (owner: 10Majavah) [20:21:24] 06cloud-services-team, 10Cloud-VPS (Project-requests): Trove for cluebotng-review? - https://phabricator.wikimedia.org/T401347#11075660 (10DamianZaremba) Note: Reading above refers primarily to `cbng-trainer` which is running jobs under another account, so most of that data stays within the WMCS ecosystem. [20:49:40] 06cloud-services-team, 10Data-Services, 06Data-Persistence: Decide how to use the new clouddb hosts (clouddb102[2-5]) - https://phabricator.wikimedia.org/T401295#11075772 (10Ladsgroup) In our discussions back then during the budgeting. The idea of these new hosts were these: - Adding one instance (not host)... [20:50:35] 06cloud-services-team, 10Data-Services, 06Data-Persistence: Decide how to use the new clouddb hosts (clouddb102[2-5]) - https://phabricator.wikimedia.org/T401295#11075773 (10Ladsgroup) Obviously, the map of which section to where was not defined. [21:00:07] 10Cloud-VPS (Project-requests): Request creation of VPS project - https://phabricator.wikimedia.org/T401619#11075793 (10AlvinDulle) Still working on it, How to make it run on Tool forge Got some errors thats why i redirected link to gitlab [21:56:09] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [21:58:40] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [22:00:05] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:00:35] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:00:54] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [22:01:06] (03update) 10raymond-ndibe: [cli] Change port type to allow protocol suffix [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/115 (https://phabricator.wikimedia.org/T400024) [22:05:34] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [22:06:34] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:16:35] RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Node tools-k8s-worker-103 has at least 12 procs in D state, and may be having NFS/IO issues - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [23:35:26] 10VPS-project-Wikistats: Add rkiwiki to wikistats - https://phabricator.wikimedia.org/T392504#11076231 (10Dzahn) 05Open→03Resolved a:03Dzahn ` MariaDB [wikistats]> insert into wikipedias (prefix, lang, loclang, method) values ("rki","Rakhine","ရခိုင်ဘ&#x... [23:36:23] 10VPS-project-Wikistats: Add minwikibooks to wikistats - https://phabricator.wikimedia.org/T395504#11076240 (10Dzahn) 05Stalled→03In progress [23:37:50] 10VPS-project-Wikistats: Add minwikibooks to wikistats - https://phabricator.wikimedia.org/T395504#11076246 (10Dzahn) 05In progress→03Resolved a:03Dzahn ` MariaDB [wikistats]> insert into wikibooks (prefix, lang, loclang, method) select prefix,lang,loclang,method from wikipedias where prefix="min"; d... [23:39:58] 10VPS-project-Wikistats: Add zghwiktionary to wikistats - https://phabricator.wikimedia.org/T399790#11076251 (10Dzahn) 05Stalled→03Resolved a:03Dzahn ` MariaDB [wikistats]> insert into wiktionaries (prefix, lang, loclang, method) select prefix,lang,loclang,method from wikipedias where prefix="zgh"; d... [23:41:35] 10VPS-project-Wikistats: Add madwikisource to wikistats - https://phabricator.wikimedia.org/T391772#11076257 (10Dzahn) 05Open→03Resolved a:03Dzahn ` MariaDB [wikistats]> insert into wikisources (prefix, lang, loclang, method) select prefix,lang,loclang,method from wikipedias where prefix="mad"; dzahn@... [23:42:57] 10VPS-project-Wikistats: Add tlwikisource to Wikistats - https://phabricator.wikimedia.org/T388678#11076264 (10Dzahn) 05Stalled→03Resolved a:03Dzahn ` MariaDB [wikistats]> insert into wikisources (prefix, lang, loclang, method) select prefix,lang,loclang,method from wikipedias where prefix="tl"; dzah... [23:44:12] 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q2:rack/setup/install cloudvirt10[68-76] - https://phabricator.wikimedia.org/T382492#11076268 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host cloudvirt1068.eqiad.wmnet with OS bookworm compl... [23:56:56] 06cloud-services-team, 06Trust-and-Safety, 07LDAP: Reset developer account password and email address for "taxonbot" user - https://phabricator.wikimedia.org/T398220#11076282 (10doctaxon) @bd808 : The new email address is dr.taxon[at]gmail.com [23:58:49] 10Tool-wikicordo: Limit input on Tool-wikicordo resets to 100 when set below 100 - https://phabricator.wikimedia.org/T401232#11076284 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup Fixed by https://gitlab.wikimedia.org/toolforge-repos/wikicordo/-/commit/03a09a0e2b5bda654f837047746f7d9e72fec91f