[00:01:03] RECOVERY - Wikitech-static main page has content on wikitech-static.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 29695 bytes in 4.365 second response time https://wikitech.wikimedia.org/wiki/Wikitech-static [00:05:55] FIRING: MaxConntrack: Max conntrack at 80.56% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:08:28] FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:10:55] RESOLVED: MaxConntrack: Max conntrack at 81.04% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:12:55] FIRING: MaxConntrack: Max conntrack at 80.97% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:13:28] RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [00:22:55] RESOLVED: MaxConntrack: Max conntrack at 80.7% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:23:55] FIRING: MaxConntrack: Max conntrack at 80.81% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:28:10] RESOLVED: MaxConntrack: Max conntrack at 80.61% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:31:28] FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate coibot.linkwatcher.eqiad.wmflabs is about to expire in 26d 23h 48m 37s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire [00:33:55] FIRING: MaxConntrack: Max conntrack at 80.87% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:48:10] RESOLVED: MaxConntrack: Max conntrack at 80.1% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:54:25] FIRING: MaxConntrack: Max conntrack at 86.69% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [00:59:25] RESOLVED: MaxConntrack: Max conntrack at 86.69% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack [03:14:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on cloudcephosd1027:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [03:34:48] RESOLVED: PuppetZeroResources: Puppet has failed generate resources on cloudcephosd1027:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [06:36:33] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Fix padding between two rows - https://phabricator.wikimedia.org/T365843#10056661 (10KCVelaga) @KKsurendran06 are you still able to work on the fixes? Just checking in on the progress [08:17:37] (03PS8) 10David Caro: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 [08:17:37] (03PS12) 10David Caro: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 [08:17:37] (03PS4) 10David Caro: ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469 [08:17:38] (03PS4) 10David Caro: undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470 [08:17:39] (03PS3) 10David Caro: ceph.drain/undrain_*: allow filtering by osd-id [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060473 [08:17:40] (03PS4) 10David Caro: ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173 [08:17:45] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.drain_rack [08:17:54] !log dcaro@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=97) [08:26:39] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_rack (T371878) [08:26:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:26:45] T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878 [08:26:52] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=99) (T371878) [08:26:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:27:39] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_rack (T371878) [08:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:32:48] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=99) (T371878) [08:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:32:54] T371878: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878 [08:33:06] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_rack [08:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:37:13] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=99) [08:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:37:16] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_rack (T371878) [08:37:17] !log dcaro@urcuchillay admin END (ERROR) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=97) (T371878) [08:37:20] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_rack [08:37:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:37:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:37:50] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=99) [08:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:43:31] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.osd.drain_rack [08:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:46:28] !log dcaro@urcuchillay admin END (PASS) - Cookbook wmcs.ceph.osd.drain_rack (exit_code=0) [08:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:47:10] (03PS1) 10David Caro: ceph.{drain,undrain}*: only drain/undrain osds that are in/out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061951 [08:48:32] (03CR) 10David Caro: [C:03+2] ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173 (owner: 10David Caro) [08:48:36] (03CR) 10David Caro: [C:03+2] ceph.drain/undrain_*: allow filtering by osd-id [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060473 (owner: 10David Caro) [08:48:41] (03CR) 10David Caro: [C:03+2] undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470 (owner: 10David Caro) [08:48:45] (03CR) 10David Caro: [C:03+2] ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469 (owner: 10David Caro) [08:48:48] (03CR) 10David Caro: [C:03+2] ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro) [08:48:52] (03CR) 10David Caro: [C:03+2] ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro) [08:52:26] (03CR) 10CI reject: [V:04-1] ceph.{drain,undrain}*: only drain/undrain osds that are in/out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061951 (owner: 10David Caro) [08:52:57] !log dcaro@urcuchillay admin START - Cookbook wmcs.ceph.wait_for_rebalance [08:53:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:53:25] (03Merged) 10jenkins-bot: ceph.bootstrap_and_add: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060451 (owner: 10David Caro) [08:53:25] (03Merged) 10jenkins-bot: ceph.undrain: use the size of the drive in TiB as weight [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060443 (owner: 10David Caro) [08:53:50] (03Merged) 10jenkins-bot: ceph.drain*: use --osd-hostname and --cluster-name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060469 (owner: 10David Caro) [08:53:50] (03Merged) 10jenkins-bot: undrain_node: wait by default [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060470 (owner: 10David Caro) [08:53:51] (03Merged) 10jenkins-bot: ceph.drain/undrain_*: allow filtering by osd-id [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060473 (owner: 10David Caro) [08:53:51] (03Merged) 10jenkins-bot: ceph.{drain,undrain}: fix chunking [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1060173 (owner: 10David Caro) [08:56:22] (03PS2) 10David Caro: ceph.{drain,undrain}*: only drain/undrain osds that are in/out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061951 [09:01:21] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [09:02:32] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway [09:02:53] (03PS8) 10David Caro: toolforge.component.deploy: remove the k8s prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059890 [09:02:53] (03PS9) 10David Caro: toolforge.component.deploy: use bump_ as default branch [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 [09:02:54] (03PS12) 10David Caro: wmcs_libs.common: add run_script [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059906 [09:02:54] (03PS14) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [09:02:55] (03PS12) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [09:02:57] (03PS14) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [09:03:01] (03PS1) 10David Caro: run_tests: update the deploy repo at the same time [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061954 [09:04:09] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [09:08:58] 10Cloud-VPS (Debian Buster Deprecation), 10Wikispore: Rebuild Wikispore Vagrant boxes on Bullseye or Bookworm - https://phabricator.wikimedia.org/T365934#10056968 (10Tgr) Packaging the vagrant box (via `vagrant package`) for export fails with ` tgr@wikispore-test:/srv/mediawiki-vagrant$ vagrant package ==> de... [09:09:31] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [09:13:42] (03PS1) 10David Caro: deploy: wait by default for the k8s components to finish deploying [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061957 [09:13:58] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [09:14:05] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway [09:21:11] 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10056996 (10Tgr) >>! In T371977#10054698, @DavidBrooks wrote: > T... [09:39:32] (03PS2) 10David Caro: deploy: wait by default for the k8s components to finish deploying [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061957 [09:40:10] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [09:41:54] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway [09:45:08] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [09:50:20] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [09:50:52] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [09:51:20] (03CR) 10David Caro: [V:03+1] "Tested both in tools and toolsbeta" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061954 (owner: 10David Caro) [09:51:25] (03CR) 10David Caro: [V:03+1] deploy: wait by default for the k8s components to finish deploying [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061957 (owner: 10David Caro) [09:51:40] (03CR) 10David Caro: [V:03+1] "Deployed the latest api-gateway with this both in tools and toolsbeta" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061957 (owner: 10David Caro) [09:57:09] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [09:58:22] 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10057085 (10dcaro) [09:58:28] (03update) 10dcaro: api-gateway: bump to 0.0.37-20240805134412-40551e32 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/474 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:58:28] (03approved) 10dcaro: api-gateway: bump to 0.0.37-20240805134412-40551e32 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/474 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:58:48] (03update) 10dcaro: jobs-api: bump to 0.0.327-20240805134054-27dfc026 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/473 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:58:53] (03merge) 10dcaro: api-gateway: bump to 0.0.37-20240805134412-40551e32 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/474 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [09:59:02] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [10:01:11] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [10:18:56] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [10:24:02] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [10:24:25] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [10:29:01] (03update) 10raymond-ndibe: Draft: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [10:30:11] (03open) 10dcaro: d/changelog: bump to 0.103.10 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/51 (https://phabricator.wikimedia.org/T369569) [10:30:31] (03update) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/111 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:30:41] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [10:30:53] (03approved) 10dcaro: jobs-api: bump to 0.0.327-20240805134054-27dfc026 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/473 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:30:58] (03update) 10dcaro: jobs-api: bump to 0.0.327-20240805134054-27dfc026 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/473 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:31:25] (03merge) 10dcaro: jobs-api: bump to 0.0.327-20240805134054-27dfc026 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/473 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [10:31:42] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice [10:33:10] (03approved) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/111 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:33:14] (03merge) 10dcaro: poetry: Autoupdate [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/111 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [10:33:57] (03update) 10dcaro: d/changelog: bump to 0.103.10 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/51 (https://phabricator.wikimedia.org/T369569) [10:36:09] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.328-20240812103324-5e8e6c58 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/477 [10:37:21] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice [11:18:40] (03CR) 10Jaime Nuche: "@hi@taavi.wtf Does the change to the config look good? I was hoping to get this merged" [labs/tools/train-blockers] - 10https://gerrit.wikimedia.org/r/1055894 (owner: 10Jaime Nuche) [11:35:43] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice [11:35:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:36:48] !log dcaro@urcuchillay toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice [11:36:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:37:03] !log dcaro@urcuchillay toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice [11:37:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:42:58] !log dcaro@urcuchillay toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice [11:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:44:09] (03PS1) 10David Caro: toolforge.component.deploy: fix MR commenting for packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061976 [11:44:50] (03PS15) 10David Caro: toolforge.run_tests: use the functional tests [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059907 [11:44:50] (03PS13) 10David Caro: openstack.tofu: use run_script instead of reimplementing it [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059919 [11:44:51] (03PS15) 10David Caro: toolforge.deploy: run tests and add note to MR [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059921 [11:44:52] (03PS3) 10David Caro: deploy: wait by default for the k8s components to finish deploying [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061957 [11:45:00] (03Abandoned) 10David Caro: toolforge.component.deploy: fix MR commenting for packages [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061976 (owner: 10David Caro) [11:45:12] (03Abandoned) 10David Caro: run_tests: update the deploy repo at the same time [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061954 (owner: 10David Caro) [11:46:08] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice [11:46:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:51:12] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (T363344) [11:51:17] !log dcaro@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) (T363344) [11:51:18] T363344: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344 [11:51:19] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (T363344) [11:51:29] !log dcaro@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) (T363344) [11:51:51] !log dcaro@cloudcumin1001 admin START - Cookbook wmcs.ceph.osd.undrain_node (T363344) [11:51:59] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice [11:52:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:52:18] !log dcaro@urcuchillay admin END (FAIL) - Cookbook wmcs.ceph.wait_for_rebalance (exit_code=99) [11:52:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:53:48] (03CR) 10David Caro: [C:03+2] "Tested with cloudcephosd1037:" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061951 (owner: 10David Caro) [11:54:26] (03approved) 10dcaro: d/changelog: bump to 0.103.10 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/51 (https://phabricator.wikimedia.org/T369569) [11:54:29] (03update) 10dcaro: d/changelog: bump to 0.103.10 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/51 (https://phabricator.wikimedia.org/T369569) [11:54:31] (03merge) 10dcaro: d/changelog: bump to 0.103.10 [repos/cloud/toolforge/tools-webservice] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/51 (https://phabricator.wikimedia.org/T369569) [11:57:48] (03Merged) 10jenkins-bot: ceph.{drain,undrain}*: only drain/undrain osds that are in/out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1061951 (owner: 10David Caro) [11:59:02] (03CR) 10David Caro: toolforge.component.deploy: use bump_ as default branch (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1059905 (owner: 10David Caro) [12:00:19] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component components-api [12:00:24] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api [12:03:09] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:05:28] !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api [12:06:50] (03open) 10raymond-ndibe: Draft: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) [12:11:09] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:16:24] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [12:22:00] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10057377 (10Jgiannelos) Hey, after some reports of CI failures caused... [12:25:18] (03update) 10raymond-ndibe: Draft: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [12:25:45] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api [12:29:10] (03update) 10raymond-ndibe: Draft: [maintain-kubeusers] increment default quota for pods, cpu, mem [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/58 (https://phabricator.wikimedia.org/T341066) [12:31:56] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api [12:32:33] (03approved) 10dcaro: jobs-api: bump to 0.0.328-20240812103324-5e8e6c58 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/477 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:32:36] (03merge) 10dcaro: jobs-api: bump to 0.0.328-20240812103324-5e8e6c58 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/477 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [12:33:11] (03update) 10raymond-ndibe: Draft: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [13:12:24] (03close) 10raymond-ndibe: Draft: [maintain-kubeusers] increment default quota for pods, cpu, mem [repos/cloud/toolforge/maintain-kubeusers] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/58 (https://phabricator.wikimedia.org/T341066) [13:20:08] (03update) 10raymond-ndibe: Draft: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [13:22:22] 10Data-Services: enwiki.analytics.db.svc.wikimedia.cloud still on replag for more than a week - https://phabricator.wikimedia.org/T372224#10057648 (10fnegri) Replag is back to zero on enwiki.analytics.db.svc.wikimedia.cloud, but only because we temporarily depooled clouddb1017 (T367856#10057258) and all traf... [13:23:57] (03update) 10raymond-ndibe: Draft: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) [13:24:37] (03update) 10raymond-ndibe: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [13:26:26] (03update) 10raymond-ndibe: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [13:27:56] (03update) 10raymond-ndibe: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) [13:29:44] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10057688 (10Andrew) On 2024-08-05 I sent this email to the releng and... [13:39:49] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/32 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:40:48] (03update) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/32 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:41:32] (03merge) 10dcaro: pre-commit: Autoupdate [repos/cloud/toolforge/api-gateway] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/api-gateway/-/merge_requests/32 (owner: 10group_203_bot_4866fc124f4b41659f667468a6115cf3) [13:43:22] (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: api-gateway: bump to 0.0.38-20240812134143-8502acf1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/478 [14:12:18] 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10057898 (10dcaro) So there's three related tables in the postrges database, `execution`, `task` and `schedule`, where the `vendor_type` is `EXECUTION_SWEEP`. The code that throws t... [14:12:39] 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10057899 (10dcaro) To get to the DB: ` root@tools-harbor-1:~# cat /srv/ops/harbor/harbor.yml | grep '^ password:'| cut -d: -f2; psql -h 5ujoynvlt5c.svc.trove.eqiad1.wikimedia.clou... [14:20:04] 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10057923 (10dcaro) jobs are stored in redis: ` root@toolsbeta-harbor-1:~# docker exec -ti redis redis-cli -n 2 127.0.0.1:6379[2]> scan 0 match *9ab67b659a81a000d4a945e4* count 10000... [14:21:07] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10057924 (10Jgiannelos) Given that current production works on bullsey... [14:25:21] 10Tool-yearinreview, 06Indic MediaWiki Developers UG, 06Indic-TechCom: Fix padding between two rows - https://phabricator.wikimedia.org/T365843#10057935 (10KKsurendran06) hey @KCVelaga It's still a work in progress. I've been occupied with other commitments for a while. I plan to resume development in about... [14:25:41] 06cloud-services-team, 10Cloud-VPS (Debian Buster Deprecation), 10Beta-Cluster-Infrastructure: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) - https://phabricator.wikimedia.org/T370460#10057938 (10Andrew) >>! In T370460#10057924, @Jgiannelos wrote: > Any... [15:07:25] 10Data-Services, 06DBA: Prepare and check storage layer for bdrwiki - https://phabricator.wikimedia.org/T371759#10058103 (10ABran-WMF) 05Open→03In progress p:05Triage→03Medium All done, ready for the views creation. [15:13:15] !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [15:19:02] !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [15:27:43] !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component api-gateway [15:33:02] !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway [15:34:09] (03approved) 10dcaro: api-gateway: bump to 0.0.38-20240812134143-8502acf1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/478 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:34:11] (03update) 10dcaro: api-gateway: bump to 0.0.38-20240812134143-8502acf1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/478 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:34:12] (03merge) 10dcaro: api-gateway: bump to 0.0.38-20240812134143-8502acf1 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/478 (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38) [15:44:11] !log dcaro@cloudcumin1001 admin END (ERROR) - Cookbook wmcs.ceph.osd.undrain_node (exit_code=97) (T363344) [15:44:16] T363344: Q4:rack/setup/install cloudcephosd10[35-38] - https://phabricator.wikimedia.org/T363344 [16:20:13] 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [gabina] - https://phabricator.wikimedia.org/T372153#10058392 (10bd808) @Gabinaluz Please use the second half of the instructions at https://wikitech.wikimedia.org/wiki/Password_and_2FA_res... [16:27:14] (03open) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:34:30] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:35:57] (03update) 10dcaro: Draft: Run as a build service tool [toolforge-repos/fourohfour] - 10https://gitlab.wikimedia.org/toolforge-repos/fourohfour/-/merge_requests/10 (owner: 10taavi) [16:36:09] (03update) 10dcaro: Run as a build service tool [toolforge-repos/fourohfour] - 10https://gitlab.wikimedia.org/toolforge-repos/fourohfour/-/merge_requests/10 (owner: 10taavi) [16:37:37] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:38:16] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:40:55] 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [gabina] - https://phabricator.wikimedia.org/T372153#10058471 (10Gabinaluz) Thank you. I followed the instructions. Please check `bastion-eqiad1-03.bastion.eqiad1.wikimedia.cloud:/home/gabi... [16:41:30] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:42:12] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:43:28] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:46:34] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:50:55] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [16:57:33] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: Test new hardware candidate for cloudbackup replacement - https://phabricator.wikimedia.org/T353746#10058585 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host testhost2001.codfw.wmnet with OS bookworm [16:58:14] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [17:01:44] (03PS2) 10BryanDavis: config: Index https://gitlab.wikimedia.org/toolforge-repos/* [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1060493 (https://phabricator.wikimedia.org/T371992) [17:02:10] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:03:54] (03CR) 10BryanDavis: config: Index https://gitlab.wikimedia.org/toolforge-repos/* (031 comment) [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1060493 (https://phabricator.wikimedia.org/T371992) (owner: 10BryanDavis) [17:05:49] (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321) [17:07:09] RESOLVED: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:16:09] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [17:17:40] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Data-Services: [wikireplicas] frequent replag spikes in clouddb hosts - https://phabricator.wikimedia.org/T367778#10058676 (10fnegri) > I repooled clouddb1019, and reverted my change in the --busy-time parameter of clouddb1015. Let's see what happens to replag in th... [17:19:06] (03update) 10raymond-ndibe: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) [17:22:01] (03update) 10raymond-ndibe: [jobs-api] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/115 (https://phabricator.wikimedia.org/T341066) [17:23:52] (03update) 10raymond-ndibe: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) [17:24:23] (03update) 10raymond-ndibe: [jobs-cli] multi-replica support for continuous jobs [repos/cloud/toolforge/jobs-cli] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/63 (https://phabricator.wikimedia.org/T341066) [17:25:09] 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [gabina] - https://phabricator.wikimedia.org/T372153#10058713 (10bd808) 05Open→03Resolved a:03bd808 `lang=shell-session root@bastion-eqiad1-03:~# ls -lh /home/gabina/2fa-reset-req... [17:30:41] 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [jobs-api,jobs-cli] Support multiple replicas of continuous jobs - https://phabricator.wikimedia.org/T341066#10058721 (10Raymond_Ndibe) 05Open→03In progress [17:35:26] 06cloud-services-team, 06DC-Ops, 10ops-codfw, 06SRE: Test new hardware candidate for cloudbackup replacement - https://phabricator.wikimedia.org/T353746#10058736 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host testhost2001.codfw.wmnet with OS bookworm comple... [17:49:21] 10Cloud-VPS, 10Striker, 10Tool-gitlab-account-approval, 10Tool-phab-ban, and 6 others: Removal of writeapi from siteinfo output breaks all mwclient-based bots, including stashbot (Server Admin Log) - https://phabricator.wikimedia.org/T371977#10058772 (10AdamWill) mwclient 0.11.0 is now released with this f... [17:51:58] 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993#10058779 (10bd808) >>! In T371993#10054106, @Tgr wrote: > The other major fallout was {T372017}. AWB is still using SVN so that sounds like a challenge. I don't know that codesearch has tried usi... [18:02:57] 10Tool-phab-ban: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372310 (10bd808) 03NEW [18:03:12] 10Tool-phab-ban: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372310#10058844 (10bd808) [18:04:00] 10Tool-gitlab-account-approval: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372311 (10bd808) 03NEW [18:04:16] 10Tool-gitlab-account-approval: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372311#10058858 (10bd808) [18:04:55] 10Tool-schedule-deployment: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372312 (10bd808) 03NEW [18:05:00] 10Tool-schedule-deployment: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372312#10058872 (10bd808) [18:05:26] 10Striker: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372313 (10bd808) 03NEW [18:05:47] 10Striker: Upgrade to mwclient 0.11.0 - https://phabricator.wikimedia.org/T372313#10058888 (10bd808) [18:40:03] 10VPS-project-Codesearch: Index known popular MediaWiki client libraries - https://phabricator.wikimedia.org/T371993#10059033 (10Ladsgroup) It's still better to migrate off SVN in general. @Reedy wanted to look into it. I don't know if he did. [20:00:28] 10Cloud-VPS, 10Bitu, 06Infrastructure-Foundations: Find or create .deb package for mwclient 0.11.0 (or mwclient 0.10.0 with writeapi dependency removed) - https://phabricator.wikimedia.org/T372345 (10bd808) 03NEW [20:26:34] 10Cloud-VPS (Project-requests), 10Beta-Cluster-Infrastructure: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353 (10bd808) 03NEW [20:33:34] 06cloud-services-team: radosgw+keystone chokes on projects with '-' in their id - https://phabricator.wikimedia.org/T341509#10059777 (10bd808) [21:16:24] FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning [21:31:03] FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses [21:39:06] 10Cloud-VPS (Debian Buster Deprecation), 10Humaniki: Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#10060016 (10Maximilianklein) update for 2024-08-12 [x] create cinder volume. [x] move project code [x] move mysql-db files [x] create a new debian bookworm inst... [22:25:04] 06cloud-services-team, 10Cloud-VPS, 07ARM support: Support terraform.wmcloud.org/registry/cloudvps on MacOS arm64 clients - https://phabricator.wikimedia.org/T372361 (10bd808) 03NEW [23:17:16] 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365 (10bd808) 03NEW [23:24:39] 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365#10060191 (10bd808) a:03bd808 I'm not quite sure how to start troubleshooting this at the moment... I guess I will start by digging around in logs. [23:25:34] 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10060197 (10KFrancis) Hi @Htriedman, my apologies for the delay in getting back to you. Please send your pers... [23:37:34] 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365#10060208 (10bd808) https://gitlab.wikimedia.org/bd808/deployment-prep-opentofu is the tofu config I was trying to apply. The only thing not committ... [23:57:44] 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365#10060218 (10bd808) Things are well and truly broken now: `counterexample $ tofu plan Planning failed. OpenTofu encountered an error while generatin...