[00:15:28] <wmcs-alerts>	 (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:20:28] <wmcs-alerts>	 (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[00:53:31] <wmcs-alerts>	 (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8612807079154386/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull
[01:54:47] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195#9553976 (10Sj) It seems Naypta is not around. Someone else has tentatively offered to take over bot maintenance, and could use guidance, see [[ https://en.wikipedia.org/...
[02:47:47] <wikibugs>	 10Grid-Engine-to-K8s-Migration, 10Chinese-Sites: Migrate zhwiki-perm-qualicheck from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357568#9553986 (10Shizhao)
[03:58:31] <wmcs-alerts>	 (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8627037458764815/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull
[06:58:31] <wmcs-alerts>	 (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8642322504975275/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull
[07:59:48] <wikibugs>	 10cloud-services-team, 10sre-alert-triage: Alert in need of triage: Wikitech-static MW version up to date (instance wikitech-static.wikimedia.org) - https://phabricator.wikimedia.org/T357880#9554174 (10LSobanski)
[08:37:48] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org, 10sre-alert-triage: Alert in need of triage: Wikitech-static MW version up to date (instance wikitech-static.wikimedia.org) - https://phabricator.wikimedia.org/T357880#9554249 (10Peachey88)
[08:51:26] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554256 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198  wd-shex-infer: update al...
[08:52:44] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554262 (10dcaro) >>! In T357209#9553241, @LucasWerkmeister wrote: > I guess I also need the [limitrange](https://kubernetes.io/docs/tasks/administer-clu...
[08:57:02] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers
[08:57:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[08:57:31] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers
[08:57:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:03:07] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers
[09:03:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:03:37] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers
[09:03:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[09:10:41] <wikibugs>	 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138#9554279 (10taavi) Still looks good: `lang=shell-session ubuntu@dbapp:~$ df -h /var/lib/postgresql/ Filesystem      Size  Used Avail Use% Mounted on /dev/sdb        501G  189G...
[09:10:45] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554280 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198  wd-shex-infer: update al...
[09:11:56] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554281 (10dcaro) Updated the limitrange: ` root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get limitrange tool-wd-shex-infer -o json | jq '.sp...
[09:12:30] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554283 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudweb1004.wikimedia.org with OS bullseye
[09:31:44] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554295 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/12  quota: allow overriding...
[09:31:49] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554297 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/...
[09:34:51] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554298 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/199  maintain-kubeusers: bump...
[09:40:25] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554308 (10dcaro) p:05Triage→03Medium
[09:40:31] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554311 (10dcaro) 05Open→03In progress
[09:40:46] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554313 (10dcaro)
[09:40:52] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554312 (10dcaro)
[09:41:09] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554311 (10dcaro)
[09:41:57] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554318 (10CodeReviewBot) dcaro upd...
[09:49:01] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.pre-reimage prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)
[09:49:01] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.pre-reimage (exit_code=99) prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)
[09:49:07] <stashbot>	 T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184
[09:52:10] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.pre-reimage prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)
[09:55:15] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554327 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudweb1004.wikimedia.org with OS bullseye completed: - cloudweb1004 (**WARN**...
[09:58:31] <wmcs-alerts>	 (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8655062025652949/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull
[09:59:34] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554329 (10taavi)
[10:04:18] <wikibugs>	 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554335 (10aborrero)
[10:09:12] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.pre-reimage (exit_code=0) prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)
[10:09:18] <stashbot>	 T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184
[10:10:05] <icinga-wm_>	 PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[10:10:21] <icinga-wm_>	 PROBLEM - ensure kvm processes are running on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[10:13:39] <wikibugs>	 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554371 (10taavi) 05Open→03Resolved This is complete, and I migrated all cloudweb hosts to Puppet 7.
[10:22:20] <wikibugs>	 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554417 (10Slst2020) a:03Slst2020
[10:26:23] <wikibugs>	 (03PS9) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765)
[10:30:49] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on cloudvirt2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:30:53] <wikibugs>	 10cloud-services-team: PuppetZeroResources  Zero Puppet resources on cloudvirt2004-dev:9100 - https://phabricator.wikimedia.org/T357886#9554423 (10phaultfinder)
[10:33:48] <jinxer-wm>	 (PuppetZeroResources) firing: Puppet has failed generate resources on cloudnet2008-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:33:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources  Zero Puppet resources on cloudnet2008-dev:9100 - https://phabricator.wikimedia.org/T357887#9554430 (10phaultfinder)
[10:35:45] <jinxer-wm>	 (WidespreadPuppetFailure) firing: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[10:35:52] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554442 (10phaultfinder)
[10:35:57] <jinxer-wm>	 (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:40:52] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554460 (10phaultfinder)
[10:40:57] <jinxer-wm>	 (PuppetZeroResources) firing: (9) Puppet has failed generate resources on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:45:45] <jinxer-wm>	 (WidespreadPuppetFailure) firing: (2) Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[10:45:53] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554481 (10phaultfinder)
[10:45:57] <jinxer-wm>	 (PuppetZeroResources) firing: (14) Puppet has failed generate resources on cloudcontrol2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:48:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554499 (10phaultfinder)
[10:48:54] <jinxer-wm>	 (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudcontrol2007-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:50:53] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554442 (10phaultfinder)
[10:51:02] <jinxer-wm>	 (PuppetZeroResources) firing: (19) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:52:26] <wikibugs>	 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554507 (10Slst2020) 05Open→03In progress
[10:55:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554518 (10phaultfinder)
[10:56:06] <jinxer-wm>	 (PuppetZeroResources) firing: (20) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:58:48] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[10:58:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554521 (10phaultfinder)
[11:00:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554524 (10phaultfinder)
[11:01:06] <jinxer-wm>	 (PuppetZeroResources) firing: (25) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:05:58] <jinxer-wm>	 (PuppetZeroResources) firing: (24) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:08:49] <jinxer-wm>	 (PuppetZeroResources) firing: (5) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:10:53] <jinxer-wm>	 (PuppetZeroResources) firing: (24) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:11:27] <wikibugs>	 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9554542 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1032.eqiad.wmnet with OS...
[11:13:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:13:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554553 (10phaultfinder)
[11:15:54] <wikibugs>	 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554564 (10phaultfinder)
[11:16:07] <jinxer-wm>	 (PuppetZeroResources) firing: (23) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:18:49] <jinxer-wm>	 (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:20:45] <jinxer-wm>	 (WidespreadPuppetFailure) firing: (2) Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[11:21:02] <jinxer-wm>	 (PuppetZeroResources) firing: (23) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[11:27:36] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-81
[11:28:17] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-81
[11:28:54] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[11:30:45] <jinxer-wm>	 (WidespreadPuppetFailure) resolved: (2) Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure
[11:30:55] <wikibugs>	 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554617 (10taavi) a:03taavi
[11:32:33] <wikibugs>	 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9554621 (10dcaro)
[11:35:24] <wikibugs>	 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554637 (10taavi)
[11:35:28] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-k8s-worker-81 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:39:01] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-37.tools.eqiad1.wikimedia.cloud to the cluster
[11:39:01] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[11:39:20] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-82
[11:40:00] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-82
[11:40:28] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-k8s-worker-81 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[11:40:36] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[11:43:25] <wikibugs>	 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554675 (10Slst2020) @Albertoleoncio Your tool has been granted write access to Elasticsearch now. The credentials are available to your tool as [[ https://wikitech.wikimedia.o...
[11:44:18] <wikibugs>	 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554676 (10Slst2020) 05In progress→03Resolved
[11:44:43] <wikibugs>	 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554678 (10taavi)
[11:44:45] <wikibugs>	 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554679 (10taavi)
[11:45:04] <wikibugs>	 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9287249 (10taavi) `digero` is currently using 144G of storage.
[11:50:05] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud to the cluster
[11:50:05] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[11:50:28] <wmcs-alerts>	 (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-38 in project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[11:51:45] <wikibugs>	 10Tools: wiki-osm.pl: Use of uninitialized value within @kml in lc at /data/project/osm4wiki/public_html/cgi-bin/wiki/wiki-osm.pl line 166. - https://phabricator.wikimedia.org/T357899#9554688 (10taavi)
[11:53:10] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-83
[11:53:49] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-83
[11:54:06] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[11:55:28] <wmcs-alerts>	 (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-38 in project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[11:56:18] <jinxer-wm>	 (PuppetZeroResources) resolved: Puppet has failed generate resources on cloudgw2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources
[12:00:50] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)
[12:00:54] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)
[12:00:56] <stashbot>	 T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184
[12:02:17] <wikibugs>	 (03PS10) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add post-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004116 (https://phabricator.wikimedia.org/T357765)
[12:02:28] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)
[12:02:31] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)
[12:03:58] <wikibugs>	 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9554726 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1032.eqiad.wmnet with OS book...
[12:04:08] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-39.tools.eqiad1.wikimedia.cloud to the cluster
[12:04:09] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[12:05:26] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-84
[12:06:06] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-84
[12:08:00] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[12:11:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[12:12:28] <wmcs-alerts>	 (InstanceDown) firing: Project tools instance tools-k8s-worker-84 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[12:15:01] <wikibugs>	 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554766 (10jberkel) I've deleted tmp and other unused stuff it's now down to 16GB, is that acceptable?
[12:16:03] <wikibugs>	 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554769 (10taavi) That's better. Is there a way to ensure I don't need to manually ping here each time this happens?
[12:16:41] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[12:17:28] <wmcs-alerts>	 (InstanceDown) resolved: Project tools instance tools-k8s-worker-84 is down   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown
[12:17:48] <wikibugs>	 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554772 (10jberkel) I'll add a command to automatically clear the tmp storage, that should help
[12:18:33] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-40.tools.eqiad1.wikimedia.cloud to the cluster
[12:18:34] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[12:19:43] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-85
[12:20:25] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-85
[12:23:50] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[12:24:34] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster
[12:25:18] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1004654 (owner: 10L10n-bot)
[12:30:18] <wikibugs>	 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554800 (10taavi)
[12:30:25] <wikibugs>	 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554813 (10taavi)
[12:30:28] <wmcs-alerts>	 (PuppetAgentNoResources) firing: No Puppet resources found on instance tools-k8s-worker-nfs-38 on project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[12:30:33] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656#7145936 (10taavi)
[12:32:25] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud
[12:32:43] <wikibugs>	 (03PS11) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add post-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004116 (https://phabricator.wikimedia.org/T357765)
[12:32:49] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)
[12:32:54] <stashbot>	 T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184
[12:33:11] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)
[12:33:24] <wikibugs>	 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554822 (10aborrero) +1
[12:33:46] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud
[12:33:47] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase (T357901)
[12:33:51] <stashbot>	 T357901: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901
[12:33:55] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T357901)
[12:34:29] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[12:34:45] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary
[12:34:52] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656#9554833 (10taavi)
[12:34:58] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99)
[12:35:53] <wikibugs>	 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554831 (10taavi) 05Open→03Resolved a:03taavi
[12:39:10] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary
[12:39:23] <logmsgbot_cloud>	 !log aborrero@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99)
[12:40:28] <wmcs-alerts>	 (PuppetAgentNoResources) resolved: No Puppet resources found on instance tools-k8s-worker-nfs-38 on project tools   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources
[12:42:15] <wikibugs>	 (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1004654 (owner: 10L10n-bot)
[12:42:50] <wikibugs>	 10cloud-services-team, 10User-aborrero: openstack: nova refuses to admit a compute node after a reimage - https://phabricator.wikimedia.org/T357631#9554855 (10aborrero) Update, after trying the procedure described above by @Andrew I get:  ` Feb 19 12:40:36 cloudvirt1032 nova-compute[27450]: 2024-02-19 12:40:36...
[12:44:32] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-41.tools.eqiad1.wikimedia.cloud to the cluster
[12:44:32] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[12:44:44] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-86
[12:45:23] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-86
[12:46:13] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[12:55:00] <wikibugs>	 (03CR) 10David Caro: openstack: cloudvirt: add pre-reimage cookbook (035 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez)
[12:56:20] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-42.tools.eqiad1.wikimedia.cloud to the cluster
[12:56:20] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[12:58:15] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-87
[12:58:56] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-87
[12:59:45] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster
[13:08:09] <wikibugs>	 (03CR) 10Majavah: openstack: cloudvirt: add pre-reimage cookbook (032 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez)
[13:09:39] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9554918 (10LucasWerkmeister) Hm, the job ran now but something didn’t work:  `lang=shell-session $ kubectl logs pod/wd-shex-infer-101-mgxk9...
[13:09:52] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-43.tools.eqiad1.wikimedia.cloud to the cluster
[13:09:52] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster
[13:09:55] <wikibugs>	 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554919 (10LucasWerkmeister) Thanks, the updated limitrange seems to be working!
[13:15:50] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9554938 (10taavi) You seem to be manually adding the volume mounts instead of relying the admission controller, and the code is not adding the `kubernetes.wmcloud.or...
[13:16:19] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-5
[13:17:01] <logmsgbot_cloud>	 !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-5
[13:34:40] <icinga-wm_>	 PROBLEM - ensure kvm processes are running on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[13:40:30] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9555003 (10LucasWerkmeister) I see… so the difference to the successful jobs in the test tool is just that I got unlucky with the placement this time. Thanks!  One m...
[14:02:09] <wikibugs>	 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9555079 (10taavi)
[14:02:13] <wikibugs>	 10Toolforge, 10cloud-services-team: tools-nfs-2 almost out of disk space (October 2023 edition) - https://phabricator.wikimedia.org/T349895#9555080 (10taavi)
[14:02:19] <wikibugs>	 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9555076 (10taavi) 05Open→03Resolved a:03jberkel Thanks!
[14:03:57] <wikibugs>	 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9555083 (10taavi) 05Open→03Resolved We are back to 77% which should be fine for now.
[14:10:00] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9555089 (10taavi) `toolforge: tool` will automatically mount all volumes and add the required config for that. It's not strictly required as you can add the mounts m...
[14:10:07] <wikibugs>	 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: Improve cloudgw filter between VM instances and cloud-private - https://phabricator.wikimedia.org/T356986#9555091 (10cmooney) >>! In T356986#9529581, @taavi wrote: >>  ii - Traffic from VMs to specific cloud-private destinations, using as many rules as nee...
[14:26:12] <wikibugs>	 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9555153 (10dcaro) p:05Triage→03Medium
[14:27:44] <wikibugs>	 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9555157 (10taavi)
[14:28:47] <wikibugs>	 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555160 (10taavi)
[14:55:52] <wikibugs>	 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555212 (10dcaro) I thought this was closed xd, that's why I opened a new one
[15:02:35] <wikibugs>	 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555220 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/12  run: add filelog to buildservice if passed
[15:02:48] <wikibugs>	 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555222 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/61  wrap buildservice
[15:22:43] <wikibugs>	 10Tool-Pageviews, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555301 (10Framawiki) p:05Triage→03High
[15:28:38] <wikibugs>	 10Tool-Pageviews, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555331 (10Framawiki) I don't know if it's just a temporary processing delay, or a breakage. But given the different user reports the same day, I prefer to fill a task.
[15:28:41] <wikibugs>	 10Cloud-Services: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555321 (10Magnus) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task...
[15:35:36] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9555388 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/rep...
[15:36:49] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9555393 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 ope...
[15:38:29] <wikibugs>	 10Tool-Pageviews, 10Data Products, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555397 (10lbowmaker)
[15:43:21] <wikibugs>	 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555422 (10dcaro)
[15:43:24] <wikibugs>	 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555433 (10dcaro) p:05Triage→03High
[15:43:37] <wikibugs>	 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555434 (10Raymond_Ndibe)
[15:45:02] <wikibugs>	 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555447 (10taavi) Dupe of {T348755}?
[15:46:09] <wikibugs>	 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555452 (10dcaro) >>! In T357915#9555447, @taavi wrote: > Dupe of {T348755}?  yes, I looked into the `Toolforge` tag, not `toolforge jobs api` :facepalm:
[15:47:05] <wikibugs>	 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555454 (10dcaro)
[15:48:06] <wikibugs>	 10Toolforge, 10Epic: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9555456 (10dcaro)
[15:48:17] <wikibugs>	 10Toolforge, 10Epic, 10User-Raymond_Ndibe: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9246268 (10dcaro)
[15:48:20] <wikibugs>	 10Toolforge, 10Epic, 10User-Raymond_Ndibe: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9555465 (10Raymond_Ndibe)
[15:54:43] <wikibugs>	 10Tool-hitaden, 10Toolforge Build Service: [buildservice,nodejs] nodejs buildpack does not take envvars into account - https://phabricator.wikimedia.org/T353557#9555475 (10dcaro) 05Open→03Resolved a:03dcaro
[15:55:04] <wikibugs>	 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10User-Raymond_Ndibe, 10User-aborrero: [harbor] Deploy with Helm - https://phabricator.wikimedia.org/T356301#9555482 (10dcaro)
[15:55:06] <wikibugs>	 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal: Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#9555483 (10dcaro)
[15:55:32] <wikibugs>	 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10User-Raymond_Ndibe, 10User-aborrero: [harbor] Deploy with Helm - https://phabricator.wikimedia.org/T356301#9555484 (10dcaro) p:05Triage→03Medium
[15:55:50] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: Build service: Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#9555486 (10dcaro) p:05Triage→03Medium
[15:56:27] <wikibugs>	 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#9555489 (10dcaro) p:05Triage→03Medium
[15:56:56] <wikibugs>	 10Toolforge Build Service: [apt-buildpack] Installed python scripts with a hardcoded shebang to the python binary will not work when installing new pythons - https://phabricator.wikimedia.org/T356500#9555490 (10dcaro) p:05Triage→03Low
[16:02:12] <wikibugs>	 (03PS10) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765)
[16:05:06] <wikibugs>	 (03PS11) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765)
[16:06:32] <wikibugs>	 10Toolforge Build Service, 10Upstream: Python buildpack does not detect requirements from pyproject.toml - https://phabricator.wikimedia.org/T353762#9555515 (10dcaro) I've added the link to this task to all the bulidservice python tutorials for people to discover :)
[16:07:55] <wikibugs>	 (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez)
[16:08:20] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9555520 (10dcaro)
[16:10:13] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555528 (10dcaro)
[16:11:18] <wikibugs>	 (03PS12) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765)
[16:11:51] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555529 (10dcaro) 05Open→03In progress
[16:12:56] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9555539 (10dcaro)
[16:12:59] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9555541 (10dcaro)
[16:13:01] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9555540 (10dcaro)
[16:13:18] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9449702 (10dcaro)
[16:13:25] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9509999 (10dcaro)
[16:13:43] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9449702 (10dcaro)
[16:13:46] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9505172 (10dcaro)
[16:13:48] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Epic: Consolidate the Toolforge CLIs - https://phabricator.wikimedia.org/T356262#9555545 (10dcaro)
[16:16:42] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 16 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[16:23:07] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook (033 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez)
[16:23:22] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook (033 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez)
[17:09:02] <wikibugs>	 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9555759 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/12  run: add filelog to buildservice if passed
[17:09:06] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555760 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/12  run: add filelog to buildservice...
[17:14:19] <wikibugs>	 10Cloud-VPS: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555761 (10JJMC89)
[17:20:15] <wikibugs>	 10Cloud-VPS: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555789 (10taavi) Did you try rebooting this already?
[17:26:18] <wikibugs>	 10Toolforge, 10cloud-services-team, 10Documentation, 10Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#9555812 (10dcaro)
[17:26:21] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9555813 (10dcaro)
[17:27:40] <wikibugs>	 10Tool-Pageviews, 10Data Products, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555814 (10Sfaci) @BTullis and I have been working on this just a couple of hours ago. A DAG was stuck on Saturday because of a out-of-memory error. We fixe...
[17:31:27] <wikibugs>	 10Cloud-VPS: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555831 (10Magnus) 05Open→03Resolved a:03Magnus Seems fixed now
[18:35:27] <wikibugs>	 10Wikibugs: Remove legacy taxonomy.py script - https://phabricator.wikimedia.org/T357928#9555982 (10bd808)
[18:36:30] <wikibugs>	 10Wikibugs: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9556002 (10bd808)
[18:36:45] <wikibugs>	 10Wikibugs: Replace pywikibot with mwclient in taxonomy.py - https://phabricator.wikimedia.org/T357852#9555999 (10bd808) 05Open→03Declined Lets do {T357928} instead per @valhallasw's suggestion.
[18:44:51] <wm-bot2>	 !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder
[18:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[18:46:01] <wm-bot2>	 !log raymond@ubuntu toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder
[18:46:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL
[19:03:44] <wm-bot2>	 !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder
[19:03:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[19:04:48] <wm-bot2>	 !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder
[19:04:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[19:05:14] <wikibugs>	 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9556066 (10LucasWerkmeister) Looks like the required config also includes the `TOOL_DATA_DIR` env variable, so I can probably stop setting that explicitly. (Right no...
[19:17:46] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9556089 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/rep...
[20:01:31] <wmcs-alerts>	 (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 3671 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh
[20:16:42] <jinxer-wm>	 (CloudVPSDesignateLeaks) firing: (2) Detected 16 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[20:17:31] <wikibugs>	 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#9556194 (10Raymond_Ndibe)
[20:18:13] <wikibugs>	 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9556193 (10Raymond_Ndibe) 05In progress→03Resolved
[21:14:16] <wikibugs>	 10Cloud-Services, 10cloud-services-team: Replace or deprecate WMCS uses of report updater - https://phabricator.wikimedia.org/T357856#9556220 (10bd808) @Milimetric, do you know the answer to this question?  If I understand correctly, folks are basically wondering if the https://analytics.wikimedia.org/publishe...
[21:54:50] <wikibugs>	 10Wikibugs, 10User-bd808: wikibugs having a hard time staying connected to libera.chat IRC network - https://phabricator.wikimedia.org/T357729#9556270 (10bd808) >>! In T357729#9552954, @valhallasw wrote: > Is there any way to get a `tcpdump` for the bot? There's obviously no root access in the container but ma...
[22:08:36] <wikibugs>	 10Cloud-Services, 10cloud-services-team: Replace or deprecate WMCS uses of report updater - https://phabricator.wikimedia.org/T357856#9556274 (10lbowmaker) Thanks @bd808 - I wasn’t aware that was the output and based on those recent-ish tickets I am confident that this is still being used and generated by Repo...
[23:06:31] <wmcs-alerts>	 (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 14753 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh
[23:27:34] <wikibugs>	 10ToolforgeBundle, 10CopyPatrol, 10Community-Tech (CommTech-Kanban): Session can't be invalidated, causing problems with language selection - https://phabricator.wikimedia.org/T357821#9556377 (10MusikAnimal) >>! In T357821#9554022, @Samwilson wrote: > PR for the lang selection: https://github.com/wikimedia/T...