[00:15:28] (InstanceDown) firing: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:20:28] (InstanceDown) resolved: Project tf-infra-test instance tf-infra-test is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [00:53:31] (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8612807079154386/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [01:54:47] 10Grid-Engine-to-K8s-Migration: Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320195#9553976 (10Sj) It seems Naypta is not around. Someone else has tentatively offered to take over bot maintenance, and could use guidance, see [[ https://en.wikipedia.org/... [02:47:47] 10Grid-Engine-to-K8s-Migration, 10Chinese-Sites: Migrate zhwiki-perm-qualicheck from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T357568#9553986 (10Shizhao) [03:58:31] (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8627037458764815/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [06:58:31] (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8642322504975275/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [07:59:48] 10cloud-services-team, 10sre-alert-triage: Alert in need of triage: Wikitech-static MW version up to date (instance wikitech-static.wikimedia.org) - https://phabricator.wikimedia.org/T357880#9554174 (10LSobanski) [08:37:48] 10cloud-services-team, 10wikitech.wikimedia.org, 10sre-alert-triage: Alert in need of triage: Wikitech-static MW version up to date (instance wikitech-static.wikimedia.org) - https://phabricator.wikimedia.org/T357880#9554249 (10Peachey88) [08:51:26] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554256 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198 wd-shex-infer: update al... [08:52:44] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554262 (10dcaro) >>! In T357209#9553241, @LucasWerkmeister wrote: > I guess I also need the [limitrange](https://kubernetes.io/docs/tasks/administer-clu... [08:57:02] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [08:57:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:57:31] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [08:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:07] !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers [09:03:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:03:37] !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers [09:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:10:41] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138#9554279 (10taavi) Still looks good: `lang=shell-session ubuntu@dbapp:~$ df -h /var/lib/postgresql/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 501G 189G... [09:10:45] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554280 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/198 wd-shex-infer: update al... [09:11:56] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554281 (10dcaro) Updated the limitrange: ` root@tools-k8s-control-6:~# kubectl -n tool-wd-shex-infer get limitrange tool-wd-shex-infer -o json | jq '.sp... [09:12:30] 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554283 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudweb1004.wikimedia.org with OS bullseye [09:31:44] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554295 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/maintain-kubeusers/-/merge_requests/12 quota: allow overriding... [09:31:49] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554297 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 opened https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/... [09:34:51] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554298 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/199 maintain-kubeusers: bump... [09:40:25] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554308 (10dcaro) p:05Triage→03Medium [09:40:31] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554311 (10dcaro) 05Open→03In progress [09:40:46] 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554313 (10dcaro) [09:40:52] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554312 (10dcaro) [09:41:09] 10Toolforge (Toolforge iteration 05), 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554311 (10dcaro) [09:41:57] 10Toolforge (Toolforge iteration 05), 10cloud-services-team (FY2023/2024-Q3-Q4), 10Cloud-Services-Origin-Team, 10Cloud-Services-Worktype-Project, 10User-dcaro: [maintain-kubeusers] Allow setting the requests cpu and mem quota - https://phabricator.wikimedia.org/T357881#9554318 (10CodeReviewBot) dcaro upd... [09:49:01] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.pre-reimage prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184) [09:49:01] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.pre-reimage (exit_code=99) prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184) [09:49:07] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [09:52:10] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.pre-reimage prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184) [09:55:15] 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554327 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudweb1004.wikimedia.org with OS bullseye completed: - cloudweb1004 (**WARN**... [09:58:31] (ToolsNfsAlmostFull) firing: Toolforge NFS is 0.8655062025652949/1 full - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsNfsAlmostFull - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsNfsAlmostFull [09:59:34] 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554329 (10taavi) [10:04:18] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554335 (10aborrero) [10:09:12] !log aborrero@cloudcumin1001 admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.pre-reimage (exit_code=0) prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184) [10:09:18] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [10:10:05] PROBLEM - nova-compute proc minimum on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [10:10:21] PROBLEM - ensure kvm processes are running on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [10:13:39] 10cloud-services-team, 10wikitech.wikimedia.org: Upgrade cloudweb hosts to Bullseye - https://phabricator.wikimedia.org/T356966#9554371 (10taavi) 05Open→03Resolved This is complete, and I migrated all cloudweb hosts to Puppet 7. [10:22:20] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554417 (10Slst2020) a:03Slst2020 [10:26:23] (03PS9) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) [10:30:49] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudvirt2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:30:53] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudvirt2004-dev:9100 - https://phabricator.wikimedia.org/T357886#9554423 (10phaultfinder) [10:33:48] (PuppetZeroResources) firing: Puppet has failed generate resources on cloudnet2008-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:33:54] 10cloud-services-team: PuppetZeroResources Zero Puppet resources on cloudnet2008-dev:9100 - https://phabricator.wikimedia.org/T357887#9554430 (10phaultfinder) [10:35:45] (WidespreadPuppetFailure) firing: Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [10:35:52] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554442 (10phaultfinder) [10:35:57] (PuppetZeroResources) firing: (4) Puppet has failed generate resources on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:40:52] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554460 (10phaultfinder) [10:40:57] (PuppetZeroResources) firing: (9) Puppet has failed generate resources on cloudcontrol2004-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:45:45] (WidespreadPuppetFailure) firing: (2) Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [10:45:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554481 (10phaultfinder) [10:45:57] (PuppetZeroResources) firing: (14) Puppet has failed generate resources on cloudcontrol2001-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:48:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554499 (10phaultfinder) [10:48:54] (PuppetZeroResources) firing: (3) Puppet has failed generate resources on cloudcontrol2007-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:50:53] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554442 (10phaultfinder) [10:51:02] (PuppetZeroResources) firing: (19) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:52:26] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554507 (10Slst2020) 05Open→03In progress [10:55:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554518 (10phaultfinder) [10:56:06] (PuppetZeroResources) firing: (20) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:58:48] (PuppetZeroResources) firing: (5) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [10:58:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554521 (10phaultfinder) [11:00:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554524 (10phaultfinder) [11:01:06] (PuppetZeroResources) firing: (25) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:05:58] (PuppetZeroResources) firing: (24) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:08:49] (PuppetZeroResources) firing: (5) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:10:53] (PuppetZeroResources) firing: (24) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:11:27] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9554542 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1032.eqiad.wmnet with OS... [11:13:49] (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:13:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554553 (10phaultfinder) [11:15:54] 10cloud-services-team: PuppetZeroResources - https://phabricator.wikimedia.org/T357889#9554564 (10phaultfinder) [11:16:07] (PuppetZeroResources) firing: (23) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:18:49] (PuppetZeroResources) firing: (6) Puppet has failed generate resources on cloudcontrol2006-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:20:45] (WidespreadPuppetFailure) firing: (2) Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [11:21:02] (PuppetZeroResources) firing: (23) Puppet has failed generate resources on cloudbackup2002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [11:27:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-81 [11:28:17] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-81 [11:28:54] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:30:45] (WidespreadPuppetFailure) resolved: (2) Puppet has failed on wmcs cluster - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?orgId=1&viewPanel=3&var-cluster=wmcs - https://alerts.wikimedia.org/?q=alertname%3DWidespreadPuppetFailure [11:30:55] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554617 (10taavi) a:03taavi [11:32:33] 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9554621 (10dcaro) [11:35:24] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554637 (10taavi) [11:35:28] (InstanceDown) firing: Project tools instance tools-k8s-worker-81 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:39:01] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-37.tools.eqiad1.wikimedia.cloud to the cluster [11:39:01] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [11:39:20] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-82 [11:40:00] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-82 [11:40:28] (InstanceDown) resolved: Project tools instance tools-k8s-worker-81 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [11:40:36] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:43:25] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554675 (10Slst2020) @Albertoleoncio Your tool has been granted write access to Elasticsearch now. The credentials are available to your tool as [[ https://wikitech.wikimedia.o... [11:44:18] 10Toolforge, 10cloud-services-team: Elasticsearch credential request for capacity-exchange - https://phabricator.wikimedia.org/T357227#9554676 (10Slst2020) 05In progress→03Resolved [11:44:43] 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554678 (10taavi) [11:44:45] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9554679 (10taavi) [11:45:04] 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9287249 (10taavi) `digero` is currently using 144G of storage. [11:50:05] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud to the cluster [11:50:05] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [11:50:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-38 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:51:45] 10Tools: wiki-osm.pl: Use of uninitialized value within @kml in lc at /data/project/osm4wiki/public_html/cgi-bin/wiki/wiki-osm.pl line 166. - https://phabricator.wikimedia.org/T357899#9554688 (10taavi) [11:53:10] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-83 [11:53:49] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-83 [11:54:06] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [11:55:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-38 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [11:56:18] (PuppetZeroResources) resolved: Puppet has failed generate resources on cloudgw2002-dev:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [12:00:50] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184) [12:00:54] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184) [12:00:56] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:02:17] (03PS10) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add post-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004116 (https://phabricator.wikimedia.org/T357765) [12:02:28] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184) [12:02:31] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184) [12:03:58] 10cloud-services-team, 10Infrastructure-Foundations, 10SRE, 10netops, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9554726 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1032.eqiad.wmnet with OS book... [12:04:08] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-39.tools.eqiad1.wikimedia.cloud to the cluster [12:04:09] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [12:05:26] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-84 [12:06:06] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-84 [12:08:00] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:11:41] (CloudVPSDesignateLeaks) firing: Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:12:28] (InstanceDown) firing: Project tools instance tools-k8s-worker-84 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:15:01] 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554766 (10jberkel) I've deleted tmp and other unused stuff it's now down to 16GB, is that acceptable? [12:16:03] 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554769 (10taavi) That's better. Is there a way to ensure I don't need to manually ping here each time this happens? [12:16:41] (CloudVPSDesignateLeaks) firing: (2) Detected 3 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:17:28] (InstanceDown) resolved: Project tools instance tools-k8s-worker-84 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [12:17:48] 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9554772 (10jberkel) I'll add a command to automatically clear the tmp storage, that should help [12:18:33] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-40.tools.eqiad1.wikimedia.cloud to the cluster [12:18:34] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [12:19:43] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-85 [12:20:25] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-85 [12:23:50] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:24:34] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [12:25:18] (03CR) 10CI reject: [V: 04-1] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1004654 (owner: 10L10n-bot) [12:30:18] 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554800 (10taavi) [12:30:25] 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554813 (10taavi) [12:30:28] (PuppetAgentNoResources) firing: No Puppet resources found on instance tools-k8s-worker-nfs-38 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:30:33] 10Toolforge (Toolforge iteration 05), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656#7145936 (10taavi) [12:32:25] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud [12:32:43] (03PS11) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add post-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004116 (https://phabricator.wikimedia.org/T357765) [12:32:49] !log aborrero@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184) [12:32:54] T319184: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184 [12:33:11] !log aborrero@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184) [12:33:24] 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554822 (10aborrero) +1 [12:33:46] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud [12:33:47] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.openstack.quota_increase (T357901) [12:33:51] T357901: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901 [12:33:55] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T357901) [12:34:29] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:34:45] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [12:34:52] 10Toolforge (Toolforge iteration 05), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656#9554833 (10taavi) [12:34:58] !log aborrero@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) [12:35:53] 10Cloud-VPS (Quota-requests), 10Tools: Request increased server-group-members quota for tools - https://phabricator.wikimedia.org/T357901#9554831 (10taavi) 05Open→03Resolved a:03taavi [12:39:10] !log aborrero@cloudcumin1001 cloudvirt-canary START - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary [12:39:23] !log aborrero@cloudcumin1001 cloudvirt-canary END (FAIL) - Cookbook wmcs.openstack.cloudvirt.lib.ensure_canary (exit_code=99) [12:40:28] (PuppetAgentNoResources) resolved: No Puppet resources found on instance tools-k8s-worker-nfs-38 on project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [12:42:15] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [labs/tools/Isa] - 10https://gerrit.wikimedia.org/r/1004654 (owner: 10L10n-bot) [12:42:50] 10cloud-services-team, 10User-aborrero: openstack: nova refuses to admit a compute node after a reimage - https://phabricator.wikimedia.org/T357631#9554855 (10aborrero) Update, after trying the procedure described above by @Andrew I get: ` Feb 19 12:40:36 cloudvirt1032 nova-compute[27450]: 2024-02-19 12:40:36... [12:44:32] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-41.tools.eqiad1.wikimedia.cloud to the cluster [12:44:32] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [12:44:44] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-86 [12:45:23] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-86 [12:46:13] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [12:55:00] (03CR) 10David Caro: openstack: cloudvirt: add pre-reimage cookbook (035 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [12:56:20] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-42.tools.eqiad1.wikimedia.cloud to the cluster [12:56:20] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [12:58:15] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-87 [12:58:56] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-87 [12:59:45] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [13:08:09] (03CR) 10Majavah: openstack: cloudvirt: add pre-reimage cookbook (032 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [13:09:39] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9554918 (10LucasWerkmeister) Hm, the job ran now but something didn’t work: `lang=shell-session $ kubectl logs pod/wd-shex-infer-101-mgxk9... [13:09:52] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-43.tools.eqiad1.wikimedia.cloud to the cluster [13:09:52] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [13:09:55] 10Toolforge (Quota-requests), 10Patch-For-Review: Request increased memory quota for wd-shex-infer Toolforge tool - https://phabricator.wikimedia.org/T357209#9554919 (10LucasWerkmeister) Thanks, the updated limitrange seems to be working! [13:15:50] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9554938 (10taavi) You seem to be manually adding the volume mounts instead of relying the admission controller, and the code is not adding the `kubernetes.wmcloud.or... [13:16:19] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-5 [13:17:01] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-5 [13:34:40] PROBLEM - ensure kvm processes are running on cloudvirt1032 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [13:40:30] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9555003 (10LucasWerkmeister) I see… so the difference to the successful jobs in the test tool is just that I got unlucky with the placement this time. Thanks! One m... [14:02:09] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9555079 (10taavi) [14:02:13] 10Toolforge, 10cloud-services-team: tools-nfs-2 almost out of disk space (October 2023 edition) - https://phabricator.wikimedia.org/T349895#9555080 (10taavi) [14:02:19] 10Tools: 'digero' tool uses an unreasonable amount of disk space - https://phabricator.wikimedia.org/T349899#9555076 (10taavi) 05Open→03Resolved a:03jberkel Thanks! [14:03:57] 10Toolforge: 2024-02-19: toolforge NFS cleanup - https://phabricator.wikimedia.org/T357882#9555083 (10taavi) 05Open→03Resolved We are back to 77% which should be fine for now. [14:10:00] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9555089 (10taavi) `toolforge: tool` will automatically mount all volumes and add the required config for that. It's not strictly required as you can add the mounts m... [14:10:07] 10Cloud-VPS, 10cloud-services-team, 10User-aborrero: Improve cloudgw filter between VM instances and cloud-private - https://phabricator.wikimedia.org/T356986#9555091 (10cmooney) >>! In T356986#9529581, @taavi wrote: >> ii - Traffic from VMs to specific cloud-private destinations, using as many rules as nee... [14:26:12] 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9555153 (10dcaro) p:05Triage→03Medium [14:27:44] 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9555157 (10taavi) [14:28:47] 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555160 (10taavi) [14:55:52] 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555212 (10dcaro) I thought this was closed xd, that's why I opened a new one [15:02:35] 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555220 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/12 run: add filelog to buildservice if passed [15:02:48] 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555222 (10CodeReviewBot) dcaro updated https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/61 wrap buildservice [15:22:43] 10Tool-Pageviews, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555301 (10Framawiki) p:05Triage→03High [15:28:38] 10Tool-Pageviews, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555331 (10Framawiki) I don't know if it's just a temporary processing delay, or a breakage. But given the different user reports the same day, I prefer to fill a task. [15:28:41] 10Cloud-Services: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555321 (10Magnus) The #Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task... [15:35:36] 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9555388 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/rep... [15:36:49] 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9555393 (10CodeReviewBot) project_1317_bot_df3177307bed93c3f34e421e26c86e38 ope... [15:38:29] 10Tool-Pageviews, 10Data Products, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555397 (10lbowmaker) [15:43:21] 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555422 (10dcaro) [15:43:24] 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555433 (10dcaro) p:05Triage→03High [15:43:37] 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555434 (10Raymond_Ndibe) [15:45:02] 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555447 (10taavi) Dupe of {T348755}? [15:46:09] 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555452 (10dcaro) >>! In T357915#9555447, @taavi wrote: > Dupe of {T348755}? yes, I looked into the `Toolforge` tag, not `toolforge jobs api` :facepalm: [15:47:05] 10Toolforge, 10User-Raymond_Ndibe: [webservice,jobs-api] Move logic to an extendend continuous job - https://phabricator.wikimedia.org/T357915#9555454 (10dcaro) [15:48:06] 10Toolforge, 10Epic: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9555456 (10dcaro) [15:48:17] 10Toolforge, 10Epic, 10User-Raymond_Ndibe: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9246268 (10dcaro) [15:48:20] 10Toolforge, 10Epic, 10User-Raymond_Ndibe: Run webservices via the jobs framework - https://phabricator.wikimedia.org/T348755#9555465 (10Raymond_Ndibe) [15:54:43] 10Tool-hitaden, 10Toolforge Build Service: [buildservice,nodejs] nodejs buildpack does not take envvars into account - https://phabricator.wikimedia.org/T353557#9555475 (10dcaro) 05Open→03Resolved a:03dcaro [15:55:04] 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10User-Raymond_Ndibe, 10User-aborrero: [harbor] Deploy with Helm - https://phabricator.wikimedia.org/T356301#9555482 (10dcaro) [15:55:06] 10Toolforge, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal: Move harbor data to object storage service - https://phabricator.wikimedia.org/T350687#9555483 (10dcaro) [15:55:32] 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q3-Q4), 10Goal, 10User-Raymond_Ndibe, 10User-aborrero: [harbor] Deploy with Helm - https://phabricator.wikimedia.org/T356301#9555484 (10dcaro) p:05Triage→03Medium [15:55:50] 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service: Build service: Calling nontrivial Procfile commands with arguments results in confusing error (“no such file or directory”) - https://phabricator.wikimedia.org/T356016#9555486 (10dcaro) p:05Triage→03Medium [15:56:27] 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#9555489 (10dcaro) p:05Triage→03Medium [15:56:56] 10Toolforge Build Service: [apt-buildpack] Installed python scripts with a hardcoded shebang to the python binary will not work when installing new pythons - https://phabricator.wikimedia.org/T356500#9555490 (10dcaro) p:05Triage→03Low [16:02:12] (03PS10) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) [16:05:06] (03PS11) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) [16:06:32] 10Toolforge Build Service, 10Upstream: Python buildpack does not detect requirements from pyproject.toml - https://phabricator.wikimedia.org/T353762#9555515 (10dcaro) I've added the link to this task to all the bulidservice python tutorials for people to discover :) [16:07:55] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [16:08:20] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9555520 (10dcaro) [16:10:13] 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555528 (10dcaro) [16:11:18] (03PS12) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) [16:11:51] 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555529 (10dcaro) 05Open→03In progress [16:12:56] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9555539 (10dcaro) [16:12:59] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9555541 (10dcaro) [16:13:01] 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9555540 (10dcaro) [16:13:18] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9449702 (10dcaro) [16:13:25] 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework, 10Patch-For-Review, 10User-aborrero: toolforge: introduce OpenAPI to jobs framework - https://phabricator.wikimedia.org/T356523#9509999 (10dcaro) [16:13:43] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge API] Investigate ways to present our multiple Openapi definitions to a future consolidated CLI client - https://phabricator.wikimedia.org/T354745#9449702 (10dcaro) [16:13:46] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9505172 (10dcaro) [16:13:48] 10Toolforge (Toolforge iteration 05), 10Epic: Consolidate the Toolforge CLIs - https://phabricator.wikimedia.org/T356262#9555545 (10dcaro) [16:16:42] (CloudVPSDesignateLeaks) firing: (2) Detected 16 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:23:07] (03CR) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook (033 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [16:23:22] (03CR) 10Arturo Borrero Gonzalez: openstack: cloudvirt: add pre-reimage cookbook (033 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/1004088 (https://phabricator.wikimedia.org/T357765) (owner: 10Arturo Borrero Gonzalez) [17:09:02] 10Toolforge (Toolforge iteration 05): [jobs] Enable filelog for buildservice-based images - https://phabricator.wikimedia.org/T357897#9555759 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/12 run: add filelog to buildservice if passed [17:09:06] 10Toolforge (Toolforge iteration 05), 10Toolforge Jobs framework: Allow using file logs with build service images - https://phabricator.wikimedia.org/T353537#9555760 (10CodeReviewBot) dcaro closed https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-cli/-/merge_requests/12 run: add filelog to buildservice... [17:14:19] 10Cloud-VPS: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555761 (10JJMC89) [17:20:15] 10Cloud-VPS: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555789 (10taavi) Did you try rebooting this already? [17:26:18] 10Toolforge, 10cloud-services-team, 10Documentation, 10Kubernetes: Figure out and document how to call the Kubernetes API as your tool user from inside a pod - https://phabricator.wikimedia.org/T321919#9555812 (10dcaro) [17:26:21] 10Toolforge (Toolforge iteration 05), 10User-aborrero: [toolforge] simplify calling the different toolforge apis from within the containers - https://phabricator.wikimedia.org/T356377#9555813 (10dcaro) [17:27:40] 10Tool-Pageviews, 10Data Products, 10Data-Engineering, 10Pageviews-API: No Pageviews data since 2024-02-17 - https://phabricator.wikimedia.org/T357910#9555814 (10Sfaci) @BTullis and I have been working on this just a couple of hours ago. A DAG was stuck on Saturday because of a out-of-memory error. We fixe... [17:31:27] 10Cloud-VPS: petscan4 VM inaccessible - https://phabricator.wikimedia.org/T357911#9555831 (10Magnus) 05Open→03Resolved a:03Magnus Seems fixed now [18:35:27] 10Wikibugs: Remove legacy taxonomy.py script - https://phabricator.wikimedia.org/T357928#9555982 (10bd808) [18:36:30] 10Wikibugs: bd808's big pile of refactoring ideas - https://phabricator.wikimedia.org/T357851#9556002 (10bd808) [18:36:45] 10Wikibugs: Replace pywikibot with mwclient in taxonomy.py - https://phabricator.wikimedia.org/T357852#9555999 (10bd808) 05Open→03Declined Lets do {T357928} instead per @valhallasw's suggestion. [18:44:51] !log raymond@ubuntu toolsbeta START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [18:44:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [18:46:01] !log raymond@ubuntu toolsbeta END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [18:46:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [19:03:44] !log raymond@ubuntu tools START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder [19:03:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:04:48] !log raymond@ubuntu tools END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder [19:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:05:14] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140#9556066 (10LucasWerkmeister) Looks like the required config also includes the `TOOL_DATA_DIR` env variable, so I can probably stop setting that explicitly. (Right no... [19:17:46] 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10Patch-For-Review, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9556089 (10CodeReviewBot) raymond-ndibe merged https://gitlab.wikimedia.org/rep... [20:01:31] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 3671 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [20:16:42] (CloudVPSDesignateLeaks) firing: (2) Detected 16 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:17:31] 10Toolforge Build Service, 10Documentation: [tbs] Improve Harbor quota handling and docs - https://phabricator.wikimedia.org/T351092#9556194 (10Raymond_Ndibe) [20:18:13] 10Toolforge (Toolforge iteration 05), 10Toolforge Build Service, 10User-Raymond_Ndibe: [tbs] Give a meaningful error message when a user exceeds their Harbor quota - https://phabricator.wikimedia.org/T351178#9556193 (10Raymond_Ndibe) 05In progress→03Resolved [21:14:16] 10Cloud-Services, 10cloud-services-team: Replace or deprecate WMCS uses of report updater - https://phabricator.wikimedia.org/T357856#9556220 (10bd808) @Milimetric, do you know the answer to this question? If I understand correctly, folks are basically wondering if the https://analytics.wikimedia.org/publishe... [21:54:50] 10Wikibugs, 10User-bd808: wikibugs having a hard time staying connected to libera.chat IRC network - https://phabricator.wikimedia.org/T357729#9556270 (10bd808) >>! In T357729#9552954, @valhallasw wrote: > Is there any way to get a `tcpdump` for the bot? There's obviously no root access in the container but ma... [22:08:36] 10Cloud-Services, 10cloud-services-team: Replace or deprecate WMCS uses of report updater - https://phabricator.wikimedia.org/T357856#9556274 (10lbowmaker) Thanks @bd808 - I wasn’t aware that was the output and based on those recent-ish tickets I am confident that this is still being used and generated by Repo... [23:06:31] (ToolsToolsDBReplicationLagIsTooHigh) firing: ToolsDB replication on tools-db-2 is lagging behind the primary, the current lag is 14753 - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolsDBReplication - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolsToolsDBReplicationLagIsTooHigh [23:27:34] 10ToolforgeBundle, 10CopyPatrol, 10Community-Tech (CommTech-Kanban): Session can't be invalidated, causing problems with language selection - https://phabricator.wikimedia.org/T357821#9556377 (10MusikAnimal) >>! In T357821#9554022, @Samwilson wrote: > PR for the lang selection: https://github.com/wikimedia/T...