[00:08:28] <wmcs-alerts>	 FIRING: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:13:28] <wmcs-alerts>	 RESOLVED: PuppetAgentStaleLastRun: Last Puppet run was over 24 hours ago on instance tf-infra-test in project tf-infra-test   - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun
[00:31:29] <wmcs-alerts>	 FIRING: [2x] PuppetCertificateAboutToExpire: Puppet CA certificate coibot.linkwatcher.eqiad.wmflabs is about to expire in 25d 23h 48m 37s - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/PuppetCertificateAboutToExpire  - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetCertificateAboutToExpire
[00:46:55] <jinxer-wm>	 FIRING: MaxConntrack: Max conntrack at 80.27% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[00:51:55] <jinxer-wm>	 RESOLVED: MaxConntrack: Max conntrack at 80.27% on cloudvirt1050:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConntrack
[01:16:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:16:24] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[01:16:33] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:21:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:21:48] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:22:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:24:33] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:29:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[01:38:01] <wikibugs>	 (03approved) 10tstarling: Add hourly update-focus-areas command [toolforge-repos/wishlist] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist/-/merge_requests/1 (https://phabricator.wikimedia.org/T364648) (owner: 10samwilson)
[01:48:25] <wikibugs>	 (03update) 10samwilson: Add hourly update-focus-areas command [toolforge-repos/wishlist] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist/-/merge_requests/1 (https://phabricator.wikimedia.org/T364648)
[01:50:48] <wikibugs>	 (03update) 10samwilson: Add hourly update-focus-areas command [toolforge-repos/wishlist] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist/-/merge_requests/1 (https://phabricator.wikimedia.org/T364648)
[01:53:36] <wikibugs>	 (03merge) 10samwilson: Add hourly update-focus-areas command [toolforge-repos/wishlist] - 10https://gitlab.wikimedia.org/toolforge-repos/wishlist/-/merge_requests/1 (https://phabricator.wikimedia.org/T364648)
[02:10:27] <wikibugs>	 10Cloud-VPS (Project-requests), 10Beta-Cluster-Infrastructure: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353#10060292 (10Andrew) +1 workaround ridiculous bug
[02:13:14] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10060295 (10Htriedman) @KFrancis email sent!  and @SLyngshede-WMF this hasn't happened yet, but I'm wondering...
[03:53:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[04:48:18] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[04:48:48] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[05:16:25] <jinxer-wm>	 FIRING: CephClusterInWarning: Ceph cluster in eqiad is in warning status - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephClusterInWarning - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephClusterInWarning
[05:41:28] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10060402 (10Raymond_Ndibe) >>! In T370843#10057898, @dcaro wrote: > So there's three related tables in the postrges database, `execution`, `task` and `schedule`, where the `vendor_ty...
[05:47:39] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10060404 (10Raymond_Ndibe) Do we have to manually create an `execution` and corresponding `task` for the above failing `schedules`? can that solve our problem?
[06:01:14] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10060409 (10Raymond_Ndibe) * Also something to think about: majority of our schedules are `RETENTION` (I dare say more than 80%). Can the fact that we have all of those schedules sch...
[07:23:06] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10060461 (10dcaro) >>! In T370843#10060404, @Raymond_Ndibe wrote: > Do we have to manually create an `execution` and corresponding `task` for the above failing `schedules`? can that...
[07:26:58] <wikibugs>	 10Cloud-VPS (Project-requests), 10Beta-Cluster-Infrastructure: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353#10060463 (10dcaro) 05Open→03In progress a:03dcaro
[07:28:14] <wm-bot2>	 !log dcaro@urcuchillay deployment_prep_s3 START - Cookbook wmcs.vps.create_project for project deployment_prep_s3 in eqiad1 (T372353)
[07:28:15] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "deployment_prep_s3"
[07:28:15] <stashbot>	 T372353: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353
[07:28:26] <wm-bot2>	 !log dcaro@urcuchillay deployment_prep_s3 END (FAIL) - Cookbook wmcs.vps.create_project (exit_code=99) for project deployment_prep_s3 in eqiad1 (T372353)
[07:28:26] <stashbot>	 wmbot~dcaro@urcuchillay: Unknown project "deployment_prep_s3"
[07:32:32] <wikibugs>	 10Cloud-VPS (Project-requests), 10Beta-Cluster-Infrastructure: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353#10060499 (10dcaro) Unfortunately, underscores are not valid domain name characters, so the name would have to be something like `deploymentpreps3`, is th...
[07:33:29] <wm-bot2>	 !log dcaro@urcuchillay tools START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6
[07:33:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[07:39:15] <wm-bot2>	 !log dcaro@urcuchillay tools END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6
[07:39:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[07:40:37] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: `webservice` requires effective user to be the tool user and listed in NSS passwd data - https://phabricator.wikimedia.org/T369569#10060514 (10dcaro) 05In progress→03Resolved
[07:42:29] <wikibugs>	 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [jobs-api] Remove authentication and use the api-gateway provided headers - https://phabricator.wikimedia.org/T367180#10060518 (10dcaro) 05In progress→03Resolved
[07:47:09] <wikibugs>	 (03update) 10dcaro: auth: use the header passed by the api gateway [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/106 (https://phabricator.wikimedia.org/T367180)
[08:09:55] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#10060574 (10ayounsi) 05Resolved→03Open https://netbox.wikimedia.org/extras/scripts/results/78992/ `cloudcephosd1039 (WMF11571)  /dcim/devices/5296/  Pr...
[08:24:33] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-6 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[09:22:07] <wikibugs>	 (03open) 10dcaro: toolforge_deploy_mr: use the correct name when registering an mr [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/181
[09:22:31] <wikibugs>	 (03update) 10dcaro: toolforge_deploy_mr: use the correct name when registering an mr [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/181
[09:26:10] <wikibugs>	 (03close) 10dcaro: toolforge_deploy_mr: use the correct name when registering an mr [repos/cloud/toolforge/lima-kilo] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_requests/181
[09:31:42] <wikibugs>	 (03approved) 10dcaro: auth: use the header passed by the api gateway [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/106 (https://phabricator.wikimedia.org/T367180)
[09:31:48] <wikibugs>	 (03merge) 10dcaro: auth: use the header passed by the api gateway [repos/cloud/toolforge/jobs-api] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/106 (https://phabricator.wikimedia.org/T367180)
[09:34:42] <wikibugs>	 (03open) 10project_1317_bot_df3177307bed93c3f34e421e26c86e38: jobs-api: bump to 0.0.329-20240813093158-b193b876 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/479 (https://phabricator.wikimedia.org/T367180)
[09:38:51] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[09:39:20] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#10060929 (10dcaro) I got this when trying to set the fqdn (checked others that have the fqdn set on the ipv6, and they don't have the role set, maybe a new...
[09:40:49] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api
[09:40:57] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[09:42:20] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api
[09:43:05] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[09:44:22] <jinxer-wm>	 FIRING: HAProxyBackendUnavailable: HAProxy service wikireplica-db-web-s5 backend clouddb1016.eqiad.wmnet is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[09:48:10] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 toolsbeta END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api
[09:48:47] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10060978 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fnegri@cumin1002 for host clouddb1016.eqiad.wmnet with OS bookworm
[09:49:13] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools START - Cookbook wmcs.toolforge.component.deploy for component jobs-api
[09:49:22] <jinxer-wm>	 FIRING: [2x] HAProxyBackendUnavailable: HAProxy service wikireplica-db-web-s5 backend clouddb1016.eqiad.wmnet is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[09:54:42] <logmsgbot_cloud>	 !log dcaro@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api
[10:03:38] <wikibugs>	 (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321)
[10:07:09] <wikibugs>	 (03approved) 10dcaro: jobs-api: bump to 0.0.329-20240813093158-b193b876 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/479 (https://phabricator.wikimedia.org/T367180) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38)
[10:07:11] <wikibugs>	 (03update) 10dcaro: jobs-api: bump to 0.0.329-20240813093158-b193b876 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/479 (https://phabricator.wikimedia.org/T367180) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38)
[10:07:13] <wikibugs>	 (03merge) 10dcaro: jobs-api: bump to 0.0.329-20240813093158-b193b876 [repos/cloud/toolforge/toolforge-deploy] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/479 (https://phabricator.wikimedia.org/T367180) (owner: 10project_1317_bot_df3177307bed93c3f34e421e26c86e38)
[10:09:11] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14): [components-api] Get a skeleton of API webservice and implement `/tool/<toolname>/deploy` with build-only features - https://phabricator.wikimedia.org/T362069#10061039 (10dcaro) a:05dcaro→03Slst2020
[10:09:44] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] Investigate how to deactivate wal from trove for postrges databases - https://phabricator.wikimedia.org/T370845#10061031 (10dcaro) 05Open→03Declined This is invalid now, if we fix the cleanup processes we don't care about the archival (it would be good actua...
[10:11:33] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14): [sct.frontend] Show the backend status - https://phabricator.wikimedia.org/T370324#10061041 (10dcaro) 05In progress→03Resolved
[10:13:17] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [sct.backend] Create worker and connect to redis - https://phabricator.wikimedia.org/T370321#10061036 (10dcaro) 05Open→03In progress
[10:19:22] <jinxer-wm>	 RESOLVED: [2x] HAProxyBackendUnavailable: HAProxy service wikireplica-db-web-s5 backend clouddb1016.eqiad.wmnet is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable
[10:27:00] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10061059 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fnegri@cumin1002 for host clouddb1016.eqiad.wmnet with OS bookworm completed: - cl...
[10:28:02] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Data-Services, 05Goal: Upgrade clouddb* hosts to Bookworm - https://phabricator.wikimedia.org/T365424#10061075 (10fnegri)
[10:53:17] <wikibugs>	 (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321)
[10:54:21] <wikibugs>	 (03open) 10dcaro: show task status [toolforge-repos/sample-complex-app-frontend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/2
[10:59:29] <wikibugs>	 (03update) 10dcaro: show task status [toolforge-repos/sample-complex-app-frontend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-frontend/-/merge_requests/2 (https://phabricator.wikimedia.org/T370321)
[11:00:59] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Toolforge (Toolforge iteration 14), 13Patch-For-Review: [sct.backend] Create worker and connect to redis - https://phabricator.wikimedia.org/T370321#10061157 (10dcaro)
[11:51:35] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install cloudcephosd10[39-41] - https://phabricator.wikimedia.org/T363341#10061264 (10dcaro) 05Open→03Resolved Done :)
[12:11:48] <wikibugs>	 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394 (10rook) 03NEW
[12:11:50] <wikibugs>	 10superset.wmcloud.org: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372395 (10rook) 03NEW
[12:15:23] <wikibugs>	 (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321)
[12:20:26] <wikibugs>	 (03update) 10dcaro: worker: add simple task and worker process [toolforge-repos/sample-complex-app-backend] - 10https://gitlab.wikimedia.org/toolforge-repos/sample-complex-app-backend/-/merge_requests/1 (https://phabricator.wikimedia.org/T370321)
[12:24:38] <wikibugs>	 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394#10061438 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/62
[12:24:45] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/62
[12:30:04] <wikibugs>	 10Quarry: remove k8s_123_2 cluster from tofu - https://phabricator.wikimedia.org/T372397 (10rook) 03NEW
[12:31:05] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/quarry/pull/63
[12:34:46] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/quarry/pull/63
[12:40:18] <wikibugs>	 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394#10061464 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/62
[12:40:27] <notefromgithub>	 vivian-rook closed https://github.com/toolforge/quarry/pull/62
[12:46:08] <wikibugs>	 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394#10061471 (10rook) 05Open→03Resolved a:03rook
[12:46:40] <wikibugs>	 10Quarry: remove k8s_123_2 cluster from tofu - https://phabricator.wikimedia.org/T372397#10061479 (10rook) https://github.com/toolforge/quarry/pull/63
[12:46:53] <wikibugs>	 10Quarry: remove k8s_123_2 cluster from tofu - https://phabricator.wikimedia.org/T372397#10061480 (10rook) 05Open→03Resolved
[12:48:57] <notefromgithub>	 vivian-rook opened https://github.com/toolforge/superset-deploy/pull/29
[12:53:38] <wikibugs>	 06cloud-services-team, 10Cloud-VPS, 06Infrastructure-Foundations, 10netops, and 2 others: Upgrade cloudsw1-c8-eqiad and cloudsw1-d5-eqiad to Junos 20+ - https://phabricator.wikimedia.org/T316544#10061502 (10cmooney)
[12:53:43] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10061503 (10cmooney)
[12:59:21] <wikibugs>	 10superset.wmcloud.org: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372395#10061546 (10rook) Looks like we're getting `     'global.postgresql.auth.postgresPassword' must not be empty, please add '--set global.postgresql.auth.postgresPassword=$POSTGRES_PASSWORD' to the comma...
[13:11:27] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401 (10DaxServer) 03NEW
[13:14:37] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401#10061578 (10DaxServer) Toolserver verification:  ` tools-bastion-13.tools.eqiad1.wikimedia.cloud:/home/daxserver/password-reset-reque...
[13:22:09] <wikibugs>	 10superset.wmcloud.org: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372395#10061610 (10rook) Adding the password gives new errors: ` WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/rook/superset-deploy/tofu/kube.config coalesce.go:...
[13:27:50] <wikibugs>	 10superset.wmcloud.org: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372395#10061643 (10rook) May be worth deploying to a parallel cluster to see if error persists in a new cluster.
[13:33:08] <wikibugs>	 10superset.wmcloud.org: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372395#10061662 (10fnegri)
[13:33:10] <wikibugs>	 10cloud-services-team (FY2023/2024-Q3-Q4), 10superset.wmcloud.org: Allow Superset to query ToolsDB public databases - https://phabricator.wikimedia.org/T367393#10061663 (10fnegri)
[13:40:53] <wikibugs>	 (03PS1) 10Lokal Profil: Updating toolforge login host [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/1062402
[13:52:15] <wikibugs>	 10cloud-services-team (Hardware), 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install cloudlb2004-dev - https://phabricator.wikimedia.org/T370678#10061766 (10Jhancock.wm) a:03Jhancock.wm
[13:59:52] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10061785 (10Raymond_Ndibe) >>! In T370843#10060461, @dcaro wrote: >>>! In T370843#10060404, @Raymond_Ndibe wrote: >> Do we have to manually create an `execution` and corresponding `t...
[14:04:41] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10061798 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=50666174-cba4-46b9-8fa9-cdf8d3361058) set by cmooney@cumin1002 for 0:40:00 on 7 host(s) and their servi...
[14:05:35] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10061799 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=3db725ef-06d9-4ef6-8e5f-eecd4b7c5f0f) set by cmooney@cumin1002 for 0:30:00 on 30 host(s) and their serv...
[14:12:03] <icinga-wm>	 PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 508 bytes in 3.011 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[14:14:35] <wikibugs>	 10Tools, 06Infrastructure-Foundations: Requested offboarding-to-volunteer of HTriedman // Transfer ownership of SpinachBot from HTriedman (WMF) to HTriedman - https://phabricator.wikimedia.org/T371644#10061826 (10DSeyfert_WMF) Hi @Htriedman - we've kept your Wiki and 1Password accounts active given your pendin...
[14:25:46] <wikibugs>	 (03PS1) 10Krinkle: Reduce memory usage [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1062418
[14:27:39] <wikibugs>	 (03PS2) 10Krinkle: Reduce memory usage [labs/codesearch] - 10https://gerrit.wikimedia.org/r/1062418
[14:32:49] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10061889 (10dcaro) > No @dcaro, execution records for the failing schedules do not exist.  We only have one one schedule with the `id` of `3218`, `vendor_id` of `-1` and `vendor_type...
[15:04:28] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10062061 (10dcaro) by looking at https://github.com/goharbor/harbor/blob/ccceacfa73db3cb26e2dd3ef8ffa8f706eef3030/src/jobservice/sync/schedule.go#L249, I suspect that the policy as l...
[15:27:55] <jinxer-wm>	 RESOLVED: [2x] GaleraClusterSizeMismatch: Galera in eqiad1 has 2 nodes - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/GaleraClusterSizeMismatch - https://grafana.wikimedia.org/d/galera-cluster-summary/wmcs-openstack-eqiad-galera-cluster-summary - https://alerts.wikimedia.org/?q=alertname%3DGaleraClusterSizeMismatch
[15:28:49] <wikibugs>	 10Cloud-VPS (Project-requests), 10Beta-Cluster-Infrastructure: Request creation of deployment_prep_s3 VPS project - https://phabricator.wikimedia.org/T372353#10062128 (10bd808) >>! In T372353#10060499, @dcaro wrote: > Unfortunately, underscores are not valid domain name characters, so the name would have to be...
[15:40:51] <wikibugs>	 10Toolforge (Toolforge iteration 14): [harbor] 2024-07-24 Tools harbor db out of space - https://phabricator.wikimedia.org/T370843#10062154 (10dcaro) >>! In T370843#10062104, @Raymond_Ndibe wrote: >>>! In T370843#10062061, @dcaro wrote: >> by looking at https://github.com/goharbor/harbor/blob/ccceacfa73db3cb26e2...
[15:42:04] <icinga-wm>	 PROBLEM - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 508 bytes in 3.015 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[15:54:29] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401#10062184 (10DaxServer) Please add the email: daxserver@icloud.com
[16:15:30] <wikibugs>	 (03update) 10dcaro: [toolforge-weld] move _display_message into toolforge weld [repos/cloud/toolforge/toolforge-weld] - 10https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-weld/-/merge_requests/46 (owner: 10raymond-ndibe)
[16:23:20] <wikibugs>	 10cloud-services-team (FY2024/2025-Q1-Q2), 10Cloud-VPS: [network,D5] reboot cloudsw-d5 - https://phabricator.wikimedia.org/T371878#10062290 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=15f30d47-cb35-4a71-a13e-bd0b11e61af8) set by cmooney@cumin1002 for 6:00:00 on 7 host(s) and their servi...
[16:54:29] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401#10062374 (10bd808) 05Open→03In progress a:03bd808
[16:56:39] <jinxer-wm>	 RESOLVED: CephSlowOps: Ceph cluster in eqiad has 2 slow ops - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/CephSlowOps - https://grafana.wikimedia.org/d/P1tFnn3Mk/wmcs-ceph-eqiad-health?orgId=1&search=open&tag=ceph&tag=health&tag=WMCS - https://alerts.wikimedia.org/?q=alertname%3DCephSlowOps
[17:01:20] <jinxer-wm>	 RESOLVED: [4x] NeutronAgentDown: Neutron neutron-openvswitch-agent on cloudvirt1031 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown
[17:08:14] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401#10062397 (10bd808) 05In progress→03Resolved @DaxServer I can see your newly set email address in the read-only LDAP replica n...
[17:32:00] <icinga-wm>	 RECOVERY - toolschecker: All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.372 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[17:52:10] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401#10062495 (10DaxServer) Thanks @bd808 I changed the email address and have a new password. When I login using the "DaxServer" acco...
[18:01:46] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Trust-and-Safety: Account recovery help needed for Developer account [DaxServer] - https://phabricator.wikimedia.org/T372401#10062541 (10bd808) >>! In T372401#10062495, @DaxServer wrote: > However, when I move on to idm.wikimedia.org, the account with "d...
[19:46:00] <jinxer-wm>	 FIRING: OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[19:48:03] <wmcs-alerts>	 FIRING: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[19:51:00] <jinxer-wm>	 FIRING: [2x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse
[20:10:59] <wikibugs>	 06cloud-services-team, 10wikitech.wikimedia.org, 06Infrastructure-Foundations, 06serviceops: wikitech self-auth: Allow wikitech to use its own internal authentication - https://phabricator.wikimedia.org/T371588#10062856 (10bd808)
[20:34:47] <wikibugs>	 10Cloud-VPS (Debian Buster Deprecation), 10Humaniki: Cloud VPS "wikidumpparse" project Buster deprecation - https://phabricator.wikimedia.org/T367561#10062930 (10Maximilianklein) update for 2024-08-13  [x] create cinder volume. [x] move project code [x] move mysql-db files [x] create a new debian bookworm inst...
[21:02:01] <wikibugs>	 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365#10063009 (10bd808) Manual cleanup of tofu failure: ` $ sudo wmcs-openstack coe cluster list +--------------------------------------+---------------...
[21:19:41] <jinxer-wm>	 FIRING: CloudVPSDesignateLeaks: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks
[21:23:03] <wmcs-alerts>	 RESOLVED: ToolforgeKubernetesWorkerTooManyDProcesses: Kubernetes worker tools-k8s-worker-nfs-20 has many processes stuck on IO (probably NFS) - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/ToolforgeKubernetesWorkerTooManyDProcesses - https://grafana.wmcloud.org/d/3jhWxB8Vk/toolforge-general-overview - https://prometheus-alerts.wmcloud.org/?q=alertname%3DToolforgeKubernetesWorkerTooManyDProcesses
[21:37:00] <jinxer-wm>	 FIRING: NovafullstackSustainedFailures: Novafullstack tests have been failing for more than 5hours in eqiad - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/NovafullstackSustainedFailures - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-nova-fullstack?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DNovafullstackSustainedFailures
[21:41:42] <wikibugs>	 (03PS3) 10GergesShamon: use date()  instead of strftime() [labs/tools/intuition] - 10https://gerrit.wikimedia.org/r/1055408 (https://phabricator.wikimedia.org/T331468)
[22:14:08] <wikibugs>	 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365#10063118 (10bd808) {T332194} looks to have been the same general problem ("Failed to create trustee or trust for Cluster"). Per T332194#8710538 I t...
[22:54:15] <wikibugs>	 10Tool-Pageviews: pageviews tool doesn't work in several newer wikis - https://phabricator.wikimedia.org/T371997#10063169 (10MusikAnimal) I've added and deployed a few dozen projects that hopefully is now the complete list.  I'm keeping this task open to track the effort to automate this process.
[22:57:29] <wikibugs>	 10Tool-Pageviews: Automatically detect available projects in Pageviews - https://phabricator.wikimedia.org/T371997#10063170 (10MusikAnimal) 05Open→03In progress p:05Triage→03High
[23:02:38] <wikibugs>	 10Tool-Pageviews: Validate projects on entry in Pageviews instead of bundling the allowlist - https://phabricator.wikimedia.org/T371997#10063191 (10MusikAnimal)
[23:03:57] <wikibugs>	 10Tool-Pageviews: Add support for Wikifunctions.org - https://phabricator.wikimedia.org/T354285#10063193 (10MusikAnimal) 05Open→03Resolved a:03MusikAnimal Apologies for the long delay. This is now done: https://pageviews.wmcloud.org/topviews/?project=wikifunctions.org  I'm working on finally, //finally...
[23:12:36] <wikibugs>	 10Cloud-VPS, 10Beta-Cluster-Infrastructure: OpenTofu fails to provision a Magnum managed k8s cluster in deployment-prep - https://phabricator.wikimedia.org/T372365#10063201 (10bd808) 05Open→03Resolved ` Apply complete! Resources: 3 added, 0 changed, 0 destroyed. `  The need for "Unrestricted (dangerous...
[23:51:00] <jinxer-wm>	 FIRING: [2x] OpenstackAPIResponse: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse