[00:20:56] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [00:25:56] (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [05:05:41] PROBLEM - Check systemd state on clouddb1015 is CRITICAL: CRITICAL - degraded: The following units failed: check-private-data.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:06:23] PROBLEM - Check systemd state on clouddb1019 is CRITICAL: CRITICAL - degraded: The following units failed: check-private-data.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:13:33] PROBLEM - Check systemd state on clouddb1021 is CRITICAL: CRITICAL - degraded: The following units failed: check-private-data.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:04:39] 10cloud-services-team (FY2023/2024-Q1-Q2), 10SRE, 10ops-eqiad, 10Goal: cloud @ eqiad: hardware re-racking plan - https://phabricator.wikimedia.org/T341494 (10taavi) [09:05:17] 10Cloud-VPS, 10cloud-services-team, 10DC-Ops, 10SRE, 10ops-eqiad: cloudrabbit: connect them via cloudsw and cloud-private - https://phabricator.wikimedia.org/T345610 (10taavi) 05Openβ†’03Resolved a:03taavi [09:40:56] (ProbeDown) firing: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [09:45:56] (ProbeDown) resolved: (2) Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [10:02:16] 10Wikibugs, 10GitLab (Integrations), 10Release-Engineering-Team (Priority Backlog πŸ“₯): Connect WikiBugs IRC bot to Wikimedia GitLab - https://phabricator.wikimedia.org/T288381 (10valhallasw) This could now presumably be built on top of https://wikitech.wikimedia.org/wiki/GitLab/Phabricator_integration (either... [10:42:13] !log taavi@runko toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the toolsbeta cluster [10:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:43:10] !log taavi@runko toolsbeta END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a control role in the toolsbeta cluster [10:43:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:47:35] !log taavi@runko toolsbeta START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the toolsbeta cluster [10:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:54:47] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] Does not handle virtual packages correctly - https://phabricator.wikimedia.org/T355575 (10dcaro) 05In progressβ†’03Resolved @Soda This has been fixed now, I was able to install imagemagick successfully (tested a few others too).... [10:59:24] !log taavi@runko toolsbeta Added a new k8s control toolsbeta-test-k8s-control-7.toolsbeta.eqiad1.wikimedia.cloud to the cluster [10:59:24] !log taavi@runko toolsbeta END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the toolsbeta cluster [10:59:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:01:20] (03PS2) 10Majavah: toolforge: add_k8s_node: Add support for containerd [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992923 (https://phabricator.wikimedia.org/T284656) [11:01:22] (03PS2) 10Majavah: wmcs_libs: k8s: Fix Kubernetes role usage [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992924 [11:01:24] (03PS2) 10Majavah: Add worker-nfs Toolforge Kubernetes role/prefix [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992925 (https://phabricator.wikimedia.org/T355883) [11:01:26] (03PS2) 10Majavah: toolforge: add_k8s_node: Allow passing --network [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992926 (https://phabricator.wikimedia.org/T284656) [11:01:28] (03PS1) 10Majavah: toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) [11:02:17] 10Toolforge, 10cloud-services-team: Fix the mis-named k8s service in tools and toolsbeta projects - https://phabricator.wikimedia.org/T262562 (10taavi) [11:02:26] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Kubernetes, 10Patch-For-Review: Toolforge k8s: Migrate workers to Containerd and Bookworm - https://phabricator.wikimedia.org/T284656 (10taavi) [11:05:33] (03CR) 10CI reject: [V: 04-1] toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) (owner: 10Majavah) [11:21:05] (03PS2) 10Majavah: toolforge: add_k8s_node: Update hiera for control and ingress nodes [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993071 (https://phabricator.wikimedia.org/T274499) [11:40:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-test-k8s-control-7 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [11:55:28] (PuppetAgentFailure) resolved: Puppet agent failure detected on instance toolsbeta-test-k8s-control-7 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [12:45:05] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] some packages install broken links - https://phabricator.wikimedia.org/T355217 (10dcaro) For java, I'm missing a couple links, that I think are broken anyhow: ` [step-build] 2024-01-26T12:31:51.804496170Z I don't know how to fix b... [13:35:42] (03CR) 10FNegri: [C: 03+1] toolsdb: add cookbook to retrieve stuck table+query [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992215 (owner: 10David Caro) [13:35:49] (03CR) 10FNegri: [C: 03+1] inventory: split into submodules [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992638 (owner: 10David Caro) [13:37:00] (03CR) 10FNegri: [C: 03+1] toolforge: add_k8s_node: Add support for containerd [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992923 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [13:40:46] (03CR) 10Majavah: [C: 03+2] toolforge: add_k8s_node: Add support for containerd [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992923 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [13:44:01] (03Merged) 10jenkins-bot: toolforge: add_k8s_node: Add support for containerd [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992923 (https://phabricator.wikimedia.org/T284656) (owner: 10Majavah) [13:44:49] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) [13:45:28] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] some packages install broken links - https://phabricator.wikimedia.org/T355217 (10dcaro) 05In progressβ†’03Resolved This should be fixed now, the links are created and skipped if unable to do so. [13:45:59] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) [13:46:33] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] alternatives aren’t being set up - https://phabricator.wikimedia.org/T355215 (10dcaro) 05Openβ†’03In progress [13:47:31] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] alternatives aren’t being set up - https://phabricator.wikimedia.org/T355215 (10dcaro) This is kind of fixed now, by adding the discovered 'bin' directories to the path: ` local.tf-test@lima-lima-kilo:~$ toolforge build start http... [13:47:46] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) [13:48:23] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service: [apt-buildpack] alternatives aren’t being set up - https://phabricator.wikimedia.org/T355215 (10dcaro) 05In progressβ†’03Resolved [14:22:41] (03PS9) 10David Caro: toolsdb: load the inventory dynamically [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992932 [14:23:08] (03PS9) 10David Caro: toolsdb: add cookbook to retrieve stuck table+query [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992215 [14:23:10] (03PS9) 10David Caro: inventory: split into submodules [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992638 [14:23:12] (03PS10) 10David Caro: toolsdb: load the inventory dynamically [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992932 [14:23:42] (03CR) 10David Caro: [C: 03+2] inventory: split into submodules [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992638 (owner: 10David Caro) [14:23:48] (03CR) 10David Caro: [C: 03+2] toolsdb: add cookbook to retrieve stuck table+query [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992215 (owner: 10David Caro) [14:29:34] (03Merged) 10jenkins-bot: toolsdb: add cookbook to retrieve stuck table+query [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992215 (owner: 10David Caro) [14:29:36] (03Merged) 10jenkins-bot: inventory: split into submodules [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/992638 (owner: 10David Caro) [15:13:47] (03PS1) 10Eevans: added (fake) aqsloader creds (Cassandra role) [labs/private] - 10https://gerrit.wikimedia.org/r/993105 (https://phabricator.wikimedia.org/T355917) [15:16:05] (03CR) 10Eevans: [V: 03+2 C: 03+2] added (fake) aqsloader creds (Cassandra role) [labs/private] - 10https://gerrit.wikimedia.org/r/993105 (https://phabricator.wikimedia.org/T355917) (owner: 10Eevans) [15:18:59] 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/2... [15:20:50] 10cloud-services-team: CRITICAL - degraded: The following units failed: check-private-data.service on clouddb1015, 1019, 1021 - https://phabricator.wikimedia.org/T355953 (10Andrew) [15:24:33] 10PAWS: move paws-dev to pawsdev - https://phabricator.wikimedia.org/T355543 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/368 [15:24:46] vivian-rook closed https://github.com/toolforge/paws/pull/368 [15:25:48] 10PAWS: move paws-dev to pawsdev - https://phabricator.wikimedia.org/T355543 (10rook) 05Openβ†’03Resolved [15:26:42] 10PAWS: Remove paws-dev from codfw1dev - https://phabricator.wikimedia.org/T355954 (10rook) [15:26:57] 10PAWS: move paws-dev to pawsdev - https://phabricator.wikimedia.org/T355543 (10rook) [15:26:59] 10PAWS: Remove paws-dev from codfw1dev - https://phabricator.wikimedia.org/T355954 (10rook) [15:30:12] 10cloud-services-team (FY2023/2024-Q1-Q2), 10Cloud-Services-Origin-Alert, 10Cloud-Services-Worktype-Unplanned, 10User-dcaro: [toolsdb] ToolsToolsDBReplicationLagIsTooHigh - 2024-01-19 - https://phabricator.wikimedia.org/T355411 (10dcaro) 05In progressβ†’03Resolved The replica has picked up the slack alre... [15:37:00] RECOVERY - Check systemd state on clouddb1015 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:42:18] 10cloud-services-team: CRITICAL - degraded: The following units failed: check-private-data.service on clouddb1015, 1019, 1021 - https://phabricator.wikimedia.org/T355953 (10Andrew) This may have been fixed by someone else since I looked at it last night. Right now I'm waiting for the task to re-run on clouddb101... [15:51:04] RECOVERY - Check systemd state on clouddb1019 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:51:24] RECOVERY - Check systemd state on clouddb1021 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:42:11] 10cloud-services-team, 10wikitech.wikimedia.org, 10Trust-and-Safety: Account recovery help needed for Developer account Triciaburmeister / TBurmeister - https://phabricator.wikimedia.org/T355958 (10TBurmeister) [16:45:16] 10cloud-services-team, 10wikitech.wikimedia.org, 10Phabricator, 10Trust-and-Safety: Account recovery help needed for Developer account Triciaburmeister / TBurmeister - https://phabricator.wikimedia.org/T355958 (10JJMC89) [17:12:36] 10cloud-services-team, 10wikitech.wikimedia.org, 10Phabricator, 10Trust-and-Safety: Account recovery help needed for Developer account Triciaburmeister / TBurmeister - https://phabricator.wikimedia.org/T355958 (10Dzahn) Hi @TBurmeister for the Phabricator part: I am following steps from https://www.medi... [17:18:02] 10cloud-services-team, 10wikitech.wikimedia.org, 10Phabricator, 10Trust-and-Safety: Account recovery help needed for Developer account Triciaburmeister / TBurmeister - https://phabricator.wikimedia.org/T355958 (10TBurmeister) I provided my text phrase in the private paste. I hope it's correct because (fac... [17:47:14] (03PS1) 10BCornwall: Update p::markmonitor to p::ncmonitor::markmonitor [labs/private] - 10https://gerrit.wikimedia.org/r/993168 [17:47:37] (03CR) 10BCornwall: [V: 03+2 C: 03+2] Update p::markmonitor to p::ncmonitor::markmonitor [labs/private] - 10https://gerrit.wikimedia.org/r/993168 (owner: 10BCornwall) [18:13:35] 10cloud-services-team, 10wikitech.wikimedia.org, 10Phabricator, 10Trust-and-Safety: Account recovery help needed for Developer account Triciaburmeister / TBurmeister - https://phabricator.wikimedia.org/T355958 (10Dzahn) 05Openβ†’03In progress p:05Triageβ†’03High a:03Dzahn [18:32:11] 10cloud-services-team (Hardware), 10SRE, 10ops-codfw, 10User-dcaro: cloud: prepare codfw for expansion (racks, switches, ceph) - https://phabricator.wikimedia.org/T346661 (10nskaggs) a:05nskaggsβ†’03None [18:32:18] 10cloud-services-team (Hardware), 10Goal: eqiad1: procure 1 additional cloudlb server - https://phabricator.wikimedia.org/T341062 (10nskaggs) a:05nskaggsβ†’03None [18:37:21] 10cloud-services-team, 10wikitech.wikimedia.org, 10Phabricator, 10Trust-and-Safety: Account recovery help needed for Developer account Triciaburmeister / TBurmeister - https://phabricator.wikimedia.org/T355958 (10Dzahn) 05In progressβ†’03Resolved We had a video meeting and the committed identity was upda... [19:14:43] 10Cloud-VPS, 10cloud-services-team, 10Goal: Gather feedback from users of the 'unmanaged' debian-12.0-nopuppet image - https://phabricator.wikimedia.org/T355963 (10Andrew) [20:13:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [20:18:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable