[00:43:41] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [00:50:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [00:55:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [01:32:34] 10Cloud-VPS (Quota-requests): Request temporary quota increase for owidm - https://phabricator.wikimedia.org/T356090 (10rook) ` root@cloudcontrol1005:~# openstack quota set --ram 32768 owidm root@cloudcontrol1005:~# openstack quota set --cores 16 owidm ` [01:32:55] 10Cloud-VPS (Quota-requests): Request temporary quota increase for owidm - https://phabricator.wikimedia.org/T356090 (10rook) Going to close this, please re-open when you are done upgrading to set the capacity back. [01:33:02] 10Cloud-VPS (Quota-requests): Request temporary quota increase for owidm - https://phabricator.wikimedia.org/T356090 (10rook) 05Open→03Resolved [01:34:07] 10Cloud-VPS (Quota-requests): Request temporary quota increase for videowiki - https://phabricator.wikimedia.org/T356089 (10rook) 05Open→03Resolved [01:34:30] 10Cloud-VPS (Quota-requests): Request temporary quota increase for videowiki - https://phabricator.wikimedia.org/T356089 (10rook) ` root@cloudcontrol1005:~# openstack quota set --ram 32768 videowiki root@cloudcontrol1005:~# openstack quota set --cores 16 videowiki ` All set. Closing the ticket, please re-open w... [04:35:00] 10wikitech.wikimedia.org, 10Gerrit: Can't login into Gerrit with a Wikimedia Developer account with non-unique email address - https://phabricator.wikimedia.org/T270233 (10Tgr) >>! In T270233#9494689, @hashar wrote: > Declining since the root cause was two accounts having the same email addresses while Gerrit... [04:43:42] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [04:49:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [04:54:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1007.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [08:30:46] (03CR) 10Brouberol: [C: 03+1] Add dummy keytabs for new an-worker1157-1175 [labs/private] - 10https://gerrit.wikimedia.org/r/993675 (https://phabricator.wikimedia.org/T353776) (owner: 10Stevemunene) [08:43:42] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [08:46:23] 10Grid-Engine-to-K8s-Migration: Migrate croptool from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319653 (10dcaro) >>! In T319653#9495429, @Soda wrote: > I'm still working on this off and on, but migrating the app is a bit more involved since php7.4 is not availiable when bu... [08:49:47] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160 (10Magnus) [09:11:05] 10Cloud-VPS, 10cloud-services-team: Horizon identity -> roles link logs user out when unauthorized - https://phabricator.wikimedia.org/T356162 (10fgiunchedi) [09:33:29] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) 05Open→03In progress a:03dcaro [09:33:52] 10Toolforge: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163 (10Leloiandudu) [09:35:17] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160 (10dcaro) Seems related to {T356163} [09:35:32] 10Toolforge: ChieBot: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163 (10dcaro) [09:37:31] 10Toolforge: [toolforge] several tools get periods of connection refused (104) when connecting to wikis - https://phabricator.wikimedia.org/T356164 (10dcaro) [09:37:45] 10Toolforge: [toolforge] several tools get periods of connection refused (104) when connecting to wikis - https://phabricator.wikimedia.org/T356164 (10dcaro) [09:37:49] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160 (10dcaro) [09:40:39] 10wikitech.wikimedia.org, 10Gerrit: Can't login into Gerrit with a Wikimedia Developer account with non-unique email address - https://phabricator.wikimedia.org/T270233 (10hashar) 05Declined→03Open Sure sorry, I have declined the task in a rush while triaging tasks concerning Gerrit. Though surely Gerrit s... [09:44:50] 10Toolforge (Toolforge iteration 04): [toolforge] several tools get periods of connection refused (104) when connecting to wikis - https://phabricator.wikimedia.org/T356164 (10dcaro) [09:55:32] 10Toolforge: ChieBot: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163 (10Joe) Just stating for the record that `connection refused/reset` messages will come from our edge caching layer, specifically from the tcp stack of our servers there, so it wouldn't be related to a m... [10:01:54] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160 (10dcaro) Might be a red herring, but all the pods are currently running on the new -nfs workers: ` root@tools-k8s-control-6:~# kubectl describe -n tool-listeria pods | grep worker Node:... [10:10:51] 10Toolforge (Toolforge iteration 04): [toolforge] several tools get periods of connection refused (104) when connecting to wikis - https://phabricator.wikimedia.org/T356164 (10dcaro) It seems both tools are running on the new nfs k8s workers: ` root@tools-k8s-control-6:~# kubectl describe -n tool-listeria pods |... [10:24:29] (03CR) 10Stevemunene: [V: 03+2 C: 03+2] Add dummy keytabs for new an-worker1157-1175 [labs/private] - 10https://gerrit.wikimedia.org/r/993675 (https://phabricator.wikimedia.org/T353776) (owner: 10Stevemunene) [10:28:56] (HarborDown) firing: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [10:48:56] (HarborDown) resolved: Harbor is down - https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Runbooks/HarborDown - https://prometheus-alerts.wmcloud.org/?q=alertname%3DHarborDown [10:52:24] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) After restarting to flush redis: ` root@tools-harbor-1:/srv/ops/harbor# docker exec -ti redis redis-cli 127.0.0.1:6379> FLUSHALL async... [10:54:45] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10cloud-services-team (FY2023/2024-Q1-Q2), 10User-dcaro: [harbor] Redis using all available memory - https://phabricator.wikimedia.org/T354176 (10dcaro) This might be related to {T356037}, flushed redis again because of that one. [11:19:19] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) Update on deleting all the executions, given that it's many many of them, trying now to run a query that would be more performant than using 'not in': ` delete from execution where... [11:19:52] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:36:06] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:37:26] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) [11:37:34] 10Toolforge (Toolforge iteration 04), 10cloud-services-team: Enable ARC support in Toolforge - https://phabricator.wikimedia.org/T356171 (10taavi) [11:48:56] 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Patch-For-Review: Enable ARC support in Toolforge - https://phabricator.wikimedia.org/T356171 (10taavi) 05Open→03In progress [11:49:06] 10Cloud-VPS, 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Patch-For-Review: Ensure Toolforge and Cloud VPS comply with Google's new email sender guidelines - https://phabricator.wikimedia.org/T354112 (10taavi) [12:02:03] 10Toolforge, 10observability: Set up monitoring for community cronjobs - https://phabricator.wikimedia.org/T306790 (10dcaro) p:05Triage→03Medium [12:02:17] 10Toolforge, 10observability: Set up monitoring for community cronjobs - https://phabricator.wikimedia.org/T306790 (10dcaro) Just noting that grid jobs will be soon phased out, so any effort should probably focus on toolforge jobs (kubernetes based). [12:06:25] 10Toolforge (Toolforge iteration 04), 10Toolforge Build Service, 10User-Raymond_Ndibe: alert users when they are about to exceed their harbor quota - https://phabricator.wikimedia.org/T353535 (10dcaro) @Raymond_Ndibe can you add a note on what's blocking this? (or point to a task if there's any) [12:33:21] 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/2... [12:43:42] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [12:46:46] (03PS1) 10Majavah: Remove Brooke's root key [labs/private] - 10https://gerrit.wikimedia.org/r/994162 [12:48:45] (03PS1) 10Majavah: Add fake ARC signing keys [labs/private] - 10https://gerrit.wikimedia.org/r/994163 (https://phabricator.wikimedia.org/T354112) [12:49:16] (03CR) 10Majavah: [V: 03+2 C: 03+2] Add fake ARC signing keys [labs/private] - 10https://gerrit.wikimedia.org/r/994163 (https://phabricator.wikimedia.org/T354112) (owner: 10Majavah) [12:57:25] (03CR) 10Muehlenhoff: [C: 03+1] Remove Brooke's root key [labs/private] - 10https://gerrit.wikimedia.org/r/994162 (owner: 10Majavah) [13:05:18] (03CR) 10Majavah: [V: 03+2 C: 03+2] Remove Brooke's root key [labs/private] - 10https://gerrit.wikimedia.org/r/994162 (owner: 10Majavah) [13:12:24] 10Cloud-VPS, 10Toolforge (Toolforge iteration 04), 10cloud-services-team, 10Patch-For-Review: Ensure Toolforge and Cloud VPS comply with Google's new email sender guidelines - https://phabricator.wikimedia.org/T354112 (10taavi) [13:21:41] 10Toolforge, 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) [13:21:54] 10Toolforge, 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) 05Open→03In progress [13:22:05] 10Toolforge, 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) a:03fnegri [13:23:03] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) [13:25:19] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) p:05Triage→03High [13:40:51] 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320 (10CodeReviewBot) dcaro opened https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/2... [13:47:52] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10dcaro) There is a very tricky setup going on, with lima-vm changing iptables creating DNAT and similar tricks to make the... [14:11:50] 10Toolforge (Toolforge iteration 04), 10cloud-services-team: Enable ARC support in Toolforge - https://phabricator.wikimedia.org/T356171 (10taavi) [14:11:54] 10Toolforge, 10cloud-services-team, 10Patch-For-Review: Upgrade Toolforge mail server to Debian Bullseye or later - https://phabricator.wikimedia.org/T311910 (10taavi) [14:11:56] 10Cloud-VPS, 10Toolforge (Toolforge iteration 04), 10cloud-services-team: Ensure Toolforge and Cloud VPS comply with Google's new email sender guidelines - https://phabricator.wikimedia.org/T354112 (10taavi) [14:38:28] (InstanceDown) firing: Project tools instance tools-sgeexec-10-21 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [14:43:28] (InstanceDown) resolved: (2) Project tools instance tools-k8s-worker-62 is down - https://prometheus-alerts.wmcloud.org/?q=alertname%3DInstanceDown [15:01:25] 10Cloud Services Proposals, 10Toolforge (Toolforge iteration 04): Decision request – Toolforge CLI consolidation - https://phabricator.wikimedia.org/T348749 (10Slst2020) [15:08:26] 10Toolforge (Toolforge iteration 04), 10Patch-For-Review: [webservice] php 7.4 containers don't pass through the environment variables to the scripts - https://phabricator.wikimedia.org/T354320 (10CodeReviewBot) dcaro merged https://gitlab.wikimedia.org/repos/cloud/toolforge/tools-webservice/-/merge_requests/2... [15:14:15] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2): [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) I could not reproduce the issue on a basic Debian VM without our custom config in `bookworm.yaml`, so I tried sele... [15:20:29] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) 05In progress→03Resolved [15:21:48] 10Toolforge (Toolforge iteration 04): [harbor] cleanup execution + task tables - https://phabricator.wikimedia.org/T356037 (10dcaro) \o/ only 60 schedules (down from >3k), 6 tasks and 6 executions left in the database, will monitor it a bit to see if there's any errors in the logs or the schedules don't trigger,... [15:35:22] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2), 10Patch-For-Review: [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10CodeReviewBot) fnegri opened https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/merge_re... [15:35:31] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2), 10Patch-For-Review: [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10fnegri) This patch seems to fix the issue for me: https://gitlab.wikimedia.org/repos/cloud/toolforge... [15:38:14] 10Toolforge (Toolforge iteration 04), 10cloud-services-team (FY2023/2024-Q1-Q2), 10Patch-For-Review: [lima-kilo] DNS resolution errors when running on M1/M2 CPUs - https://phabricator.wikimedia.org/T356177 (10Raymond_Ndibe) Thanks @fnegri I will do that [15:46:51] (03PS4) 10Pwangai: Append coverage value [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/992929 (https://phabricator.wikimedia.org/T355803) [15:49:07] (03CR) 10Pwangai: Append coverage value (032 comments) [labs/tools/sonarqubebot] - 10https://gerrit.wikimedia.org/r/992929 (https://phabricator.wikimedia.org/T355803) (owner: 10Pwangai) [16:13:22] 10Toolforge: ChieBot: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163 (10dcaro) p:05Triage→03High [16:13:29] 10Toolforge: Listeria bot sometimes gets stuck with 104 errors from Wikimedia APIs - https://phabricator.wikimedia.org/T356160 (10dcaro) p:05Triage→03High [16:40:01] (03CR) 10FNegri: [C: 03+1] toolforge: k8s: depool_and_remove_node: Fix END logging [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993670 (owner: 10Majavah) [16:40:20] (03CR) 10Majavah: [C: 03+2] toolforge: k8s: depool_and_remove_node: Fix END logging [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993670 (owner: 10Majavah) [16:43:58] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:44:35] (03Merged) 10jenkins-bot: toolforge: k8s: depool_and_remove_node: Fix END logging [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/993670 (owner: 10Majavah) [16:48:42] (CloudVPSDesignateLeaks) firing: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:49:20] 10Cloud-VPS (Project-requests): Request creation of topic-curator VPS project - https://phabricator.wikimedia.org/T356195 (10So9q) [16:52:40] 10Cloud-VPS (Project-requests): Request creation of topic-curator VPS project - https://phabricator.wikimedia.org/T356195 (10So9q) [16:53:29] 10Cloud-VPS (Project-requests): Request creation of topic-curator VPS project - https://phabricator.wikimedia.org/T356195 (10So9q) [16:53:37] (03CR) 10David Caro: [C: 03+1] openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 (owner: 10Majavah) [16:53:39] (03CR) 10FNegri: [C: 03+1] openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 (owner: 10Majavah) [16:53:42] (CloudVPSDesignateLeaks) resolved: (2) Detected 4 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [16:53:54] (03CR) 10Majavah: [C: 03+2] openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 (owner: 10Majavah) [17:09:01] (03Merged) 10jenkins-bot: openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 (owner: 10Majavah) [18:08:23] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-37 [18:09:02] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-37 [18:17:53] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-38 [18:18:31] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-38 [18:22:03] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-39 [18:22:42] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-39 [18:23:22] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-40 [18:24:00] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-40 [18:29:01] !log andrew@cloudcumin1001 tools START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-41 [18:29:41] !log andrew@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-41 [18:32:49] 10VPS-project-Codesearch, 10User-MarcoAurelio: Include a "Report bug" type link in CodeSearch footer - https://phabricator.wikimedia.org/T346073 (10MarcoAurelio) 05Open→03Resolved [18:33:18] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [18:38:41] (CloudVPSDesignateLeaks) firing: Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:41:20] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-7.tools.eqiad1.wikimedia.cloud to the cluster [18:41:20] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [18:42:16] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [18:43:41] (CloudVPSDesignateLeaks) firing: (2) Detected 1 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [18:46:10] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [18:47:54] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 [18:48:10] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 [18:48:28] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [18:50:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:51:31] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [18:55:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [18:56:28] (PuppetAgentStaleLastRun) firing: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-8 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:03:55] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 [19:04:10] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 [19:04:37] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [19:06:28] (PuppetAgentStaleLastRun) resolved: Last Puppet run was over 24 hours ago on instance tools-k8s-worker-nfs-8 in project tools - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentStaleLastRun [19:12:54] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-8.tools.eqiad1.wikimedia.cloud to the cluster [19:12:54] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [19:13:01] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [19:16:04] !log taavi@cloudcumin1001 tools END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster [19:16:22] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-9 [19:16:37] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-9 [19:17:09] !log taavi@cloudcumin1001 tools START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster [19:24:44] !log taavi@cloudcumin1001 tools Added a new k8s worker-nfs tools-k8s-worker-nfs-9.tools.eqiad1.wikimedia.cloud to the cluster [19:24:44] !log taavi@cloudcumin1001 tools END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster [19:48:41] (CloudVPSDesignateLeaks) firing: (2) Detected 25 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [19:53:41] (CloudVPSDesignateLeaks) resolved: (2) Detected 25 stray dns records - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/Designate_record_leaks - https://grafana.wikimedia.org/d/ebJoA6VWz/wmcs-openstack-eqiad-nova-fullstack - https://alerts.wikimedia.org/?q=alertname%3DCloudVPSDesignateLeaks [20:25:28] (PuppetSyncFailure) firing: Failed to update Puppet repository /var/lib/git/operations/puppet on instance toolsbeta-puppetmaster-04 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [20:33:12] 10cloud-services-team (Hardware), 10DC-Ops, 10ops-codfw: Q#:rack/setup/install X - https://phabricator.wikimedia.org/T356216 (10RobH) [20:33:40] 10cloud-services-team (Hardware), 10DC-Ops, 10ops-codfw: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216 (10RobH) a:03Andrew [20:34:33] 10cloud-services-team (Hardware), 10DC-Ops, 10ops-codfw: Q#:rack/setup/install (2) cloudbackup hosts - https://phabricator.wikimedia.org/T356216 (10RobH) @andrew: I've assigned this task to you for you to populate the racking details, additionally please add the servers to the site.pp file with the insetup... [21:08:28] (PuppetAgentFailure) firing: Puppet agent failure detected on instance toolsbeta-mail-01 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentFailure [21:48:13] 10Tool-global-search: OAuth period of authentication of less than 24 hours is too short with Global-Search - https://phabricator.wikimedia.org/T356222 (10Billinghurst) [21:50:45] 10Tool-global-search: OAuth period of authentication of less than 24 hours is too short with Global-Search - https://phabricator.wikimedia.org/T356222 (10Billinghurst) [21:50:53] 10Tool-global-search: Extend the OAuth duration for global-search - https://phabricator.wikimedia.org/T350521 (10Billinghurst) [21:53:40] 10Toolforge: ChieBot: Intermittent connection reset by peer errors - https://phabricator.wikimedia.org/T356163 (10Leloiandudu) >>! In T356163#9497689, @Joe wrote: > migration to kubernetes (which is still only partial, btw). I was talking about the migration of my tool. It's now running on k8s 100%. [22:24:02] 10wikitech.wikimedia.org, 10Gerrit: Can't login into Gerrit with a Wikimedia Developer account with non-unique email address - https://phabricator.wikimedia.org/T270233 (10Tgr) >>! In T270233#9497614, @hashar wrote: > My guess is for us to enforce the email uniqueness on the Wikitech side I agree that would m... [22:29:53] (03PS2) 10Reedy: branches.json: Add REL1_41 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/994230 [22:32:39] (03PS2) 10Reedy: runner: Use pathlib in most places where possible [labs/libraryupgrader] - 10https://gerrit.wikimedia.org/r/759863 (owner: 10Legoktm) [22:35:16] (03CR) 10CI reject: [V: 04-1] runner: Use pathlib in most places where possible [labs/libraryupgrader] - 10https://gerrit.wikimedia.org/r/759863 (owner: 10Legoktm) [23:06:15] (03PS1) 10Eevans: (faux) keys & certs for new sessionstore hosts [labs/private] - 10https://gerrit.wikimedia.org/r/994347 (https://phabricator.wikimedia.org/T353402) [23:07:32] (03CR) 10Eevans: [V: 03+2 C: 03+2] (faux) keys & certs for new sessionstore hosts [labs/private] - 10https://gerrit.wikimedia.org/r/994347 (https://phabricator.wikimedia.org/T353402) (owner: 10Eevans) [23:25:28] (PuppetSyncFailure) firing: Failed to update Puppet repository /var/lib/git/operations/puppet on instance toolsbeta-puppetmaster-04 in project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetSyncFailure [23:35:22] 10Cloud-VPS (Project-requests): Request creation of topic-curator VPS project - https://phabricator.wikimedia.org/T356195 (10bd808) @So9q Moving the tool to a dedicated project is reasonable if it will actually fix something, but if your guess about URL length is correct I'm not sure how that will be different i... [23:37:50] (03CR) 10DannyS712: [C: 03+2] branches.json: Add REL1_41 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/994230 (owner: 10Reedy) [23:40:54] (03Merged) 10jenkins-bot: branches.json: Add REL1_41 [labs/libraryupgrader/config] - 10https://gerrit.wikimedia.org/r/994230 (owner: 10Reedy) [23:57:47] 10Cloud-VPS (Project-requests): Request creation of topic-curator VPS project - https://phabricator.wikimedia.org/T356195 (10bd808) With the changes from https://github.com/dpriskorn/WikidataTopicCurator/commit/d532547c74c4ca156d712299f580bc72e50f645a now in place on your Toolforge tool I can't use something lik...