[00:06:55] (MaxConnTrack) firing: Max conntrack at 80% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [00:50:31] 10Tools: [dplbot] uncategorized articles not working - https://phabricator.wikimedia.org/T355014 (10russblau) 05Open→03Resolved The list generator on that page is now working again. [01:16:56] (MaxConnTrack) firing: Max conntrack at 90% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [01:21:55] (MaxConnTrack) resolved: Max conntrack at 90% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [01:23:25] (MaxConnTrack) resolved: Max conntrack at 80% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [01:29:55] (MaxConnTrack) firing: Max conntrack at 80% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [01:31:55] (MaxConnTrack) firing: Max conntrack at 90% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [01:46:25] (MaxConnTrack) resolved: Max conntrack at 80% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [01:46:55] (MaxConnTrack) resolved: Max conntrack at 90% on cloudvirt1060:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack - https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking - https://alerts.wikimedia.org/?q=alertname%3DMaxConnTrack [02:08:04] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-bastion-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [02:44:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [02:49:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [05:08:04] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-bastion-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [06:05:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [06:10:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [06:27:15] 10Tools: Tool:Panoviewer - Grid Engine web service cannot be reached. - https://phabricator.wikimedia.org/T354949 (10tstarling) a:03tstarling [06:30:20] 10Tools: Tool:Panoviewer - Grid Engine web service cannot be reached. - https://phabricator.wikimedia.org/T354949 (10tstarling) 05Open→03Resolved Following [[https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web|wikitech:Help:Toolforge/Web]], I created a service.template file with `lang=yaml backend: kuber... [06:34:06] 10Grid-Engine-to-K8s-Migration: Migrate panoviewer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319953 (10tstarling) >>! In T319953#9392450, @tstarling wrote: > I think the worst-case scenario is that the tiling job stops working. I guess the worst case scenario was act... [06:53:53] (03PS1) 10Eugene233: Suggestions for improvements in some ISA messages [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990679 (https://phabricator.wikimedia.org/T354921) [07:02:58] 10Grid-Engine-to-K8s-Migration: Migrate uarchivebot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320104 (10Andriy.v) 05Open→03Resolved a:03Andriy.v Migration to k8s is completed. [08:08:04] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-bastion-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [09:49:15] 10cloud-services-team, 10Infrastructure-Foundations, 10netops: Remove cloud-support1-c-eqiad VLAN - https://phabricator.wikimedia.org/T355115 (10taavi) [09:49:37] 10cloud-services-team, 10Infrastructure-Foundations, 10netops: Remove cloud-support1-c-eqiad VLAN - https://phabricator.wikimedia.org/T355115 (10taavi) [09:49:45] 10Data-Services, 10cloud-services-team, 10Patch-For-Review: Move wiki replicas behind cloudlb - https://phabricator.wikimedia.org/T346947 (10taavi) [10:35:46] 10Toolforge, 10cloud-services-team: Do something to Toolforge tools with no non-blocked maintainers - https://phabricator.wikimedia.org/T320342 (10akosiaris) [11:02:06] !log taavi@cloudcumin1001 admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot (T355061) [11:02:12] T355061: MaxConnTrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1060:9100 - https://phabricator.wikimedia.org/T355061 [11:05:11] (03PS1) 10David Caro: ceph: use timedelta instead of integers [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990975 [11:05:13] (03PS1) 10David Caro: ceph.drain_osd_node: improve logs [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990976 [11:05:15] (03PS1) 10David Caro: ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 [11:05:17] (03PS1) 10David Caro: ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 [11:05:19] (03PS1) 10David Caro: ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 [11:05:20] 10Cloud-VPS, 10cloud-services-team: wmcs-drain-hypervisor is broken - https://phabricator.wikimedia.org/T355067 (10taavi) 05Open→03Resolved [11:05:24] 10Cloud-VPS, 10cloud-services-team: MaxConnTrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1060:9100 - https://phabricator.wikimedia.org/T355061 (10taavi) [11:05:37] (03Abandoned) 10David Caro: ceph: add missing cumin_params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/969321 (owner: 10David Caro) [11:06:39] (03Abandoned) 10David Caro: some fixes, to sort out [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/970414 (owner: 10David Caro) [11:06:56] (03PS1) 10Majavah: proxy: proxy via cloudcumin1001.eqiad.wmnet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 [11:08:04] (PuppetAgentNoResources) firing: No Puppet resources found on instance toolsbeta-bastion-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:08:24] (03CR) 10CI reject: [V: 04-1] ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 (owner: 10David Caro) [11:08:55] (03CR) 10CI reject: [V: 04-1] ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 (owner: 10David Caro) [11:08:58] (03CR) 10CI reject: [V: 04-1] ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 (owner: 10David Caro) [11:09:42] (03CR) 10CI reject: [V: 04-1] proxy: proxy via cloudcumin1001.eqiad.wmnet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 (owner: 10Majavah) [11:12:28] (03PS2) 10Majavah: proxy: proxy via cloudcumin1001.eqiad.wmnet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 [11:12:54] (03PS2) 10David Caro: ceph.osd.drain_node: force passing the cluster name [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990977 [11:12:56] (03PS2) 10David Caro: ceph.osd.undrain_node: fix help and default batch param [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990978 [11:12:58] (03PS2) 10David Caro: ceph: add missing cumin params [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990979 [11:18:04] (PuppetAgentNoResources) resolved: No Puppet resources found on instance toolsbeta-bastion-6 on project toolsbeta - https://prometheus-alerts.wmcloud.org/?q=alertname%3DPuppetAgentNoResources [11:21:15] PROBLEM - Host cloudvirt1060 is DOWN: PING CRITICAL - Packet loss = 100% [11:22:15] RECOVERY - Host cloudvirt1060 is UP: PING OK - Packet loss = 0%, RTA = 1.27 ms [11:22:47] !log taavi@cloudcumin1001 admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) (T355061) [11:22:53] T355061: MaxConnTrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1060:9100 - https://phabricator.wikimedia.org/T355061 [11:24:16] (NeutronAgentDown) firing: Neutron neutron-linuxbridge-agent on cloudvirt1060 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [11:34:22] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot for cloudvirt1060.eqiad.wmnet [11:34:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:35:22] !log taavi@runko admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) for cloudvirt1060.eqiad.wmnet [11:35:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [11:44:16] (NeutronAgentDown) resolved: Neutron neutron-linuxbridge-agent on cloudvirt1060 is down - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting#Networking_failures - https://grafana.wikimedia.org/d/wKnDJf97z/wmcs-neutron-eqiad1 - https://alerts.wikimedia.org/?q=alertname%3DNeutronAgentDown [11:45:26] 10Cloud-VPS, 10cloud-services-team (FY2023/2024-Q1-Q2), 10Infrastructure-Foundations, 10Observability-Alerting: [wmcs-cookbooks] Downtime alerts from cloudcumins - https://phabricator.wikimedia.org/T347490 (10taavi) One problem is that spicerack uses this to construct the silence start/end dates: `lang=pyt... [12:27:37] 10superset.wmcloud.org: remove old cluster - https://phabricator.wikimedia.org/T354574 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/superset-deploy/pull/16 [12:27:44] vivian-rook opened https://github.com/toolforge/superset-deploy/pull/16 [12:28:33] 10superset.wmcloud.org: remove old cluster - https://phabricator.wikimedia.org/T354574 (10rook) 05Open→03Resolved [12:28:40] vivian-rook closed https://github.com/toolforge/superset-deploy/pull/16 [12:38:51] 10Data-Services, 10cloud-services-team, 10Data-Engineering, 10Patch-For-Review: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108 (10taavi) AIUI currently Data Engineering reviews the view changes to ensure the data is ok to publish and then WMCS (or Data Platform?) SREs dep... [12:39:48] 10Data-Services, 10cloud-services-team, 10Data-Platform, 10Patch-For-Review: Add global_edit_count to wikireplicas - https://phabricator.wikimedia.org/T344108 (10taavi) [12:42:00] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10taavi) [12:42:38] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10taavi) a:03taavi [12:51:35] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10taavi) [12:58:10] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10taavi) I manually added 10GiB to the volume, and updated the Trove database to match. This gives some breathing room: ` /dev/sdb 255G 233G 9.5G 97% /var/lib/post... [13:01:32] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10taavi) Now failing with: ` chmod: changing permissions of '/var/run/postgresql': Operation not permitted ... 2024-01-16 12:56:25.997 UTC [1] FATAL: could not create lock... [13:05:03] 10Cloud-VPS, 10cloud-services-team: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 (10taavi) The reboot finished cleanly and the instance is now showing as Active/Healthy, although the database is still complaining about those archive files already existing... [13:06:50] 10Cloud-VPS, 10cloud-services-team: MaxConnTrack Netfilter: Maximum number of allowed connection tracking entries alert on cloudvirt1060:9100 - https://phabricator.wikimedia.org/T355061 (10taavi) 05Open→03Resolved Cautiously resolving after a reboot. [13:17:54] 10cloud-services-team, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Remove cloud-support1-c-eqiad VLAN - https://phabricator.wikimedia.org/T355115 (10ayounsi) Thanks, that's awesome ! For the record, they will need to be removed from `puppet:modules/network/data/data.yaml` as well as Netbox... [13:19:55] 10Grid-Engine-to-K8s-Migration, 10Wiki-Loves-Monuments-Database: Migrate heritage from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319787 (10dcaro) As far as I can see it's jstop/jsub of that other job. I would recommend joining both jobs into the same one, k8s jobs will... [13:25:11] 10Grid-Engine-to-K8s-Migration, 10User-revi: Migrate tc-rc from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320079 (10dcaro) @revi I might be able to help, but I would need the tool code to be hosted in a public git repository somewhere (if it's already, can you share the... [13:39:45] (03CR) 10Sebastian Berlin (WMSE): [C: 04-1] Fix capitalization in some ISA messages (033 comments) [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990675 (https://phabricator.wikimedia.org/T354920) (owner: 10Eugene233) [13:41:21] 10Cloud-VPS, 10cloud-services-team: Use cloud-private and cfssl certs for instance live migrations - https://phabricator.wikimedia.org/T355145 (10taavi) [13:42:34] (03CR) 10Amire80: "One little correction needed. Other than that, looks good, thanks!" [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990679 (https://phabricator.wikimedia.org/T354921) (owner: 10Eugene233) [13:42:44] (03CR) 10Amire80: [C: 04-1] Suggestions for improvements in some ISA messages [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990679 (https://phabricator.wikimedia.org/T354921) (owner: 10Eugene233) [13:48:29] 10Toolforge, 10cloud-services-team: Replace Toolschecker alerts with Prometheus based ones - https://phabricator.wikimedia.org/T313030 (10taavi) [13:52:21] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [13:52:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:52:35] !log taavi@runko admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [13:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:53:28] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [13:53:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:53:37] !log taavi@runko admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=99) [13:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:54:19] 10Grid-Engine-to-K8s-Migration: Migrate phetools from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319965 (10dcaro) Some hints, hopefully helpful. > 2. Due to the mixing of PHP and Python code in the frontend, we'll need a single image for the webservice and the various back... [13:54:41] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [13:54:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:54:51] !log taavi@runko admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [13:54:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:55:37] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [13:55:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:55:48] !log taavi@runko admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [13:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:57:27] (03PS1) 10Majavah: openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 [13:57:29] (03PS1) 10Majavah: openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) [14:00:38] 10Grid-Engine-to-K8s-Migration: Migrate enwikt-translations from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319724 (10dcaro) @Erutuon HI! We added some (admittedly simple) support for rust on the toolforge build service (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Bu... [14:00:50] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 (owner: 10Majavah) [14:00:56] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [14:02:14] 10Grid-Engine-to-K8s-Migration: Migrate copyclear from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319647 (10Hannolans) Hi sorry I totally missed this issue. multichill made me aware. Its not so clear for me what I should do. I would like to reschedule the scripts on a daily... [14:02:52] 10Grid-Engine-to-K8s-Migration: Migrate copyclear from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319647 (10Hannolans) a:05Hannolans→03komla [14:04:17] 10Grid-Engine-to-K8s-Migration: Migrate persondata from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319962 (10dcaro) @Wurgl can you elaborate? You can use `toolforge jobs list/show` and `toolforge webservice status` to check things on the k8s side, you can also see some sta... [14:06:15] (03CR) 10Sebastian Berlin (WMSE): [C: 04-1] Suggestions for improvements in some ISA messages (031 comment) [labs/tools/Isa] (m2c) - 10https://gerrit.wikimedia.org/r/990679 (https://phabricator.wikimedia.org/T354921) (owner: 10Eugene233) [14:08:13] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) a:03dcaro [14:16:06] (03PS2) 10Majavah: openstack: cloudvirt: don't try to remove AM silence [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991015 [14:16:08] (03PS2) 10Majavah: openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) [14:16:10] (03PS1) 10Majavah: openstack: cloudvirt: don't restore maintenance aggregate [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991020 [14:16:12] (03PS1) 10Majavah: openstack: cloudvirt: set_maintenance: Abort if already in maintenance [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991021 [14:16:14] (03PS1) 10Majavah: openstack: cloudvirt: set_maintenance: Remove real aggregates [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991022 [14:19:19] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: set_maintenance: Remove real aggregates [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991022 (owner: 10Majavah) [14:19:45] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: don't restore maintenance aggregate [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991020 (owner: 10Majavah) [14:19:51] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: set_maintenance: Abort if already in maintenance [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991021 (owner: 10Majavah) [14:19:56] (03CR) 10CI reject: [V: 04-1] openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [14:22:55] (03PS3) 10Majavah: openstack: cloudvirt: safe_reboot: Downtime during reboot [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) [14:22:57] (03PS2) 10Majavah: openstack: cloudvirt: don't restore maintenance aggregate [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991020 [14:22:59] (03PS2) 10Majavah: openstack: cloudvirt: set_maintenance: Abort if already in maintenance [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991021 [14:23:01] (03PS2) 10Majavah: openstack: cloudvirt: set_maintenance: Remove real aggregates [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991022 [14:23:08] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot for cloudvirt1060.eqiad.wmnet [14:23:11] !log taavi@runko admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) for cloudvirt1060.eqiad.wmnet [14:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:23:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:23:54] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.safe_reboot for cloudvirt1060.eqiad.wmnet [14:23:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:24:53] !log taavi@runko admin END (FAIL) - Cookbook wmcs.openstack.cloudvirt.safe_reboot (exit_code=99) for cloudvirt1060.eqiad.wmnet [14:24:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:25:24] !log taavi@runko admin START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance [14:25:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:25:37] !log taavi@runko admin END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) [14:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:25:49] (03PS3) 10Majavah: openstack: cloudvirt: set_maintenance: Abort if already in maintenance [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991021 [14:25:51] (03PS3) 10Majavah: openstack: cloudvirt: set_maintenance: Remove real aggregates [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991022 [14:26:17] (03CR) 10Majavah: "This soft-depends on https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/991017/." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [14:35:18] (03CR) 10FNegri: [C: 03+1] "I don't remember the list of things that are currently depending on the proxy host when running cookbooks from a laptop." [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 (owner: 10Majavah) [15:14:57] (03CR) 10Volans: openstack: cloudvirt: safe_reboot: Downtime during reboot (032 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [15:16:05] 10cloud-services-team, 10Infrastructure-Foundations, 10netops, 10Patch-For-Review: Remove cloud-support1-c-eqiad VLAN - https://phabricator.wikimedia.org/T355115 (10cmooney) >>! In T355115#9462157, @ayounsi wrote: > For the record, they will need to be removed from `puppet:modules/network/data/data.yaml` a... [15:18:12] (03CR) 10Majavah: proxy: proxy via cloudcumin1001.eqiad.wmnet (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 (owner: 10Majavah) [15:18:18] (03CR) 10Majavah: [C: 03+2] proxy: proxy via cloudcumin1001.eqiad.wmnet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 (owner: 10Majavah) [15:20:07] (03CR) 10Majavah: openstack: cloudvirt: safe_reboot: Downtime during reboot (032 comments) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [15:23:45] (03Merged) 10jenkins-bot: proxy: proxy via cloudcumin1001.eqiad.wmnet [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 (owner: 10Majavah) [15:31:08] (03CR) 10FNegri: [C: 03+1] proxy: proxy via cloudcumin1001.eqiad.wmnet (031 comment) [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/990980 (owner: 10Majavah) [15:38:52] 10Grid-Engine-to-K8s-Migration: Migrate request from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320003 (10komla) @FNDE keep us posted on how this goes [15:41:32] (03CR) 10Volans: "replies inline" [cloud/wmcs-cookbooks] - 10https://gerrit.wikimedia.org/r/991016 (https://phabricator.wikimedia.org/T347490) (owner: 10Majavah) [15:44:41] 10Toolforge, 10cloud-services-team: There are some files that I cannot view, delete, or do anything to - https://phabricator.wikimedia.org/T355022 (10fnegri) [15:48:27] 10Grid-Engine-to-K8s-Migration: Migrate copyclear from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319647 (10komla) >>! In T319647#9462296, @Hannolans wrote: > Hi sorry I totally missed this issue. multichill made me aware. Its not so clear for me what I should do. I would l... [15:54:53] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) I have added support for recursive dependency checks to the apt buildpack, it's a bi... [15:55:49] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10dcaro) 05Open→03Resolved [15:57:51] 10Grid-Engine-to-K8s-Migration: Migrate wd-shex-infer from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320140 (10dcaro) >>! In T320140#9420202, @LucasWerkmeister wrote: > T353698 solved building the image (thanks!); now I’m stuck on T353847. Just deployed a fix for that, it... [16:09:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:14:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [16:29:04] 10Toolforge (Software install/update): Install dotnet-runtime-3.0 on Toolforge hosts - https://phabricator.wikimedia.org/T238180 (10fnegri) 05Open→03Resolved a:03fnegri Since this task was opened, we have now launched the new Toolforge Build Service that lets you use more up-to-date .NET versions. https:/... [16:33:43] 10Cloud-VPS, 10Toolforge (Toolforge iteration 03), 10cloud-services-team, 10Patch-For-Review: Ensure Toolforge and Cloud VPS comply with Google's new email sender guidelines - https://phabricator.wikimedia.org/T354112 (10taavi) [16:33:57] 10Toolforge, 10cloud-services-team: There are some files that I cannot view, delete, or do anything to - https://phabricator.wikimedia.org/T355022 (10fnegri) 05Open→03In progress p:05Triage→03Medium [16:34:45] 10Tool-spacemedia, 10Toolforge, 10cloud-services-team: NFS broken - Cannot build nor run my tool anymore - https://phabricator.wikimedia.org/T354581 (10fnegri) p:05Triage→03Medium [16:35:23] 10Toolforge: [docs] Update Toolforge component README's - https://phabricator.wikimedia.org/T352964 (10fnegri) p:05Triage→03Medium [16:36:40] 10Toolforge, 10Documentation: Find and fix inaccuracies in https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Django_OAuth_tool - https://phabricator.wikimedia.org/T245683 (10fnegri) [16:37:25] 10Toolforge, 10Fiwiki-Wikidata-Commons, 10Documentation: Django socialauth OAUTH login fails with mediawiki backend - https://phabricator.wikimedia.org/T353593 (10fnegri) 05Open→03Invalid Closing as Invalid as there is no action needed. [16:37:37] 10Toolforge, 10cloud-services-team: php 8.2 crashes when using XMLReader - https://phabricator.wikimedia.org/T352886 (10fnegri) p:05Triage→03Medium [16:46:00] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [16:49:30] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T354898 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/paws/pull/365 [16:49:43] 10PAWS: update helm chart and jupyterhub - https://phabricator.wikimedia.org/T354898 (10rook) 05Open→03Resolved [16:49:43] vivian-rook closed https://github.com/toolforge/paws/pull/365 [16:50:36] 10Toolforge, 10cloud-services-team: php 8.2 crashes when using XMLReader - https://phabricator.wikimedia.org/T352886 (10taavi) I can reproduce this on Toolforge, but not on my laptop which has PHP `8.2.12` from Debian. Updating the container seems like a reasonable first thing to try. [16:57:47] 10Grid-Engine-to-K8s-Migration: Migrate betacommand-dev from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319587 (10dcaro) >>! In T319587#8300155, @Betacommand wrote: > I tried moving to K8's when they were first introduced. It went about as well as a cat in a room full of ro... [17:02:47] 10Grid-Engine-to-K8s-Migration: Migrate blogconverter from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319597 (10dcaro) I reached out to the user's volunteer account: https://www.mediawiki.org/wiki/User_talk:HaeB#Migrating_blogconverter_tool_to_k8s [17:09:18] 10Grid-Engine-to-K8s-Migration: Migrate chie-bot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319625 (10dcaro) >>! In T319625#8327827, @Leloiandudu wrote: > @bd808 hi Bryan. I understand that you're worried about scope creep here. Sorry about that. My initial thought wa... [17:12:32] 10Grid-Engine-to-K8s-Migration: Migrate dvorapabot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319709 (10dcaro) If you use python, you can now build your own image by using a requirements.txt, that will generate the venv for you automatically. You'll have to have your c... [18:29:59] 10PAWS: Move prometheus inside of the cluster - https://phabricator.wikimedia.org/T355179 (10rook) [19:30:22] (HAProxyBackendUnavailable) firing: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:35:22] (HAProxyBackendUnavailable) resolved: HAProxy service neutron-api_backend backend cloudcontrol1006.private.eqiad.wikimedia.cloud is down - https://wikitech.wikimedia.org/wiki/HAProxy - TODO - https://alerts.wikimedia.org/?q=alertname%3DHAProxyBackendUnavailable [19:47:59] 10Grid-Engine-to-K8s-Migration, 10Wiki-Project-Med: Migrate mdwiki from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319887 (10Harej) [19:49:15] 10Toolforge (Toolforge iteration 03), 10Toolforge Build Service: apt buildpack (Aptfile support): not installing dependencies of packages already present on the build image - https://phabricator.wikimedia.org/T353847 (10LucasWerkmeister) Thanks, but this doesn’t resolve all of my problems, just the one you ret... [19:53:10] 10Cloud-VPS (Project-requests), 10cloud-services-team (Kanban), 10Security-Team, 10Wiki-Project-Med: Request creation of OurWorldinData VPS project - https://phabricator.wikimedia.org/T301044 (10Harej) [20:46:01] (OpenstackAPIResponse) firing: Openstack API average response time is too high. - https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse - https://grafana.wikimedia.org/d/UUmLqqX4k - https://alerts.wikimedia.org/?q=alertname%3DOpenstackAPIResponse [21:03:56] (ProbeDown) firing: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:08:56] (ProbeDown) resolved: Service tools-k8s-haproxy-4:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-4:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:23:56] (ProbeDown) firing: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:28:56] (ProbeDown) resolved: Service tools-k8s-haproxy-3:30000 has failed probes (http_admin_toolforge_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-k8s-haproxy-3:30000 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [21:58:51] (ProbeDown) firing: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown [22:03:51] (ProbeDown) resolved: Service tools-static-14:80 has failed probes (http_tools_static_wmflabs_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#tools-static-14:80 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://prometheus-alerts.wmcloud.org/?q=alertname%3DProbeDown