[07:36:30] <slyngs>	 I'm rolling out a quick patch for idp-test to fix some styling
[07:39:30] <XioNoX>	 slyngs: easy +1 if you have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052851 <3
[07:39:48] <slyngs>	 I like easy
[07:40:46] <slyngs>	 Done
[07:40:49] <XioNoX>	 thx
[07:42:28] <wikibugs>	 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9964127 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by ayounsi@cumin1002 for hosts: `netbox-dev2002.codfw.wmnet` - netbox-dev2002.codfw.wmnet (**PASS**)   - Downt...
[07:49:14] <wikibugs>	 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Do we need ping offload servers at all POPs? - https://phabricator.wikimedia.org/T345809#9964152 (10ayounsi) 05Open→03Declined Closing this task as afaik we haven't seen any issue in esams, and the proper path forward is tracked in {T367973}...
[08:01:15] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410#9964208 (10elukey) >>! In T354410#9961563, @Volans wrote: > @elukey do you know how much of a...
[08:34:43] <elukey>	 hey folks, I know that https://wikitech.wikimedia.org/wiki/Ganeti#Renumber_(aka_change_network)_a_VM is discouraged, but I am wondering if we could try it for https://phabricator.wikimedia.org/T344230
[08:35:00] <elukey>	 to avoid re-creating new VMs etc.. that may be a little more painful
[08:37:18] <XioNoX>	 elukey: what's the goal? I don't understand the task
[08:38:36] <XioNoX>	 nevermind, got it
[08:38:58] <XioNoX>	 why was it created that way? And what's the issue with creating a new VM in the proper location ?
[08:56:36] <elukey>	 not sure, maybe at the beginning it was more a test than something fully prod-ready
[08:56:52] <elukey>	 creating a new VM is not an issue, only a lot more work
[08:57:08] <elukey>	 there is the etcd ensemble to respect, the k8s mgmt control plane, etc..
[08:57:29] <wikibugs>	 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack: [spicerack] python-kafka does not support python 3.12, there's a fix but there has not been any releases since 2020 - https://phabricator.wikimedia.org/T354410#9964436 (10Volans) Ok, sounds good to me. Thanks for looking into this and yes there is no re...
[09:00:49] <elukey>	 I have never swapped an etcd cluster, maybe there is the possibility of expanding the cluster 3->6 and then cutting off the oldest VMs
[09:01:03] <volans>	 elukey: the cumin broken aliases email boils down to O:etcd::v3::kubernetes returning no hosts, by any chance do you know the status of that one?
[09:01:47] <elukey>	 checking
[09:04:07] <elukey>	 volans: ah maybe I know, wikikube now co-locates etcd and control plane daemons on wikikube-ctrl*
[09:04:45] <elukey>	 so I think that the kubeetcd nodes are not used anymore
[09:05:00] <elukey>	 so probably the cumin alias is obsoled
[09:05:04] <volans>	 we can also just ask serviceops to fix ti
[09:05:06] <elukey>	 *obsoleted
[09:05:28] <elukey>	 yep
[09:08:00] <wikibugs>	 10Packaging, 06Infrastructure-Foundations: Package ipxe-qemu - https://phabricator.wikimedia.org/T369136#9964495 (10aborrero) hey @ayounsi, I have reviewed the .deb packages that you built. They LGTM. I even installed them on my laptop :-P So from my point of view, you have a +1 to put them on reprepro.  Pleas...
[09:34:50] <volans>	 elukey: {done} (see -serivceops :D )
[09:50:11] <elukey>	 Today after a bit of scavenging I created this revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052934
[09:50:32] <elukey>	 added some of you in Cc, I think we can safely revert but lemme know
[11:55:24] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965147 (10Marostegui) databases are ready
[13:49:05] <elukey>	 XioNoX: it seems ok to me judging from the metrics, maybe a little bit more disk space could give us some room for extra logs of netbox 4 that we didn't anticipate, but I am not strongly for it
[13:49:39] <XioNoX>	 elukey: sounds good, like 20G?
[13:50:10] <elukey>	 yep I'd say it is good, even 25G, we are not asking a lot :)
[13:50:29] <elukey>	 memory/cpu is always something that we can change, the disk is a problem
[13:50:38] <XioNoX>	 rgr
[13:50:46] <XioNoX>	 elukey: forthe DB too?
[13:51:07] <XioNoX>	 https://grafana.wikimedia.org/d/000000377/host-overview?var-server=netboxdb1002&orgId=1&refresh=5m&var-datasource=thanos&var-cluster=misc
[13:51:55] <elukey>	 yep I think so
[13:52:13] <slyngs>	 Maybe a bit more disk for the database?
[13:52:27] <XioNoX>	 25 too so they're similar then
[13:52:28] <XioNoX>	 thx!
[13:52:28] <slyngs>	 Otherwise I think memory and CPU looks fine
[13:55:21] <XioNoX>	 in term of git, I want to move all the changes that are on the dev branch into main, what's the cleanest way? 
[13:55:48] <XioNoX>	 squash them all into one commit ? regular rebase ?
[13:58:15] <XioNoX>	 I'm going with a regular rebase, wish me luck
[14:12:34] <elukey>	 rebase is nice to preserve history, good luck :)
[14:26:42] <wikibugs>	 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9965587 (10ops-monitoring-bot) Deployed netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
[14:33:47] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965612 (10hnowlan) kubernetes* and mw*  are ready
[14:54:07] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6a298ae5-e736-4051-8220-9ec4f352950a) set...
[14:59:29] <XioNoX>	 slyngs: do you remember how we fixed the "'Group' instance expected, got <Group: ops>" error ? it's back on netbox next after the upgrade
[15:00:26] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965719 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=39fcbcd0-8c16-4208-ac06-f4b442e55a54) set...
[15:01:48] <XioNoX>	 ah, the pipeline module needed updating, so it's not picking up the new wheels from the deploy server
[15:03:19] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965737 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2a5cb43e-793c-4103-9499-369354315479) set...
[15:14:12] <wikibugs>	 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9965772 (10ops-monitoring-bot) Deployed netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
[15:23:29] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9965800 (10cmooney) Upgrade complete, all looks good network side at first glance, all online hosts are pingable again.
[15:30:16] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965849 (10cmooney) Switch upgrade completed without issue.  All connected hosts are back online and responding to p...
[15:30:30] <XioNoX>	 who manages https://docker-registry.wikimedia.org/python3-build-bookworm ? seems like there is a regression in the latest version
[15:30:40] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965851 (10MatthewVernon) Swift looks OK, thanks.
[15:34:02] <wikibugs>	 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9965869 (10Marostegui) Repooling databases
[15:36:36] <elukey>	 XioNoX: it should be a package in production-images, we (SRE) manage it
[15:36:41] <elukey>	 what is the regression?
[15:37:01] <elukey>	 sorry, an image not a package
[15:38:28] <XioNoX>	 `netbox-deploy$ make freeze` works only if I pin the version to `python3-build-bookworm:0.1.0-20240623` in `Dockerfile.build`
[15:39:55] <elukey>	 --verbose :)
[15:40:09] <XioNoX>	 similarly in the Makefile with "latest" vs. "0.1.0-20240623"
[15:40:20] <XioNoX>	 it finishes cleanly but doesn't generate the required file
[15:41:07] <elukey>	 does it work with the 20240630?
[15:41:24] <elukey>	 (just to narrow down)
[15:41:29] <XioNoX>	 haven't tested with that one yet
[15:41:37] <XioNoX>	 I want to unbreak netbox-next first
[15:41:59] <elukey>	 anyway, the images with -date are weekly rebuild that we do, basically to refresh the OS + packages installed
[15:42:44] <elukey>	 I have to go in a few but if you want to open a task with how to repro I'll work on it tomorrow morning
[15:43:02] <XioNoX>	 yeah, I'll test more and report back
[15:43:03] <XioNoX>	 thx!
[15:43:07] <elukey>	 np!
[15:44:28] <wikibugs>	 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9965916 (10ops-monitoring-bot) Deployed netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
[16:53:02] <slyngs>	 XioNoX: We update the ApereoCAS pipeline thingy to 0.0.3
[16:53:35] <slyngs>	 That uses the new Netbox groups, rather than the Django default one
[17:09:17] <jinxer-wm>	 FIRING: SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[18:05:35] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: generate_vrts_aliases.service on mx-in1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:31:10] <XioNoX>	 Netbox 4.0.7 released, right on time
[19:50:00] <jhathaway>	 :)