[01:55:09] (PuppetConstantChange) firing: (2) Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [02:32:18] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:55:09] (PuppetConstantChange) firing: (2) Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [06:32:18] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:02:02] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:07:02] (SystemdUnitFailed) firing: (2) generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:20:00] (ProbeDown) firing: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/IDM/Runbook - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:30:00] (ProbeDown) resolved: (2) Service idm2001:443 has failed probes (http_idm_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/IDM/Runbook - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:39:53] (PuppetConstantChange) firing: (2) Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [08:44:53] (PuppetConstantChange) firing: (2) Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [08:49:12] moritzm: is the above alert related do your nftables work ? (eg. https://puppetboard.wikimedia.org/report/testvm2005.codfw.wmnet/f900a3eb5915ea5c3c9872a84b408c959e972166 ) [08:49:52] no, I think that's Cathal experimenting with QoS tagging [09:15:43] noted, thx! (cc topranks) :) [09:16:54] oh shit sorry let me have a look [09:20:05] no rush, at first I thought it was my VM as I was playing with testvm2006 :) [09:21:51] 10netops, 10Ganeti, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: 14Investigate Ganeti in routed mode - 14https://phabricator.wikimedia.org/T300152#9683434 (10ayounsi) 05Open→03Resolved 14We can consider this task completed with success. Next step is to discuss the next steps and ope... [09:21:54] slyngs, volans, I lost a bit track of Netbox and ApereoSocialPipeline. What's left to have it as default in prod ? [09:22:10] 301 Simon [09:22:56] We need to switch OIDC authentication as well, so either make a new OIDC client in CAS or change the old one. Probably safest to make a new one [09:27:25] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683460 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1037.eqiad.wmnet... [09:30:42] slyngs: is that something we can prepare ahead of time or it needs to be done during any netbox config change ? [09:30:59] We can do that a head of time [09:38:41] slyngs: cool, can I help ? new quarter, new hope at upgrading netbox :) [09:39:34] It's just a minor thing, shouldn't take more than a few minutes... Famous last words :-) [09:40:15] New client in CAS and then set the Netbox config to use OIDC. [09:40:58] and deploy https://gerrit.wikimedia.org/r/c/operations/software/netbox-deploy/+/980824 in between [09:44:00] Yes, the django-social-auth plugin might also need to be a some minimum version, but I think we're way past the minimum version requirement [09:44:58] Hmm, no we are not [09:46:17] The new Netbox want's social-auth-core[openidconnect]==4.5.3, which is fine [10:17:34] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683638 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1037.eqiad.wmnet with... [10:25:21] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683651 (10aborrero) [10:34:53] (PuppetConstantChange) firing: (2) Puppet performing a change on every puppet run on testvm2005:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [10:35:25] ffs [10:59:06] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683722 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1038.eqiad.wmnet... [10:59:10] hey, I had a couple of puppet questions for tomorrow's office hours; I've stuffed them into the gdoc; is that the best/correct way to provide a little advance notice? [11:35:21] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683890 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1038.eqiad.wmnet with... [11:35:33] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683893 (10aborrero) [11:50:57] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683960 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1039.eqiad.wmnet... [11:55:54] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9683981 (10aborrero) [12:07:02] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:34:50] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9684132 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1039.eqiad.wmnet with... [12:55:46] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9684225 (10MoritzMuehlenhoff) [13:15:20] https://github.com/netbox-community/netbox/releases/tag/v4.0-beta1 :) [13:53:42] lots of cool stuff there :) [13:54:11] including perf improvements [13:54:38] <3 netbox [13:54:49] yeah rest api improvements caught my eye [13:55:08] "Support for Python 3.8 and 3.9 has been removed." the current Netbox hosts are Bulleye, so running 3.9 [13:55:46] we could upgrade to bookworm while upgrading to 3.x [13:55:49] we might be better with building a new netbox-dev VM [13:56:09] sure prod too if that's easier [13:56:15] yeah [13:56:59] we might have to add a lot of feature flags to run them in parallel [13:57:05] but yeah [13:58:28] netbox-future? netbox-really-the-next ? netbox-dev? :) [13:58:40] haha [13:58:44] netbox-future has a ring to it [13:58:47] return-to-netbox [13:59:07] "just when you thought it was safe..." [13:59:19] or add a o for each new version, netbox, netboox, netbooox, netbooooox [13:59:20] back-to-netbox actually, sorry bad translation [13:59:45] versioning done right [14:00:06] hahahaha [14:00:55] lots of breaking changes too [14:01:08] ofc [14:02:23] volans we don't oversubscribe vCPUs in Ganeti, do we? [14:03:51] 301 Location: moritzm [14:04:00] but I don't think so [14:04:05] ;P [14:04:29] no, we don't [14:04:55] Cool, I'm looking at making a small opensearch cluster for stuff that doesn't belong on the main search cluster [14:05:47] we have some indices that were assigned to our cluster back when it was the only ES/OS cluster [14:12:14] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Netbox: replace getstats.GetDeviceStats with ntc-netbox-plugin-metrics-ext - https://phabricator.wikimedia.org/T311052#9684539 (10ayounsi) for information https://github.com/Eskemm-Numerique/ntc-netbox-plugin-metrics-ext/pull/1 got merged, so `ntc-n... [14:17:02] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9684564 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1040.eqiad.wmnet... [14:17:56] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:22:56] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [14:24:55] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9684602 (10aborrero) [14:35:09] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2006:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [14:38:00] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#9684676 (10ayounsi) [15:01:47] 10netops, 06cloud-services-team, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9684744 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1040.eqiad.wmnet with... [16:07:18] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:35:09] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2006:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange [20:07:18] (SystemdUnitFailed) firing: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:59:35] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 10cloud-services-team (FY2023/2024-Q3-Q4), 13Patch-For-Review: Remove elasticsearch-curator dependency from Spicerack/Elastic cookbooks - https://phabricator.wikimedia.org/T361647#9686403 (10bking) [22:35:09] (PuppetConstantChange) firing: Puppet performing a change on every puppet run on testvm2006:9100 - https://puppetboard.wikimedia.org/nodes?status=changed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetConstantChange