[06:01:01] 10netops, 10Infrastructure-Foundations: db2137 doesn't get an IP via PXE boot - https://phabricator.wikimedia.org/T357951#9556745 (10Marostegui) [06:01:35] 10netops, 10Infrastructure-Foundations: db2137 doesn't get an IP via PXE boot - https://phabricator.wikimedia.org/T357951#9556756 (10Marostegui) The host was being reimaged into bookworm. Please feel free to start the reimage yourself anytime. [06:23:06] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw - https://phabricator.wikimedia.org/T355870#9556783 (10Marostegui) [06:25:09] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate servers in codfw rack B3 from asw-b3-codfw to lsw1-b3-codfw - https://phabricator.wikimedia.org/T355870#9556787 (10Marostegui) es2021 is no longer a master and it just need normal depooling cc @ABran-WMF [06:32:59] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866#9556801 (10Marostegui) @cmooney is there anything pending here or can this be closed? [07:37:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867#9557370 (10ops-monitoring-bot) Draining ganeti2028.codfw.wmnet of running VMs [08:15:45] 10Mail, 10Infrastructure-Foundations, 10SRE, 10Wikimedia-Mailing-lists: In Mailman3 if a list has no owners, mail goes to root@ - https://phabricator.wikimedia.org/T281753#9557728 (10JJMC89) [09:38:49] (PuppetZeroResources) firing: Puppet has failed generate resources on cumin1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [12:01:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, and 2 others: Move WMCS servers to 1 single NIC - https://phabricator.wikimedia.org/T319184#9558616 (10aborrero) [14:03:35] (SystemdUnitFailed) firing: netbox_ganeti_codfw02_sync.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:08:29] ^ should be fixed by https://gerrit.wikimedia.org/r/c/operations/puppet/+/1005090 [14:18:35] (SystemdUnitFailed) resolved: netbox_ganeti_codfw02_sync.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:48:55] (SystemdUnitFailed) firing: netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:53:19] 10netops, 10Infrastructure-Foundations, 10SRE: BGP peering from LSW to K8s hosts using loopback IP not IRB - https://phabricator.wikimedia.org/T357619#9559391 (10cmooney) [14:54:10] 10netops, 10Infrastructure-Foundations, 10SRE: Update K8S BGP groups eqiad row e-f - https://phabricator.wikimedia.org/T357924#9559389 (10cmooney) 05Open→03Resolved All done, used statics to avoid any disruption to forwarding. [15:00:49] (PuppetDisabled) firing: Puppet disabled on ganeti2033:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=ganeti&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [15:18:36] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:31:03] 10netops, 10Infrastructure-Foundations, 10SRE: FPC1 Failure on cr1-esams - https://phabricator.wikimedia.org/T351304#9559649 (10RobH) [15:33:32] Can I get some aptrepo help, please? I'm trying to import a new repo - sudo -E reprepro --verbose --component thirdparty/ceph-reef update bookworm-wikimedia [15:33:38] running on apt1001 in /srv/wikimedia [15:33:50] It errors out Error: unknown key 'D27D666CD88E42B4'! [15:34:17] which is a key listed in modules/aptrepo/files/updates, but it's not the ceph key (which is E84AC2C0460F3994 ) [15:36:54] There is modules/aptrepo/files/updates-keys/D27D666CD88E42B4_elastic.gpg [15:41:34] Hello everyone, I'm trying to reimage the alert2001 host to Bookworm but the cookbook fails unexpectedly with these errors: https://phabricator.wikimedia.org/P57360 [15:42:38] A part that caught my attention from that output is that it seems to lookup for the value of profile::puppet::agent::force_puppet7 however, when checking our Puppet repo I was unable to find references of alert2001 using that variable. [15:42:50] Do you know what can I do to proceed with the reimage? Thanks in advance. :) [15:45:00] Hm, should the rune at https://wikitech.wikimedia.org/wiki/Reprepro#Adding_a_new_external_repository maybe be sudo -i reprepro rather than -E ? [15:47:48] ...because the docs generally say to use sudo -i reprepro, and -E would preserve the existing env instead of using the reprepro environment? [15:48:08] But maybe I misunderstand, so I don't feel like I should just try that without a sanity check here first... [15:48:36] Emperor: the 'If signing fails' section below seems to suggest you need to set some variables manually if using -E instead of -i [15:49:22] denisse: what if you set a per-host override for alert2001 and set profile::puppet::agent::force_puppet7: true? [15:49:38] taavi: yeah, but maybe the doc is just outdated and I should be using -i? [15:50:04] sukhe: We don't want to reimage with Puppet 7 yet, we plan on doing that after the upgrade. [15:51:48] 10netops, 10Infrastructure-Foundations, 10SRE: BGP peering from LSW to K8s hosts using loopback IP not IRB - https://phabricator.wikimedia.org/T357619#9559790 (10cmooney) 05Open→03Resolved Config pushed out across the estate now, multihop config only added on CRs. [15:52:04] ok, so setting it to false might help. I do wonder why that needs to be done though at all [15:52:06] using -i worked [16:02:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544#9559899 (10cmooney) [16:02:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867#9559902 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=343ed6db-68dd-4330-8851-9631da7da8d5... [16:03:10] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw - https://phabricator.wikimedia.org/T355866#9559896 (10cmooney) 05Open→03Resolved a:03cmooney >>! In T355866#9556801, @Marostegui wrote: > @cmooney is there anythin... [16:06:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867#9559917 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=47f3a57d-6476-4782-ba82-9c2dc99042c9... [16:17:08] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867#9560025 (10cmooney) All links moved successfully and all hosts responding to ping as before. [16:20:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-swift-storage, 10ops-codfw: Migrate servers in codfw rack A7 from asw-a7-codfw to lsw1-a7-codfw - https://phabricator.wikimedia.org/T355867#9560061 (10MatthewVernon) ms and thanos swift both OK post-move. [18:30:07] 10Mail, 10Infrastructure-Foundations, 10MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), 10User-notice: Stop sending change notification email if edit is done by a bot - https://phabricator.wikimedia.org/T356984#9560798 (10Primefac) Good god. First we can't fix T250856, and now we're making the problem //worse//... [19:00:49] (PuppetDisabled) firing: Puppet disabled on ganeti2033:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=ganeti&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [22:09:02] 10CAS-SSO, 10Infrastructure-Foundations, 10SRE: Enable webauthn in CAS to replace U2F - https://phabricator.wikimedia.org/T311236#9561647 (10Scott_French) FYI, I've added an outdated block to the U2F-based enrollment procedure in https://wikitech.wikimedia.org/wiki/CAS-SSO (as it no longer works). Just menti... [23:01:04] (PuppetDisabled) firing: Puppet disabled on ganeti2033:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=ganeti&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled