[06:52:49] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#10021526 (10ayounsi) [06:52:50] 10netbox, 06Infrastructure-Foundations: Change icinga link to alerts.w.o in netbox device page - https://phabricator.wikimedia.org/T371079#10021527 (10ayounsi) [07:45:39] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:49:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:00:18] o/ [08:12:34] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513#10021674 (10ayounsi) [08:12:36] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#10021675 (10ayounsi) [08:49:53] filed a quick python/ldap change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1057814 [08:52:42] 07Puppet: Single member group breaks cross validation script - https://phabricator.wikimedia.org/T371221 (10SLyngshede-WMF) 03NEW [08:55:50] of course CI fails, fixing :) [09:28:48] new netbox out: https://github.com/netbox-community/netbox/releases/tag/v4.0.8 nothing special for us [09:44:03] ack! [09:44:26] speaking about netbox - I filed https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1057826 for the mac address, still very WIP but I am wondering if there is a way to test it [09:45:12] elukey: netbox-next [09:45:43] note that none of this can be merged into prod until before the netbox 4 upgrade [09:45:50] deployed I mean [09:46:01] sure sure [09:46:42] so basically I can cherry pick the change directly on netbox-dev and run it (maybe with dryrun or similar, or for a test node) [09:46:46] see what explodes etc.. [09:46:54] yeah [09:47:00] super thanks [09:55:34] 07Puppet, 13Patch-For-Review: Single member group breaks cross validation script - https://phabricator.wikimedia.org/T371221#10022153 (10SLyngshede-WMF) 05Open→03Resolved [10:17:33] 10netbox, 06Infrastructure-Foundations: Change icinga link to alerts.w.o in netbox device page - https://phabricator.wikimedia.org/T371079#10022270 (10Volans) My 2 cents would be to keep both for now as in icinga you can also see what are the icinga checks configured for that device. [12:11:24] for netbox upgrade, reminder that the deploy host was switched today, so make sure everything related is in sync ;) [13:23:01] good to know, thx [13:26:46] volans: XioNoX: hi, I have sent a couple patches to change the `git::clone` for homer public and private repos, and they should not affect anything hopefully: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056981 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056985/ ;) [13:27:55] nice! I'll let someone who knows this better than me review it [13:28:31] I am not sure who owns that, I assumed Homer would be XioNoX and since the repos are cloned on cumin hosts that would naturally involves volvans [13:28:31] :D [13:44:34] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox automation to move selected hosts from ASW to LSW - https://phabricator.wikimedia.org/T370846#10022944 (10cmooney) [14:28:55] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Upgrade anycast-healthchecker to 0.9.8 (from 0.9.1-1+wmf12u1) - https://phabricator.wikimedia.org/T370068#10023237 (10cmooney) p:05Triage→03Medium [14:29:47] 10netbox, 06Infrastructure-Foundations: Netbox logs filling up disk, netbox1002 - https://phabricator.wikimedia.org/T371036#10023244 (10joanna_borun) p:05Triage→03Low [14:29:49] 10netbox, 06Infrastructure-Foundations: Netbox logs filling up disk, netbox1002 - https://phabricator.wikimedia.org/T371036#10023245 (10ayounsi) [14:29:54] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#10023246 (10ayounsi) [14:30:28] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: lvs2013: move uplink to lsw1-c2-codfw and connect to per-rack vlan - https://phabricator.wikimedia.org/T370927#10023247 (10cmooney) p:05Triage→03Medium [14:38:02] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 13Patch-For-Review: gNMI module in Spicerack - https://phabricator.wikimedia.org/T344325#10023279 (10ayounsi) 05Open→03Stalled p:05High→03Low [14:39:31] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#10023289 (10Volans) a:05Volans→03None [14:40:25] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#10023290 (10elukey) [14:41:49] 10SRE-tools, 06DC-Ops, 06Infrastructure-Foundations: Cookbook sre.hardware.upgrade-firmware fails to get firmwares from Dell's website - https://phabricator.wikimedia.org/T357756#10023295 (10joanna_borun) p:05High→03Medium [14:56:00] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Request additional mgmt IP range for frack servers - https://phabricator.wikimedia.org/T370164#10023429 (10cmooney) 05Open→03Resolved Gonna close this one, I see hosts have been assigned to the new range and are reachable ` cmo... [17:23:24] XioNoX: this seems interesting https://github.com/netbox-community/netbox/discussions/12610 [17:26:15] (will check more tomorrow) [17:26:19] * elukey afk [17:59:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:00:39] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:15:39] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:19:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:49:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:50:39] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:04:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:09:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:54:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:59:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:44:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:45:39] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:00:39] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:04:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:25:16] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#10025490 (10jhathaway) Sorry for the delay in looking at this, from our logs it looks like th... [21:30:13] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#10025507 (10jhathaway) Opened a ticket, #103465 [21:34:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:39:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:50:39] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [21:54:22] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:24:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:14:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:15:39] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:29:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:34:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed