[01:39:06] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team: git-sync-upstream failing - https://phabricator.wikimedia.org/T336263 (10Andrew) 05Open→03Resolved [05:40:42] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:55:42] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:17:38] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: Access port speed <= 100Mbps False positives - https://phabricator.wikimedia.org/T336511 (10ayounsi) [06:18:24] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: Access port speed <= 100Mbps False positives - https://phabricator.wikimedia.org/T336511 (10ayounsi) I muted the alert for now until we can get to the bottom of it as it was spamming too much. [06:25:14] 10netbox, 10DC-Ops, 10Infrastructure-Foundations: Netbox device's platform field inconsistency - https://phabricator.wikimedia.org/T336623 (10ayounsi) Do we use it for anything? [07:18:55] 10netbox, 10DC-Ops, 10Infrastructure-Foundations: Netbox device's platform field inconsistency - https://phabricator.wikimedia.org/T336623 (10Volans) We do have a bunch of platforms in Netbox, from a quick grep around the `tools/ganeti-netbox-sync.py` script in the netbox-extra repository does set it to `Lin... [07:23:16] 10netbox, 10DC-Ops, 10Infrastructure-Foundations: Netbox device's platform field inconsistency - https://phabricator.wikimedia.org/T336623 (10ayounsi) Sounds good if neither humans nor automation uses it, let's remove it and add a validator to make sure we don't accidentally set it. Afaik we use the manufac... [08:11:30] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) [09:40:36] 10netops, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MatthewVernon) [12:35:19] 10netops, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10MatthewVernon) [12:41:49] 10netops, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10Jelto) [13:27:15] slyngs: I'm about to merge my dhcp patches, is the testing of the physical host still on? [13:27:28] Absolutely :-) [13:27:30] lmk if you want to run it or should I [13:27:54] If you have time please go ahead. [13:28:07] where is your code? cumin1001 or 2002? [13:28:15] cumin1001 [13:28:40] In the "cookbooks" dir in my homedir [13:28:44] got it [13:28:45] thx [13:28:58] FYI I'll be reimaging sretest1001 shortly, anyone using it? [13:29:25] If something blows up, feel free to blame me [13:31:33] :D [13:49:23] 10netops, 10DBA, 10Data-Platform-SRE, 10Infrastructure-Foundations, and 9 others: codfw row D switches upgrade - https://phabricator.wikimedia.org/T335042 (10ssingh) [13:53:17] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) I've tested a reimage of a physical host and worked fine, we still have a bit of duplication of requests, do you s... [13:53:50] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10Volans) [13:55:09] slyngs: I had to change one line, commenting on the CR. Apart from that so far it's running, the DHCP worked fine, I'll let you know the reimage [13:55:14] if completes fine [14:22:00] slyngs: reimage completed [14:22:12] Wuhuuu :-) [14:22:19] posted the one line fix [14:22:24] i've made in your hoem [14:22:54] I'll update the patch [14:23:23] thx [14:30:25] volans LMK if you have time to go over cookbook testing, I'm getting an import error but guessing it's something silly [14:31:06] inflatador: sure, currently in a meeting so I might reply with some delay [14:36:50] volans np, paste is here whenever you're finished https://phabricator.wikimedia.org/P48231 [14:39:02] replied to the paste [14:41:37] volans already tried that...but I got it to work by writing out the whole homedir instead of using `~` . Not sure why...I've done testing this way before and never had to do that [14:42:23] Thanks for the help though, I'm sure I'll have more questions ;P [14:42:35] I'll check it out later but IIRC paths should accept the user expansion [14:48:26] I thinking it had to do with sudo...probably would work with `sudo -E` [14:48:54] yup, confirmed [14:49:02] right, that makes sense [15:02:59] 10SRE-tools, 10netops, 10Infrastructure-Foundations, 10SRE: Setup zero touch provisioning (ZTP) for network devices - https://phabricator.wikimedia.org/T336485 (10ayounsi) @Volans is it possible to have a full pcap of those `unknown network segment` ? [16:28:42] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:52:29] topranks, XioNoX: so the new switches in codfw don't have the em0 iface, I guess you added it to the templated bu not the existing hosts [16:52:33] should I just add it? [16:53:05] huh really ?? [16:53:18] https://netbox.wikimedia.org/dcim/devices/4558/ [16:53:23] sorry, you mean in Netbox [16:53:26] yes [16:53:35] sorry I didn't specify :D [16:53:42] np, I'll add it now [16:53:53] to all the ones that are missing? [16:55:08] I can do that, was just trying that one via gui though [16:55:18] and I'm hitting some odd problem, wonder if it's validation related [16:55:55] it could be [16:56:22] it should be possible to add them all via GUI using as a selector for example the device type + status [16:56:43] which error are you getting? [16:56:48] ok so it worked, it would not let me add them unless they were "enabled" [16:56:54] fair [16:57:02] bu weird [16:57:06] checkign the validator [16:57:14] it was popping up saying "count_ipaddresses cannot be none" [16:57:28] and then also saying "name field must be completed" - but I had it in [16:57:56] I'll try to add the remaining ones via API, with other validation errors it's been easier to see what's happening that way [16:58:09] some of them in the GUI just flash up as a little popup and disappear [16:58:20] ok, asyou want [16:58:52] from a quick check the new validator is only checking that on disabled ifaces a bunch of attributes are not set [16:58:59] just that [16:59:27] hmm.. maybe a race condition with cound_ipaddresses then, I assume it checks there are no IPs [16:59:30] I'll dig into it [17:00:17] ack, thx! [19:43:57] 10SRE-tools, 10Infrastructure-Foundations, 10SRE, 10Spicerack, 10Discovery-Search (Current work): Create cookbook to reindex into elasticsearch / cirrus - https://phabricator.wikimedia.org/T219507 (10bking) Moving this out of the current work, but this is still a priority for us. Will revisit next quarter. [19:44:05] 10SRE-tools, 10Discovery-Search, 10Infrastructure-Foundations, 10SRE, 10Spicerack: Create cookbook to reindex into elasticsearch / cirrus - https://phabricator.wikimedia.org/T219507 (10bking) [20:28:42] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:33:42] (SystemdUnitFailed) resolved: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed