[02:15:34] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:15:34] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:12:50] topranks: FYI the homer-diff email has changes pending for 8 core routers, I think is related to some recent changes you made [09:13:57] huh yeah let me look, thought I’d pushed that out to them all [09:22:44] thx [09:23:44] jobo: quick one for the team interface, I noticed that https://wikitech.wikimedia.org/wiki/SRE/Infrastructure_Foundations and the pages underneath it don't show up in https://wikitech.wikimedia.org/wiki/Category:SRE_Infrastructure_Foundations , should we add the category to them too? [09:58:17] Good catch. Yes, I’ll do that [09:59:27] PSA: volans|off != volans|on [10:05:12] don't think the regex is properly escaped there it seems to be failing [10:15:34] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:36:14] 10netops, 10Infrastructure-Foundations, 10SRE: Add BGP to protocols contributing to aggregates - https://phabricator.wikimedia.org/T351456 (10cmooney) 05Open→03Resolved a:03cmooney [12:41:36] 10netops, 10Ganeti, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Investigate Ganeti in routed mode - https://phabricator.wikimedia.org/T300152 (10ayounsi) Cluster and cluster group created in Netbox : https://netbox.wikimedia.org/virtualization/cluster-groups/71/ Next (on Monday?) merge the... [13:23:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate hosts from codfw row A/B ASW to new LSW devices - https://phabricator.wikimedia.org/T355544 (10cmooney) [13:24:27] 10netops, 10Data-Persistence, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw - https://phabricator.wikimedia.org/T355549 (10cmooney) 05Open→03Resolved a:03cmooney All done, things working well on the new switches / EVPN vlans :) [14:15:34] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:19:35] topranks any objection to me running thru https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Move_existing_server_between_rows/racks,_changing_IPs for the canary host (cloudelastic1010) in the next hour or so? I think I can do it myself, but happy to accept help or wait until next week [15:20:18] plan is https://etherpad.wikimedia.org/p/cloudelastic-T355617 and I believe I just need to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/992547 before starting the process from the wikitech link [15:20:19] inflatador: none at all yeah feel free, if there are any questions just ping me :) [15:20:20] T355617: Migrate cloudelastic from public to private IPs - https://phabricator.wikimedia.org/T355617 [15:20:50] topranks excellent, thanks [16:40:55] topranks if you're still around, I deleted the interfaces from NB but not sure about cable ID/VLAN . Delete changelog is here: https://netbox.wikimedia.org/extras/changelog/?request_id=e72681ac-6817-4527-af21-f59f18d1ff44 [16:47:03] nm. looks like 18 based on https://netbox.wikimedia.org/dcim/devices/615/interfaces/ 's change log [16:50:48] inflatador: let me look [16:51:01] cable won't have changed so that stays the same (I should be able to get from log of deletion) [16:51:11] topranks ACK thanks, I tried running the script but it auto-reverted [16:51:18] vlan will depend on location but will be "private1--" [16:52:01] https://netbox.wikimedia.org/extras/changelog/157735/ [17:01:40] topranks I'm a dumbass...wasn't committing the changes [17:01:43] inflatador: I can't seem to find the old cable ID, but it's not a big deal I think. [17:01:46] everything's looking good now [17:01:52] ahaha ok [17:02:09] I never got to the "fixing" part, too busy looking for that cable ID [17:02:11] :) [17:02:25] the "new" one says 4911, so probably matching the old if it rolled back [17:03:07] inflatador: port looks good, IP assignment etc. [17:03:37] I think you need to run the sre.network.configure-switch-interfaces as per the steps but should be fine [17:05:34] Awesome thanks, again. I think the cable ID was 6897 based on the changelogs, but we can revisit as needed [17:21:58] Everything looking good so far. AFK for ~40 while it reimages [18:01:19] back. Everything looks great! Thanks again top-ranks and have a great wkend [18:15:34] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:15:34] (SystemdUnitFailed) firing: generate_os_reports.service Failed on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed