[00:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:51:40] 10SRE-tools, 10Cassandra, 06SRE: Create cookbook to do `nodetool repair` across cassandra cluster - https://phabricator.wikimedia.org/T225694#9716407 (10LSobanski) @Eevans Tagging with #cassandra in case this may be of interest. [07:43:56] topranks, XioNoX: FYI I've updated netbox-extras on prod but not dev as there are local changes + local commits [07:44:47] volans: thanks [07:45:30] I was working on dev two weeks back but there were also changes, wasn’t sure if it was safe or not so left them also [07:45:52] I think it is, we can catch up with Arzhel when he’s back [07:47:42] yeah no hurry, just FYI [07:48:17] feel free to nuke any of mine at least [07:48:44] what are you doing awake... [07:54:21] indeed - and online! [07:55:00] ok well I guess that’s the all clear volans if you want to zap -next? [07:55:07] or I can have a look [07:55:08] ok I'll do [07:55:16] thanks [07:56:50] {done} [08:47:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:20:38] 10netbox, 10Cumin, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cumin: add backend for Netbox - https://phabricator.wikimedia.org/T205900#9716946 (10Volans) [09:29:07] 10netbox, 10Cumin, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Cumin: add backend for Netbox - https://phabricator.wikimedia.org/T205900#9716976 (10Volans) [09:43:15] 10SRE-tools, 06Infrastructure-Foundations, 10Spicerack, 06SRE: Evaluate options for non-root operations with cumin and spicerack cookbooks - https://phabricator.wikimedia.org/T244840#9717060 (10Volans) Cumin is currently working with the running user from the `cuminunpriv1001` host (after a kinit) towards... [10:07:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:31:24] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#9717511 (10cmooney) FWIW I changed the key-exchange algo configured on mr1-eqsin to see if it would make any difference, from some brief searching the ec21159 one seems to use less cpu than dh group-... [11:34:41] topranks: for the homer failure in the daily email, is there something I should look into or is known/related to magru? [11:36:34] volans: hadn't seen it actually - and no changes that I know of would seem to account for it [11:36:41] let me dig into what's happening [11:39:57] link_data['z_dev'] = a_int.connected_endpoint.device.name [11:39:57] AttributeError: type object 'Devices' has no attribute 'name' [11:40:07] for cr2-eqiad [11:40:50] yeah [11:41:45] at very least we're handling the error badly if it's to do with the source data / something missed [11:41:53] yeah [11:42:05] possibly a validator we should have - I'm just finishing something here I'll take a closer look shortly [11:50:50] no worries, thx [12:17:13] volans: it was to do with magru yeah [12:17:20] ok [12:17:30] new link - dc ops made live to check light levels [12:17:51] interface has a connection - but to a patch panel only, the "z-end" in magru isn't there yet [12:18:24] for now I just set eqiad side int to disabled and homer is happy, I'll have a think about how to prevent this error longer term [12:18:45] it's a tricky one cos really the interface should be 'disabled' as the far-side isn't connected [12:18:59] but we sometimes need to enable it only one side during provisioning to check things [12:21:45] sure sure make sense, no big deal [13:02:44] 10netops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw: use old asw switches from row A and B as msw switches in row C and D - https://phabricator.wikimedia.org/T361871#9717872 (10Papaul) 05Open→03Resolved Since Monday I setup in rack D1 and D2 the juniper switch as management switch and... [14:07:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:20:01] 10SRE-tools, 10Cassandra, 06SRE: Create cookbook to do `nodetool repair` across cassandra cluster - https://phabricator.wikimedia.org/T225694#9718771 (10Eevans) →14Duplicate dup:03T297944 [16:07:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:17:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:17:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:12:25] (SystemdUnitFailed) firing: (2) debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:42:39] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730 (10RobH) 03NEW [22:42:42] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9720829 (10RobH) [22:43:36] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru, 06Traffic: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9720831 (10RobH)