[01:48:45] (SystemdUnitFailed) firing: (3) slapd.service Failed on ldap-rw2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:48:45] (SystemdUnitFailed) firing: (3) slapd.service Failed on ldap-rw2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:32:21] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [07:17:21] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [08:34:39] 10Packaging, 10Infrastructure-Foundations: Build and package gnmic - https://phabricator.wikimedia.org/T347461 (10ayounsi) [08:34:55] 10Packaging, 10Infrastructure-Foundations: Build and package gnmic - https://phabricator.wikimedia.org/T347461 (10ayounsi) [08:35:02] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322 (10ayounsi) [08:35:14] 10Packaging, 10Infrastructure-Foundations: Build and package gnmic - https://phabricator.wikimedia.org/T347461 (10ayounsi) [09:08:21] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:13:21] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:18:36] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:23:36] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [09:48:46] (SystemdUnitFailed) firing: (2) upload_puppet_facts.service Failed on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:57:21] (ProbeDown) firing: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:02:21] (ProbeDown) resolved: (2) Service mirror1001:443 has failed probes (http_mirrors_wikimedia_org_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mirror1001:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown [10:53:46] (SystemdUnitFailed) firing: (2) upload_puppet_facts.service Failed on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:53:49] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10cloud-services-team: [spicerack] Add remote command output to log file - https://phabricator.wikimedia.org/T347093 (10Volans) >>! In T347093#9188497, @fnegri wrote: > Is there a task where I can learn more about this? I don't think we have one open... [11:08:45] (SystemdUnitFailed) resolved: (2) upload_puppet_facts.service Failed on puppetmaster1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:29:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322 (10ayounsi) To keep it somewhere for later, on Dell SONiC it should be on the `/openconfig-qos:qos/interfaces` path. Grouping it by sour... [13:38:51] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade asw1-eqsin - https://phabricator.wikimedia.org/T332395 (10ayounsi) FYI, the `mgmt_junos` bug (also present on the fasw) might not be fixed by an upgrade, but maybe with the solution exposed in https://www.reddit.com/r/Juniper/comments/mvq8hf/comment/j7gd... [13:56:19] probably for XioNoX or topranks: but is there any easy way to query all core-routers for static routes, or put differently, run a command on all of them? [13:56:43] sukhe: what kind? [13:57:11] I'm more in the opinion that we should get rid of static routes :) [13:57:24] https://phabricator.wikimedia.org/T300877 and similar [13:57:36] yeah, even the statics in cr2-codfw point to a decommed host :( [13:57:43] for the recdns anycast IP [13:58:08] sukhe: one option is to grep the rancid repository [13:58:36] having backup static routes for anycast doesn't make sens [13:58:44] "show configuration routing-options static" and "show route protocol static terse" are probably good commands [13:59:07] I tend to use rancid / api call from my laptop, not sure if we can run commands on them with cumin [13:59:07] topranks: yep, I wanted to know how do I run them across all core-routers instead of logging into each one of them individually [13:59:11] I created them to be extra safe, but due to the nature of anycast if they start being useful we have bigger problems :) [13:59:39] technically we can run commands with cumin, but I'll leave up to netops to decide what's best here :) [13:59:40] XioNoX: yeah I think let's either update all of them or just remove them if we have decided that it's no longer worthwhile [14:00:03] I meant in particular for auditing them, to be clear [14:00:07] cool, will open a task [14:00:19] volans: I don't think we can easily itterate over all devices [14:01:13] but we can easily get them from netbox via spicerack :D [14:01:14] XioNoX: thanks, let's clean this up as well if that's OK [14:01:18] I am happy doing the legwork [14:02:08] sukhe: no worries I can do it [14:04:14] 10netops, 10Infrastructure-Foundations: Remove static routes for anycast prefixes - https://phabricator.wikimedia.org/T347494 (10ayounsi) [14:04:41] thanks! [14:05:32] 10netops, 10Infrastructure-Foundations: Remove static routes for anycast prefixes - https://phabricator.wikimedia.org/T347494 (10ayounsi) a:03ayounsi [14:05:40] https://phabricator.wikimedia.org/T347494 [14:07:29] 10netops, 10Infrastructure-Foundations: Remove static routes for anycast prefixes - https://phabricator.wikimedia.org/T347494 (10ssingh) Thanks for the task! Do we plan to consolidate https://netbox.wikimedia.org/ipam/prefixes/97/ip-addresses/? That is, if we remove the redundant backup static routes, `10.3.0.... [14:12:21] 10netops, 10Infrastructure-Foundations: Remove static routes for anycast prefixes - https://phabricator.wikimedia.org/T347494 (10ayounsi) Yeah, actually you can use 10.3.0.2/32 for NTP, I won't go through renumbering the syslog VIP.