[00:04:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:08:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:09:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:13:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:14:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:23:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:33:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:34:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:53:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:08:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:13:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:14:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:33:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:49:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:53:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:08:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:09:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:28:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:33:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:38:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:43:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:48:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:49:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:53:30] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:08:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:09:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:23:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:26:18] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:28:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:38:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:43:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:49:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:54:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:03:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:13:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:14:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:18:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:23:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:29:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:34:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:38:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:39:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:48:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:49:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:58:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:59:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:03:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:04:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:08:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:18:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:19:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:23:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:24:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:38:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:40:01] _joe_: any idea what's up with this spam? ^ (it says conftool so I think about you :) ) [05:40:14] <_joe_> XioNoX: think about jbond instead :) [05:40:30] <_joe_> that's his work on config-master, I'd disable alerts tbh [05:40:38] <_joe_> it's spamming #-operations too [05:43:31] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:44:28] (SystemdUnitFailed) firing: (21) dump-conftool-pools.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:44:58] noted, thx, will mute [06:09:28] (SystemdUnitFailed) firing: (20) envoyproxy.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:11:52] hahaha, and now it's a different one [06:14:28] (SystemdUnitFailed) firing: (20) envoyproxy.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:18:31] (SystemdUnitFailed) firing: (20) envoyproxy.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:19:28] (SystemdUnitFailed) firing: (20) envoyproxy.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:23:49] muted it too for 3 days (cc jbond ) [06:24:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:28:58] hahhaa [06:29:07] I give up [06:29:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:43:31] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:44:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:44:29] 10netops, 10Infrastructure-Foundations, 10Patch-For-Review: Adjust routing policy to increase SSH session speed from East Asia to toolforge - https://phabricator.wikimedia.org/T334530 (10ayounsi) a:03ayounsi [06:52:29] 10Puppet, 10netops, 10Infrastructure-Foundations, 10SRE, 10good first task: Routinator: use tmpfs - https://phabricator.wikimedia.org/T300955 (10ayounsi) [06:58:31] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:59:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:04:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:08:31] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:14:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:18:31] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:23:31] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:24:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:28:31] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:28:59] 10Puppet, 10netops, 10Infrastructure-Foundations, 10SRE, and 2 others: Routinator: use tmpfs - https://phabricator.wikimedia.org/T300955 (10ayounsi) a:03ayounsi @MoritzMuehlenhoff is it ok to bump the RAM from 4G to 6G on the rpki* VMs? https://netbox.wikimedia.org/virtualization/virtual-machines/?q=rpki [07:29:28] (SystemdUnitFailed) firing: (19) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:33:31] (SystemdUnitFailed) firing: (20) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:34:28] (SystemdUnitFailed) firing: (20) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:37:48] 10Puppet, 10netops, 10Infrastructure-Foundations, 10SRE, and 2 others: Routinator: use tmpfs - https://phabricator.wikimedia.org/T300955 (10MoritzMuehlenhoff) >>! In T300955#9086089, @ayounsi wrote: > @MoritzMuehlenhoff is it ok to bump the RAM from 4G to 6G on the rpki* VMs? https://netbox.wikimedia.org/v... [07:38:31] (SystemdUnitFailed) firing: (20) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:39:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Implement better filter on BGP_Customer_out - https://phabricator.wikimedia.org/T340448 (10ayounsi) a:03ayounsi [07:39:28] (SystemdUnitFailed) firing: (20) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:43:31] (SystemdUnitFailed) firing: (21) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:44:28] (SystemdUnitFailed) firing: (21) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:24] 10Puppet, 10netops, 10Infrastructure-Foundations, 10SRE, and 2 others: Routinator: use tmpfs - https://phabricator.wikimedia.org/T300955 (10ops-monitoring-bot) VM rpki2002.codfw.wmnet rebooted by ayounsi@cumin1001 with reason: None [07:48:31] (SystemdUnitFailed) firing: (21) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:53:31] (SystemdUnitFailed) firing: (21) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:54:39] 10Puppet, 10netops, 10Infrastructure-Foundations, 10SRE, and 2 others: Routinator: use tmpfs - https://phabricator.wikimedia.org/T300955 (10ops-monitoring-bot) VM rpki1001.eqiad.wmnet rebooted by ayounsi@cumin1001 with reason: bump ram to 6g [07:58:31] (SystemdUnitFailed) firing: (21) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:03:31] (SystemdUnitFailed) firing: (21) fstrim.service Failed on config-master1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:05:31] XioNoX: thanks i have added in a silence now so hopefully it should be gone and will be looking at this today sorry for the noise all [08:05:50] eh, I was in the middle of adding a silence for the whole hosts [08:06:01] yes thats what i have done [08:06:07] cool, thanks! [08:06:16] np [08:21:05] jbond: interested if you'd any thoughts [08:21:28] gdnsd is reporting a drop in TCP connections since the ns2 IP change yesterday [08:21:39] across all our instances, dns300x but also all others [08:22:38] everything seems to be working just fine, myself and sukhe can't really think why it is [08:22:57] it's just on the gdnsd "tcp requests" exposed counter, kernel-reported tcp conns don't show the same pattern [08:23:53] i made a quick temp dashboard looking at dns1004 for instance: [08:23:53] https://grafana-rw.wikimedia.org/d/Bv-Zik-Vz/cathal-host-network-temp?from=now-24h&orgId=1&refresh=5m&to=now&var-host=dns1004:9100&var-site= [08:24:13] you can see the pattern on the auth dns one too: [08:24:13] https://grafana.wikimedia.org/d/Jj8MztfZz/authoritative-dns?orgId=1&refresh=30s&var-datasource=thanos&var-server=All&from=now-2d&to=now [08:30:02] topranks: is 198.35.27.27 now been anycasted i see traffic on dns[4-6]* after the change [08:30:29] yep... well it was already being anycasted but that's the IP we set for ns2 [08:30:36] so 4-6 fired into being [08:30:56] ack thats what io thoght just wanted to make sure [08:31:06] and what specuific alert is fiering? [08:31:16] there is no alert firing [08:31:43] just a roughly 50% drop in the value reported by gdnsd_tcp_reqs metric [08:31:50] so this graph? https://grafana.wikimedia.org/d/Jj8MztfZz/authoritative-dns?orgId=1&refresh=30s&var-datasource=thanos&var-server=All&from=now-2d&to=now&viewPanel=16 [08:31:50] the "this graph looks weird" alert [08:31:57] haha [08:32:22] yeah I'd put it down to some instrumentation problem, but it's clearly the result of the IP change elsewhere so *something* has changed [08:33:14] ohhh [08:33:20] I might have an idea [08:33:51] could it be that the anycast IP didn't get whitelisted somewhere in the gdnsd config? [08:34:23] it's perhaps that it's not being counted by the process for that metric [08:34:27] hmm, no, if it was that we wouldn't see any tcp traffic on ulsfo or drmrs [08:34:33] it's certainly answering requests to that IP from all the dns hosts [08:36:10] to confirm the old ip address is still reachable [08:36:36] yep 100% [08:36:43] and ahh yes its still getting data and returning metricts https://grafana-rw.wikimedia.org/d/Jj8MztfZz/authoritative-dns?orgId=1&refresh=30s&var-datasource=thanos&var-server=All&from=now-2d&to=now&forceLogin&viewPanel=194 [08:36:47] will be until Sunday when we're on site and decom cr3 [08:37:03] dns3001 is also part of the anycast setup [08:37:23] we won't see actual level of residual queries to old ns2 IP until I withdraw the Anycast prefix from esams [08:37:27] could it be that the traffic simply moved to udp because $reason [08:37:29] (in about 30-40 mins time) [08:37:52] i do see a small increase of about 1k udp at the same time as the drop but hard to see [08:37:57] well yeah, TCP is a fall-back mode for DNS, so I did consider some weird issue (mtu or something) was suddenly no longer present and less of that [08:38:06] that's about the level of the drop [08:38:25] what I don't get with that explanation is we see it across the board at all sites and all dns hosts [08:38:46] I don't think it's worth spending too much time on, but I would like to understand [08:38:57] I've seen nothing to suggest any actual dns issues are present [08:39:59] i think there are a couple of things. 1 we introduced the anycast prefix which would naturaly mean that traffic shiffted from all dns[1-3] to dns[4-6] which we do see. this would explain why we see some drop on all serveres [08:40:17] then there is the possible shift from tcp to udp which is a bit harder to spot but possible [08:41:14] it is strange though [08:41:26] true... but on aggregate we've only 50% of the tcp requests that we had [08:41:52] I did also wonder if there were some sort of long-lived tcp conns disrupted or something, but given all IPs are still reachable anything like that should be ok [08:44:34] im not sure what logs we have but as a next step we could look at the tcp top talkers from yesterday and today [08:44:46] if there is something obvioulsy missing ewe could see if they moved to udp [08:45:40] it could be one or two isp cache in some large eu isp that for some reason didn;t have udp connectivity but dose now [08:45:44] if we look at a longer time scale, we're back at the pre july rate - https://grafana.wikimedia.org/d/Jj8MztfZz/authoritative-dns?orgId=1&var-datasource=thanos&var-server=All&from=now-90d&to=now&forceLogin&refresh=30s&viewPanel=188 [08:46:05] maybe a question is, why did it increase early july [08:46:46] the udp increase is easier to spot here https://grafana-rw.wikimedia.org/d/Jj8MztfZz/authoritative-dns?orgId=1&var-datasource=thanos&var-server=All&from=1691672308763&to=1691701193220&forceLogin&viewPanel=241 [08:47:36] ah nevermind, it's jsut that the eqiad dns host got re-imaged/re-named so we just don't have their data before early july [08:47:56] jbond: "Panel not found" [08:48:42] https://grafana-rw.wikimedia.org/d/dzVQfBe4k/authoritative-dns-jbond?orgId=1&viewPanel=241 [08:48:48] try that [08:49:02] alos added to https://grafana-rw.wikimedia.org/d/dzVQfBe4k/authoritative-dns-jbond?orgId=1 [08:49:12] just below the tcdp graph [08:49:28] so ~1k ops/s bump [08:49:41] which exactly matches the tcp drop [08:49:41] yes seems to be it just shifted to udp [08:49:56] I wonder - when a DNS over UDP query fails - and a system decides it's gonna use TCP instead [08:50:25] In different implementations, does the software internally cache "this server failed on udp, use tcp with it" [08:50:53] and thus, due to some past glitch, we had a bunch of dns resolvers out there using TCP for the old ns2 IP [08:50:59] which returned to udp when it changed [08:51:24] topranks: yes serveres will cache that typoe of faliure [08:51:24] (it's a neat theory but I don't know how it explains tcp drop on dns100x/dns200x which weren't serving ns2 IP) [08:51:35] however they should periodicly retry afaik [08:52:08] yep, but probably the kind of behaviour implemented differently, and perhaps poorly, in different software [08:52:35] topranks: 500 tcp quieres where going to dns300* [08:52:53] the drop in the other serveres is closer to 100 [08:53:17] and the new serveres [4-5] gained about 50 qps [08:53:24] okk [08:53:37] so i think we have both a shift of traffic due to better latency [08:53:46] and a shift to udp because of reasons [08:53:57] $reasons [08:54:10] most common ones are a combination of mtu + ipv6 + edns buffer size [08:54:20] we can leave it s a welcome present for bblack [08:54:32] yeah.... good spot on the uptick in udp, fact it is the same order of magnitude as tcp drop definitely helps explain what's happened [08:54:49] however all our responses should be prrey small likley below 512 so it is still a bit strange to see the shift to udp or why things neede tcp in the first place [08:54:51] suk.he thinks we're all in for a scolding from b.black upon his return anyway :P [08:55:07] lol [08:55:41] nothing is on fire so far [08:55:44] cool thanks for the insight [08:55:52] yeah we'd know if there was an issue [08:56:01] agree [08:56:09] in fact - the theory about servers failing before and being "stuck" on TCP for ns2 IP [08:56:28] some software may do that for the entire RRset, i.e. all the NS entries ns0/ns1/ns2, when even one fails [08:56:31] (wild speculation) [08:56:50] we can move on I thnk thanks guys :) [08:57:23] topranks: its definetly possible but i wo=uldnt expect that from a decent and up to date pice of software, but there are many isps that do not run such stacks so yes its possible [08:57:44] 100% [08:58:30] perhaps one of the ddos scubberes was dropping udp traffic to the ns2 ip, forcing things to go via tcp. [08:58:39] ah [08:58:50] that is a very interesting theory [08:59:38] XioNoX: CF isn't enabled anywhere is it? I know you made a change to those ACLs yesterday [08:59:55] nah, not enabled since at least a week ago [09:00:11] and the current rules allow udp to the authdns VIPs [09:00:37] arelion/LG notify us when mitigation is ongoing [09:01:08] yeah that's as I understood it [09:03:15] anyway - esams depool is pushed to the dns servers [09:03:25] I'll give it 20 mins then withdraw the anycast prefixes [11:02:25] morning folks :) [11:02:33] * sukhe missed the fun [11:07:22] if we are talking about weird timing, at 16:19 UTC yesterday (when we see the sharp drop), there was another event that had just taken place [11:07:42] 12:19:08 < sukhe> https://developers.google.com/speed/public-dns/cache [11:07:45] 12:19:13 < sukhe> we can flush the cache [11:07:48] 12:19:51 < sukhe> done [11:07:51] 12:19 EST is 16:19 UTC [11:08:43] I had purged the cache for ns2 on google DNS [11:09:06] it's close, but I do suspect that the initial drop had already started [12:04:04] 10Puppet, 10netops, 10Infrastructure-Foundations, 10SRE, 10good first task: Routinator: use tmpfs - https://phabricator.wikimedia.org/T300955 (10ayounsi) 05Open→03Resolved All done! [12:33:31] (SystemdUnitFailed) firing: (7) clean-confd-rundir.service Failed on aux-k8s-worker1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:36:39] * jbond looking [12:48:31] (SystemdUnitFailed) firing: (7) clean-confd-rundir.service Failed on aux-k8s-worker1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:01:27] jbond: was reading the scrollback, thanks for the insight [14:01:56] as for why TCP, we do have some signficant traffic from Google towards our authdns'es over 853 (DoT) [14:02:18] Google Public DNS htat is [16:25:39] 10netops, 10Infrastructure-Foundations, 10SRE: Add per-output queue graphing for Juniper network devices in LibreNMS - https://phabricator.wikimedia.org/T326322 (10ayounsi) Next steps here: * Decide which hosts will run gnmic, I can think of 4 options: ** netflowXXXX (my preferred option, as already monitori... [16:26:55] 10netops, 10Infrastructure-Foundations, 10SRE: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322 (10ayounsi) [16:37:44] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10observability: Prometheus: ingest SONiC metrics - https://phabricator.wikimedia.org/T335027 (10ayounsi) After more investigation, I'm going to roll out gNMIc for more real life testing. As it's multi-platform and should export the... [16:49:28] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:53:31] (SystemdUnitFailed) firing: (3) debmonitor-maintenance-gc.service Failed on debmonitor2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed