[00:03:29] (SystemdUnitFailed) resolved: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:08:29] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:19:52] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_eventlogging_analytics.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:48:28] 10Data-Engineering-Planning, 10Data Pipelines: Add Python Linter Checks to CI - https://phabricator.wikimedia.org/T318346 (10odimitrijevic) @lbowmaker now that we are on the other side of the Oozie migration can we prioritize for the next sprint? [02:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:23:37] 10Data-Engineering, 10Observability-Alerting: Exclude jupyterhub singleuser services from the systemd unit failure alerts - https://phabricator.wikimedia.org/T336951 (10fgiunchedi) Thank you for reaching out @BTullis ! I'll enumerate what I think are our options: 1. Make sure transient units are collected, i.... [08:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:49:42] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 14 A): Define Service Level Objective (SLO) for mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T333833 (10gmodena) a:03gmodena [10:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:00:32] 10Data-Engineering-Planning, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Upgrade the spark YARN shuffler service on Hadoop workers from version 2 to 3 - https://phabricator.wikimedia.org/T332765 (10BTullis) Uff! I've made a fundamental error. The jar file that I've included in conda-analy... [11:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:12:42] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10gmodena) >>! In T336656#8861728, @Ottomata wrote: >> s3://-/ (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:43:13] 10Data-Engineering-Planning, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10gmodena) @Ottomata should we adopt this naming conventions also for DSE? [11:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:06:29] (03PS1) 10Nik Gkountas: show a warning for unused languages with translated messages over 75% [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921329 [12:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:17:34] (03PS2) 10Amire80: WIP Show a warning for unused languages with translated messages over 75% [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921329 (owner: 10Nik Gkountas) [12:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:30:32] (03PS3) 10Amire80: WIP Show a warning for unused languages with localization over 75% [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921329 (https://phabricator.wikimedia.org/T336752) (owner: 10Nik Gkountas) [12:33:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:35:16] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Wikimedia-Hackathon-2023: Add several new languages to wikistats - https://phabricator.wikimedia.org/T337029 (10Amire80) [12:35:37] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Wikimedia-Hackathon-2023: Add several new languages to wikistats - https://phabricator.wikimedia.org/T337029 (10Amire80) a:03Amire80 [12:37:19] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Wikimedia-Hackathon-2023: Add several new languages to wikistats - https://phabricator.wikimedia.org/T337029 (10Amire80) [12:38:04] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Wikimedia-Hackathon-2023: Add several new LTR nguages to wikistats (May 2023) - https://phabricator.wikimedia.org/T337029 (10Amire80) [12:38:14] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Wikimedia-Hackathon-2023: Add several new LTR languages to wikistats (May 2023) - https://phabricator.wikimedia.org/T337029 (10Amire80) [12:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:48:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:53:14] (03PS1) 10Amire80: Add several languages [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921337 (https://phabricator.wikimedia.org/T337029) [12:53:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:58:06] (03PS1) 10Amire80: Remove zh.json [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921339 (https://phabricator.wikimedia.org/T337029) [12:58:32] (03PS4) 10Nik Gkountas: Show a warning for unused languages with localization over 75% [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921329 (https://phabricator.wikimedia.org/T336752) [13:03:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:06:55] 10Data-Engineering, 10Data-Engineering-Wikistats, 10translatewiki.net, 10I18n, and 2 others: Automate adding languages to wikistats - https://phabricator.wikimedia.org/T336752 (10Amire80) [13:08:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:10:10] 10Data-Engineering, 10Data-Engineering-Wikistats, 10Patch-For-Review, 10Wikimedia-Hackathon-2023: Add several new LTR languages to wikistats (May 2023) - https://phabricator.wikimedia.org/T337029 (10Amire80) [13:10:34] 10Data-Engineering-Planning, 10Data-Engineering-Wikistats, 10translatewiki.net, 10I18n: Addition of Slovene language in the WikiStats 2.0 language selector - https://phabricator.wikimedia.org/T320356 (10Amire80) [13:10:59] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n: Wiki selector: "All wikis" is not translated - https://phabricator.wikimedia.org/T246911 (10Amire80) [13:14:40] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n, 10Wikimedia-Hackathon-2023: Untranslatable strings in Wikistats - https://phabricator.wikimedia.org/T337032 (10Amire80) [13:15:00] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n: Wiki selector: "All wikis" is not translated - https://phabricator.wikimedia.org/T246911 (10Amire80) [13:15:02] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n, 10Wikimedia-Hackathon-2023: Untranslatable strings in Wikistats - https://phabricator.wikimedia.org/T337032 (10Amire80) [13:18:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:19:34] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n: In Wikistats, `npm run dev` only shows strings in English - https://phabricator.wikimedia.org/T337033 (10Amire80) [13:20:23] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n: In Wikistats, `npm run dev` only shows strings in English - https://phabricator.wikimedia.org/T337033 (10Amire80) [13:23:00] !log restart monitor_refine_eventlogging_analytics.service on an-launcher1002 [13:23:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:23:29] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:23:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:28:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:33:29] (SystemdUnitFailed) resolved: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:38:29] (SystemdUnitFailed) firing: (2) monitor_refine_eventlogging_analytics.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:47:36] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 14 A), 10Patch-For-Review: mediawiki-page-content-change-enrichment checkpoints should be stored in Swift - https://phabricator.wikimedia.org/T336656 (10Ottomata) > ? I don't have strong feelings about _ vs -, I have a semi strong feeling, mostly ar... [13:48:29] (SystemdUnitFailed) resolved: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:48:35] 10Data-Engineering-Planning, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) I think it will be easier to not change in in DSE for now, since our i... [13:49:21] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 13): Add Python Linter Checks to CI - https://phabricator.wikimedia.org/T318346 (10lbowmaker) [13:50:02] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 13): Add Python Linter Checks to CI - https://phabricator.wikimedia.org/T318346 (10lbowmaker) @odimitrijevic sure! I added to this current sprint (it was already estimated). If someone has time this sprint they can work on it or it will rollover to next. [13:50:07] 10Data-Engineering-Planning, 10serviceops, 10Event-Platform Value Stream (Sprint 14 A), 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10Ottomata) Oh but let's set job_name to match the same in wikikube as we decided... [13:53:29] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:03:29] (SystemdUnitFailed) resolved: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:04:03] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n, 10Wikimedia-Hackathon-2023: Untranslatable strings in Wikistats - https://phabricator.wikimedia.org/T337032 (10Amire80) [14:08:29] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:16:08] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n, 10RTL, 10Wikimedia-Hackathon-2023: Support right-to-left languages in Wikistats - https://phabricator.wikimedia.org/T251376 (10Amire80) [14:18:29] (SystemdUnitFailed) resolved: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:23:29] (SystemdUnitFailed) firing: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:33:29] (SystemdUnitFailed) resolved: rsync-published.service Failed on stat1009:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:36:45] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure (Q4 Wrap up): Bring stat1009 into service - https://phabricator.wikimedia.org/T336036 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=58b1da63-7fca-4cbc-8725-c1ba80542dd9) set by stevemunene@cumin1001 for 7 days, 0:00:0... [15:16:12] 10Data-Engineering, 10Observability-Alerting: Exclude jupyterhub singleuser services from the systemd unit failure alerts - https://phabricator.wikimedia.org/T336951 (10BTullis) Thanks for the summary @fgiunchedi I agree that option 1 is probably the best option here. I didn't know about the `CollectMode` op... [15:17:02] 10Data-Engineering, 10Observability-Alerting: Reduce alert noise associated with individual users' jupyterhub-singleuser services - https://phabricator.wikimedia.org/T336951 (10BTullis) [15:17:28] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce alert noise associated with individual users' jupyterhub-singleuser services - https://phabricator.wikimedia.org/T336951 (10BTullis) p:05Triage→03Low [15:31:55] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting, 10Patch-For-Review: Reduce alert noise associated with individual users' jupyterhub-singleuser services - https://phabricator.wikimedia.org/T336951 (10BTullis) a:03BTullis [16:05:26] 10Data-Engineering, 10Product-Analytics, 10Research: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10Mayakp.wiki) [16:10:35] 10Data-Engineering, 10Product-Analytics, 10Research: Investigate relation of UA deprecation to increase in automated traffic and reduction in unique devices - https://phabricator.wikimedia.org/T336715 (10Mayakp.wiki) [16:14:38] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10BTullis) [16:29:04] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10Ottomata) FWIW the monitor_refine_* are a safety step for noticing that something is wrong. The refine_* j... [16:37:13] (DiskSpace) firing: Disk space stat1008:9100:/srv 5.924% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1008 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [16:37:49] ^ looking [16:45:11] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10BTullis) >>! In T337052#8865325, @Ottomata wrote: > For IRC spam, I think it would be fine to turn off IRC... [16:46:57] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10mforns) I believe when we migrate Refine to Airflow the number of alerts will diminish. One reason for that... [17:00:53] https://usercontent.irccloud-cdn.com/file/Fgwps26x/image.png [17:07:13] (DiskSpace) resolved: Disk space stat1008:9100:/srv 5.95% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1008 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:07:25] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure: Send a critical alert to data-engineering if produce_canary_events isn't running correctly - https://phabricator.wikimedia.org/T337055 (10BTullis) [17:08:14] 10Data-Engineering: NEW BUG REPORT remove mysql databases from SQLLab - https://phabricator.wikimedia.org/T337056 (10Mayakp.wiki) [17:08:29] 10Data-Engineering: NEW BUG REPORT remove mysql databases from SQLLab - https://phabricator.wikimedia.org/T337056 (10Mayakp.wiki) p:05Triage→03Low [17:11:51] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10Ottomata) +1 [17:12:15] At least it doesn't look like it's going up steadily. This is the last 30 days. [17:12:20] https://usercontent.irccloud-cdn.com/file/SoLW7mLY/image.png [17:12:27] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=stat1008&var-datasource=thanos&var-cluster=analytics&viewPanel=12&from=now-30d&to=now [17:12:27] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure: Send a critical alert to data-engineering if produce_canary_events isn't running correctly - https://phabricator.wikimedia.org/T337055 (10Ottomata) > We have deployed a [[ https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/89464... [17:12:35] I think we can leave it for the weekend. [17:15:26] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure: Send a critical alert to data-engineering if produce_canary_events isn't running correctly - https://phabricator.wikimedia.org/T337055 (10Ottomata) > There are plans to migrate this job from a systemd timer to Airflow, so it's possible th... [17:16:13] (DiskSpace) firing: Disk space stat1008:9100:/srv 5.882% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1008 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [17:39:50] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure: Send a critical alert to data-engineering if produce_canary_events isn't running correctly - https://phabricator.wikimedia.org/T337055 (10mforns) Yes, the ProduceCanaryEvents Airflow DAG would have a similar structure than the Refine one.... [17:43:20] 10Data-Engineering: Implement new AQS endpoints for Knowledge Gap metrics - https://phabricator.wikimedia.org/T337059 (10Milimetric) [17:45:11] 10Data-Engineering: Airflow job to load Knowledge Gap metrics into Cassandra - https://phabricator.wikimedia.org/T337060 (10Milimetric) [17:45:47] 10Data-Engineering, 10Data Pipelines (Sprint 13): Implement new AQS endpoints for Knowledge Gap metrics - https://phabricator.wikimedia.org/T337059 (10Milimetric) [17:46:07] 10Data-Engineering, 10Data Pipelines (Sprint 13): Airflow job to load Knowledge Gap metrics into Cassandra - https://phabricator.wikimedia.org/T337060 (10Milimetric) [18:18:39] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10BTullis) >>! In T337052#8865357, @mforns wrote: > Another reason is that we hopefully(2) won't need to exec... [18:40:46] (03PS1) 10Milimetric: Totally not real code [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921409 [18:41:08] (03Abandoned) 10Milimetric: Totally not real code [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921409 (owner: 10Milimetric) [18:47:05] (03CR) 10Milimetric: Add test for knowledge gap totals endpoint (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/915678 (owner: 10Nick Ifeajika) [18:50:21] (03CR) 10Milimetric: query finetuning (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/914799 (owner: 10Nick Ifeajika) [19:25:08] 10Data-Engineering, 10Data-Platform-SRE, 10Shared-Data-Infrastructure: Send a critical alert to data-engineering if produce_canary_events isn't running correctly - https://phabricator.wikimedia.org/T337055 (10Ottomata) > So yea, working on the ProduceCanaryEvents DAG first makes sense no? Ya! > Would movin... [19:40:16] 10Data-Engineering, 10Data-Platform-SRE, 10Observability-Alerting: Reduce IRC/alert noise associated with monitor_refine_ systemd timers from alertmanager - https://phabricator.wikimedia.org/T337052 (10Ottomata) > Will the refine job be running once per hour (per partition?), with a sensor for when the sourc... [19:49:10] (03PS1) 10Legoktm: mediawiki/recentchange: Add new 'notify_url' and 'title_url' fields [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/921414 [20:01:27] (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [20:12:45] 10Data-Engineering, 10Data-Engineering-Wikistats, 10I18n, 10Wikimedia-Hackathon-2023: Untranslatable strings in Wikistats - https://phabricator.wikimedia.org/T337032 (10Reedy) Seems these aren't too difficult to fix - https://gerrit.wikimedia.org/r/c/analytics/wikistats2/+/921160 [21:16:13] (DiskSpace) firing: Disk space stat1008:9100:/srv 5.641% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=stat1008 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:11:27] (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1001:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1001:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage [23:28:34] 10Data-Engineering, 10MediaWiki-extensions-EventLogging, 10Metrics-Platform-Planning, 10Patch-Needs-Improvement: Deprecate/delete the mw.eventLog.Schema class - https://phabricator.wikimedia.org/T305491 (10Pppery)