[00:37:21] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:37:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:45:19] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:07:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:09:25] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:15:51] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:17:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:36:43] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:37:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:46:21] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[01:51:11] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:52:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:00:35] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:02:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:06:51] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:07:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:16:27] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:17:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:21:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[02:36:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[02:37:05] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:37:52] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[02:45:07] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[02:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:05:57] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:07:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[03:15:37] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[03:17:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:35:59] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:37:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:45:37] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[04:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:28:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[05:35:27] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:37:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:42:03] <wikibugs>	 (03CR) 10Ayounsi: Created development/network/probe 1.0.0 schema (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/907508 (https://phabricator.wikimedia.org/T334417) (owner: 10Jameel Kaisar)
[05:45:39] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[05:48:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[06:07:27] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:07:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:15:29] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:17:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:21:53] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:22:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:31:23] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:32:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:36:05] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:37:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[06:45:43] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:19:02] <steve_munene>	 !log Decommission an-worker1132 from the Hadoop cluster for T333091 reimage
[08:19:04] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[08:19:05] <stashbot>	 T333091: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091
[08:26:29] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Icebox, 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ROCm to 4.5 - https://phabricator.wikimedia.org/T295661 (10elukey) I followed what outlined in https://github.com/RadeonOpenCompute/ROCm/issues/1125#issuecomment-925362329 and created the two fake packages, i...
[08:34:55] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:37:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:45:09] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:47:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:57:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[09:02:09] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster
[09:05:16] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:07:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (9) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:15:28] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:17:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[09:17:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:21:12] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:22:24] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster executed with errors: - an-wo...
[09:22:39] <wikibugs>	 10Data-Engineering, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui)
[09:22:51] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:24:17] <wikibugs>	 10Data-Engineering, 10DBA, 10cloud-services-team: Migrate wiki replicas (clouddb*) hosts to MariaDB 10.6 - https://phabricator.wikimedia.org/T334651 (10Marostegui) p:05Triage→03Medium #data-engineering when would it be a good time to get around 10-15 minutes downtime for clouddb1021? cc @BTullis  #cloud-...
[09:25:55] <wikibugs>	 10Data-Engineering, 10DBA: Migrate dbstore1005 to MariaDB 10.6 - https://phabricator.wikimedia.org/T334652 (10Marostegui)
[09:26:01] <wikibugs>	 10Data-Engineering, 10DBA: Migrate dbstore1005 to MariaDB 10.6 - https://phabricator.wikimedia.org/T334652 (10Marostegui) p:05Triage→03Medium
[09:27:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (10) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:30:39] <icinga-wm_>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:32:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (10) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[09:52:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (7) hadoop-yarn-nodemanager.service Failed on an-test-worker1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:06:03] <icinga-wm_>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) produce_canary_events.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:14:16] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:15:25] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster
[10:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:17:54] <elukey>	 steve_munene: o/
[10:18:04] <elukey>	 I checked in https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Cluster/Hadoop/Administration#HDFS_NameNode_Status_Interface and I don't see the node in decommissioned state
[10:18:24] <elukey>	 maybe the process didn't work?
[10:18:40] <elukey>	 not a big deal but after the reimage the node will try to get back in service
[10:19:08] <steve_munene>	 Could it be because it was shutoff?
[10:19:33] <elukey>	 in theory no, it is a setting for the namenode
[10:19:44] <elukey>	 let's check when it comes back up
[10:21:16] <steve_munene>	 cool
[10:22:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:25:43] <elukey>	 aqu: o/
[10:25:56] <elukey>	 is the above alert part of the oozie->airflow migration --^ ?
[10:30:16] <elukey>	 (going afk for lunch)
[10:31:54] <aqu>	 Thx I will check 
[10:32:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:37:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[10:47:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:22:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[11:30:25] <wikibugs>	 (03PS1) 10Urbanecm: Personalized praise: Add instrumentation for skipping suggestions [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/908520 (https://phabricator.wikimedia.org/T334300)
[11:32:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:28:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[12:38:56] <aqu>	 Here is a fix for the alert about check_webrequest_partitions https://gerrit.wikimedia.org/r/c/operations/puppet/+/908529/
[12:38:56] <aqu>	 We actually don't need to check for SUCCESS files anymore with Airflow.
[12:48:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[13:52:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:02:53] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:04:09] <wikibugs>	 (03PS4) 10Snwachukwu: Update pageview hourly and daily druid tables. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/906595 (https://phabricator.wikimedia.org/T334224)
[14:05:14] <wikibugs>	 (03CR) 10Snwachukwu: Update pageview hourly and daily druid tables. (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/906595 (https://phabricator.wikimedia.org/T334224) (owner: 10Snwachukwu)
[14:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:15:35] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Created development/network/probe 1.0.0 schema [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/907508 (https://phabricator.wikimedia.org/T334417) (owner: 10Jameel Kaisar)
[14:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:28:05] <wikibugs>	 (03CR) 10Snwachukwu: [V: 03+2 C: 03+2] Update pageview hourly and daily druid tables. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/906595 (https://phabricator.wikimedia.org/T334224) (owner: 10Snwachukwu)
[14:37:15] <jinxer-wm>	 (EventgateValidationErrors) firing: ...
[14:37:16] <jinxer-wm>	 eventgate-analytics-external stream eventlogging_SearchSatisfaction validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors
[14:43:41] <icinga-wm_>	 PROBLEM - eventgate-analytics-external validation error rate too high on alert2001 is CRITICAL: 5.192 gt 2 https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos
[14:52:15] <jinxer-wm>	 (EventgateValidationErrors) resolved: ...
[14:52:16] <jinxer-wm>	 eventgate-analytics-external stream eventlogging_SearchSatisfaction validation errors detected in past 15 min - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos - https://alerts.wikimedia.org/?q=alertname%3DEventgateValidationErrors
[14:52:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[14:53:17] <icinga-wm_>	 RECOVERY - eventgate-analytics-external validation error rate too high on alert2001 is OK: (C)2 gt (W)1 gt 0.6331 https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos
[15:02:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:22:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:23:30] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster executed with errors: - an-wo...
[15:27:57] <SandraEbele>	 !log deploying analytics refinery-update pageview druid table
[15:27:58] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:30:24] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster
[15:32:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[15:38:49] <elukey>	 ottomata: o/
[15:38:50] <elukey>	 around?
[15:41:46] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Icebox, 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ROCm to 4.5 - https://phabricator.wikimedia.org/T295661 (10elukey) Updated the docs, I was able to run tensorflow on dse-k8s-worker1001 successfully. The remaining issue is to add the proper users to the `ren...
[15:42:37] <SandraEbele>	 !log paused Oozie pageview-druid-hourly job.
[15:42:38] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[15:47:15] <wikibugs>	 10Analytics-Radar, 10Data-Engineering-Icebox, 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ROCm to 4.5 - https://phabricator.wikimedia.org/T295661 (10elukey) a:03elukey
[15:49:19] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster executed with errors: - an-wo...
[15:52:18] <wikibugs>	 10Analytics-Radar, 10Data-Engineering, 10Event-Platform Value Stream, 10SRE, 10Patch-For-Review: Allow kafka clients to verify brokers hostnames when using SSL - https://phabricator.wikimedia.org/T291905 (10elukey)
[15:58:53] <wikibugs>	 10Data-Engineering, 10SRE, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on an-worker1132 - https://phabricator.wikimedia.org/T333091 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by stevemunene@cumin1001 for host an-worker1132.eqiad.wmnet with OS buster
[16:15:59] <wikibugs>	 10Data-Engineering, 10Data Pipelines: Gather all data-purge into a single job - https://phabricator.wikimedia.org/T262201 (10JArguello-WMF)
[16:18:05] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Move Kafka Jumbo's TLS clients to the new bundle - https://phabricator.wikimedia.org/T296064 (10elukey)
[16:18:52] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream: Move Kafka Jumbo's TLS clients to the new bundle - https://phabricator.wikimedia.org/T296064 (10elukey) Re-added the Data Engineering tag to triage this task again (requires manual work from the team, see above). Should be quick and easy, the cluster has alrea...
[16:22:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:32:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:37:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:47:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:52:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:55:27] <jinxer-wm>	 (HiveServerHeapUsage) firing: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[17:02:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[17:27:47] <ottomata>	 elukey:  o/ sorry in meetings, just saw your ping
[17:45:49] <wikibugs>	 10Data-Engineering, 10Event-Platform Value Stream, 10Metrics-Platform-Planning, 10Product-Analytics, 10WMF-Architecture-Team: Major (API) versioning of Event Platform streams - https://phabricator.wikimedia.org/T332212 (10Ottomata) @BPirkle great!  How can we best collaborate and make this happen soonish...
[17:50:27] <jinxer-wm>	 (HiveServerHeapUsage) resolved: Hive Server JVM Heap usage is above 80% on an-coord1002:10100 - https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Alerts#Hive_Server_Heap_Usage - https://grafana.wikimedia.org/d/000000379/hive?panelId=7&fullscreen&orgId=1&var-instance=an-coord1002:10100 - https://alerts.wikimedia.org/?q=alertname%3DHiveServerHeapUsage
[18:10:10] <wikibugs>	 10Data-Engineering, 10serviceops, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10Ottomata) > I would suggest to experiment with proper values in DSE first (the charts values.yaml suggests 512Mi for...
[19:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[19:39:20] <wikibugs>	 10Data-Engineering, 10observability, 10Event-Platform Value Stream (Sprint 11): Produce requests to eventgate-logging-external in eqiad occasionally fail. - https://phabricator.wikimedia.org/T334510 (10Ottomata) Tried to do a little debugging today but not with much luck.  The "message timed out" is ultimate...
[20:22:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:32:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[21:37:15] <SandraEbele>	 !log Successfully Deployed analytics refinery using scap, then deployed onto hdfs.
[21:37:16] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[22:07:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[22:17:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:37:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[23:47:48] <jinxer-wm>	 (SystemdUnitFailed) firing: (8) check_webrequest_partitions.service Failed on an-launcher1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed