[00:20:35] <icinga-wm>	 PROBLEM - Check systemd state on an-web1001 is CRITICAL: CRITICAL - degraded: The following units failed: hardsync-published.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:30:13] <icinga-wm>	 RECOVERY - Check systemd state on an-web1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:15:45] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:45:25] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:58:59] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:08:35] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:14:21] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:18:23] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:26:00] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:35:39] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:43:21] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:51:03] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:00:37] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:10:15] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:23:41] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[07:41:01] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1132 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[08:53:22] <joal>	 hi team, Naé is sick today, I'll keep her at home and will be mostly unavailable :S
[09:05:35] <elukey>	 <3
[10:53:16] <wikibugs>	 10Data-Engineering, 10Observability-Alerting, 10User-fgiunchedi: Migrate zookeeper prometheus checks from Icinga to Alertmanager - https://phabricator.wikimedia.org/T309012 (10fgiunchedi) Update on this: the SRE bits are done, what's left are zk alerts for 'analytics' Prometheus instance, namely for druid an...
[11:23:07] <wikibugs>	 (03PS1) 10Barakat Ajadi: PaintTiming: Move painttiming to navtiming schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/901169 (https://phabricator.wikimedia.org/T328256)
[12:36:12] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis)
[12:37:06] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade hadoop workers to bullseye - https://phabricator.wikimedia.org/T332570 (10BTullis)
[12:37:09] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[12:38:00] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[12:39:55] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Refresh hadoop coordinators an-coord100[1-2] with an-coord[3-4] - https://phabricator.wikimedia.org/T332572 (10BTullis)
[12:40:44] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[12:41:55] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade hadoop master to bullseye - https://phabricator.wikimedia.org/T332573 (10BTullis)
[12:42:19] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[12:56:56] <btullis>	 As part of the bullseye upgrade for the analytics cluster, we will lose python3.7.  https://gerrit.wikimedia.org/r/c/operations/puppet/+/901196 as I'm not aware of a requirement to forward-port it. 
[12:57:34] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade hadoop standby master to bullseye - https://phabricator.wikimedia.org/T332578 (10BTullis)
[12:58:16] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[13:02:58] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade an-launcher1002 to bullseye - https://phabricator.wikimedia.org/T332580 (10BTullis)
[13:03:21] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade an-launcher1002 to bullseye - https://phabricator.wikimedia.org/T332580 (10BTullis)
[13:06:47] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[13:28:20] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade an-test-druid1001 to bullseye - https://phabricator.wikimedia.org/T332584 (10BTullis)
[13:30:58] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[13:35:13] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade the druid-public cluster to bullseye - https://phabricator.wikimedia.org/T332589 (10BTullis)
[13:38:05] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis)
[14:43:53] <wikibugs>	 10Data-Engineering, 10Observability-Alerting, 10Patch-For-Review, 10User-fgiunchedi: Migrate Kafka prometheus alerts from Icinga to Alertmanager - https://phabricator.wikimedia.org/T309010 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi The Prometheus Kafka alerts have been migrated from Puppet / Ici...
[14:48:24] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure: Upgrade the druid-analytics cluster to bullseye - https://phabricator.wikimedia.org/T332604 (10BTullis)
[14:54:21] <wikibugs>	 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 10): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10BTullis) We have excluded spark2 and python 3.7 from bullseye builds.   The `hadoop-yarn-nodemanager` service is...
[15:22:34] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10SRE-OnFire, 10serviceops: Incident: 2022-12-09 api appserver worker starvation - https://phabricator.wikimedia.org/T324994 (10Joe) Removing the sustainability tag as it doesn't seem like there is any related actionable here. @Clement_Goubert if...
[15:25:31] <wikibugs>	 10Data-Engineering-Planning, 10Event-Platform Value Stream, 10SRE-OnFire, 10SRE-Sprint-Week-Sustainability-March2023, and 2 others: Uneven CPU throttling of eventgate-analytics under load - https://phabricator.wikimedia.org/T325068 (10Volans)
[15:31:21] <wikibugs>	 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 10): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10xcollazo) > Also the anacoda-wmf package isn't available in bullseye  As per https://wikitech.wikimedia.org/wiki/Data_Engineering/Syste...
[16:13:49] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate popularity_score.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329877 (10EBernhardson) needs https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/304 to properly pass t...
[17:26:24] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11): Spark Streaming Dumps POC: Backfill metadata table - https://phabricator.wikimedia.org/T323642 (10JArguello-WMF)
[17:40:35] <wikibugs>	 10Data-Engineering, 10Product-Analytics: Add log_search to monthly sqoop list - https://phabricator.wikimedia.org/T332621 (10nettrom_WMF)
[17:43:23] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:47:13] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:03:04] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: produce_canary_events.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:09:42] <wikibugs>	 10Data-Engineering, 10Data Pipelines (sprint 10): Differential privacy airflow-dags merge request - https://phabricator.wikimedia.org/T330234 (10JArguello-WMF) 05Open→03Resolved
[18:15:34] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:22:02] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[18:40:28] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1132 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:03:58] <wikibugs>	 10Data-Engineering-Planning, 10Data Pipelines: Review Superset permissions and assign roles as appropriate - https://phabricator.wikimedia.org/T328457 (10SNowick_WMF) Hi @BTullis can you please add @JTannerWMF (jtanner) to the `sql_lab` role so that she can access queries and dashboards? Thank you.