[00:45:20] PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:58:40] PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:08:12] PROBLEM - Check systemd state on an-worker1132 is CRITICAL: CRITICAL - degraded: The following units failed: export_smart_data_dump.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:40:36] RECOVERY - Check systemd state on an-worker1132 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:10:37] (03PS1) 10Phedenskog: navtiming: Add new metrics to allowlist for the navtiming schema. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/902571 [07:11:45] (03PS2) 10Phedenskog: navtiming: Add new metrics to allowlist for the navtiming schema. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/902571 [08:33:12] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 10): [SPIKE] tune memory and latency of mediawiki-event-enrichment on k8s - https://phabricator.wikimedia.org/T332166 (10gmodena) After a few long lasting runs on YARN and k8s all I can say is I see correlation (not necessarilly causation!) with OOMs an... [08:34:02] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 10): [NEEDS GROOMING] eventutilities-python: issue async requests from MapFunction context - https://phabricator.wikimedia.org/T332948 (10gmodena) [08:34:14] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 10): [NEEDS GROOMING] eventutilities-python: issue async requests from MapFunction context - https://phabricator.wikimedia.org/T332948 (10gmodena) a:03gmodena [08:39:28] 10Data-Engineering, 10Event-Platform Value Stream (Sprint 10): eventutilities-python: issue async requests from MapFunction context - https://phabricator.wikimedia.org/T332948 (10gmodena) [08:51:23] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate rdf_streaming_updater_reconcile.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329879 (10Gehel) 05Open→03Resolved [08:51:25] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:51:27] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate import_cirrus_indexes.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329873 (10Gehel) 05Open→03Resolved [08:51:29] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:51:33] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:51:36] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Migrate drop_old_data_daily.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329870 (10Gehel) 05Open→03Resolved [08:51:38] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate incoming_links.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329875 (10Gehel) 05Open→03Resolved [08:51:40] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:51:43] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:51:50] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:51:56] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate popularity_score.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329877 (10Gehel) 05Open→03Resolved [08:51:58] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:01] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate import_ttl.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329874 (10Gehel) 05Open→03Resolved [08:52:03] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:05] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:08] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): migrate mjolnir application and dag to airflow v2 and spark3 - https://phabricator.wikimedia.org/T329239 (10Gehel) 05Open→03Resolved [08:52:10] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:13] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Migrate export_queries_to_relforge.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329871 (10Gehel) 05Open→03Resolved [08:52:17] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:20] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:25] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Migrate ores_predictions.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329876 (10Gehel) 05Open→03Resolved [08:52:27] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:31] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): [Tracking] Migrate Search Airflow jobs to Airflow 2 and use shared supporting code from the data engineering Airflow - https://phabricator.wikimedia.org/T318414 (10Gehel) [08:52:33] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Migrate query_clicks.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329878 (10Gehel) 05Open→03Resolved [11:30:30] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Upgrade the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10BTullis) [11:30:32] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 10): Upgrade an-test-druid1001 to bullseye - https://phabricator.wikimedia.org/T332584 (10BTullis) 05Open→03Resolved a:03BTullis [12:00:48] (03PS1) 10Lgaulia: Add first input delay schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/902693 (https://phabricator.wikimedia.org/T332012) [12:45:49] 10Data-Engineering-Planning, 10Cloud-Services: Review and fix any bugs found in the automated bootstrap process for a ceph mon/mgr server - https://phabricator.wikimedia.org/T332987 (10BTullis) [12:56:57] 10Data-Engineering-Planning, 10cloud-services-team: Review and fix any bugs found in the automated bootstrap process for a ceph mon/mgr server - https://phabricator.wikimedia.org/T332987 (10taavi) [13:39:29] (03PS4) 10Nmaphophe: GDI Equity Landscape Tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/895737 [13:41:08] (03CR) 10Nmaphophe: GDI Equity Landscape Tables (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/895737 (owner: 10Nmaphophe) [13:46:10] (03CR) 10Joal: "Still two nits - hopefully ready by next commit" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/902068 (owner: 10Jennifer Ebe) [13:56:32] (03CR) 10Phedenskog: Add first input delay schema (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/902693 (https://phabricator.wikimedia.org/T332012) (owner: 10Lgaulia) [14:11:46] (03PS4) 10Snwachukwu: Copy add_partition hql script from Oozie to Hql folder. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/900389 (https://phabricator.wikimedia.org/T330200) [14:23:36] 10Data-Engineering-Planning, 10Data Pipelines, 10Epic: Support for Product Analytics Data Pipelines Migration to Airflow - https://phabricator.wikimedia.org/T332997 (10lbowmaker) [14:43:13] !log merged alertmanager rules for eventlogging checks being migrated from Icinga T309007 [14:43:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:43:16] T309007: Migrate eventlogging check_prometheus checks to alertmanager - https://phabricator.wikimedia.org/T309007 [14:44:03] heads up here folks. new checks are just the same rules moved over from Icinga, I'll keep an eye on them over next few hours [14:44:59] topranks: Many thanks indeed for making the changes. [14:47:10] 10Data-Engineering-Planning, 10Data Pipelines: Setup config to allow lineage instrumentation - https://phabricator.wikimedia.org/T333004 (10lbowmaker) [14:52:43] 10Data-Engineering-Planning, 10Data Pipelines: [NEEDS GROOMING] Support migration of simple (Hive > Hive) jobs - https://phabricator.wikimedia.org/T333006 (10lbowmaker) [15:32:37] 10Analytics-Radar, 10Data-Engineering-Icebox, 10Machine-Learning-Team, 10Patch-For-Review: Upgrade ROCm to 4.5 - https://phabricator.wikimedia.org/T295661 (10elukey) Hi @fkaelin, we'll definitely try to upgrade during the next quarter to the latest ROCm release :) [15:44:28] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Migrate mediawiki_revision_recommendation_create.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T330447 (10pfischer) a:03pfischer [15:44:58] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Migrate mediawiki_revision_recommendation_create.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T330447 (10pfischer) a:05pfischer→03EBernhardson [15:45:12] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Migrate search_satisfaction.py from airflow 1 to airflow 2 - https://phabricator.wikimedia.org/T329880 (10pfischer) a:05pfischer→03EBernhardson [16:11:29] (03CR) 10Snwachukwu: [C: 03+2] Copy add_partition hql script from Oozie to Hql folder. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/900389 (https://phabricator.wikimedia.org/T330200) (owner: 10Snwachukwu) [16:13:56] (03CR) 10Snwachukwu: [V: 03+2 C: 03+2] Copy add_partition hql script from Oozie to Hql folder. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/900389 (https://phabricator.wikimedia.org/T330200) (owner: 10Snwachukwu) [16:15:39] (03CR) 10Snwachukwu: [V: 03+2 C: 03+2] Copy add_partition hql script from Oozie to Hql folder. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/900389 (https://phabricator.wikimedia.org/T330200) (owner: 10Snwachukwu) [17:18:00] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure, 10Epic: Install Ceph Cluster for Data Engineering - https://phabricator.wikimedia.org/T324660 (10BTullis) [17:27:54] 10Data-Engineering-Planning, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 10): Deploy ceph mon and mgr processes to data-engineering cluster - https://phabricator.wikimedia.org/T330149 (10BTullis) These `mon` and `mgr` daemons are now deployed to the cluster. ` btullis@cephosd1001:~$ sudo ceph -s c...