[00:34:06] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), and 4 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Niharika) \o/ [01:15:07] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:27:49] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: monitor_refine_event.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:03:39] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host an-test-worker1001... [07:34:16] (03PS11) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [07:36:57] (03PS12) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [07:39:53] (03PS13) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [07:46:39] (03PS14) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [07:47:42] (03CR) 10Aqu: "Thanks for the comments DCausse. I've updated the code. Waiting for the answers from the CI now." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) (owner: 10Aqu) [08:00:58] PROBLEM - puppet last run on an-presto1006 is CRITICAL: CRITICAL: Puppet last ran 1 day ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:06:06] RECOVERY - puppet last run on an-presto1006 is OK: OK: Puppet is currently disabled (Create presto cluster for perf testing - T329525 - nfraison), not alerting. Last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:09:07] FI, puppet alerts on an-presto are linked to the presto test we are doing to identify root cause of issues when adding 10 nodes to the prod cluster [09:15:25] (03PS15) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [09:21:56] (03CR) 10Joal: [C: 03+1] "Comments on comments :) LGTM" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) (owner: 10Aqu) [09:32:20] (03PS16) 10Aqu: Remove Guava from dependency [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) [09:33:10] (03CR) 10Aqu: "Thanks Joal." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) (owner: 10Aqu) [09:52:08] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10BTullis) >>! In T329363#8615911, @Halfak wrote: > Hi @BTullis! In the recent past (last 2 years), a lot of ORES... [10:02:29] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10elukey) >>! In T329363#8617588, @BTullis wrote: >>>! In T329363#8615911, @Halfak wrote: >> Hi @BTullis! In the r... [11:02:27] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: 13 new wikis missing from mediawiki_history - https://phabricator.wikimedia.org/T329119 (10EChetty) [11:04:02] 10Data-Engineering-Planning, 10Data Pipelines, 10Product-Analytics: 13 new wikis missing from mediawiki_history - https://phabricator.wikimedia.org/T329119 (10EChetty) p:05Triage→03High [11:23:07] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Create airflow v2 instance and supporting repos for search platform - https://phabricator.wikimedia.org/T327970 (10ops-monitoring-bot) Cookbook cookbooks.sre.ganeti.reimage started by bking@cumin1001 for host an-airflow1005.eqi... [11:28:31] 10Data-Engineering-Planning, 10Patch-For-Review, 10Shared-Data-Infrastructure (Shared-Data-Infra Sprint 08): Upgrade Hadoop test cluster to Bullseye - https://phabricator.wikimedia.org/T329363 (10BTullis) Great, thanks for that @elukey. That leads me neatly to my next issue of [[https://github.com/wikimedia... [11:46:31] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): migrate mjolnir application and dag to airflow v2 and spark3 - https://phabricator.wikimedia.org/T329239 (10BTullis) Hello all. Am I right in my thinking that: 1) when this task is finished you won't have any further need for... [12:01:06] (03CR) 10Joal: [C: 03+1] "LGTM! thanks milimetric :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/887869 (https://phabricator.wikimedia.org/T324482) (owner: 10Milimetric) [12:22:15] (03PS1) 10Milimetric: Migrate pageview dumps jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/889532 (https://phabricator.wikimedia.org/T324482) [12:29:57] (03CR) 10Joal: "Minor nits" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/889532 (https://phabricator.wikimedia.org/T324482) (owner: 10Milimetric) [13:52:24] 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 08), 10Patch-For-Review: Streaming services errors should be routed to an error event topic. - https://phabricator.wikimedia.org/T326536 (10Ottomata) I asked the mailing list about this need, and someone quickly confirmed it is a [[ https://i... [14:05:11] 10Data-Engineering-Planning, 10Data Pipelines (Sprint 08), 10Patch-For-Review, 10SecTeam-Processed, 10Vuln-VulnComponent: Upgrade Puppet code to make Airflow configuration files compatible with version 2.5.0 - https://phabricator.wikimedia.org/T315580 (10BTullis) [15:00:37] (03PS2) 10Milimetric: Migrate pageview dumps jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/889532 (https://phabricator.wikimedia.org/T329646) [15:01:15] (03CR) 10Milimetric: Migrate pageview dumps jobs (037 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/889532 (https://phabricator.wikimedia.org/T329646) (owner: 10Milimetric) [15:25:51] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/887371 (https://phabricator.wikimedia.org/T327074) (owner: 10Snwachukwu) [16:12:59] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work): Create airflow v2 instance and supporting repos for search platform - https://phabricator.wikimedia.org/T327970 (10ops-monitoring-bot) Cookbook cookbooks.sre.ganeti.reimage was started by bking@cumin1001 for host an-airflow1005... [16:59:46] (03PS3) 10Milimetric: Migrate pageview dumps jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/889532 (https://phabricator.wikimedia.org/T329646) [17:13:29] 10Data-Engineering-Planning, 10Data Pipelines, 10Discovery-Search (Current work), 10Patch-For-Review: Create airflow v2 instance and supporting repos for search platform - https://phabricator.wikimedia.org/T327970 (10BTullis) Hello, In preparation for the Airflow 2.5 upgrade I have created a CR to create... [17:58:19] (03CR) 10DCausse: Remove Guava from dependency (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/883118 (https://phabricator.wikimedia.org/T327072) (owner: 10Aqu) [19:21:30] (03PS4) 10Milimetric: Migrate pageview dumps jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/889532 (https://phabricator.wikimedia.org/T329646) [20:13:19] PROBLEM - Checks that the local airflow scheduler for airflow @search is working properly on an-airflow1005 is CRITICAL: CRITICAL: /usr/bin/env AIRFLOW_HOME=/srv/airflow-search /usr/lib/airflow/bin/airflow jobs check --job-type SchedulerJob --hostname an-airflow1005.eqiad.wmnet did not succeed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow [20:14:01] PROBLEM - Checks that the airflow database for airflow search is working properly on an-airflow1005 is CRITICAL: CRITICAL: /usr/bin/env AIRFLOW_HOME=/srv/airflow-search /usr/lib/airflow/bin/airflow db check did not succeed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow [22:05:31] 10Data-Engineering, 10SRE: Add backend field to webrequest Hive table - https://phabricator.wikimedia.org/T257354 (10BCornwall) [22:06:24] 10Data-Engineering, 10SRE: Add backend field to webrequest Hive table - https://phabricator.wikimedia.org/T257354 (10BCornwall) Untagging traffic as it seems like there's not much we can do here. Please feel free to retag us if that changes!