[00:11:52] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Draft a project plan for the Hadoop version 3 upgrade - https://phabricator.wikimedia.org/T379748#10719796 (10BTullis) I have finished a first draft of: [[https://docs.google.com/document/d/1P7-6UiSjURgudvum62qwQzRQ-7REUGl3OeNfRJxorSU/edit|Hado... [00:16:09] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python: Wmfdata: Spark session creation fails when there is a zombie session - https://phabricator.wikimedia.org/T367998#10719801 (10nshahquinn-wmf) [07:31:58] 10Data-Engineering (Q4 2025 April 1st - June 30th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10720343 (10brouberol) How did it go? [10:11:26] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720752 (10FCeratto-WMF) Completed sections: s6 [10:12:21] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720760 (10Ladsgroup) [10:27:09] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720829 (10Ladsgroup) [10:27:30] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720830 (10FCeratto-WMF) Started section s8 [10:30:43] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720837 (10FCeratto-WMF) [10:55:21] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Draft a project plan for the Hadoop version 3 upgrade - https://phabricator.wikimedia.org/T379748#10720879 (10BTullis) 05Open→03Resolved I'll resolve this now, since the draft plan has been written and has been shared for review. [11:05:05] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720901 (10FCeratto-WMF) [11:09:33] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10720922 (10FCeratto-WMF) [13:18:33] 06Data-Engineering, 06Data-Platform-SRE, 10Sustainability (Incident Followup): airflow: Consider restricting the rights for airflow deployers to destroy postgresql clusters - https://phabricator.wikimedia.org/T391348 (10BTullis) 03NEW [13:46:32] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Spike: Figure how best to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T366544#10721545 (10Milimetric) Good work. I wonder if flipping the problem around and writing the latest PK in some faste... [14:03:35] 06Data-Engineering, 10MediaWiki-DomainEvents, 10Event-Platform: EventBus: replace PageSaveCompleteHook with PageRevisionUpdateListener - https://phabricator.wikimedia.org/T390970#10721663 (10gmodena) [14:03:37] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 10Event-Platform: Port EventBus PageChangeHooks to Domain Events - https://phabricator.wikimedia.org/T390969#10721665 (10gmodena) [14:06:30] 06Data-Engineering, 10Observability-Tracing, 10Event-Platform: EventGate: Enable OpenTelemetry Propagation - https://phabricator.wikimedia.org/T391353 (10Ottomata) 03NEW [14:07:25] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 10Event-Platform: EventBus: replace PageSaveCompleteHook with PageRevisionUpdateListener - https://phabricator.wikimedia.org/T390970#10721740 (10gmodena) [14:08:44] 06Data-Engineering, 06Data-Platform-SRE, 10Sustainability (Incident Followup): airflow: Consider restricting the rights for airflow deployers to destroy postgresql clusters - https://phabricator.wikimedia.org/T391348#10721751 (10Gehel) p:05Triage→03High [14:10:48] 06Data-Engineering, 10Observability-Tracing, 10Event-Platform: EventGate: Enable OpenTelemetry Propagation - https://phabricator.wikimedia.org/T391353#10721760 (10Ottomata) [14:15:21] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 10Sustainability (Incident Followup): airflow: Consider restricting the rights for airflow deployers to destroy postgresql clusters - https://phabricator.wikimedia.org/T391348#10721794 (10Gehel) [14:17:37] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 10Event-Platform: Port EventBus PageChangeHooks to Domain Events - https://phabricator.wikimedia.org/T390969#10721802 (10gmodena) →14Duplicate dup:03T391254 [14:18:46] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation: Decomission dumps job `download_enterprise_htmldumps` - https://phabricator.wikimedia.org/T390556#10721807 (10xcollazo) Looks like we need to (cleanly) revert https://github.com/wikimedia/operations-puppet/commit/ced0ecef68b3d72ecedeb183701ed3d... [14:19:24] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 10Event-Platform: EventBus: replace PageSaveCompleteHook with PageRevisionUpdateListener - https://phabricator.wikimedia.org/T390970#10721809 (10gmodena) Things to keep in mind (input from @daniel ) re parent task: - We don't ne... [14:19:46] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10MediaWiki-DomainEvents, 07Epic, 10Event-Platform, and 2 others: Hypothesis 5.2.13: EventBus Adoption of Domain Events - https://phabricator.wikimedia.org/T391254#10721813 (10gmodena) [14:30:13] 06Data-Engineering, 10observability, 10Event-Platform: Data Platform, SRE Observability, overlaps, use cases, and potential - https://phabricator.wikimedia.org/T390323#10721922 (10akosiaris) [14:31:26] 06Data-Engineering, 10observability, 10Event-Platform: Data Platform, SRE Observability, overlaps, use cases, and potential - https://phabricator.wikimedia.org/T390323#10721931 (10Ottomata) [14:35:21] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Spike: Figure how best to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T366544#10721939 (10xcollazo) [14:46:55] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10721989 (10brouberol) This requires an airflow user to be created on the destination airf... [14:51:42] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10722006 (10brouberol) We're requesting state data about the `mw_content_merge_events_to_m... [14:52:45] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10722007 (10brouberol) That [worked](https://airflow-platform-eng.wikimedia.org/dags/test_... [14:57:59] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10722042 (10brouberol) {F59014994} all done! [15:09:25] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: EPIC: Trino/minIO/Hive-Standalone-Metaserver/Dagster/Metabase/Superset Implementation - https://phabricator.wikimedia.org/T377362#10722147 (10Jgreen) a:05Jgreen→03None Epic task, doesn't make sense for me... [15:09:54] 06Data-Engineering: Migrate and re-deploy eventgate using new service-utils - https://phabricator.wikimedia.org/T361768#10722154 (10Ahoelzl) [15:12:14] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 10Sustainability (Incident Followup): airflow: Consider restricting the rights for airflow deployers to destroy postgresql clusters - https://phabricator.wikimedia.org/T391348#10722174 (10BTullis) a:05BTullis→03None [15:16:05] 06Data-Engineering, 06Data-Platform-SRE, 10Data-Services: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10722200 (10Gehel) @Ahoelzl : could you validate if those tables should or should not be exposed? Is redaction needed? [15:21:02] 06Data-Engineering, 06Data-Platform-SRE, 10Data-Services: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10722274 (10Ahoelzl) @Bugreporter thanks for filing. Can you elaborate on the use cases? And also on priority? [15:29:16] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10722311 (10mforns) I tested the new commons-impact-analytics code and the new cassandra/data-gateway setup, both... [15:32:01] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Spike: Figure how best to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T366544#10722329 (10xcollazo) >>! In T366544#10721545, @Milimetric wrote: > Good work. I wonder if flipping the problem ar... [15:42:55] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10722407 (10mforns) During the testing of the Cassandra/DataGateway/AQS late sorting changes, an unrelated bug ca... [15:46:36] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10722416 (10xcollazo) Oh, wow, thanks @brouberol ! I am confused though: does this mean t... [16:30:40] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10722657 (10EChukwukere-WMF) >>! In T370470#10722311, @mforns wrote: > I tested the new commons-impact-analytics... [16:55:18] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10722718 (10mforns) > This matches with the results from the test scripts we wrote. Looks very good to me! Ship i... [16:59:22] (03PS1) 10Mforns: Fix commons impact metrics all-wikis cassandra loading queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1135077 (https://phabricator.wikimedia.org/T370470) [17:03:29] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10722731 (10EChukwukere-WMF) @mforns yes these results still match what is expected. The test scripts we wrote sh... [17:05:45] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: EPIC: Trino/minIO/Hive-Standalone-Metaserver/Dagster/Metabase/Superset Implementation - https://phabricator.wikimedia.org/T377362#10722737 (10IAckerman-WMF) a:03greg [17:43:29] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 10Dumps-Generation, 13Patch-For-Review: Decomission dumps job `download_enterprise_htmldumps` - https://phabricator.wikimedia.org/T390556#10722885 (10Ahoelzl) [17:43:53] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Observability-Tracing, 10Event-Platform: EventGate: Enable OpenTelemetry Propagation - https://phabricator.wikimedia.org/T391353#10722891 (10Ahoelzl) [17:47:01] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10722897 (10Ahoelzl) [18:28:26] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 10Dumps-Generation, 13Patch-For-Review: Decomission dumps job `download_enterprise_htmldumps` - https://phabricator.wikimedia.org/T390556#10723175 (10Ahoelzl) a:03amastilovic [18:39:17] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Canary failure on airflow platform_eng intsance after migrating to Kubernetes - https://phabricator.wikimedia.org/T390727#10723216 (10brouberol) Indeed! I had a reminder to create the whole 9 x 9 combination of u... [18:57:09] 10Data-Engineering (Q4 2025 April 1st - June 30th): Update webrequest and unique devices pipelines to derive access_method without m-dot domain - https://phabricator.wikimedia.org/T389696#10723278 (10mforns) Hi @Jdlrobson! > For example access_method="mobile web" and skin=vector-2022 is a very different experi... [20:18:02] 10Data-Engineering (Q4 2025 April 1st - June 30th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10723523 (10tchin) Ok so that wasn't it, now we get this error: ` 25/04/08 19:42:35 ERROR AsyncEventQueue: Listener DatahubSparkListener threw an exception datahub.s... [20:31:49] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Figure root cause of silent failures when computing metrics for mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T387033#10723634 (10tchin) >>! In T387033#10700469, @xcollazo wrote: > Reopening as we had anothe...