[00:19:29] 10Data-Engineering (Q3 2024 January 1st - March 31th), 03Abstract Wikipedia Fix-It tasks, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: WikiLambda metrics: Repair the allowlist for mediawiki_product_metrics_wikilambda_api; backfill missin... - https://phabricator.wikimedia.org/T384876#10566125 [00:20:36] 10Data-Engineering (Q3 2024 January 1st - March 31th), 03Abstract Wikipedia Fix-It tasks, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: WikiLambda metrics: Repair the allowlist for mediawiki_product_metrics_wikilambda_api; backfill missin... - https://phabricator.wikimedia.org/T384876#10566130 [00:20:57] 10Data-Engineering (Q3 2024 January 1st - March 31th), 03Abstract Wikipedia Fix-It tasks, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: WikiLambda metrics: Repair the allowlist for mediawiki_product_metrics_wikifunctions_ui; backfill miss... - https://phabricator.wikimedia.org/T384531#10566131 [00:21:43] 10Data-Engineering (Q3 2024 January 1st - March 31th), 03Abstract Wikipedia Fix-It tasks, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: WikiLambda metrics: Repair the allowlist for mediawiki_product_metrics_wikifunctions_ui; backfill miss... - https://phabricator.wikimedia.org/T384531#10566133 [00:24:56] 10Data-Engineering (Q3 2024 January 1st - March 31th), 03Abstract Wikipedia Fix-It tasks, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: WikiLambda metrics: Repair the allowlist for mediawiki_product_metrics_wikilambda_api; backfill missin... - https://phabricator.wikimedia.org/T384876#10566146 [00:25:09] 10Data-Engineering (Q3 2024 January 1st - March 31th), 03Abstract Wikipedia Fix-It tasks, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: WikiLambda metrics: Repair the allowlist for mediawiki_product_metrics_wikifunctions_ui; backfill miss... - https://phabricator.wikimedia.org/T384531#10566147 [04:03:30] 06Data-Engineering, 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10566472 (10EChukwukere-WMF) @mforns This is ready for testing correct? I might be pinging for questions , if and when I have any [05:12:42] 06Data-Engineering, 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10566541 (10EChukwukere-WMF) Also was this fix pushed to the QA environment we use for testing ? [08:42:22] 06Data-Engineering, 06Trust and Safety Product Team, 13Patch-For-Review, 10Product-Analytics (Kanban): Add mediawiki_product_metrics_incident_reporting_system_interaction to the sanitization allowlist - https://phabricator.wikimedia.org/T384650#10566825 (10cchen) @mforns - I've added you as a reviewer on t... [12:50:12] 14Analytics, 06Data-Engineering, 06Data-Engineering-Icebox: Count the number of video plays - https://phabricator.wikimedia.org/T198628#10567440 (10Bugreporter) [12:53:45] 14Analytics, 06Data-Engineering, 06Data-Engineering-Icebox: Count the number of video plays - https://phabricator.wikimedia.org/T198628#10567446 (10AndrewTavis_WMDE) Copying over the task description from the duplicate {T386916} from @jan-david.franke_WMDE: **Feature summary** (what you would like to be abl... [13:07:05] 14Analytics, 06Data-Engineering, 06Data-Engineering-Icebox: Count the number of video plays - https://phabricator.wikimedia.org/T198628#10567510 (10AndrewTavis_WMDE) I'd like to note one part of the above: > Through internal discussions at WMDE we believe that this metric could be derived from the base webr... [13:53:09] 06Data-Engineering, 06Research: Research airflow instance - https://phabricator.wikimedia.org/T386933 (10fkaelin) 03NEW [14:35:25] 06Data-Engineering, 06Research: Research airflow instance - https://phabricator.wikimedia.org/T386933#10567834 (10Ottomata) > we use a conda_env variable if we want to configure a custom env to use - e.g. by setting it to a gitlab url. On the k8s research instance this does not seem to work - connection timeou... [14:45:05] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Data Pipelines, 10Observability-Metrics, 10SRE Observability (FY2024/2025-Q3), 07Technical-Debt: migrate Data Platform Engineering maintained metrics from graphite to prometheus - https://phabricator.wikimedia.org/T372855#10567883 (10Ahoelzl) a:03... [14:45:43] 10Data-Engineering (Q3 2024 January 1st - March 31th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10567886 (10Ahoelzl) [14:46:29] 10Data-Engineering (Q3 2024 January 1st - March 31th): Integrate Spark with DataHub with lineage (non Data-Engineering Airflow instances) - https://phabricator.wikimedia.org/T386724#10567888 (10Ahoelzl) →14Duplicate dup:03T386862 [14:46:30] 10Data-Engineering (Q3 2024 January 1st - March 31th): Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10567890 (10Ahoelzl) [14:52:47] 06Data-Engineering: Migrate and re-deploy eventgate using new service-utils - https://phabricator.wikimedia.org/T361768#10567916 (10Ahoelzl) [14:58:03] 10Data-Engineering (Q3 2024 January 1st - March 31th): Airflow mapped tasks UI & metrics - https://phabricator.wikimedia.org/T357430#10567937 (10Ahoelzl) Moving this back to backlog. [14:58:19] 06Data-Engineering: Airflow mapped tasks UI & metrics - https://phabricator.wikimedia.org/T357430#10567938 (10Ahoelzl) [15:54:55] 06Data-Engineering, 06Research: Research airflow instance - https://phabricator.wikimedia.org/T386933#10568204 (10fkaelin) Yes by [[ https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/research/dags/article_quality_dag.py#L20 | default ]] the artifactory is used, the gitlab env is only... [16:44:50] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10568434 (10xcollazo) (A fix for T384625 has been... [17:24:31] 10Data-Engineering (Q3 2024 January 1st - March 31th): Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10568571 (10amastilovic) > Regarding the migration options, I like that option 1 is much more risk-free. Agreed. @brouberol @Ahoelzl and @Ottomata agree to... [17:28:14] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Data Pipelines, 10Observability-Metrics, 10SRE Observability (FY2024/2025-Q3), 07Technical-Debt: Disable Data Platform Engineering generated graphite metrics and dashboards - https://phabricator.wikimedia.org/T372855#10568582 (10Ottomata) [17:41:10] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Event-Platform: Gobblin-wmf Gitlab migration and maintenance - https://phabricator.wikimedia.org/T370368#10568634 (10Ahoelzl) p:05Triage→03Medium a:03amastilovic [17:43:41] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] Validate `pageview` and `unique_devices` generated from `webrequest_frontend` - https://phabricator.wikimedia.org/T386343#10568662 (10Ahoelzl) 05Open→03In progress p:05Triage→03High [18:00:01] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10568740 (10Ahoelzl) Related https://phabricator.... [18:07:14] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] HAProxy and VarnishKafka should produce compatible datasets - https://phabricator.wikimedia.org/T382571#10568764 (10Ahoelzl) [18:13:55] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Data-Platform-related-Mediawiki-Content-data, 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), and 2 others: DAG failing due to failure to ac... - https://phabricator.wikimedia.org/T386114#10568801 [18:26:52] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Data-Platform-related-Mediawiki-Content-data, 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), and 2 others: DAG failing due to failure to ac... - https://phabricator.wikimedia.org/T386114#10568865 [18:28:24] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] Fix HAProxy `uri_host` and `accept_language` differences with VarnishKafka - https://phabricator.wikimedia.org/T386354#10568867 (10Ahoelzl) p:05Triage→03High [18:30:21] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list January 2025 - https://phabricator.wikimedia.org/T384259#10568876 (10Ahoelzl) p:05Triage→03Medium [18:50:03] 10Data-Engineering (Q3 2024 January 1st - March 31th): Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10568938 (10Ahoelzl) Migration tracker: https://docs.google.com/spreadsheets/d/1SKlAv-oTKBhGM1duW7j8j_PFoRUQZozjQCOBhWySITE/edit?gid=1954854246#gid=195485424... [19:08:54] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Dumps 2.0 (Kanban Board): Implement alerting for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T384962#10568984 (10Ahoelzl) [19:09:01] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Mediawiki-Content: Implement alerting for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T384962#10568985 (10Ahoelzl) [19:10:52] 06Data-Engineering, 10Data Pipelines, 10DPE-Mediawiki-Content, 10Dumps-Generation, and 3 others: MediaWiki Dumps XML - Provide attribute to indicate that user is temporary account in exported content - https://phabricator.wikimedia.org/T365693#10568994 (10Ahoelzl) [19:11:03] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Optimize XML Dump code to be able to handle wikis from simplewiki to enwiki - https://phabricator.wikimedia.org/T381016#10568995 (10Ahoelzl) [19:12:24] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Modify code to dump all slots - https://phabricator.wikimedia.org/T384945#10569004 (10Ahoelzl) [19:12:31] 06Data-Engineering, 10DPE-Mediawiki-Content: Modify wmf_content.mediawiki_content_history_v1 to include slot origin - https://phabricator.wikimedia.org/T386211#10569006 (10Ahoelzl) [19:12:50] 06Data-Engineering, 10DPE-Mediawiki-Content: Investigate reasons for remaining inconsistencies - https://phabricator.wikimedia.org/T385112#10569011 (10Ahoelzl) [19:13:02] 06Data-Engineering, 10DPE-Mediawiki-Content: Add table maintenance for wmf_data_ops.data_quality_metrics - https://phabricator.wikimedia.org/T384744#10569012 (10Ahoelzl) [19:13:23] 06Data-Engineering, 10DPE-Mediawiki-Content: Investigate Flink app errors - https://phabricator.wikimedia.org/T384724#10569016 (10Ahoelzl) [19:15:04] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Consider writing Spark files to Ceph (S3) instead of Hadoop - https://phabricator.wikimedia.org/T384500#10569020 (10Ahoelzl) [19:15:14] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps 2.0: Airflow job to do monthly XML dumps - https://phabricator.wikimedia.org/T384381#10569021 (10Ahoelzl) [19:15:20] 06Data-Engineering, 10DPE-Mediawiki-Content: Airflow job to do monthly XML dumps - https://phabricator.wikimedia.org/T384381#10569022 (10Ahoelzl) [19:15:36] 06Data-Engineering, 10DPE-Mediawiki-Content: Stop using spark.jars.packages - https://phabricator.wikimedia.org/T375298#10569023 (10Ahoelzl) [19:15:57] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support, and with fixes to support Dumps 2.0 - https://phabricator.wikimedia.org/T338057#10569024 (10Ahoelzl) [19:16:04] 06Data-Engineering, 10DPE-Mediawiki-Content: Put together a DPE Deep Dive session on learnings from Dumps 2 XML generation code - https://phabricator.wikimedia.org/T384392#10569025 (10Ahoelzl) [19:16:20] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Refactor code to use new table and column names - https://phabricator.wikimedia.org/T384385#10569026 (10Ahoelzl) [19:16:38] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Modify XML dumping code to be able to do 'partial' dumps - https://phabricator.wikimedia.org/T384383#10569033 (10Ahoelzl) [19:16:49] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Productionization of code to dump in XML - https://phabricator.wikimedia.org/T384382#10569034 (10Ahoelzl) [19:17:19] 06Data-Engineering, 10DPE-Mediawiki-Content: Time partitioning for mediawiki_content_history - https://phabricator.wikimedia.org/T380773#10569041 (10Ahoelzl) [19:17:51] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Test if an existing conda environment with Spark 3.1.2 clients works fine with Spark 3.5.3 - https://phabricator.wikimedia.org/T380417#10569047 (10Ahoelzl) [19:18:07] 06Data-Engineering, 10DPE-Mediawiki-Content: Figure out how to send in file:// URIs to wmf-event-stream - https://phabricator.wikimedia.org/T380104#10569049 (10Ahoelzl) [19:18:17] 06Data-Engineering, 10DPE-Mediawiki-Content: Figure out how to unit test Iceberg tables - https://phabricator.wikimedia.org/T380101#10569050 (10Ahoelzl) [19:18:38] 06Data-Engineering, 10DPE-Mediawiki-Content: Implement a canary against MariaDB schema changes - https://phabricator.wikimedia.org/T379921#10569055 (10Ahoelzl) [19:18:48] 06Data-Engineering, 10DPE-Mediawiki-Content: Go over tasks in #dumps-generation and figure what makes sense to fix in Dumps 2.0 - https://phabricator.wikimedia.org/T379410#10569056 (10Ahoelzl) [19:19:18] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 10Discovery-Search (2025.02.10 - 2025.02.28), 07Epic, 13Patch-For-Review: EPIC: Update flink jobs to support Flink 1.20 - https://phabricator.wikimedia.org/T376812#10569057 (10Ahoelzl) [19:19:21] 06Data-Engineering, 10DPE-Mediawiki-Content: [Event Platform] We should alert on EventBus performance degradation. - https://phabricator.wikimedia.org/T375197#10569058 (10Ahoelzl) [19:19:35] 06Data-Engineering, 10Event-Platform: [NEEDS INVESTIGATION][BUG] eventutilities_python operator metrics - https://phabricator.wikimedia.org/T373112#10569059 (10Ahoelzl) [19:19:46] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines, 06Movement-Insights: Keep canonical_data.wikis updated - https://phabricator.wikimedia.org/T241741#10569060 (10Ahoelzl) [19:20:03] 06Data-Engineering, 10DPE-Mediawiki-Content: Consider whether we want to dump private wikis - https://phabricator.wikimedia.org/T371509#10569061 (10Ahoelzl) [19:20:32] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation, 06SRE, 07Epic: Dumps generation cause disruption to the production environment - https://phabricator.wikimedia.org/T368098#10569062 (10Ahoelzl) [19:20:38] 06Data-Engineering, 10DPE-Mediawiki-Content, 07Epic: Create a deprecation notice on "other dumps" - https://phabricator.wikimedia.org/T364855#10569063 (10Ahoelzl) [19:21:08] 06Data-Engineering, 10DPE-Mediawiki-Content: Use the Spark-Iceberg built in CDC mechanism to PoC a replacement for wikimedia_wikitext_current - https://phabricator.wikimedia.org/T366544#10569064 (10Ahoelzl) [19:21:24] 06Data-Engineering, 10CirrusSearch, 06Discovery-Search, 10DPE-Mediawiki-Content: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script - https://phabricator.wikimedia.org/T366248#10569065 (10Ahoelzl) [19:21:43] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation, 07Epic: Outreach to producers of "other dumps" to raise awareness about Dumps 2.0 and options for deprecation or migration - https://phabricator.wikimedia.org/T364856#10569066 (10Ahoelzl) [19:21:53] 06Data-Engineering, 10DPE-Mediawiki-Content: [SPIKE] Benchmark the run time of batch processing - https://phabricator.wikimedia.org/T379365#10569078 (10Ahoelzl) [19:22:06] 06Data-Engineering, 10DPE-Mediawiki-Content, 07Epic: Dumps 2.0 Phase III: Production level dumps - https://phabricator.wikimedia.org/T366752#10569079 (10Ahoelzl) [19:22:15] 06Data-Engineering, 10DPE-Mediawiki-Content: dumps_publish_wikitext_raw_to_xml DAG fails sporadically. Needs sensors to wait before running. - https://phabricator.wikimedia.org/T363941#10569080 (10Ahoelzl) [19:22:44] 06Data-Engineering, 06Data-Engineering-Icebox, 06cloud-services-team, 10Data-Services, 10Datasets-General-or-Unknown: Provide dumps using bittorrent - https://phabricator.wikimedia.org/T29653#10569081 (10Ahoelzl) [19:22:57] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation: SHA-256 digest for wiki dumps - https://phabricator.wikimedia.org/T363184#10569085 (10Ahoelzl) [19:23:13] 06Data-Engineering, 10Datasets-General-or-Unknown, 10DPE-Mediawiki-Content, 06Tech-Docs-Team, 07Documentation: Dumps documentation: revise and improve landing pages and navigation - https://phabricator.wikimedia.org/T348037#10569086 (10Ahoelzl) [19:23:34] 06Data-Engineering, 10DPE-Mediawiki-Content, 10Dumps-Generation: Wikipedia be-tarask is dumped as be_x_old - https://phabricator.wikimedia.org/T351785#10569087 (10Ahoelzl) [20:41:52] 06Data-Engineering, 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10569315 (10Eevans) >>! In T370470#10566541, @EChukwukere-WMF wrote: > Also was this fix pushed to the QA environment we use for testing ? By QA... [21:25:22] 06Data-Engineering, 06Research: A dataset sensor should work indepent of airflow instance - https://phabricator.wikimedia.org/T386973 (10fkaelin) 03NEW [21:27:26] 06Data-Engineering, 06Research: Research airflow instance - https://phabricator.wikimedia.org/T386933#10569394 (10fkaelin) 05Open→03Resolved a:03fkaelin Closing this as resolved for the gitlab part, the external sensor fix is tracked with T386973 [23:04:54] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Abstract Wikipedia team, 10function-evaluator, 10function-orchestrator, 13Patch-For-Review: WF service logging seems to be partially missing - https://phabricator.wikimedia.org/T386972#10569673 (10tchin) a:03tchin [23:34:18] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Data-Engineering-Icebox, 10Data-Engineering-Roadmap, 10Experimentation Lab (Experiment Platform Sprint 2), 03Google-Summer-of-Code-2025: Currency Equivalency - https://phabricator.wikimedia.org/T386984 (10Cain.micah) 03NEW