[00:00:34] 06Data-Engineering: Support experimentat platform backend migration to new service_utils - https://phabricator.wikimedia.org/T361770#10473123 (10Ahoelzl) [00:00:55] 06Data-Engineering: Support experimentat platform backend migration to new service_utils - https://phabricator.wikimedia.org/T361770#10473124 (10Ahoelzl) @tchin still an objective? [00:02:20] 14Analytics-Radar, 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-Recent-changes, and 3 others: Remove deprecated RCFeedEngine support - https://phabricator.wikimedia.org/T250628#10473150 (10Ahoelzl) [00:03:03] 06Data-Engineering, 06Data-Persistence, 06MediaWiki-Engineering, 10Temporary accounts, and 3 others: Define MediaWiki user types - https://phabricator.wikimedia.org/T336176#10473176 (10Ahoelzl) [00:07:48] 06Data-Engineering, 06Data-Engineering-Icebox: [Anomaly detection] Create a heatmap view in Superset - https://phabricator.wikimedia.org/T301572#10473252 (10Ahoelzl) [00:08:17] 06Data-Engineering, 06Data-Engineering-Icebox, 06Research-Freezer, 06Stewards-and-global-tools: Collect information about users affected by blocks - https://phabricator.wikimedia.org/T297051#10473254 (10Ahoelzl) [00:08:19] 06Data-Engineering, 06Data-Engineering-Icebox: Wikistats feature: run http server in npm run dev - https://phabricator.wikimedia.org/T296890#10473256 (10Ahoelzl) [00:23:21] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10MediaWiki-General, 10MediaWiki-Platform-Team (Radar), 13Patch-For-Review: Update Pingback to use the Event Platform - https://phabricator.wikimedia.org/T323828#10473288 (10Ahoelzl) 05Open→03Resolved [00:23:43] 10Data-Engineering (Q2 2024 October 1st - December 31th): Migrate Event Platform Schema Respositories to Gitlab - https://phabricator.wikimedia.org/T366836#10473290 (10Ahoelzl) 05Open→03Resolved [00:23:51] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10473292 (10Ahoelzl) 05Open→03Resolved [00:24:00] 10Data-Engineering (Q2 2024 October 1st - December 31th): Some Gobblin folders don't have `_IMPORTED` flags - https://phabricator.wikimedia.org/T376144#10473293 (10Ahoelzl) 05Open→03Resolved [00:24:13] 10Data-Engineering (Q2 2024 October 1st - December 31th): load haproxykafka topics into HDFS via gobblin - https://phabricator.wikimedia.org/T377931#10473294 (10Ahoelzl) 05Open→03Resolved [00:24:29] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: [HAProxy transition] Deploy a staging airflow dag for webrequest refinement - https://phabricator.wikimedia.org/T378342#10473296 (10Ahoelzl) 05Open→03Resolved [00:24:33] 10Data-Engineering (Q2 2024 October 1st - December 31th): Implement a data retention policy for webrequest_frontend datasets - https://phabricator.wikimedia.org/T379024#10473298 (10Ahoelzl) 05Open→03Resolved [00:24:44] 10Data-Engineering (Q2 2024 October 1st - December 31th): Fix `hdfs_usage` data size columns - https://phabricator.wikimedia.org/T381746#10473300 (10Ahoelzl) 05Open→03Resolved [00:25:01] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Observability-Tracing: service-utils helper for trace header propagation - https://phabricator.wikimedia.org/T371120#10473301 (10Ahoelzl) 05Open→03Resolved [00:25:42] 10Data-Engineering (Q2 2024 October 1st - December 31th): Implement automated deployment of refinery HQL files to HDFS (via blunderbuss) - https://phabricator.wikimedia.org/T365659#10473305 (10Ahoelzl) 05In progress→03Resolved [00:25:47] 10Data-Engineering (Q2 2024 October 1st - December 31th): Gitlab CI/CD Component for Blunderbuss - https://phabricator.wikimedia.org/T382348#10473307 (10Ahoelzl) 05Open→03Resolved [00:26:03] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10MediaWiki-Platform-Team (Radar): Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate - https://phabricator.wikimedia.org/T353817#10473308 (10Ahoelzl) 05Open→03Resolved [00:26:24] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06SRE: Streamline Data Platform access approvals for WMF staff - https://phabricator.wikimedia.org/T370424#10473310 (10Ahoelzl) 05Open→03Resolved [00:26:26] 10Data-Engineering (Q2 2024 October 1st - December 31th): Airflow should alert on task failure only after exhausting retries - https://phabricator.wikimedia.org/T377745#10473311 (10Ahoelzl) 05Open→03Resolved [00:26:39] 10Data-Engineering (Q2 2024 October 1st - December 31th), 13Patch-For-Review: [Refine refactoring] Refine jobs should be scheduled by Airflow: implementation - https://phabricator.wikimedia.org/T356762#10473312 (10Ahoelzl) 05Open→03Resolved [00:27:59] 06Data-Engineering: Set up Alerting for Data Quality dags in Airflow. - https://phabricator.wikimedia.org/T377333#10473314 (10Ahoelzl) [00:28:00] 06Data-Engineering: [Refine Refactoring] Refine Data Quality - late events, RefineMonitor refactor, etc. - https://phabricator.wikimedia.org/T377739#10473316 (10Ahoelzl) [00:28:01] 06Data-Engineering, 10Dumps 2.0: [Event Platform] We should alert on EventBus performance degradation. - https://phabricator.wikimedia.org/T375197#10473318 (10Ahoelzl) [00:28:03] 06Data-Engineering: [Refine Refactoring] Detect inactive event streams / Refine datasets using data recency thresholds - https://phabricator.wikimedia.org/T361498#10473322 (10Ahoelzl) [00:28:04] 06Data-Engineering: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables - https://phabricator.wikimedia.org/T356866#10473324 (10Ahoelzl) [00:28:06] 06Data-Engineering, 10CheckUser, 06DBA, 06Trust and Safety Product Team, 07Schema-change-in-production: Remove cuc_actiontext, cuc_only_for_read_old, and cuc_private from cu_changes on WMF wikis - https://phabricator.wikimedia.org/T370903#10473320 (10Ahoelzl) [00:28:09] 06Data-Engineering, 07Epic: [Epic] Migrate Data Engineering maintained NodeJS repositories to GitLab - https://phabricator.wikimedia.org/T366614#10473328 (10Ahoelzl) [00:28:13] 06Data-Engineering: Airflow RestExternalTaskSensor should be able to sense named dynamic mapped tasks - https://phabricator.wikimedia.org/T372644#10473330 (10Ahoelzl) [00:28:17] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization, and 2 others: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#10473326 (10Ahoelzl) [00:30:41] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Calculate rough HDFS storage requirements for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T383816#10473333 (10Ahoelzl) [00:30:42] 10Data-Engineering (Q3 2024 January 1st - March 31th): Identify Internal Users of MediaWiki Wikitext Tables - https://phabricator.wikimedia.org/T383743#10473335 (10Ahoelzl) [00:30:43] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Movement-Insights, 13Patch-For-Review: Test airflow cascading-reruns for projectview-hourly dependent jobs - https://phabricator.wikimedia.org/T383804#10473334 (10Ahoelzl) [00:30:44] 10Data-Engineering (Q3 2024 January 1st - March 31th): Migrate and re-deploy eventgate using new service runner - https://phabricator.wikimedia.org/T361768#10473336 (10Ahoelzl) [00:30:46] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Airflow-test] refine_to_hive_hourly_test.refine_hive_dataset.wait_for_gobblin_export is failing recurrently - https://phabricator.wikimedia.org/T382901#10473338 (10Ahoelzl) [00:30:47] 10Data-Engineering (Q3 2024 January 1st - March 31th): Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10473337 (10Ahoelzl) [00:30:50] 10Data-Engineering (Q3 2024 January 1st - March 31th): Haproxy kafka and varnishkafka produce compatible datasets - https://phabricator.wikimedia.org/T382571#10473339 (10Ahoelzl) [00:30:54] 10Data-Engineering (Q3 2024 January 1st - March 31th): Reduce `refine_to_hive_hourly` airflow task number - https://phabricator.wikimedia.org/T380856#10473341 (10Ahoelzl) [00:30:59] 10Data-Engineering (Q3 2024 January 1st - March 31th): Warning of mismatch in declarations of Webrequest schema - https://phabricator.wikimedia.org/T380916#10473340 (10Ahoelzl) [00:31:03] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10MediaWiki-DomainEvents, 06MW-Interfaces-Team, 05FY2024-25 KR 5.2 Simplify feature development, 07OKR-Work: Design and document new Domain Events feature in MediaWiki core - https://phabricator.wikimedia.org/T379959#10473342 (10Ahoelzl) [00:31:09] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Airflow Optimization] Reduce Overhead in Refine DAG by Precomputing Parameters - https://phabricator.wikimedia.org/T381073#10473346 (10Ahoelzl) [00:31:13] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10MediaWiki-DomainEvents, 06MW-Interfaces-Team: Explore a mechanism for publishing domain events to an event bus - https://phabricator.wikimedia.org/T379935#10473344 (10Ahoelzl) [00:31:19] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Refine Simplification] Remove Schema Merging in Refine Process by Enforcing Backward Compatibility - https://phabricator.wikimedia.org/T381072#10473347 (10Ahoelzl) [00:31:25] 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: [Refine DAG Improvement] Add Parameter to Reduce Spark Driver Logs in Skein Log Collection - https://phabricator.wikimedia.org/T381074#10473345 (10Ahoelzl) [00:31:29] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0, 10Data-Platform-SRE (2025.01.11 - 2025.01.31): Test if an existing conda environment with Spark 3.1.2 clients works fine with Spark 3.5.3 - https://phabricator.wikimedia.org/T380417#10473348 (10Ahoelzl) [00:31:35] 10Data-Engineering (Q3 2024 January 1st - March 31th): Find a way to make spark have an intermediate data materialization step before coalescing - https://phabricator.wikimedia.org/T379194#10473350 (10Ahoelzl) [00:31:39] 10Data-Engineering (Q3 2024 January 1st - March 31th): Write documentation on usage of RestExternalTaskSensor - https://phabricator.wikimedia.org/T378000#10473351 (10Ahoelzl) [00:31:43] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Discovery-Search, 10Data-Platform-SRE (2025.01.11 - 2025.01.31): Upload an image with flink-k8s-operator version that supports flink 1.20 - https://phabricator.wikimedia.org/T377137#10473352 (10Ahoelzl) [00:31:49] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Discovery-Search, 10Dumps 2.0, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 13Patch-For-Review: Add relevant kafka clusters to defined airflow connections in puppet - https://phabricator.wikimedia.org/T379676#10473349 (10Ahoelzl) [00:31:55] 10Data-Engineering (Q3 2024 January 1st - March 31th): Move more of refine_hive_hourly dag logic into RefineConfiguration - https://phabricator.wikimedia.org/T375064#10473355 (10Ahoelzl) [00:32:00] 10Data-Engineering (Q3 2024 January 1st - March 31th): Update event-producing tools to overwrite `meta.dt` - https://phabricator.wikimedia.org/T376026#10473354 (10Ahoelzl) [00:32:04] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Placeholder] Clean Up Corresponding Hive Tables After Deprecating Older Stream Configs - https://phabricator.wikimedia.org/T368800#10473358 (10Ahoelzl) [00:32:08] 10Data-Engineering (Q3 2024 January 1st - March 31th): [SPIKE] Learn and document how to use Flink-CDC from MediaWiki MariaDB locally - https://phabricator.wikimedia.org/T373144#10473357 (10Ahoelzl) [00:32:12] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0, 03Discovery-Search (Current work), 07Epic, 13Patch-For-Review: EPIC: Update flink jobs to support Flink 1.20 - https://phabricator.wikimedia.org/T376812#10473353 (10Ahoelzl) [00:32:16] 10Data-Engineering (Q3 2024 January 1st - March 31th): Migrate refinery HQL files to CI/CD supported GitLab repository - https://phabricator.wikimedia.org/T362832#10473361 (10Ahoelzl) [00:32:20] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Data-Platform-SRE, 06Movement-Insights: Fail Spark job or airflow task if unexpected number of output files - https://phabricator.wikimedia.org/T377006#10473360 (10Ahoelzl) [00:32:26] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Metrics Platform: Make jsonschema-tools merge values of enums when merging allOf - https://phabricator.wikimedia.org/T345317#10473362 (10Ahoelzl) [00:32:30] 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: Timeout hive-metastore locks - https://phabricator.wikimedia.org/T365563#10473359 (10Ahoelzl) [00:32:34] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Research, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 03Discovery-Search (Current work): Low available space on Hadoop / HDFS - https://phabricator.wikimedia.org/T381707#10473364 (10Ahoelzl) [00:32:40] 10Data-Engineering (Q3 2024 January 1st - March 31th): Airflow skips canary-event tasks - https://phabricator.wikimedia.org/T380836#10473370 (10Ahoelzl) [00:32:44] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Figure why XML Dump code generates 1000's of files for simplewiki - https://phabricator.wikimedia.org/T381016#10473368 (10Ahoelzl) [00:32:48] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 13Patch-For-Review: [SPIKE] Experiment with approaches for a incremental updates of MediaWiki data in the Data Lake - https://phabricator.wikimedia.org/T370354#10473366 (10Ahoelzl) [00:32:52] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Data Quality] Implement wiki completeness check for MediaWiki History - https://phabricator.wikimedia.org/T365203#10473374 (10Ahoelzl) [00:32:56] 10Data-Engineering (Q3 2024 January 1st - March 31th): Handle Late-Arrived Events from Gobblin into Airflow triggered Refine - https://phabricator.wikimedia.org/T370665#10473378 (10Ahoelzl) [00:33:00] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_dumps.wikitext_raw - https://phabricator.wikimedia.org/T357684#10473372 (10Ahoelzl) [00:33:04] 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: Publish Data Engineering maintained NodeJS packages to GitLab and use them in depender code - https://phabricator.wikimedia.org/T366612#10473376 (10Ahoelzl) [00:33:08] 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10473380 (10Ahoelzl) [00:33:12] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10473382 (10Ahoelzl) [00:33:16] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Developer Experience] Implement CI hql Linting - https://phabricator.wikimedia.org/T360967#10473386 (10Ahoelzl) [00:33:20] 10Data-Engineering (Q3 2024 January 1st - March 31th): Airflow mapped tasks UI & metrics - https://phabricator.wikimedia.org/T357430#10473384 (10Ahoelzl) [00:33:25] 10Data-Engineering (Q3 2024 January 1st - March 31th): [Data Quality] Improve Superset visualizations - https://phabricator.wikimedia.org/T372678#10473388 (10Ahoelzl) [00:33:29] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Data Pipelines, 10Data-Catalog: Upgrade to Spark 3.2 to support Spark lineage for Iceberg tables - https://phabricator.wikimedia.org/T378899#10473390 (10Ahoelzl) [00:33:33] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Movement-Insights: Temporarily Extend Retention Window for webrequest tables - https://phabricator.wikimedia.org/T375943#10473392 (10Ahoelzl) [00:33:37] 10Data-Engineering (Q3 2024 January 1st - March 31th): Replace service runner with a simplified library to better support metrics and debugging: service-utils - https://phabricator.wikimedia.org/T360924#10473396 (10Ahoelzl) [00:33:41] 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: Migrate and re-deploy eventstreams using service-utils - https://phabricator.wikimedia.org/T361769#10473394 (10Ahoelzl) [00:33:45] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Enable HA for the mw-content-history-reconcile-enrich flink application - https://phabricator.wikimedia.org/T375176#10473400 (10Ahoelzl) [00:33:49] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Data Pipelines, 10Data-Catalog, 13Patch-For-Review: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10473398 (10Ahoelzl) [00:33:53] 06Data-Engineering, 10Data-Platform-SRE (2025.01.11 - 2025.01.31): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10473402 (10Ahoelzl) [00:33:59] 06Data-Engineering, 06Traffic, 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10473406 (10Ahoelzl) [00:34:05] 06Data-Engineering, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 13Patch-For-Review: Upgrade Spark to a version with long term Iceberg support, and with fixes to support Dumps 2.0 - https://phabricator.wikimedia.org/T338057#10473404 (10Ahoelzl) [00:40:08] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic: HDFS capacity needs FY24/25 - https://phabricator.wikimedia.org/T384098 (10Ahoelzl) 03NEW [00:45:20] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Calculate rough HDFS storage requirements for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T383816#10473429 (10Ahoelzl) [00:45:22] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic: HDFS capacity needs FY24/25 - https://phabricator.wikimedia.org/T384098#10473430 (10Ahoelzl) [00:49:32] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs HTML dumps - https://phabricator.wikimedia.org/T384099 (10Ahoelzl) 03NEW [00:49:54] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs HTML dumps - https://phabricator.wikimedia.org/T384099#10473443 (10Ahoelzl) [00:49:56] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic: HDFS capacity needs FY24/25 - https://phabricator.wikimedia.org/T384098#10473444 (10Ahoelzl) [00:51:39] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs data platform - https://phabricator.wikimedia.org/T384100 (10Ahoelzl) 03NEW [00:51:55] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs data engineering and platform users - https://phabricator.wikimedia.org/T384100#10473457 (10Ahoelzl) [00:52:11] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs data engineering and platform users - https://phabricator.wikimedia.org/T384100#10473458 (10Ahoelzl) [00:52:11] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic: HDFS capacity needs FY24/25 - https://phabricator.wikimedia.org/T384098#10473459 (10Ahoelzl) [02:20:08] 06Data-Engineering, 06Data-Engineering-Icebox, 06Research-Freezer, 06Stewards-and-global-tools: Collect information about users affected by blocks - https://phabricator.wikimedia.org/T297051#10473516 (10Xaosflux) Regarding the user story in the description, the stewards primary VRT queue has been averaging... [21:37:48] 06Data-Engineering, 10Multi-Content-Revisions, 07Schema-change: Plain old contents - https://phabricator.wikimedia.org/T384130 (10Bugreporter) 03NEW