[02:58:30] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10767223 (10Ottomata) It shouldn't! But the [[ https://gitlab.wikimedia.org/repos/data-e... [03:07:16] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10767232 (10Ottomata) Re the use of the `producers.eventgate.enrich_fields_from_http... [08:36:00] !log rerun airflow mediawiki_history_check_denormalize [08:36:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:36:30] !log rerun airflow mediawiki_history_metrics_monthly [08:36:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:00:06] 06Data-Engineering: Refine to Hive with Airflow – Kubernetes Resource Optimization - https://phabricator.wikimedia.org/T392668 (10Antoine_Quhen) 03NEW [09:00:36] 06Data-Engineering: Refine to Hive with Airflow – Kubernetes Resource Optimization - https://phabricator.wikimedia.org/T392668#10767601 (10Antoine_Quhen) [09:00:38] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10767602 (10Antoine_Quhen) [09:26:41] (03CR) 10Joal: [V:03+2 C:03+2] "Merge for later deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1136103 (https://phabricator.wikimedia.org/T391767) (owner: 10Gerrit maintenance bot) [09:45:36] (03CR) 10Joal: [V:03+2 C:03+2] "Merge for later deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1138395 (https://phabricator.wikimedia.org/T392499) (owner: 10Gerrit maintenance bot) [10:15:37] 10Data-Engineering (Q4 2025 April 1st - June 30th): Handle Late-Arrived Events from Gobblin into Airflow triggered Refine - https://phabricator.wikimedia.org/T370665#10767724 (10Antoine_Quhen) [10:15:42] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10767725 (10Antoine_Quhen) [12:14:19] 06Data-Engineering, 06Traffic, 10DPE HAProxy Migration, 13Patch-For-Review: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10767900 (10JAllemandou) Thanks @Fabfur ! When the data flows in, we need a schema change and a job modification on our side to make it appear... [13:30:59] 10Data-Engineering (Q4 2025 April 1st - June 30th): NEW BUG REPORT significantly increased edit revert rate for 2025-03 edits; Android, iOS, Mobile Web, Other - https://phabricator.wikimedia.org/T391708#10768164 (10JAllemandou) Copy pasting from slack: After the rerun of the MediawikiHistory job with less cores... [13:56:24] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10768243 (10tchin) If I'm reading this correctly, now we want in the stream config:... [13:58:31] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10768244 (10tchin) Also reading into this ticket more, any event that is sent that h... [14:40:00] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Create Airflow pipeline to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391283#10768370 (10xcollazo) >>! In T391283#10766162, @xcollazo wrote: > Two successful runs of the Airflow job so far.... [14:55:34] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Create table and pyspark job to produce wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391282#10768446 (10xcollazo) All done here, table now available in the data lake `wmf_content.... [15:15:48] 10Data-Engineering (Q4 2025 April 1st - June 30th): Handle Late-Arrived Events from Gobblin into Airflow triggered Refine - https://phabricator.wikimedia.org/T370665#10768491 (10Antoine_Quhen) Resuming work on this task, and after rereading the previous discussions, I propose the following plan: • We already gen... [15:17:15] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10768498 (10xcollazo) > So, if the source file size did not change, the artifact shouldn'... [15:21:25] 06Data-Engineering: Refine to Hive with Airflow – Switch Over plan - https://phabricator.wikimedia.org/T392696 (10Antoine_Quhen) 03NEW [15:22:13] 06Data-Engineering: Refine to Hive with Airflow – Switch Over plan - https://phabricator.wikimedia.org/T392696#10768536 (10Antoine_Quhen) [15:22:17] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10768537 (10Antoine_Quhen) [15:24:00] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10768554 (10Antoine_Quhen) [15:26:33] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10768570 (10Antoine_Quhen) [15:29:48] 10Data-Engineering (Q4 2025 April 1st - June 30th): Refine to Hive with Airflow – Handle Late-Arrived Events - https://phabricator.wikimedia.org/T370665#10768580 (10Antoine_Quhen) [15:30:18] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10768581 (10xcollazo) Oh, it may be Gitlab: ` curl -I -L https://gitlab.wikimedia.org/rep... [15:31:20] 10Data-Engineering (Q4 2025 April 1st - June 30th): Move more of refine_hive_hourly dag logic into RefineConfiguration - https://phabricator.wikimedia.org/T375064#10768588 (10Antoine_Quhen) [15:31:24] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10768589 (10Antoine_Quhen) [15:34:40] 10Data-Engineering (Q4 2025 April 1st - June 30th): [Refine Simplification] Remove Schema Merging in Refine Process by Enforcing Backward Compatibility - https://phabricator.wikimedia.org/T381072#10768595 (10Antoine_Quhen) [15:39:55] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10768610 (10Ottomata) ^ agree. I'm not sure if dropping the event in the various cas... [15:54:14] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10768668 (10Ottomata) Oh ho! Reminds me of {T348958} where we encountered this same prob... [16:00:45] (03PS14) 10Hasan Akgün (WMDE): Add Prometheus stats push [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) [16:01:07] (03CR) 10CI reject: [V:04-1] Add Prometheus stats push [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [16:01:54] (03PS15) 10Hasan Akgün (WMDE): Add Prometheus stats push [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) [16:02:56] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10768696 (10mpopov) > Validation happens after these field values are set, so you co... [16:03:51] (03PS16) 10Hasan Akgün (WMDE): Add Prometheus stats push [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) [16:05:48] 06Data-Engineering, 10DPE-Mediawiki-Content, 07Epic: Daily updated wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391279#10768703 (10Ahoelzl) [16:06:13] (03CR) 10Hasan Akgün (WMDE): "I have updated current PrometheusPushgateway to StatsdExporter and also updated all of the usages." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1136417 (https://phabricator.wikimedia.org/T389344) (owner: 10Hasan Akgün (WMDE)) [16:08:49] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10768716 (10xcollazo) >>! In T348958#9658469, @xcollazo wrote: > Ah, good find! > >>>! I... [16:24:09] 06Data-Engineering, 10Data Pipelines, 07Epic: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#10768784 (10Ahoelzl) [16:24:54] 10Data-Engineering-Roadmap, 10Data Pipelines, 07Epic: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#10768791 (10Ahoelzl) [16:25:05] 10Data-Engineering-Roadmap, 10Data Pipelines, 07Epic: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#10768792 (10Ahoelzl) 05Open→03In progress p:05Triage→03High [16:34:53] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10768829 (10xcollazo) Opened https://gitlab.com/gitlab-org/gitlab/-/issues/537696 for Git... [16:35:16] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Daily updated wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391279#10768835 (10Ahoelzl) [16:35:26] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Daily updated wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391279#10768837 (10Ahoelzl) 05Open→03In progress p:05Triage→03High [16:37:37] 10Data-Engineering (Q4 2025 April 1st - June 30th): Refine to Hive with Airflow – Handle Late-Arrived Events - https://phabricator.wikimedia.org/T370665#10768849 (10Ahoelzl) [16:37:38] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Event-Platform: Update event-producing tools to overwrite `meta.dt` - https://phabricator.wikimedia.org/T376026#10768848 (10Ahoelzl) [16:37:42] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10768850 (10Ahoelzl) [16:38:04] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10768851 (10dr0ptp4kt) [17:03:01] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 10GitLab (Upstream pit of despair 🕳️): Investigate artifact mismatch error when running mw_content_merge_events_to_mw_content_history_daily - https://phabricator.wikimedia.org/T391123#10768947 (10thcipriani) [17:59:58] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 10Dumps-Generation, 13Patch-For-Review: Decomission dumps job `download_enterprise_htmldumps` - https://phabricator.wikimedia.org/T390556#10769104 (10xcollazo) @brouberol helped us figure the patch was returning an empty PPC run (... [18:14:51] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Add data quality metrics to mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T392494#10769146 (10xcollazo) a:03xcollazo [18:19:18] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Modify table maintenance mechanism to support Iceberg's rewrite_position_delete_files() - https://phabricator.wikimedia.org/T391280#10769156 (10xcollazo) 05Open→03Resolved Successfull run in prod of `rewrite_position_delete_fi... [18:27:44] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: Migrate Gobblin to Airflow - https://phabricator.wikimedia.org/T390249#10769182 (10amastilovic) 05Open→03Resolved [19:16:25] (03PS1) 10Mforns: Prepare sqoop and CIM queries for mediawiki image table migration [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1139117 [19:22:34] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Experimentation Lab: FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10769333 (10tchin) > `subject_id` should be base64-decodable and its length prior to... [19:40:12] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Commons-Impact-Metrics, 13Patch-For-Review: Update Commons Impact Metrics to account for new File table - https://phabricator.wikimedia.org/T389800#10769387 (10mforns) There's also this change (that for some reason wasn't automatically posted by gerritbot... [22:51:04] 06Data-Engineering: Refine to Hive with Airflow – Update Refine Documentation on Wikitech - https://phabricator.wikimedia.org/T392697#10769784 (10Peachey88) [22:51:10] 06Data-Engineering: Refine to Hive with Airflow – Post-Migration Cleanup - https://phabricator.wikimedia.org/T392698#10769785 (10Peachey88)