[02:04:48] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 13Patch-For-Review: Fix CommonsCategoryGraphBuilder to reflect latest changes to categorylinks table - https://phabricator.wikimedia.org/T404735#11196165 (10GFontenelle_WMF) @amastilovic: I've tested it and it looks like it's working normally now. Than... [06:59:11] (03CR) 10Joal: [C:03+1] "LGTM! Thanks a lot for changing this" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1187584 (https://phabricator.wikimedia.org/T401325) (owner: 10Aleksandar Mastilovic) [07:03:30] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11196403 (10dcausse) >>! In T401021#11195972,... [07:22:10] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11196439 (10achou) >>! In T401021#11196403, @d... [08:20:45] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Discovery-Search, 10Wikidata, 10Wikidata-Query-Service, and 2 others: [EPIC] Upgrade flink jobs to java 17 - https://phabricator.wikimedia.org/T404340#11196543 (10dcausse) I did a quick test using the search flink job, unfortunately it failed bec... [10:33:12] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Discovery-Search, 10Wikidata, 10Wikidata-Query-Service, and 3 others: [EPIC] Upgrade flink jobs to java 17 - https://phabricator.wikimedia.org/T404340#11196850 (10dcausse) With some tweaks of the java option we have the search flink job running j... [12:08:30] (03PS1) 10Joal: Fix bug in cassandra unique-devices loading [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1189854 (https://phabricator.wikimedia.org/T401666) [12:08:46] (03CR) 10Joal: [V:03+2 C:03+2] "Self merging bug correction" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1189854 (https://phabricator.wikimedia.org/T401666) (owner: 10Joal) [12:23:05] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Add data-rewrite to airflow iceberg maintenance defaults - https://phabricator.wikimedia.org/T404598#11197182 (10JAllemandou) a:03JAllemandou [12:24:21] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10Data-Engineering-Wikistats, 06Product-Analytics: Wikistats reports no mobile unique devices for Wikidata, MediaWiki.org, Wikifunctions - https://phabricator.wikimedia.org/T299559#11197196 (10JAllemandou) [12:24:32] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10Data-Engineering-Wikistats, 06Product-Analytics: Wikistats reports no mobile unique devices for Wikidata, MediaWiki.org, Wikifunctions - https://phabricator.wikimedia.org/T299559#11197197 (10JAllemandou) a:03JAllemandou [13:24:29] 06Data-Engineering, 06Data-Engineering-Radar, 06Discovery-Search, 06Infrastructure-Foundations, and 3 others: Elasticsearch dependency upgrade in spicerack - https://phabricator.wikimedia.org/T390860#11197463 (10elukey) >>! In T390860#11192639, @RKemper wrote: > Have pushed out various improvements to the... [13:51:00] 06Data-Engineering, 06Data-Engineering-Radar, 10CirrusSearch, 10Discovery-Search (2025.09.05 - 2025.09.26), and 2 others: SUP: Serde w/o RowTypeInfo - https://phabricator.wikimedia.org/T404597#11197549 (10pfischer) @Ottomata, we were hit by a breaking change in Flink's API when upgrading to 1.20. In order... [14:09:32] (03PS2) 10Xcollazo: Update changelog.md for v0.3.2 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1189588 (https://phabricator.wikimedia.org/T404735) (owner: 10Aleksandar Mastilovic) [14:10:25] (03CR) 10Xcollazo: [C:03+1] "Added v0.3.1 details." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1189588 (https://phabricator.wikimedia.org/T404735) (owner: 10Aleksandar Mastilovic) [14:15:22] (03PS3) 10Santiago Faci: Added `agent.ua_string` as a possible source when parsing user agent [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1186049 (https://phabricator.wikimedia.org/T385180) [14:18:17] (03PS4) 10Santiago Faci: Added `agent.ua_string` as a possible source when parsing user agent [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1186049 (https://phabricator.wikimedia.org/T385180) [14:28:45] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: MW Content ingest fails with MERGE INTO error - https://phabricator.wikimedia.org/T404975#11197646 (10xcollazo) Similar to T397525#10935336, we attempted the below query to unblock this pipeline, but it failed. Over at T397525#... [14:40:24] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 06Data-Persistence, 10Data-Persistence-Design-Review, 06Growth-Team, and 3 others: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task - https://phabricator.wikimedia.org/T401021#11197656 (10achou) >>! In T401021#11190788, @O... [16:09:08] (03CR) 10Aleksandar Mastilovic: [C:03+2] Update changelog.md for v0.3.2 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1189588 (https://phabricator.wikimedia.org/T404735) (owner: 10Aleksandar Mastilovic) [16:09:32] (03CR) 10Aleksandar Mastilovic: [V:03+2 C:03+2] Update changelog.md for v0.3.2 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1189588 (https://phabricator.wikimedia.org/T404735) (owner: 10Aleksandar Mastilovic) [16:33:15] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th): Optimize metrics computation for the MW Content Pipeline - https://phabricator.wikimedia.org/T401010#11197919 (10xcollazo) Thanks for the numbers @Antoine_Quhen. > monitor snapshot counts on Iceberg tables This is interesting. We could collect this as p... [16:44:06] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: MW Content ingest fails with MERGE INTO error - https://phabricator.wikimedia.org/T404975#11197939 (10xcollazo) Here is the script we put together to fix this: ` offending_rows = spark.sql(""" SELECT * FROM ( SELECT... [16:45:35] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06Infrastructure-Foundations, 13Patch-For-Review: proposal: allow analytics-admins to also trigger puppet runs - https://phabricator.wikimedia.org/T404630#11197940 (10BTullis) I wouldn't be surprised if Turnilo moves to k8s before long, s... [16:45:45] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: MW Content ingest fails with MERGE INTO error - https://phabricator.wikimedia.org/T404975#11197941 (10xcollazo) But of course, some of these duplicates made their way downstream to `wmf_content.mediawiki_content_current_v1`: `... [16:54:48] (03CR) 10Aleksandar Mastilovic: [C:03+2] Address review comments [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1187584 (https://phabricator.wikimedia.org/T401325) (owner: 10Aleksandar Mastilovic) [16:56:22] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: Another instance of duplicate rows on wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T404975#11197960 (10xcollazo) [17:20:46] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: Another instance of duplicate rows on wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T404975#11198021 (10xcollazo) Sample of duplicates from `wmf_content.mediawiki_content_current_v1`: ` spark.sql(... [18:47:11] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: Another instance of duplicate rows on wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T404975#11198200 (10xcollazo) This is how we defined the DELETE statement for `wmf_content.mediawiki_content_cur... [18:59:48] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: Another instance of duplicate rows on wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T404975#11198214 (10xcollazo) >>! In T404975#11195698, @xcollazo wrote: > Pausing all [[ https://airflow.wikimed... [19:06:05] 10Data-Engineering (Q1 FY25/26 July 1st - September 30th), 10DPE-Mediawiki-Content: Another instance of duplicate rows on wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T404975#11198223 (10xcollazo) Ingest for `mw_content_merge_events_to_mw_content_history_daily__spark_process_eve...