[10:20:28] 10Data-Engineering-Roadmap, 06Data-Platform-SRE, 10Dumps-Generation, 10MW-on-K8s, and 3 others: WE 5.4 KR - Hypothesis 5.4.4 - Q3 FY24/25 - Migrate current-generation dumps to run on kubernetes - https://phabricator.wikimedia.org/T352650#10631977 (10BTullis) [10:50:49] !log disable puppet on an-worker1[187-208] to bring them into the hadoop cluster in batches T388512 [10:50:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:50:53] T388512: Bring an-worker1[187-208] into the hadoop cluster - https://phabricator.wikimedia.org/T388512 [11:03:41] !log roll restart hadoop master hosts to pick up new hosts an-worker1[187-208] T388512 [11:03:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:03:44] T388512: Bring an-worker1[187-208] into the hadoop cluster - https://phabricator.wikimedia.org/T388512 [12:24:30] 10Data-Engineering (Q3 2025 January 1st - March 31th), 07Essential-Work: Support for 4.3.11 - webrequest based scraping detection - https://phabricator.wikimedia.org/T388721#10632311 (10XiaoXiao-WMF) [12:25:26] 10Data-Engineering (Q3 2025 January 1st - March 31th), 07Essential-Work: Support for 4.3.11 - webrequest based scraping detection - https://phabricator.wikimedia.org/T388721#10632322 (10XiaoXiao-WMF) [13:33:56] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Dumps-Generation, 06Growth-Team, 10GrowthExperiments, and 3 others: structured_data.commons_entity stuck at 2025-01-20 - https://phabricator.wikimedia.org/T387470#10632736 (10Cparle) Still stuck at `2025-01-20` [13:35:07] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10632739 (10Cparle) Still stuck at `2025-01-20`... [13:45:19] !log fail over the hadoop namenode services from an-master1004 to an-master1003 [13:45:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:48:46] !log restart the hadoop-hdfs-namenode service on an-master1004 to pick up the new hosts as well T388512 [13:48:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:48:49] T388512: Bring an-worker1[187-208] into the hadoop cluster - https://phabricator.wikimedia.org/T388512 [13:49:58] (03CR) 10Joal: [V:03+2 C:03+2] Add tl.wikisource to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1126977 (https://phabricator.wikimedia.org/T388654) (owner: 10Gerrit maintenance bot) [13:59:44] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#10632894 (10xcollazo) > But, IIUC, there is no SQL syntax for adding columns to struct types inside of arrays or map values. Iceberg does support it: ` spar... [14:02:33] 06Data-Engineering: When doing ADD COLUMN to a struct under a map, Iceberg fails to SELECT it - https://phabricator.wikimedia.org/T388793 (10xcollazo) 03NEW [14:02:53] 06Data-Engineering: When doing ADD COLUMN to a struct under a map, Iceberg fails to SELECT it - https://phabricator.wikimedia.org/T388793#10632915 (10xcollazo) [14:02:55] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Dumps 2.0 Phase III: Production level dumps - https://phabricator.wikimedia.org/T366752#10632916 (10xcollazo) [14:04:12] 06Data-Engineering: When doing ADD COLUMN to a struct under a map, Iceberg fails to SELECT it - https://phabricator.wikimedia.org/T388793#10632918 (10xcollazo) [14:04:13] 06Data-Engineering, 10DPE-Mediawiki-Content: Modify wmf_content.mediawiki_content_history_v1 to include slot origin - https://phabricator.wikimedia.org/T386211#10632919 (10xcollazo) [14:04:14] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Dumps 2.0 Phase III: Production level dumps - https://phabricator.wikimedia.org/T366752#10632920 (10xcollazo) [14:25:11] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10633010 (10xcollazo) >>! In T386255#10619623, @L... [15:17:12] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list February 2025 - https://phabricator.wikimedia.org/T387592#10633343 (10mforns) It's done! I re-ran the monthly jobs to recalculate the metrics for Februrary. Sorry for the delay! [15:39:44] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#10633499 (10Ottomata) `lang=sql ADD COLUMN revision_content_slots.value.origin_rev_id bigint; ` \( ゚ヮ゚)/ Where did you find that syntax?!?! Does it work... [16:34:41] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group for DSantamaria - https://phabricator.wikimedia.org/T388693#10633803 (10BCornwall) [17:01:10] PROBLEM - Webrequests Varnishkafka log producer on cp3074 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [17:05:10] RECOVERY - Webrequests Varnishkafka log producer on cp3074 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [17:06:37] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10633985 (10BTullis) > So the dumps are indeed mo... [17:40:47] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634110 (10EBernhardson) Pulling the last million events from `codfw.mediawiki.page_change.v1` and filtering for auth.wikimedai.org... [17:45:39] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634121 (10Ottomata) Responsible code: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventBus/+/refs/heads/m... [17:50:24] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634147 (10EBernhardson) The messages themselves come from a job in the `NewUserMessage` extension, but i don't see anything in tha... [17:52:14] 07Analytics-Data-Problem, 06Data-Engineering, 06Abstract Wikipedia team, 06Movement-Insights: Unique devices per country spikes on wikifunctions - https://phabricator.wikimedia.org/T364872#10634162 (10Ahoelzl) @Mayakp.wiki weird, we did comprehensively backfill, including Druid. Is there a way you can veri... [17:58:37] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634210 (10Ottomata) > referencing the expected database. this makes sense, as database is more akin to wiki_id, which is also cor... [18:00:13] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634217 (10Ottomata) Actually, this could be a problem for dumps 2 via page_content_change enrichment! Enrichment happens via a ac... [18:01:57] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list February 2025 - https://phabricator.wikimedia.org/T387592#10634231 (10GFontenelle_WMF) Thanks so much, @mforns! We really appreciate this! [18:02:08] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list February 2025 - https://phabricator.wikimedia.org/T387592#10634233 (10GFontenelle_WMF) 05Open→03Resolved [18:37:56] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#10634329 (10xcollazo) > Where did you find that syntax?!?! https://iceberg.apache.org/docs/nightly/spark-ddl/#alter-table-add-column > Does it work for arr... [19:04:06] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data Pipelines: Refine: Use Spark SQL instead of Hive JDBC - https://phabricator.wikimedia.org/T209453#10634453 (10Ottomata) Amazing. So we just need to get that bug fixed, convert everything to Iceberg, and then we can stop using JDBC! ;) BTW, I updated the... [19:33:26] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634514 (10xcollazo) >>! In T388825#10634217, @Ottomata wrote: > Actually, this could be a problem for dumps 2 via page_content_cha... [19:39:49] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Infrastructure-Foundations, 10netops: Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10634521 (10dr0ptp4kt) For keyword search later: This is manifesting in alerts with subject:"DiskSpace druid_... [19:43:28] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10634525 (10mforns) @Eevans Hey! Sorry for the delay (we are in the middle of the Airflow migration to Kuberne... [19:53:50] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10634549 (10xcollazo) > If you drop down the port... [20:01:36] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10634575 (10xcollazo) > I also thought I'd check... [20:03:41] 06Data-Engineering, 10Event-Platform: Some events in mediawiki.page_change.v1 refers to auth.wikimedia.org in meta.uri and meta.domain - https://phabricator.wikimedia.org/T388825#10634579 (10Ottomata) Huh! For the reconciliation. So, if meta.domain is bad in page_change, page_content_change will fail getting... [21:12:19] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10634722 (10Eevans) >>! In T370470#10620744, @Eevans wrote: > > [ ... ] > > Once we're good there, I think the... [21:21:36] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10634743 (10Eevans) And v1.0.12 of the Gateway image has been deployed: `lang=sh-session eevans@deploy2002:/s...