[04:02:08] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10502822 (10VirginiaPoundstone) @jebe-wmf are you also preparing a visualization dashboard (in Superset) and a findings and limitations report? Do you ne... [04:22:49] 06Data-Engineering, 06Trust and Safety Product Team, 10Product-Analytics (Kanban): Add mediawiki_product_metrics_incident_reporting_system_interaction to the sanitization allowlist - https://phabricator.wikimedia.org/T384650#10502827 (10cchen) [04:26:37] 06Data-Engineering, 06Trust and Safety Product Team, 10Product-Analytics (Kanban): Add mediawiki_product_metrics_incident_reporting_system_interaction to the sanitization allowlist - https://phabricator.wikimedia.org/T384650#10502828 (10cchen) [06:45:18] 06Data-Engineering, 10Dumps 2.0: Modify code to dump all slots - https://phabricator.wikimedia.org/T384945#10502939 (10Pppery) Commons is AFAIK the only current user of MCR. [08:45:44] 06Data-Engineering, 06Traffic, 13Patch-For-Review: Rollout haproxykafka on all hosts - https://phabricator.wikimedia.org/T378578#10503087 (10Fabfur) 05In progress→03Resolved [08:47:49] (03CR) 10Gmodena: [C:03+1] Keep event.mediawiki_page_change_v1 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1114803 (owner: 10DCausse) [09:42:27] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10503212 (10ValterVB) 05Resolved→03Open Problem not solved, happened again with the dumps of 23 January 2025. [11:38:39] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592#10503709 (10Marostegui) [11:49:12] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592#10503741 (10Marostegui) [13:07:03] 06Data-Engineering, 10MediaWiki-Core-Hooks, 10MediaWiki-DomainEvents, 06MW-Interfaces-Team, 10Event-Platform: Implement DomainEventDispatcher (baseline) - https://phabricator.wikimedia.org/T377229#10503931 (10Aklapper) [13:15:32] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), 13Patch-For-Review: Owners phpunit test does not work with subfolders - https://phabricator.wikimedia.org/T352472#10503989 (10phuedx) [13:16:58] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 10MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), 13Patch-For-Review: Owners phpunit test does not work with subfolders - https://phabricator.wikimedia.org/T352472#10503999 (10phuedx) 05Open→03Resolved a:03phuedx Being **bold**. [13:26:44] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: TLS connection for hive-standalone-metaserver with minio - https://phabricator.wikimedia.org/T385031 (10Jgreen) 03NEW [13:28:16] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: TLS connection for hive-standalone-metaserver with minio - https://phabricator.wikimedia.org/T385031#10504058 (10Jgreen) [13:37:53] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10504086 (10JEbe-WMF) I was not aware you could have a visualisation dashboard on superset, i am exploring this. Yes i could also use an extra set of eyes [14:12:08] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: TLS connection for hive-standalone-metaserver with minio - https://phabricator.wikimedia.org/T385031#10504383 (10Jgreen) [14:23:34] 06Data-Engineering, 10Dumps 2.0: Modify code to dump all slots - https://phabricator.wikimedia.org/T384945#10504423 (10xcollazo) >>! In T384945#10502939, @Pppery wrote: > Commons is AFAIK the only current user of MCR. Right, `slot_roles` tables agree: ` spark.sql(""" SELECT * FROM mediawiki_slot_roles WHERE s... [14:27:54] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10504441 (10Ottomata) I just tested Growth's HomepageVisit stream in beta:... [14:40:51] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10504623 (10xcollazo) I've applied same fix as in T383030#10449493. Verified that `itwiki` looks good at https://dumps.wikimedia.org/itwiki/20250120/. @ValterVB please verify. [14:43:03] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10504673 (10Sgs) Thanks for testing @Ottomata, I was testing a different u... [14:56:47] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10504765 (10Ottomata) Okay! For PHP submitted events: https://gerrit.wik... [15:13:30] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10504861 (10Alserv) Seems frwiki at https://dumps.wikimedia.org/frwiki/20250120/ has the same problem: "Partial dump" since many days, no progress visible. [15:15:38] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10504870 (10Alserv) Thanks ! Now frwiki is fine. [15:22:17] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10504927 (10mpopov) > EventLogging PHP extension is explicitly setting `ht... [15:28:14] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10504959 (10Ottomata) I think the reason it is done this way is that when... [16:00:05] (03PS13) 10Peter Fischer: Rewrite MediawikiDumper partitioning implementation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1101892 (https://phabricator.wikimedia.org/T381016) [16:05:57] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Optimize XML Dump code to be able to handle wikis from simplewiki to enwiki - https://phabricator.wikimedia.org/T381016#10505139 (10pfischer) The latest revision has shown it can process enwiki in approxima... [16:06:44] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Optimize XML Dump code to be able to handle wikis from simplewiki to enwiki - https://phabricator.wikimedia.org/T381016#10505147 (10pfischer) [16:30:18] (03CR) 10CI reject: [V:04-1] Rewrite MediawikiDumper partitioning implementation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1101892 (https://phabricator.wikimedia.org/T381016) (owner: 10Peter Fischer) [16:38:23] (03PS14) 10Peter Fischer: Rewrite MediawikiDumper partitioning implementation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1101892 (https://phabricator.wikimedia.org/T381016) [16:39:23] (03CR) 10Peter Fischer: "Thank you for your comments, I revised the code." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1101892 (https://phabricator.wikimedia.org/T381016) (owner: 10Peter Fischer) [17:08:35] (03CR) 10CI reject: [V:04-1] Rewrite MediawikiDumper partitioning implementation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1101892 (https://phabricator.wikimedia.org/T381016) (owner: 10Peter Fischer) [17:16:28] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Identify Internal Users of MediaWiki Wikitext Tables - https://phabricator.wikimedia.org/T383743#10505502 (10Snwachukwu) [18:00:21] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list January 2025 - https://phabricator.wikimedia.org/T384259#10505717 (10mforns) Looking! [18:45:19] !log Deploying latest DAGs to the analytics Airflow instance. T358375. [18:45:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:45:22] T358375: Declare wmf_content.mediawiki_content_history_v1 a production table - https://phabricator.wikimedia.org/T358375 [19:08:56] !log Ran the following to get rid of old data under wmf_dumps: T358375#10506036 [19:08:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:08:59] T358375: Declare wmf_content.mediawiki_content_history_v1 a production table - https://phabricator.wikimedia.org/T358375 [19:12:41] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10506064 (10Ottomata) Alright, deploying eventgate-analytics-external 1.9.... [19:19:29] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10506075 (10Ottomata) I've fully deployed eventgate-analytics-external v1.... [19:24:13] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: Enable Event Platform instruments to opt out of collecting User-Agent data - https://phabricator.wikimedia.org/T382173#10506080 (10Ottomata) [19:25:48] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform: [Event Platform] Disable default collection of user agent for analytics streams - https://phabricator.wikimedia.org/T384964#10506091 (10Ottomata) [19:26:01] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 10Event-Platform: [Event Platform] Disable default collection of user agent for analytics streams - https://phabricator.wikimedia.org/T384964#10506092 (10Ottomata) [19:28:53] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Event-Platform, 13Patch-For-Review: Upgrade eventgate-wikimedia to node20 - https://phabricator.wikimedia.org/T383814#10506099 (10Ottomata) ==== Status 2025-01-29 v1.10.0 has been released with node20. It is running in beta. I'd like to deploy to so... [19:40:14] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10506124 (10xcollazo) @tchin can we close thi... [19:51:16] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10506133 (10ValterVB) There are 2 skipped files on itwiki (also in other wiki): 2025-01-23 18:58:22 skipped All pages with complete edit history (.7z) 2025-01-23 18:58:22 skipped All pages with... [19:59:01] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10506159 (10xcollazo) >>! In T383030#10506133, @ValterVB wrote: > There are 2 skipped files on itwiki (also in other wiki): > 2025-01-23 18:58:22 skipped All pages with complete edit history (.7... [20:05:29] 06Data-Engineering, 10Dumps-Generation: Wikimedia Downloads not complete - https://phabricator.wikimedia.org/T383030#10506193 (10ValterVB) Oops, I didn't know that. Thanks [20:46:43] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06MediaWiki-Engineering, 10MediaWiki-General, 07Wikimedia-production-error: PHP Unknown error: EventLoggingLegacyConverter: Failed proxying legacy EventLogging event query string to WMF Event Platform ... - https://phabricator.wikimedia.org/T383939#10506289 [21:15:25] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Identify Internal Users of MediaWiki Wikitext Tables - https://phabricator.wikimedia.org/T383743#10506401 (10Snwachukwu) There aren't any hive query or script I found using these tables. The dumps are currently used by Research and Platfo... [21:46:23] 06Data-Engineering, 10Dumps 2.0: Investigate reasons for remaining inconsistencies - https://phabricator.wikimedia.org/T385112 (10xcollazo) 03NEW