[00:37:16] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Delete redundant mobile- and desktopwebuiactions event in WikimediaEvents - https://phabricator.wikimedia.org/T376065#10216054 (10Jdlrobson... [00:47:30] 10Quarry: Set query result retention time - https://phabricator.wikimedia.org/T360041#10216064 (10Base) If you do do this, it would be good to only remove the older runs results, but leave the most recent run result for each query, or as a less desirable alternative to keep those for only published queries (but... [06:16:48] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781#10216380 (10ABran-WMF) p:05High→03Medium [06:17:18] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781#10216382 (10ABran-WMF) [06:24:51] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781#10216390 (10ABran-WMF) [06:54:47] 06Data-Engineering, 06Data-Platform, 10Dumps-Generation, 06Trust and Safety Product Team, and 3 others: Hide autoblocks from the globalblocks table database dump - https://phabricator.wikimedia.org/T376726#10216450 (10MoritzMuehlenhoff) [06:58:18] 10Data-Engineering (Q2 2024 October 1st - December 31th), 10Data-Platform-SRE (2024.09.28 - 2024.10.18), 13Patch-For-Review: Update druid config to automatically drop unused segments - https://phabricator.wikimedia.org/T376118#10216466 (10JAllemandou) > Should I set this value to true so that we delete all u... [07:00:59] (03CR) 10Joal: [V:03+2 C:03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1078089 (https://phabricator.wikimedia.org/T376572) (owner: 10Gerrit maintenance bot) [07:05:19] (03CR) 10Joal: "One nit, almost good to go. Thanks @dandreescu@wikimedia.org" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1078733 (https://phabricator.wikimedia.org/T375527) (owner: 10Milimetric) [07:58:01] 06Data-Engineering, 10CirrusSearch, 06Data Products, 10MediaWiki-extensions-EventLogging, and 2 others: Error: Call to a member function getPageAsLinkTarget() on null - https://phabricator.wikimedia.org/T368543#10216600 (10Aklapper) Still an issue in `1.43.0-wmf.26` in the logs. Does #Data-Engineering tria... [09:55:45] 06Data-Engineering, 10CirrusSearch, 06Data Products, 10MediaWiki-extensions-EventLogging, and 3 others: Error: Call to a member function getPageAsLinkTarget() on null - https://phabricator.wikimedia.org/T368543#10216926 (10phuedx) p:05Triage→03Low [[ https://logstash.wikimedia.org/goto/0ab1e2fb9a4c028e... [09:55:46] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882 (10JAllemandou) 03NEW [09:55:48] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10216938 (10JAllemandou) p:05Triage→03Unbreak! [10:14:45] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10217000 (10JAllemandou) [10:20:26] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10217012 (10BTullis) [10:31:13] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10217047 (10JAllemandou) The problem is visible on HDFS: ` hdfs dfs -du -s -h /wmf/data/wmf/webrequest/webrequest_source=text/year=2024/mo... [10:47:00] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10217071 (10JAllemandou) No such pattern on raw data: ` hdfs dfs -du -s -h /wmf/data/raw/webrequest/webrequest_text/year=2024/month=10/day... [11:31:42] 14Analytics, 06Data-Engineering, 10Observability-Logging, 06SRE, and 2 others: Integrate Event Platform and ECS logs - https://phabricator.wikimedia.org/T291645#10217225 (10matmarex) I've been told that this project would let me process Logstash data with SQL queries, and I would like that very much. [12:47:46] 06Data-Engineering, 10Data Pipelines, 13Patch-Needs-Improvement: [Iceberg] Update Refine Sanitize to insert into Iceberg tables - https://phabricator.wikimedia.org/T311739#10217496 (10Aklapper) [12:48:30] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10217506 (10Ottomata) Wha? and the _SUCCESS flag is written even if the job is failing? Or...its not failing somehow? [13:17:51] 14Analytics, 06Data-Engineering, 10Observability-Logging, 06SRE, and 2 others: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL - https://phabricator.wikimedia.org/T291645#10217619 (10Ottomata) [13:19:10] 14Analytics, 06Data-Engineering, 10Observability-Logging, 06SRE, and 2 others: Produce ECS formatted logstash logs to Event Platform, allowing them to be queried in the WMF Data Lake with SQL - https://phabricator.wikimedia.org/T291645#10217616 (10Ottomata) [13:28:33] (03PS2) 10Milimetric: Shift is_redirect_to_pageview upstream to webrequest [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1078733 (https://phabricator.wikimedia.org/T375527) [13:28:38] (03CR) 10Milimetric: Shift is_redirect_to_pageview upstream to webrequest (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1078733 (https://phabricator.wikimedia.org/T375527) (owner: 10Milimetric) [13:35:19] (03CR) 10Joal: [C:03+2] "LGTM! We need some operations for this when we deploy, let's be careful :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1078733 (https://phabricator.wikimedia.org/T375527) (owner: 10Milimetric) [13:38:31] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781#10217747 (10ABran-WMF) [16:35:31] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Delete redundant mobile- and desktopwebuiactions event in WikimediaEvents - https://phabricator.wikimedia.org/T376065#10218648 (10Jdlrobson... [17:13:50] 06Data-Engineering, 10Event-Platform, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Deprecate use of desktop- and mobilewebuiactions in Event Platform - https://phabricator.wikimedia.org/T368678#10218756 (10KSarabia-WMF) [18:05:29] 06Data-Engineering, 10Data Products (Data Products Sprint 19): Add wikitech (labswiki) to the sqoop list - https://phabricator.wikimedia.org/T217792#10219034 (10VirginiaPoundstone) 05Open→03Resolved [18:08:08] 06Data-Engineering, 10Event-Platform: [NEEDS GROOMING] We should improve the code health of gobblin-wmf - https://phabricator.wikimedia.org/T370368#10219039 (10amastilovic) [18:15:59] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219063 (10Ottomata) Joseph says is related: {T347076} [19:06:17] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219256 (10Ottomata) From Joseph in [[ https://wikimedia.slack.com/archives/C05RHK7PS6Q/p1728584479417629?thread_t... [19:36:50] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Add a spark global config for better file commit strategy - https://phabricator.wikimedia.org/T351388#10219403 (10Ottomata) Updating paper trail. `max_active_runs_per_dag` was reset back to 3 in [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/100... [19:40:23] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219415 (10Antoine_Quhen) Yes, we suspect this issue is related to concurrency access on the... [19:49:00] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219432 (10Ottomata) I'm going to try to summarize the suspected problem from today's [[ htt... [19:50:22] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219436 (10Ottomata) @JAllemandou, If this is truly the problem, why don't we see an error l... [20:04:46] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Deprecate use of desktop- and mobilewebuiactions in Event Platform - https://phabricator.wikimedia.org/T368678#10219489 (10bwang) a:05bwan... [21:02:44] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219706 (10Ottomata) Discussed ^ with @Antoine_Quhen [[ https://wikimedia.slack.com/archives... [21:08:02] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights, 13Patch-For-Review: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10219738 (10Ottomata) [23:21:48] 06Data-Engineering, 10Data Pipelines: Add support for repository artifacts in Airflow - https://phabricator.wikimedia.org/T322690#10220210 (10amastilovic) @Ottomata @mforns I think we should expand the scope of this refactor to include redefining the relationships between `Artifact`, `ArtifactLocator`, `Artifa... [23:47:45] 06Data-Engineering, 10Event-Platform: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#10220274 (10MunizaA) Hi, I took a stab at this and was able to put together a job that enriches page change events by retrieving the HTML from [MW Rest API](https://www.med...