[01:19:38] 06Data-Engineering, 10Event-Platform, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Deprecate use of desktop- and mobilewebuiactions in Event Platform - https://phabricator.wikimedia.org/T368678#10220398 (10Edtadros) [01:45:37] 06Data-Engineering, 10Event-Platform, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Deprecate use of desktop- and mobilewebuiactions in Event Platform - https://phabricator.wikimedia.org/T368678#10220416 (10Edtadros) a:05Edtadros→03Jdlrobson... [01:48:32] 06Data-Engineering, 10Event-Platform, 10MW-1.43-notes (1.43.0-wmf.27; 2024-10-15), 13Patch-For-Review, and 2 others: Delete redundant mobile- and desktopwebuiactions event in WikimediaEvents - https://phabricator.wikimedia.org/T376065#10220425 (10Edtadros) a:05Edtadros→03Jdlrobson @jdlrobson, this look... [07:29:13] btullis, brouberol o/ - qq: can we drop the docker images called "/wikimedia/datahub-*" ? [07:29:20] from the docker registry I mean [07:29:33] (in favor of the gitlab-based ones) [07:31:13] elukey: Yes, please feel free to go ahead. [07:36:29] super [07:36:54] I expanded the debmonitor reports to other docker images in the repo, I'll ping you folks for some java updates :) [07:37:06] (still the old ones, sooo painful to track on k8s) [07:39:24] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Add a spark global config for better file commit strategy - https://phabricator.wikimedia.org/T351388#10220566 (10JAllemandou) In the discussion above I made a mistake: I stated that the jobs fail while they don't. They generate corrupted data, as in a po... [07:51:53] Yes please! [08:32:06] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10220654 (10JAllemandou) This morning I started to backfill hours with missing data and monitored jobs. Since jobs... [08:46:54] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10220691 (10brouberol) That's good news @JAllemandou ! [08:52:54] 06Data-Engineering, 10Temporary accounts, 10Event-Platform: Prepare EventBus for temp accounts - https://phabricator.wikimedia.org/T374811#10220694 (10gmodena) > Feature checks a user name (possibly then checking if it is registered) We do check for user names for serializing events posted to EventGate in t... [09:11:34] 14Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board), 10Event-Platform, 13Patch-For-Review: [Event Platform] Instrument EventBus with prometheus MW Statslib - https://phabricator.wikimedia.org/T363587#10220714 (10gmodena) [11:14:48] 06Data-Engineering, 10Event-Platform: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#10220993 (10dr0ptp4kt) @MunizaA okay if feedback comes in a week or two (or even three)? Just wanted to know if it's on a time-sensitive critical path - folks are handling... [12:12:29] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop deprecated abuse filter fields on wmf wikis - https://phabricator.wikimedia.org/T367781#10221203 (10ABran-WMF) [13:15:28] 06Data-Engineering, 10Temporary accounts, 10Event-Platform: Prepare EventBus for temp accounts - https://phabricator.wikimedia.org/T374811#10221361 (10Ottomata) > EventFactory.php (deprecated since 0.5.0). The only implication I can think of is that the to-be-deprecated event streams, e.g. mediawiki.page-mo... [13:15:31] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Add a spark global config for better file commit strategy - https://phabricator.wikimedia.org/T351388#10221363 (10Ottomata) > actually we should have set the parameter by default on every job. Do you mean always? Or just when backfilling or writing in pa... [13:55:50] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10221458 (10xcollazo) >>! In T376882#10219432, @Ottomata wrote: > ... > So, it is likely that we have been experien... [13:57:32] 06Data-Engineering, 06Data-Platform-SRE, 06SRE Observability: [Data Platform] Install a Prometheus connector for Presto, pointed at thanos-query - https://phabricator.wikimedia.org/T347430#10221464 (10CDanis) @Ahoelzl why was this moved to "Radar (External Teams)" column? Per @BTullis's post, I think this w... [14:15:53] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10221514 (10Ottomata) I just [[ https://issues.apache.org/jira/browse/MAPREDUCE-7331?focusedCommentId=17888669&page... [14:21:29] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Some hours of webrequest are not refined entirely - https://phabricator.wikimedia.org/T376882#10221519 (10Ottomata) > the issue from this ticket started way before on Sept 9th as per T376882#10217047. Yes, bu... [14:24:27] (03PS1) 10Btullis: Update the smtp server settings for email from refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1079529 (https://phabricator.wikimedia.org/T325394) [14:37:05] 06Data-Engineering, 10Temporary accounts, 10Event-Platform: Prepare EventBus for temp accounts - https://phabricator.wikimedia.org/T374811#10221586 (10gmodena) Hey @Ottomata >> EventFactory.php (deprecated since 0.5.0). > The only implication I can think of is that the to-be-deprecated event streams, e... [15:08:02] 06Data-Engineering, 10Temporary accounts, 10Event-Platform: Prepare EventBus for temp accounts - https://phabricator.wikimedia.org/T374811#10221665 (10Ottomata) > I’d prefer consumers transition away from legacy streams rather than adding features to a deprecated code path/streams. +1 > FWIW I could not fin... [15:10:37] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10221670 (10Ottomata) [15:17:45] 06Data-Engineering, 10Data Pipelines: [Airflow Migration] Migrate reportupdater jobs - https://phabricator.wikimedia.org/T307540#10221684 (10mforns) a:05mforns→03None [15:21:41] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: Fail Spark job or airflow task if unexpected number of output files - https://phabricator.wikimedia.org/T377006 (10Ottomata) 03NEW [15:34:20] 10Quarry: [bug] Quarry queries are stopped - https://phabricator.wikimedia.org/T377010 (10Prototyperspective) 03NEW [15:48:48] 06Data-Engineering, 10Data Pipelines: Add support for repository artifacts in Airflow - https://phabricator.wikimedia.org/T322690#10221865 (10mforns) Oh my. Just realized I've been neglecting this task for months. Sorry for that. @Ottomata > @mforns in this workflow_utils MR, @amastilovic and I are consideri... [16:03:42] 06Data-Engineering, 10Data Pipelines: Add support for repository artifacts in Airflow - https://phabricator.wikimedia.org/T322690#10221917 (10Ottomata) Quick comment, but let's find time to discuss further! IIRC, the original intention was: - An Artifact has a Source and multiple Caches. - Sources and Caches... [16:06:12] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Add a spark global config for better file commit strategy - https://phabricator.wikimedia.org/T351388#10221919 (10Ottomata) 05Declined→03Open Let's reopen this then. [16:06:18] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Globally configure spark to use fileoutputcommitter.algorithm.version=1 to avoid concurrent write issues - https://phabricator.wikimedia.org/T351388#10221921 (10Ottomata) [16:10:16] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10221949 (10Ottomata) [16:10:18] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Globally configure spark to use fileoutputcommitter.algorithm.version=1 to avoid concurrent write issues - https://phabricator.wikimedia.org/T351388#10221950 (10Ottomata) [16:10:28] 14Data-Engineering (Sprint 5), 06Data-Platform-SRE: Globally configure spark to use fileoutputcommitter.algorithm.version=2 to avoid concurrent write issues - https://phabricator.wikimedia.org/T351388#10221953 (10Ottomata) [16:15:10] 06Data-Engineering, 10Data-Engineering-Jupyter, 10CAS-SSO, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Allow login to JupyterHub via CAS - https://phabricator.wikimedia.org/T260386#10222001 (10BTullis) I have written a proposal document (restricted to WMF) about how to im... [16:21:55] 06Data-Engineering, 10Data-Engineering-Jupyter, 10CAS-SSO, 06Infrastructure-Foundations, 10Data-Platform-SRE (2024.09.28 - 2024.10.18): Improve the JupyterHub services and use CAS/SSO - https://phabricator.wikimedia.org/T260386#10222015 (10BTullis) [16:28:31] 06Data-Engineering, 10Event-Platform: Add CI step to event schema repositories to test to fail if a schema is deleted - https://phabricator.wikimedia.org/T377023 (10Ottomata) 03NEW [16:39:53] 06Data-Engineering, 10Event-Platform: Add CI step to event schema repositories to test to fail if a schema is deleted - https://phabricator.wikimedia.org/T377023#10222130 (10xcollazo) +1 to do it at Gitlab CI time, as it would be a proactive, rather than reactive, measure. [18:01:12] 10Quarry: [bug] Quarry queries are stopped - https://phabricator.wikimedia.org/T377010#10222394 (10rook) It's been awhile since I've looked at that code. When I worked on it it was to have the stopped status appear when someone manually presses the "stop" button, I thought I added it just for that, but maybe it... [18:03:46] 10Quarry: [bug] Quarry queries are stopped - https://phabricator.wikimedia.org/T377010#10222415 (10Prototyperspective) @rook Yes, I did not press the stop button for any of the queries that were stopped and it only displays the above two lines and not any further info like some error code. Other example: https:/... [18:50:39] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10222517 (10Ottomata) @JAllemandou I started backfilling webrequest_actor_metrics_hourly but got a little confu... [19:22:12] (03CR) 10JHathaway: [C:03+1] Update the smtp server settings for email from refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1079529 (https://phabricator.wikimedia.org/T325394) (owner: 10Btullis) [19:42:12] 06Data-Engineering, 06Data-Platform, 10Temporary accounts: Add MW table 'cu_log' to data lake - https://phabricator.wikimedia.org/T364398#10222632 (10mpopov) **Update**: @Ahoelzl is going to sync with DPE about sqoop and will follow up after the group is aligned on making more MariaDB wiki replica data avail... [19:58:39] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10222640 (10Ottomata) @xcollazo helped me. I was using the Airflow CLI, and thought the right command was airfl... [20:01:46] 06Data-Engineering: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565#10222642 (10Mayakp.wiki) Unique devices could also be made more reliable from better bot detection T373630 [21:17:03] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10222778 (10xcollazo) OpsWeek note: We are getting the following failures: ` FAIL: refinery-drop-webrequest-ra... [21:21:49] 06Data-Engineering, 10Event-Platform, 10Web Team Essential Work 2024 (Migrate to new Event Platform), 10Web-Team-Backlog (FY2024-25 Q2 Sprint 1): Deprecate use of desktop- and mobilewebuiactions in Event Platform - https://phabricator.wikimedia.org/T368678#10222793 (10KSarabia-WMF) **Quick Update:** [[ htt... [21:26:36] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE, 06Movement-Insights: 2024-10-10 Data Loss Incident - webrequest Hive table - https://phabricator.wikimedia.org/T376882#10222815 (10xcollazo) I generated the new (temporary) checksum via: ` sudo -u hdfs bash cd /srv/deployment/anal...