[00:03:27] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Product-Analytics, 13Patch-For-Review: [SPIKE] Experiment with approaches for a incremental updates of MediaWiki data in the Data Lake - https://phabricator.wikimedia.org/T370354#10173808 (10Ottomata) Hm, something else to look into: https://github.c... [00:12:07] 06Data-Engineering: [SPIKE] Learn and document how to use Flink-CDC from MediaWiki MariaDB locally - https://phabricator.wikimedia.org/T373144#10173827 (10Ottomata) [00:55:35] 07Analytics-Data-Problem, 06Design-System-Team, 06MW-Interfaces-Team, 06Web-Team-Backlog, 07JavaScript: Send Api-User-Agent header from MediaWiki client-side code - https://phabricator.wikimedia.org/T373874#10173947 (10Jdlrobson) [03:57:31] 14Analytics, 06Data-Engineering, 10Observability-Logging, 06SRE, and 2 others: Integrate Event Platform and ECS logs - https://phabricator.wikimedia.org/T291645#10174094 (10Cpetrillo) This would be very useful for us to be able to understand if known problematic reusers (see: https://phabricator.wikimedia.... [07:09:15] (03CR) 10Kosta Harlan: "Sorry, I should have noted in the task -- I believe we are now supposed to use https://gitlab.wikimedia.org/repos/data-engineering/schemas" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1075329 (https://phabricator.wikimedia.org/T356105) (owner: 10Máté Szabó) [08:07:02] 06Data-Engineering, 06Java-Scala-Standardization, 10Release-Engineering-Team (Radar): Java projects hosted on Gerrit should publish artifacts to Gitlab - https://phabricator.wikimedia.org/T370400#10174364 (10Gehel) [08:07:08] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies are ... - https://phabricator.wikimedia.org/T367405#10174365 [08:07:18] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization, 07Epic: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#10174366 (10Gehel) [08:07:33] 06Data-Engineering, 06Java-Scala-Standardization, 10Release-Engineering-Team (Radar): Java projects hosted on Gerrit should publish artifacts to Gitlab - https://phabricator.wikimedia.org/T370400#10174369 (10Gehel) →14Duplicate dup:03T367405 [08:07:43] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Data-Platform-SRE, 06Discovery-Search, 06Java-Scala-Standardization: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all dependencies are ... - https://phabricator.wikimedia.org/T367405#10174371 [09:37:26] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 10Temporary accounts (Blockers to minor pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#10174701 (10Gehel) [09:43:52] !log root@an-test-worker1002:/tmp# find *_resources -type f -mtime +60 -exec rm {} \; [09:43:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:52:35] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 10Temporary accounts (Blockers to minor pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#10174752 (10BTullis) >>! In T347510#10147817, @kostajh... [09:54:56] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 10Temporary accounts (Blockers to minor pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#10174759 (10kostajh) >>! In T347510#10174752, @BTullis... [10:13:46] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 10Temporary accounts (Blockers to minor pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#10174824 (10BTullis) @kostajh I have run those queries... [10:18:21] 06Data-Engineering, 10Data-Platform-SRE (2024.09.06 - 2024.09.27), 10Temporary accounts (Blockers to minor pilot wiki deployment): Generate a list of Superset users affected by changes to IP masking/temp users - https://phabricator.wikimedia.org/T347510#10174832 (10BTullis) Please do let me know if there is... [11:20:20] 06Data-Engineering, 10[DEPRECATED] wdwb-tech, 10Citoid, 06Content-Transform-Team, and 9 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118#10175060 (10akosiaris) [13:27:20] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights, and 2 others: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables - https://phabricator.wikimedia.org/T356701#10175441 (10fkaelin) From a data/metrics usage perspective, the `user_is_anonymous` field seems t... [13:31:10] 06Data-Engineering, 06Traffic: Cookie % has been rejected because it is foreign and does not have the "Partitioned" attribute - https://phabricator.wikimedia.org/T375256#10175471 (10Tgr) This only affects cookies on cross-domain requests. Not sure if that's a problem. The logspam can be prevented by setting th... [13:45:14] 06Data-Engineering, 10Data Pipelines, 06Data Products, 10Dumps-Generation, and 2 others: MediaWiki Dumps XML - Provide attribute to indicate that user is temporary account in exported content - https://phabricator.wikimedia.org/T365693#10175611 (10xcollazo) Can we please describe the use case for this? [13:48:56] 06Data-Engineering, 10Dumps 2.0, 10Event-Platform: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341#10175651 (10gmodena) Spark supports both streaming as well as batch read/writes to Kafka: https://spark.apache.org/docs/3.5.3/structured-streami... [13:51:47] (03CR) 10Xcollazo: [C:03+2] Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) (owner: 10Milimetric) [13:57:18] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#10175692 (10github-toolforge-bot) supertassu closed https://github.com/toolforge/quarry/pull/70 [13:58:48] (03CR) 10Ottomata: Implement custom jdbc datasource (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) (owner: 10Milimetric) [13:59:18] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#10175693 (10taavi) 05Open→03Resolved a:03taavi [13:59:30] (03CR) 10Ottomata: Implement custom jdbc datasource (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) (owner: 10Milimetric) [14:00:00] 10Quarry: Quarry login fails due to redirect to plaintext HTTP URL - https://phabricator.wikimedia.org/T361471#10175706 (10LucasWerkmeister) Works for me now, thanks \o/ [14:03:24] 14Analytics, 06Data-Engineering, 10Observability-Logging, 06SRE, and 2 others: Integrate Event Platform and ECS logs - https://phabricator.wikimedia.org/T291645#10175713 (10Ottomata) This would also help with some analysis in {T375146} [14:04:40] (03Merged) 10jenkins-bot: Implement custom jdbc datasource [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1071624 (https://phabricator.wikimedia.org/T372677) (owner: 10Milimetric) [15:54:02] 14Analytics, 06Data-Engineering, 10EventStreams, 10Wikidata, and 3 others: Expose rdf-streaming-updater.mutation content through EventStreams - https://phabricator.wikimedia.org/T294133#10176252 (10dcausse) The work has been broken up into smaller tasks: - T374918: to define the schema that will be used an... [16:03:11] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights, and 2 others: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables - https://phabricator.wikimedia.org/T356701#10176296 (10Ottomata) @fkaelin in case you haven't seen: https://www.mediawiki.org/wiki/Talk:User... [16:12:44] 06Data-Engineering, 10Dumps 2.0, 10Event-Platform: [SPIKE] how can we support Spark producer/consumers in Event Platform - https://phabricator.wikimedia.org/T374341#10176322 (10Ottomata) > I would like to investigate if this tool can help: https://github.com/databricks-industry-solutions/json2spark-schema. A... [16:45:41] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Event-Platform, 13Patch-For-Review: Migrate Event Platform Schema Respositories to Gitlab - https://phabricator.wikimedia.org/T366836#10176475 (10Snwachukwu) - [[ https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines | Event Platform... [17:31:02] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights, and 2 others: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables - https://phabricator.wikimedia.org/T356701#10176856 (10fkaelin) @Ottomata thanks for sharing - the intricacies of naming/classifying the use... [17:35:16] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights, and 2 others: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables - https://phabricator.wikimedia.org/T356701#10176880 (10Ottomata) I think so. > FYI: Discussion about this has moved to phab:T337103. The le... [20:42:38] 06Data-Engineering, 10Research-engineering, 06Research-Freezer, 10Event-Platform: [Research Engineering Request] Productionized Edit Types - https://phabricator.wikimedia.org/T351225#10177591 (10Ottomata) > Fetch current and parent wikitext from API (or perhaps consume from page-change-based stream that al... [20:54:55] (03PS1) 10Aqu: Add useragent to eventlogging legacy fragement eventcapsule [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1075636 (https://phabricator.wikimedia.org/T356762) [20:59:23] (03PS2) 10Aqu: Add useragent to eventlogging legacy fragement eventcapsule [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1075636 (https://phabricator.wikimedia.org/T356762) [20:59:52] (03CR) 10CI reject: [V:04-1] Add useragent to analytics/legacy events [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1075637 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [22:10:05] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board): Implement Airflow Dataset class for RestExternalTaskSensor - https://phabricator.wikimedia.org/T372647#10177878 (10amastilovic) OK, so the schema I proposed in the Google Doc looks like this: ` iceberg_wmf_dummy_dataset: dat... [22:16:58] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board): Implement Airflow Dataset class for RestExternalTaskSensor - https://phabricator.wikimedia.org/T372647#10177893 (10amastilovic) = Unification of all the instance-specific datasets.yaml files = Currently, each instance specifie... [22:35:32] 06Data-Engineering, 06Data Products, 06Data-Platform, 06Movement-Insights, and 2 others: Temporary Accounts Initiative (IP Masking) - Add user_is_temp to data tables - https://phabricator.wikimedia.org/T356701#10177956 (10Mayakp.wiki) @fkaelin : I agree with your point. But our mandate here was to get an i...