[07:25:51] I imagine its the wrong time in the day to be asking this question here :D but... I'm looking at creating a initial run of some metrics around techncial contributor retention, and I feel like the best place to put some of the data might end up being the data lake. How would I go about asking if this would be "ok" tm... [07:27:34] This would be a collection of data from public sources of such activity, such as 1) IRC logs from wb-bot collected into a table of timestamp + username + channel 2) github, gitlab, gerrit, phabricator interactions such as timestamp, ID of object, type of action, username of actor 3) collected git repo infomation ie timestamp, commit hash, users / users of commit 4) mailing list emails + timestamps and email addresses 5) rss feed authors [08:06:23] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Persistence, 06DBA, and 4 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11736923 (10JAllemandou) > The big database import (sqoop) into the Data Lake starts on the first of each month at 05:00. The sqoop... [08:24:23] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline - https://phabricator.wikimedia.org/T420412#11736954 (10APizzata-WMF) >but I don't think we should automatically disable the filtering when we reach the valid_unti... [09:15:17] (03CR) 10A-pizzata: [V:03+2] change mapper-weight for localuser, fix mr comment [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1256302 (https://phabricator.wikimedia.org/T411116) (owner: 10A-pizzata) [11:12:50] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform: Fix PyFlink log levels - https://phabricator.wikimedia.org/T419997#11737709 (10JMonton-WMF) Here there is an example where a Flink application failed and the main reason for failure was reported as INFO: The log is quite big, and can... [11:18:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform, 13Patch-For-Review: PyFlink: Handle messages bigger than max.size - https://phabricator.wikimedia.org/T420448#11737715 (10JMonton-WMF) Even with the 19MB check + 1MB or margin, the application has failed due to a big message: https:... [11:57:30] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 06Test Kitchen, 07Technical-Debt: Deprecate and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210#11737814 (10Sfaci) > Update: T417510 is scheduled to go on prod by 19 March for all wikis and wil... [12:45:06] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Persistence, 06DBA, and 4 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11737959 (10Raine) >>! In T419980#11735868, @AtUkr wrote: > A mass deletion of categories has just been launched on ruwikinews, with... [12:48:53] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Persistence, 06DBA, and 4 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11737992 (10Raine) >>! In T419980#11737958, @Raine wrote: >>>! In T419980#11735868, @AtUkr wrote: >> A mass deletion of categories ha... [12:58:11] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 06Test Kitchen, 07Technical-Debt: Deprecate and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210#11738020 (10Sfaci) Related the above I have also confirmed that there are no validation errors in... [13:22:56] !log Test Kitchen edge-unique experiments (poll 16571) - adds: logged-out-retention-round4; removes: none; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [13:22:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:34:29] 06Data-Engineering, 10Data Pipelines: Add user_central_id to mediawiki_history and mediawiki_history_reduced Hive tables - https://phabricator.wikimedia.org/T365648#11738236 (10Ottomata) 05Open→03Resolved This was completed in Fall 2025. [14:18:31] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10MediaWiki-extensions-CentralAuth, 06MediaWiki-Platform-Team: CentralAuth's localuser table contains many nulls and duplicate mappings - https://phabricator.wikimedia.org/T411116#11738485 (10APizzata-WMF) Just ran the following with the watchful eye... [14:21:43] (03CR) 10Jforrester: "I think you have to self-merge in this repo." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1256413 (https://phabricator.wikimedia.org/T420615) (owner: 10Jforrester) [15:21:47] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 07Technical-Debt, 06Test Kitchen (Experiment Platform Sprint 21): Deprecate and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210#11738796 (10Sfaci) [15:22:04] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 07Technical-Debt, 06Test Kitchen (Experiment Platform Sprint 21): Deprecate and remove mw.eventLog.submitClick() - https://phabricator.wikimedia.org/T415210#11738799 (10Sfaci) a:05seanleong-WMDE→03None [15:22:27] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Test Kitchen: Logged in reader retention logging - https://phabricator.wikimedia.org/T420621#11738803 (10tchin) > we should cover as many wikis as feasible That itself could be its own task, but I'm assuming that 100% sampling on every wiki is *probab... [15:31:24] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform, 13Patch-For-Review: PyFlink: Handle messages bigger than max.size - https://phabricator.wikimedia.org/T420448#11738858 (10Ottomata) That is quite surprising! I highly doubt that 1MB margin is not enough. Maybe? Maybe there is some... [17:20:57] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline - https://phabricator.wikimedia.org/T420412#11739639 (10mforns) >>but I don't think we should automatically disable the filtering when we reach the valid_until dat... [17:38:12] 06Data-Engineering: when analyzing a Wikifunctions dump, parent_id in page creation revisions is sometimes 0 and sometimes None - https://phabricator.wikimedia.org/T420974 (10Amire80) 03NEW [17:54:34] (03CR) 10Mforns: [V:03+2 C:03+2] pageviews/allowlist: Add Abstract Wikipedia [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1256413 (https://phabricator.wikimedia.org/T420615) (owner: 10Jforrester) [17:54:49] (03CR) 10Mforns: [V:03+2 C:03+2] "Done" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1256413 (https://phabricator.wikimedia.org/T420615) (owner: 10Jforrester) [17:56:52] (03CR) 10Mforns: [V:03+2 C:03+2] "Thank you!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1256413 (https://phabricator.wikimedia.org/T420615) (owner: 10Jforrester) [18:46:16] (03PS6) 10Snwachukwu: Extend mediarequest Cassandra loads with poster/plays for video-requests API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1250005 (https://phabricator.wikimedia.org/T415202) [19:22:38] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline - https://phabricator.wikimedia.org/T420412#11740365 (10JAllemandou) What I like with the `valid_until` field is the possibility to keep old inactive records, in c... [19:41:16] !log Test Kitchen edge-unique experiments (poll 17690) - adds: none; removes: none; fields: attribution-research-short-baseline-run - xLab/MPIC/TK tips at https://w.wiki/FwuD [19:41:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:56:18] 06Data-Engineering: Load Google Search Console data into the Data Lake - https://phabricator.wikimedia.org/T420996 (10nshahquinn-wmf) 03NEW [20:04:00] 06Data-Engineering: Load Google Search Console data into the Data Lake - https://phabricator.wikimedia.org/T420996#11740693 (10nshahquinn-wmf) While this would be very useful for #movement-insights, from our perspective it's not top priority (unlike, for example, T418032). [20:08:18] (03PS7) 10Snwachukwu: Extend mediarequest Cassandra loads with poster/plays for video-requests API [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1250005 (https://phabricator.wikimedia.org/T415202) [20:08:41] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics_privatedata_users and SQL Lab for AnnieKim_WMDE - https://phabricator.wikimedia.org/T420500#11740702 (10Scott_French) [20:44:24] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics_privatedata_users and SQL Lab for AnnieKim_WMDE - https://phabricator.wikimedia.org/T420500#11740929 (10Ottomata) Approved. [20:53:55] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics_privatedata_users and SQL Lab for AnnieKim_WMDE - https://phabricator.wikimedia.org/T420500#11740960 (10Scott_French) 05Open→03Resolved a:03Scott_French Thanks, all! @AnnieKim_WMDE - Your [[ http... [21:01:02] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to data and Superset for Daria-WMDE (Daria Ammalainen (WMDE)) - https://phabricator.wikimedia.org/T420716#11740987 (10Scott_French) @Daria-WMDE - Great, thank you! Once the NDA comes through, I believe that should be everything we need to en... [21:05:34] (03CR) 10Mforns: Extend mediarequest Cassandra loads with poster/plays for video-requests API (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1250005 (https://phabricator.wikimedia.org/T415202) (owner: 10Snwachukwu) [21:05:55] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to superset for alice.moutinho - https://phabricator.wikimedia.org/T420751#11741004 (10Scott_French) @Alice.moutinho - Great, thank you - I see alicem LDAP account was created. Once the NDA comes through, I believe that should be everything... [21:13:31] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to Superset for keren.ramirezWMDE - https://phabricator.wikimedia.org/T420896#11741047 (10Scott_French) [21:21:13] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Implement list of JA3N-JA4H pairs to be tagged as automated into the bot detection pipeline - https://phabricator.wikimedia.org/T420412#11741062 (10mforns) @JAllemandou Ah! I understand now. Like: - Initially, we set valid_until to NULL. - Then, the day w... [21:30:16] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Research-engineering, 06Research-Freezer, 10Event-Platform: Update edit-type flink job with new schema - https://phabricator.wikimedia.org/T421005 (10AKhatun_WMF) 03NEW [22:19:02] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users level 3 for bvibber - https://phabricator.wikimedia.org/T420406#11741336 (10Scott_French) 05Open→03Resolved a:03Scott_French Alright, I think that should do it! @bvibber - The c...