[07:40:21] 14Analytics, 06Data-Engineering-Icebox, 10ContentTranslation, 10Language-analytics, and 2 others: Special:ContentTranslationStats is slow and getting crowded - https://phabricator.wikimedia.org/T325790#9836602 (10Nikerabbit) [07:44:00] 06Data-Engineering: Clickstream datasets only reference 'other' link type, no 'link' - https://phabricator.wikimedia.org/T366042 (10JAllemandou) 03NEW [07:44:06] 06Data-Engineering, 06Discovery-Search, 10Dumps-Generation: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043 (10dcausse) 03NEW [07:46:06] 10Data-Engineering (Q4 2024 April 1st - June 30th): Clickstream datasets only reference 'other' link type, no 'link' - https://phabricator.wikimedia.org/T366042#9836629 (10JAllemandou) a:03JAllemandou [07:46:20] 10Data-Engineering (Q4 2024 April 1st - June 30th): Clickstream datasets only reference 'other' link type, no 'link' - https://phabricator.wikimedia.org/T366042#9836627 (10JAllemandou) [07:53:01] !log manually rerun clickstream job for 2024-04 to pick up linktarget data that was not present at the moment it ran automatically (T366042) [07:53:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:53:05] T366042: Clickstream datasets only reference 'other' link type, no 'link' - https://phabricator.wikimedia.org/T366042 [08:31:47] 06Data-Engineering, 06Discovery-Search, 10Dumps-Generation: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9836704 (10BTullis) It might be related to {T325228} and this patch: 1029220: Move dumps::generation::worker::dumper_misc_crons_only role | https://gerrit.wi... [08:39:47] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9836734 (10Gehel) [08:59:21] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9836824 (10Lucas_Werkmeister_WMDE) [10:00:17] I'd switch an-test-druid1001 to nftables some time today, but it involves a reboot (the old iptables kernel modules can't be unloaded at run time), since it's a test host it's unlikely to be an issue, but let me know if I that would be a bad time [10:29:27] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837114 (10BTullis) I'm looking into this, but I haven't found an exact cause yet. Looking at `wikibase/wikidatawiki` first: I can see... [10:30:49] btullis: I have created a patch for later review: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1036614 [10:31:04] btullis: woops - should have said "Hi" first - my apologizes [10:31:19] joal: No worries :-) [10:31:58] FYI, I am on leave for most of this week, so I only have an hour or two now. Happy to review and merge this now, if it helps. [10:32:15] I am mainly working on T366043 at the moment. [10:32:16] T366043: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043 [10:32:52] ack btullis - do you need help with this? [10:32:57] Oh, I see it can't be deployed until after a refinery deploy. [10:33:22] correct btullis - I pinged you mostly for awareness and verification :) [10:33:51] btullis: I hope the refinery patches will be deployed today or tomorrow, so maybe we could deploy the puppet one later this week if ok for you [10:34:51] joal: I'm happy to +1, but as mentioned I will be out for the rest of the week, so it will have to be b.rouberol or s.tevemunene to help deploy at the correct time. [10:35:59] good for btullis - I'm happy to work this b.rouberol or s.tevemunene for review as well if you prefer [10:36:14] +me - good for *me* [10:36:31] Thanks <3 - As for help o [10:37:29] n the current issue, I fear that this might also fall on someone else's shoulders. I'm currently looking as to why new files appear on dumpsdata1006, but don't appear to be getting synced to clouddumps1002. [10:37:54] ack btullis [10:38:04] I'll add whatever I find to the ticket. [10:38:48] Ping if help is needed - dumps are complicated and we should be on this together. This message is actually valid for xcollazo as well [10:39:55] OK, if you have a few minutes then, shall we look together now? Batcave? [10:40:06] sure, joining! [11:08:58] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837306 (10BTullis) This is the list of jobs that run on this miscellaneous cron job host: - adds-changes... [11:13:12] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837319 (10BTullis) We have a patch to switch the NFS server from dumpsdata1006 to dumpsdata1003. However, it w... [11:40:52] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837365 (10BTullis) I believe that we may be able to fix this by running the following on dumpsdata1006: ` /usr/... [11:53:07] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837388 (10BTullis) A dry-run of that command looks like this: ` dumpsgen@dumpsdata1003:/data/otherdumps/wikibas... [11:54:13] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837389 (10BTullis) I have disabled puppet on dumpsadata1003 and temporarily disabled the `dumps-rsyncer` system... [11:57:38] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9837392 (10BTullis) Running the following command as the `dumpsgen` user, in a screen session, on dumpsdata1006.... [12:06:25] (03CR) 10Ottomata: "> These events require an ad-hoc jsonschema, because" [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1036272 (https://phabricator.wikimedia.org/T314956) (owner: 10Gmodena) [12:14:26] 06Data-Engineering: Add page-title to the x_analytics header - https://phabricator.wikimedia.org/T366004#9837456 (10Ottomata) This begs a question that @Milimetric and others have discussed for a while: Using webrequests to identify pageviews is error prone and computationally expensive. Could we emit a pagevi... [12:15:45] 14Analytics, 06Data-Engineering, 10Data-Engineering-Wikistats, 06Data Products: Pageviews complete dumps have lots of rows with article name = '-' - https://phabricator.wikimedia.org/T365321#9837459 (10Ottomata) > but as of now we have not devised a solution for this problem A solution: https://phabri... [13:27:40] 06Data-Engineering, 10Data Products (Data Products Sprint 14), 10Web-Team-Backlog (FY2023-24 Q4 Sprint 5): Follow-Up Ticket for QA: Validate Sample Rate Adjustments - https://phabricator.wikimedia.org/T365489#9837809 (10WDoranWMF) [14:08:58] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform, 10GitLab (Pipeline Services MigrationšŸ¤), 13Patch-For-Review: Migrate Data Engineering Pipelinelib repos to GitLab - https://phabricator.wikimedia.org/T344730#9838008 (10Ottomata) @Snwachukwu, for libraries, we should think about how we w... [14:59:34] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform, 10GitLab (Pipeline Services MigrationšŸ¤), 13Patch-For-Review: Migrate Data Engineering Pipelinelib repos to GitLab - https://phabricator.wikimedia.org/T344730#9838181 (10tchin) >>! In T344730#9838008, @Ottomata wrote: > Can we publish and... [15:19:21] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform, 10GitLab (Pipeline Services MigrationšŸ¤), 13Patch-For-Review: Migrate Data Engineering Pipelinelib repos to GitLab - https://phabricator.wikimedia.org/T344730#9838248 (10Ottomata) Nit: Can we rename https://gitlab.wikimedia.org/repos/da... [15:33:15] 10Data-Engineering (Q4 2024 April 1st - June 30th), 10Event-Platform, 10GitLab (Pipeline Services MigrationšŸ¤), 13Patch-For-Review: Migrate Data Engineering Pipelinelib repos to GitLab - https://phabricator.wikimedia.org/T344730#9838311 (10Ottomata) > Yes, that's actually what I do for service-utils Very c... [17:35:43] 07Analytics-Data-Problem, 06Data-Platform, 06Movement-Insights: Unique devices per country spikes on wikifunctions - https://phabricator.wikimedia.org/T364872#9839103 (10Pcoombe) By far the most shown CentralNotice campaign so far this year (by a factor of about 20) has been Wiki Loves Folklore which ran 1 F... [17:41:24] (03PS2) 10DCausse: search: add missing lexeme fields [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1036566 (https://phabricator.wikimedia.org/T365692) [18:21:05] 06Data-Engineering, 06Data-Platform-SRE, 06Discovery-Search, 10Dumps-Generation, 13Patch-For-Review: Some dumps are not available since mid may 2024 - https://phabricator.wikimedia.org/T366043#9839293 (10BTullis) {F54554249, width=80%} It looks like the https://dumps.wikimedia.org/other/categoriesrdf/dai... [18:26:47] 06Data-Engineering, 10Event-Platform: Orchestrate gobblin ingestion task with Airflow and config store. - https://phabricator.wikimedia.org/T361094#9839335 (10Ottomata) Gobblin uses ESC to discover streams to ingest. Given that [[ https://phabricator.wikimedia.org/T361853#9791692 | we will not be removing sup... [18:47:14] 06Data-Engineering, 10Data Pipelines: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#9839453 (10Ottomata) [18:47:16] 10Data-Engineering (Q4 2024 April 1st - June 30th), 13Patch-For-Review: [Refine refactoring] Extract refine schema management into a dedicated tool - https://phabricator.wikimedia.org/T356762#9839452 (10Ottomata) [18:57:56] 10Quarry: move testing off blubberoid - https://phabricator.wikimedia.org/T366107 (10rook) 03NEW [19:21:17] 10Quarry: tox - https://phabricator.wikimedia.org/T366112 (10rook) 03NEW [19:22:15] 10Quarry: tox - https://phabricator.wikimedia.org/T366112#9839688 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/44 [19:32:12] 10Quarry: move testing off blubberoid - https://phabricator.wikimedia.org/T366107#9839736 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/45 [19:34:27] 10Quarry: move testing off blubberoid - https://phabricator.wikimedia.org/T366107#9839743 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/45 [19:38:12] 10Quarry: tox - https://phabricator.wikimedia.org/T366112#9839746 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/44 [19:39:51] 10Quarry: move testing off blubberoid - https://phabricator.wikimedia.org/T366107#9839748 (10rook) 05Openā†’03Resolved [19:39:55] 10Quarry: tox - https://phabricator.wikimedia.org/T366112#9839749 (10rook) 05Openā†’03Resolved [19:47:14] a-team - dumps are broken, is it already known issue? [19:48:38] (also, topic links to phab kanban which is no longer in use...) [20:12:10] 06Data-Engineering, 10Observability-Logging, 06Traffic, 13Patch-For-Review: HAProxy should not log information we don't actually need - https://phabricator.wikimedia.org/T365566#9839808 (10Fabfur) 05Openā†’03Resolved [20:15:08] 10Quarry: [bug] Access denied for user 'quarry'@'172.16.2.72' (using password: NO) - https://phabricator.wikimedia.org/T365374#9839817 (10GTrang) 05Openā†’03Resolved a:03GTrang Bug appears to have been fixed. [20:18:34] (03PS6) 10Xcollazo: SQL queries that format the base Commons Impact Metrics datasets into the expected shape for the 14 Cassandra tables. [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1023461 (https://phabricator.wikimedia.org/T358707) [20:22:16] (03CR) 10Xcollazo: [V:03+2 C:03+2] "Patch set 6 resolves all issues found in review from Mforns." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1023461 (https://phabricator.wikimedia.org/T358707) (owner: 10Xcollazo) [21:32:33] 06Data-Engineering, 06Movement-Insights, 07Epic: [Data Quality] Implement basic data quality metrics for Unique Devices datasets - https://phabricator.wikimedia.org/T357833#9840184 (10Mayakp.wiki) As a part of SDS1.1.2, I have created a [[ https://docs.google.com/document/d/1d39qFJ26z8RtoS8LAgchPpgutFHMhX1WB... [22:04:37] (03CR) 10Ebernhardson: [C:03+2] search: add missing lexeme fields [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1036566 (https://phabricator.wikimedia.org/T365692) (owner: 10DCausse) [22:05:13] (03Merged) 10jenkins-bot: search: add missing lexeme fields [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/1036566 (https://phabricator.wikimedia.org/T365692) (owner: 10DCausse) [22:13:47] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting permissions for analytics-privatedata-users (with kerberos) for Mareike Heuer - https://phabricator.wikimedia.org/T364715#9840295 (10colewhite) [22:18:01] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting permissions for analytics-privatedata-users (with kerberos) for Mareike Heuer - https://phabricator.wikimedia.org/T364715#9840315 (10colewhite) a:05MareikeHeuerWMDEā†’03colewhite Added Data Engineering tag for provisioning...