[01:38:04] 06Data-Engineering, 06Movement-Insights, 10Wmfdata-Python, 10GitLab (Project Migration): Move Wmfdata-Python from Github to Gitlab - https://phabricator.wikimedia.org/T304544#10248839 (10nshahquinn-wmf) [01:41:59] 06Data-Engineering, 06Movement-Insights, 10Wmfdata-Python, 10GitLab (Project Migration): Move Wmfdata-Python from Github to Gitlab - https://phabricator.wikimedia.org/T304544#10248853 (10nshahquinn-wmf) @fkaelin is interested in co-hosting a Wmfdata work session during the Research and Data Science offsite... [06:42:09] (03Abandoned) 10Aqu: Update Refine smtp server - backport to 0.2.49 [analytics/refinery/source] (0.2.49) - 10https://gerrit.wikimedia.org/r/1081914 (https://phabricator.wikimedia.org/T377698) (owner: 10Aqu) [09:00:01] 10Data-Engineering (Q2 2024 October 1st - December 31th), 06Data-Platform-SRE: Refine jobs fail to send alert emails due to a decommissioned MX - https://phabricator.wikimedia.org/T377698#10249268 (10gmodena) [09:02:19] 06Data-Engineering, 10Proton, 10Recommendation-API, 10Event-Platform, 07events: WikiKube: Rename the last few "production" named helm releases to use "main" instead - https://phabricator.wikimedia.org/T377805 (10akosiaris) 03NEW [09:03:05] 06Data-Engineering, 10Proton, 10Recommendation-API, 06serviceops, and 2 others: WikiKube: Rename the last few "production" named helm releases to use "main" instead - https://phabricator.wikimedia.org/T377805#10249287 (10akosiaris) [09:09:06] 06Data-Engineering, 10Proton, 10Recommendation-API, 06serviceops, and 2 others: WikiKube: Rename the last few "production" named helm releases to use "main" instead - https://phabricator.wikimedia.org/T377805#10249299 (10akosiaris) p:05Triage→03Medium [09:57:27] 06Data-Engineering, 10Dumps 2.0 (Kanban Board), 10Event-Platform: Update eventutilities_python wrappers to support Flink 1.20 - https://phabricator.wikimedia.org/T374359#10249380 (10gmodena) [10:01:14] 06Data-Engineering, 10Dumps 2.0 (Kanban Board), 10Event-Platform: Update eventutilities_python wrappers to support Flink 1.20 - https://phabricator.wikimedia.org/T374359#10249391 (10gmodena) Quick update on this. I build the wrappers locally, using a SNAPSHOT of wikimedia-event-utiliites that bundles Flin... [10:01:26] 06Data-Engineering, 10Dumps 2.0 (Kanban Board), 10Event-Platform: Update eventutilities_python wrappers to support Flink 1.20 - https://phabricator.wikimedia.org/T374359#10249403 (10gmodena) [10:05:02] 06Data-Engineering, 03Discovery-Search (Current work), 07Epic, 10Event-Platform, 13Patch-For-Review: EPIC: Update flink jobs to support Flink 1.19 - https://phabricator.wikimedia.org/T376812#10249407 (10gmodena) >>! In T376812#10225722, @dcausse wrote: >>>! In T376812#10225610, @gmodena wrote: >> @dcauss... [10:32:33] 06Data-Engineering-Icebox: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712#10249513 (10elukey) 05Open→03Declined [10:32:48] 06Data-Engineering, 10Event-Platform: Create EventStream's equivalent to irc.wikimedia.org's #central channel - https://phabricator.wikimedia.org/T240182#10249516 (10elukey) 05Open→03Declined [10:42:56] 14Analytics, 06Data-Engineering, 10Event-Platform: Port architecture of irc-recentchanges to Kafka - https://phabricator.wikimedia.org/T234234#10249536 (10elukey) 05Open→03Resolved a:03elukey To keep archives happy - in T376014 we moved irc.wikimedia.org's backed to https://github.com/paravoid/ircs... [11:06:36] Hi analytics team! ollie_wmde and I are trying to recreate the wikidata map today (following https://github.com/wmde/wikidata-map) and want to check we're going everything right. We want to publish the dataset so it can then be ingested by the shiny visualisation tool. As I understand it this data is clearly in the tier 3 category (it's just data from public wikidata). Do we need to fill in the google form mentioned on wikitech? [11:07:03] from what I can see it's only available to the foundation gsuite users [11:08:01] 06Data-Engineering, 10Data-Platform-SRE (2024.10.19 - 2024.11.08), 03Discovery-Search (Current work): Unable to find ingested tables in datahub - https://phabricator.wikimedia.org/T376657#10249674 (10BTullis) >>! In T376657#10242489, @EBernhardson wrote: > Curious, i can't say what changed but today I'm gett... [11:11:05] 06Data-Engineering, 10Data-Platform-SRE (2024.10.19 - 2024.11.08), 03Discovery-Search (Current work): Unable to find ingested tables in datahub - https://phabricator.wikimedia.org/T376657#10249676 (10BTullis) >>! In T376657#10234102, @brouberol wrote: > Actually, this [daily DAG](https://airflow-test-k8s.wik... [11:24:14] tarrow: You could export the data on wmcloud, to be absolutely sure it's all public I suppose? [11:26:22] awight: We're happy to have it just live in the exports. Just trying to be good and follow https://wikitech.wikimedia.org/wiki/Data_Platform/Web_publication but we can't do the final step [11:27:15] (we're also realising that some analytics clients e.g. 1011 are missing `one-off` in /srv/published/) [11:30:32] awight: actually, maybe you're already answering what we were going to follow up with at the end: could we have run this somewhere else? Is there a hadoop on cloud? I did look and didn't see one [11:34:03] tarrow: Thanks, the private form is new to me as well (instruction added in March). +1 there seems to be a gap in supporting external orgs through these legal reviews. [11:35:00] Hadoop is only available on analytics clients, you're right. [11:38:30] I can see that if there was a private-data free replica of hadoop on cloud that would save a lot of "worry" about making sure that nothing slips by. I could imagine a fair few analytics tasks actually only care about the content of projects and don't really need to be run is such a privileged environment if there was the compute etc. somewhere else [11:39:52] (03CR) 10Aqu: [V:03+2 C:03+2] Event deduplication via windowing (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1080306 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [11:43:17] Starting build #22 for job analytics-refinery-maven-release [11:43:49] (03CR) 10Aqu: [C:03+2] Add refinery-source jars for v0.2.49.2 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1082083 (owner: 10Maven-release-user) [11:43:52] (03CR) 10Aqu: [V:03+2 C:03+2] Add refinery-source jars for v0.2.49.2 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1082083 (owner: 10Maven-release-user) [12:04:00] Project analytics-refinery-maven-release build #22: 09SUCCESS in 20 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release/22/ [12:13:43] Starting build #20 for job analytics-refinery-update-jars [12:15:22] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.2.53 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1082199 [12:15:22] Project analytics-refinery-update-jars build #20: 09SUCCESS in 1 min 38 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/20/ [12:42:30] tarrow: I agree, and I've had the same need myself for maybe 50% of my hadoop tasks. But I'm sure the tradeoffs have been considered! [12:45:39] tarrow: btw, you can ping the team with your original question by using the alias given in the channel topic, or bearloga who added the form... [12:46:31] yeah, I was kinda waiting waiting a few hours for people in the americas to wake up [12:47:04] (03CR) 10Aqu: [C:03+2] Add refinery-source jars for v0.2.53 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1082199 (owner: 10Maven-release-user) [12:47:08] (03CR) 10Aqu: [V:03+2 C:03+2] Add refinery-source jars for v0.2.53 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1082199 (owner: 10Maven-release-user) [12:49:56] !log about to deploy analytics/refinery with refinery/source 0.2.49.2 & 0.2.53 [12:49:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:57:10] awight: tarrow real, quick, your need is understood, but it has never been priortized [12:57:30] https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Public_Data_Lake was an older idea (we have newer ones now) but it was canned years ago [12:57:36] tarrow: what's your role? [12:57:50] (i have meetings starting so will be a little afk) [12:58:58] ottomata: wmde software engineer :) [12:59:44] ottomata: nothing urgent needed from you; we are able to do what we want (need?) to do. I just wanted to also follow the instructions to the letter and couldn't due to no permissions for the g-form [13:03:26] ottomata: yes just to highlight tarrow's original question which I've muddied, the "web publication" guidelines now include a final step of filling out the "data publication log form" but the permissions are set to WMF org-only it seems. [13:03:44] 06Data-Engineering, 10Data-Platform-SRE (2024.10.19 - 2024.11.08), 03Discovery-Search (Current work): Unable to find ingested tables in datahub - https://phabricator.wikimedia.org/T376657#10250091 (10BTullis) 05Open→03Resolved [13:12:16] indeed! [13:13:39] pinging you in #data-engineering-collab in slack and pinging appropriate folks [13:32:15] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10250191 (10Ottomata) [13:34:40] 06Data-Engineering, 03Discovery-Search (Current work), 07Epic, 10Event-Platform, 13Patch-For-Review: EPIC: Update flink jobs to support Flink 1.19 - https://phabricator.wikimedia.org/T376812#10250202 (10dcausse) >>! In T376812#10249407, @gmodena wrote: >>>! In T376812#10225722, @dcausse wrote: >>>>! In T... [13:36:00] 06Data-Engineering, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Upload an image with flink-k8s-operator version that supports flink 1.20 - https://phabricator.wikimedia.org/T377137#10250212 (10dcausse) [13:36:13] 06Data-Engineering, 03Discovery-Search (Current work), 10Dumps 2.0 (Kanban Board), 10Event-Platform, 13Patch-For-Review: Bump eventutilities to support flink 1.20 - https://phabricator.wikimedia.org/T377130#10250214 (10dcausse) [13:36:14] 06Data-Engineering, 03Discovery-Search (Current work), 07Epic, 10Event-Platform, 13Patch-For-Review: EPIC: Update flink jobs to support Flink 1.20 - https://phabricator.wikimedia.org/T376812#10250207 (10dcausse) [13:36:54] 06Data-Engineering, 06Discovery-Search, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Create and distribute a flink base image with flink 1.20.0 - https://phabricator.wikimedia.org/T377134#10250216 (10dcausse) [13:50:35] Starting build #3 for job wikimedia-event-utilities-maven-release [13:54:40] Project wikimedia-event-utilities-maven-release build #3: 09SUCCESS in 4 min 5 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release/3/ [14:04:51] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10250346 (10Ottomata) Hello! Some questions: - what are your needs for eventual consistency? Do you need to be sure you don’t miss anything? Or if you missed a few... [14:05:45] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10250351 (10Ottomata) It might be helpful to make a little design doc (in this phab ticket is fine) with some requirements for this. Providing proposed solutions is goo... [14:26:53] ottomata: cheers! I fear I've in inadvertently opened a can of worms :P [14:47:13] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10250617 (10Ottomata) Also of interest: Consuming recent updates to Iceberg tables: - https://iceberg.apache.org/docs/nightly/spark-procedures/#examples_13 - https://ww... [14:47:45] 06Data-Engineering, 10Data-Engineering-Wikistats, 07dark-mode: Dark mode support for stats.wikimedia.org - https://phabricator.wikimedia.org/T370758#10250626 (10VirginiaPoundstone) Thank you for this feature request @Diskdance. Wikistats is not in an active enhancement development currently, but we will kee... [15:02:51] tarrow: are they tasty worms? [15:05:22] Looks like the link to that hidden Google form was only added earlier this year [15:05:45] Shame the conversation about it is happening closed off on wmf slack, as I'd like to know what I'm meant to do when I get to that step too 🤣 [15:06:23] And I don't have slack access [15:15:04] !log Deployed refinery using scap, then deployed onto hdfs [15:15:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:17:50] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10250827 (10Cparle) > what are your needs for eventual consistency? Do you need to be sure you don’t miss anything? Or if you missed a few (like during upstream outages)... [15:44:00] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10251060 (10Cparle) > It might be helpful to make a little design doc (in this phab ticket is fine) with some requirements for this. Providing proposed solutions is good... [16:11:44] (03CR) 10Mforns: Shift is_redirect_to_pageview upstream to webrequest (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1078733 (https://phabricator.wikimedia.org/T375527) (owner: 10Milimetric) [16:19:51] 06Data-Engineering, 10Proton, 10Recommendation-API, 06serviceops, and 2 others: WikiKube: Rename the last few "production" named helm releases to use "main" instead - https://phabricator.wikimedia.org/T377805#10251202 (10Ottomata) Thank you! Please let us know when you plan to do eventgate-*s. IIRC, ther... [16:31:03] 06Data-Engineering, 10Structured-Data-Backlog (Current Work): [L] Track commons deletion requests - https://phabricator.wikimedia.org/T370898#10251265 (10Ottomata) > Missing a few is ok so long as we eventually pick them up. Sounds like missing is not okay :) > I don't think that would allow us to get histor... [16:59:52] haha tarrow indeed! [17:00:25] addshore: indeed. i'm sorry about the slack thing too. [17:00:46] addshore: can we get you slack accesss? [17:01:55] Is it allowed for a mere volunteer? ;) [17:05:10] oh perhaps not [17:05:13] rawr [17:05:36] addshore: what step are you expecting to get to? [17:05:45] 06Data-Engineering, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10251435 (10amastilovic) @brouberol automatic sync from Ceph every 5 minutes might cause some issues with the HDFS synchronizer. We need to think about the scen... [17:21:35] 06Data-Engineering, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10251511 (10Ottomata) > automatically pulled and serialized within 5 minutes Not sure if I missed this somewhere, but how will artifact file syncing be trigger... [17:26:07] 14Analytics, 06Data-Engineering-Icebox, 06cloud-services-team, 10Data-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950#10251522 (10taavi) [17:27:08] 14Analytics, 06Data-Engineering-Icebox, 06cloud-services-team, 10Data-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950#10251525 (10taavi) Sorry to poke an many years old ticket.. but what still needs t... [17:38:12] 14Analytics, 06Data-Engineering-Icebox, 06cloud-services-team, 10Data-Services: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950#10251589 (10Ottomata) @taavi many tickets were declined for complexity reasons, bu... [18:11:29] ottomata: 5. Of https://wikitech.wikimedia.org/wiki/Data_Platform/Web_publication I believe [18:27:33] addshore: yes but i mean...as a volunteer, how would you be using data in data platform for publication? [18:37:40] So, tom and and Ollie today were essentially following docs I wrote a year ago for publishing the wikidata map. The one thing seemingly that changed is that 5th bullet point [18:37:59] So, that process is one that I have followed, and now I guess technically can't :D [18:38:28] Tldr, taking wikidata data (already public) querying it in Hadoop etc, and publishing the result (without being supplemented by any other data). [18:39:14] Anyway, I'd mostly just assume IAR here and ignore bullet point 5, as I don't think it makes any sense In that case? But who knows, as I can't see the content of the form [18:44:38] okay, makes sense! just was making sure I didn't misunderstand what you as a volunteer were trying to do. thank you! [19:56:08] 06Data-Engineering, 10Data-Platform-SRE (2024.10.19 - 2024.11.08): Design a suitable DAG deployment method - https://phabricator.wikimedia.org/T368033#10252028 (10brouberol) @amastilovic sorry, I wasn't clear. We currently sync the DAGs every 5 minutes with `git-sync` as a stopgap, while the HDFS synchronizer...