[03:52:54] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub - https://phabricator.wikimedia.org/T306896#10076247 (10tchin) I ran a job using our regular prod configs just without iceberg tables. It ran successfully and outputted this: `lang=j... [03:56:14] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub - https://phabricator.wikimedia.org/T306896#10076249 (10tchin) It seems like right now, unless we upgrade to at least Spark 3.4 and Iceberg 1.4, we will not be able to use Datahub's s... [05:23:48] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10076308 (10Marostegui) Running this schema change on the old enwiki master (db1184) [07:29:45] (03PS1) 10KCVelaga: Bug: T372724 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1063940 (https://phabricator.wikimedia.org/T372724) [07:32:49] (03CR) 10KCVelaga: "Stream config: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/%2B/master/wmf-config/ext-EventStreamConfig.php#" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1063940 (https://phabricator.wikimedia.org/T372724) (owner: 10KCVelaga) [09:16:26] (03CR) 10KCVelaga: Add MP fragment schema for translation workflows (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1061096 (https://phabricator.wikimedia.org/T369687) (owner: 10Mforns) [09:41:15] 10Quarry, 10cloud-services-team (FY2024/2025-Q1-Q2): Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#10076746 (10fnegri) I created the quarry_readonly user manually because Trove doesn't let me create read-only users: ` MariaDB [(none)]> CREATE USER quarry_readonly@'172.16.%... [09:42:48] 10Quarry, 10cloud-services-team (FY2024/2025-Q1-Q2): Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#10076764 (10fnegri) The `quarry_p` database was also created manually, but I've also added the schema to `schema.sql` in https://github.com/toolforge/quarry/pull/61. [09:43:50] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Event-Platform, 13Patch-For-Review: Rollback haproxy feed automated ingestion - https://phabricator.wikimedia.org/T372456#10076766 (10gmodena) > Remove the Gobblin MapReduce job that loads Kafka topics. Removing this gobbling job resulted in this al... [10:34:26] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board), 10Event-Platform: [BUG] MediawikiPageContentChangeEnrichAvailability is firing - https://phabricator.wikimedia.org/T372768#10076969 (10gmodena) I did some investigation both on the affected Kafka topics and related hive `even... [11:04:24] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board), 10Event-Platform: [BUG] MediawikiPageContentChangeEnrichAvailability is firing - https://phabricator.wikimedia.org/T372768#10076997 (10gmodena) > The alert compares the ratio of flink_taskmanager_job_task_operator_event_proce... [11:42:40] (03Abandoned) 10Nik Gkountas: Show a warning for unused languages with localization over 75% [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/921329 (https://phabricator.wikimedia.org/T336752) (owner: 10Nik Gkountas) [12:14:03] 10Quarry: deploy.sh shows warnings - https://phabricator.wikimedia.org/T372881 (10fnegri) 03NEW [12:25:59] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884 (10rook) 03NEW [12:27:44] 10Quarry: Remove warning message - https://phabricator.wikimedia.org/T372886 (10rook) 03NEW [12:28:00] 10Quarry, 10PAWS: deploy.sh shows warnings - https://phabricator.wikimedia.org/T372881#10077284 (10rook) [12:47:46] 10Quarry: Remove warning message - https://phabricator.wikimedia.org/T372886#10077321 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/64 [12:50:00] 10Quarry: Remove warning message - https://phabricator.wikimedia.org/T372886#10077322 (10rook) 05Open→03Resolved a:03rook [14:04:12] !log deployed airflow analytics after slight adjustment to pageview definition T368303 [14:11:48] (03PS2) 10Mforns: Add MP fragment schema for translation workflows [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1061096 (https://phabricator.wikimedia.org/T369687) [14:12:24] (03CR) 10CI reject: [V:04-1] Add MP fragment schema for translation workflows [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1061096 (https://phabricator.wikimedia.org/T369687) (owner: 10Mforns) [14:13:59] (03PS3) 10Mforns: Add MP fragment schema for translation workflows [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1061096 (https://phabricator.wikimedia.org/T369687) [14:14:53] (03CR) 10Mforns: Add MP fragment schema for translation workflows (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1061096 (https://phabricator.wikimedia.org/T369687) (owner: 10Mforns) [14:25:50] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10077780 (10Ottomata) [14:29:25] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10077794 (10BTullis) Looks like we might need to prioritise: {T338057} then? [14:30:55] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Ingest a test hive database into datahub - https://phabricator.wikimedia.org/T372899 (10Ottomata) 03NEW [14:43:14] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10077861 (10tchin) Yeah I think we should prioritize that. I also tested joins and they work: {F57282734,width=100%} [14:48:02] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub with lineage - https://phabricator.wikimedia.org/T306896#10077893 (10xcollazo) >>! In T306896#10077861, @tchin wrote: > Yeah I think we should prioritize that. > > I also tested join... [15:44:54] 06Data-Engineering, 06Data Products, 06Data-Platform, 06DBA, 07Schema-change-in-production: Change page.page_links_updated to fixed-length timestamp in wmf wikis - https://phabricator.wikimedia.org/T371742#10078143 (10Ladsgroup) The schema change is changing the field type from `varbinary(14)` to `binary... [16:20:39] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884#10078379 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/65 [16:26:13] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884#10078440 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/65 [16:26:31] 10Quarry: Remedy warning in install - https://phabricator.wikimedia.org/T372884#10078441 (10rook) 05Open→03Resolved a:03rook [16:30:41] 10Quarry, 10PAWS: deploy.sh shows warnings - https://phabricator.wikimedia.org/T372881#10078462 (10rook) 05Open→03Resolved [16:40:23] (03CR) 10Nik Gkountas: [C:03+1] "Looks good to me" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1061096 (https://phabricator.wikimedia.org/T369687) (owner: 10Mforns) [17:26:46] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to for ifeatu_nnaobi_wmde - https://phabricator.wikimedia.org/T371796#10078703 (10Ottomata) I can approve for `analytics-privatedata-users`. Approved! [17:31:09] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Ingest a test hive database into datahub - https://phabricator.wikimedia.org/T372899#10078737 (10tchin) What should the databases be called and where should it live? Should we have seperate databases for hive and iceberg tables? `/wmf/data/wmf_test` and `... [17:38:36] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Ingest a test hive database into datahub - https://phabricator.wikimedia.org/T372899#10078758 (10Ottomata) > Should we have seperate databases for hive and iceberg tables? I think likely yes? > What should the databases be called? `wmf_test` and `wmf_t... [17:47:51] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Ingest a test hive database into datahub - https://phabricator.wikimedia.org/T372899#10078797 (10xcollazo) >>! In T372899#10078758, @Ottomata wrote: >> Should we have seperate databases for hive and iceberg tables? > I think likely yes? Presto doesn't l... [18:47:36] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Ingest a test hive database into datahub - https://phabricator.wikimedia.org/T372899#10079003 (10Ottomata) > What about test_sandbox and test_iceberg_sandbox ? Maybe 'test' and 'sandbox' are redundant? How about just `sandbox` and `sandbox_iceberg`? I'... [19:06:48] (03PS1) 10Aleksandar Mastilovic: Put the whole backfill command to be run in alert email body [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1064071 [19:10:02] (03CR) 10Ottomata: Put the whole backfill command to be run in alert email body (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1064071 (owner: 10Aleksandar Mastilovic) [23:25:28] (03CR) 10Aleksandar Mastilovic: ci: migrate to new parent pom (031 comment) [analytics/gobblin-wmf] - 10https://gerrit.wikimedia.org/r/1062711 (https://phabricator.wikimedia.org/T360219) (owner: 10Gehel)