[01:03:01] 10Analytics-Radar, 10Data-Engineering, 10Event-Platform, 10WMF-JobQueue, and 3 others: Queuing jobs is extremely slow - https://phabricator.wikimedia.org/T292048 (10matmarex) 05Open→03Resolved (per above) [06:22:24] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10Joe) I might be very naive, but wouldn't it make sense to write a prometheus alert that checks if there are no / few messag... [07:30:43] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10Joe) Specifically I just plotted ` sum(irate(rdkafka_producer_topic_partition_msgs{instance="cp3050:9132", source="webrequ... [08:01:02] 10Data-Engineering-Kanban, 10Data Engineering Planning (Sprint 01), 10Patch-For-Review: Upgrade to latest PrestoDB and enable iceberg support - https://phabricator.wikimedia.org/T311525 (10JAllemandou) Thanks Andrew for the details! Indeed I had messed up my table creation (I have played a bit with compressi... [08:42:24] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add alert for varnishkafka low/zero messages per second to alertmanager - https://phabricator.wikimedia.org/T300246 (10BTullis) Thanks @joe for the insight and clarity. I think you're quite right and I wasn't seeing the wood for all of the tr... [09:25:50] 10Data-Engineering-Kanban, 10Data-Catalog, 10Data Engineering Planning (Sprint 01), 10Patch-For-Review: Integrate Superset with DataHub - https://phabricator.wikimedia.org/T306903 (10BTullis) >>! In T306903#8061835, @Ottomata wrote: > Idea: > > What about running a second instance of superset on the same... [09:28:47] (03PS2) 10Joal: [WIP] Update refine to use Iceberg for event_sanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811212 (https://phabricator.wikimedia.org/T311739) [12:02:51] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add the conftool pooled/depooled status and weight into prometheus for each service - https://phabricator.wikimedia.org/T309189 (10BTullis) @Joe has suggested a better alert trigger than the one I had previously used, so we no longer need thi... [12:05:11] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Add the conftool pooled/depooled status and weight into prometheus for each service - https://phabricator.wikimedia.org/T309189 (10BTullis) 05Open→03Declined [13:51:52] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Upgrade Turnilo - https://phabricator.wikimedia.org/T301990 (10BTullis) 05Open→03Resolved [13:52:12] 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Upgrade Turnilo - https://phabricator.wikimedia.org/T301990 (10BTullis) I'm resolving this ticket because the upgrade is done. There is still an issue but I'll work on that in T308778 [14:50:40] (03PS3) 10Joal: [WIP] Update refine to use Iceberg for event_sanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811212 (https://phabricator.wikimedia.org/T311739) [14:54:06] (03CR) 10Joal: [WIP] Update refine to use Iceberg for event_sanitize (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/811212 (https://phabricator.wikimedia.org/T311739) (owner: 10Joal) [16:04:53] (03CR) 10Mforns: "I reviewed this patch carefully again." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/694547 (https://phabricator.wikimedia.org/T270433) (owner: 10Mforns) [16:06:33] joal, I gave another round of review to the everlasting deletion script changes. I left a couple comments. If you have time, can you please give a quick review? I'm planning to thoroughly test it with all existing jobs. [18:58:51] I recently deployed a survey to jawiki and I'm wondering how we (Design Research) can consume that data. It's a simple external survey but the initiations/responses data would be useful. [19:02:28] From what I understood we need to have 'analytics-privatedata-users' and to work with hive. [19:07:27] We don't have the need for PII or data engineering expertise. So I'm wondering if we could access a cleaned version of that data on Superset. [21:19:45] 10Data-Engineering, 10CheckUser, 10MW-1.38-notes (1.38.0-wmf.26; 2022-03-14), 10MW-1.39-notes (1.39.0-wmf.17; 2022-06-20), and 3 others: Update CheckUser for actor and comment table - https://phabricator.wikimedia.org/T233004 (10Zabe) There is a problem with AbuseFilter sending non-existing users to the Ch... [21:29:09] 10Data-Engineering-Kanban, 10Security-Team, 10SecTeam-Processed, 10Security, 10Vuln-Infoleak: AQS Cassandra superuser has default password - https://phabricator.wikimedia.org/T311652 (10sbassett) p:05Triage→03Low [23:25:17] PROBLEM - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:26:27] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The following units failed: eventlogging_to_druid_editattemptstep_hourly.service,eventlogging_to_druid_navigationtiming_hourly.service,eventlogging_to_druid_prefupdate_hourly.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:27:45] PROBLEM - Check unit status of eventlogging_to_druid_editattemptstep_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_editattemptstep_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:30:17] PROBLEM - Check unit status of eventlogging_to_druid_prefupdate_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_prefupdate_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers