[00:02:03] PROBLEM - statsv Varnishkafka log producer on cp7007 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [00:02:13] PROBLEM - statsv Varnishkafka log producer on cp4037 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [00:02:13] PROBLEM - Webrequests Varnishkafka log producer on cp4038 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [00:02:15] PROBLEM - Webrequests Varnishkafka log producer on cp4052 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [00:02:16] PROBLEM - statsv Varnishkafka log producer on cp4038 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [02:22:56] 06Data-Engineering, 10Event-Platform: Bug: event validation error: mediawiki.page-restrictions-change - https://phabricator.wikimedia.org/T390012 (10Ottomata) 03NEW [02:25:33] 06Data-Engineering, 06MW-Interfaces-Team, 10WMF-JobQueue, 10Event-Platform: Bug: event validation error: bad mediawiki.job.* meta.request_id field - https://phabricator.wikimedia.org/T390013 (10Ottomata) 03NEW [02:36:05] 06Data-Engineering, 10Event-Platform: Bug: event validation error: mediawiki.page-restrictions-change - https://phabricator.wikimedia.org/T390012#10676786 (10Ottomata) Hm, actually this looks like it has been happening longer than a week: https://grafana.wikimedia.org/goto/0F4MjOoHR?orgId=1 [08:45:23] 06Data-Engineering: Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10677289 (10JAllemandou) a:03JAllemandou [08:45:57] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review: [Event Platform] Declare webrequest as an Event Platform stream - https://phabricator.wikimedia.org/T314956#10677295 (10JAllemandou) It seems this is not happening. Should we decline? [08:47:25] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: [HAProxy migration] Fix webrequest_frontend deletion job - https://phabricator.wikimedia.org/T387749#10677299 (10JAllemandou) 05Open→03Resolved [08:52:49] 10Data-Engineering (Q3 2025 January 1st - March 31th): Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10677317 (10JAllemandou) [08:55:15] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration, 13Patch-For-Review: [HAProxy migration] Compile expected migration delta, switch over plan and communicate - https://phabricator.wikimedia.org/T387750#10677320 (10JAllemandou) [08:55:16] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration, 13Patch-For-Review: [HAProxy migration] HAProxy and VarnishKafka should produce compatible datasets - https://phabricator.wikimedia.org/T382571#10677321 (10JAllemandou) [08:57:36] 10Data-Engineering (Q3 2025 January 1st - March 31th): Switch webrequest dataset to feed from HAProxy instead of VarnishKafka - https://phabricator.wikimedia.org/T386177#10677326 (10JAllemandou) [08:57:39] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration, 13Patch-For-Review: [HAProxy migration] Compile expected migration delta, switch over plan and communicate - https://phabricator.wikimedia.org/T387750#10677327 (10JAllemandou) [09:07:00] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic: Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029 (10JAllemandou) 03NEW [09:19:27] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031 (10BTullis) 03NEW [09:19:49] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031#10677403 (10BTullis) a:03BTullis [09:30:44] (03PS1) 10Joal: Update webrequest_frontend validation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1131276 (https://phabricator.wikimedia.org/T389797) [09:45:22] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic: Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029#10677468 (10JAllemandou) [09:57:09] (03CR) 10Aqu: [C:03+1] Update webrequest_frontend validation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1131276 (https://phabricator.wikimedia.org/T389797) (owner: 10Joal) [10:00:13] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic: Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029#10677545 (10elukey) Thanks a lot for the heads up! I am checking the Benthos [[ https://gerrit.wikimedia.org/r/plugins/gitiles/opera... [10:04:57] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Traffic: Migrate Benthos `webrequest_sampled_live` to feed from HAProxy data - https://phabricator.wikimedia.org/T390029#10677578 (10JAllemandou) Nice catch @elukey :) This behavior (no `dt`) doesn't exist anymore with HAProxy. However there still are ba... [10:19:06] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1131276 (https://phabricator.wikimedia.org/T389797) (owner: 10Joal) [10:20:50] !log Deploying refinery (scap + hdfs) [10:20:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:03:20] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031#10677777 (10BTullis) p:05Triage→03High [11:10:24] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031#10677789 (10BTullis) Looking at a random caching proxy host from the list of those affected, we can see that the varnishkafka service was reloaded at 00:00:10 th... [11:13:30] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031#10677798 (10Fabfur) Thanks for reporting this, @BCornwall is already working on it with https://gitlab.wikimedia.org/repos/sre/varnishkafka/-/merge_requests/5 [11:37:53] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031#10677884 (10BTullis) →14Duplicate dup:03T389978 [11:38:25] 06Data-Engineering, 06Traffic, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 13Patch-For-Review: varnishkafka 1.1.0-5 exits on SIGHUP - https://phabricator.wikimedia.org/T389978#10677885 (10BTullis) [13:52:58] 10Data-Engineering (Q3 2025 January 1st - March 31th), 13Patch-For-Review: [Refine DAG Improvement] Add Parameter to Reduce Spark Driver Logs in Skein Log Collection - https://phabricator.wikimedia.org/T381074#10678501 (10Antoine_Quhen) 05Open→03In progress Lets rollout progressively. The property file is... [14:05:10] 06Data-Engineering, 10Event-Platform, 13Patch-For-Review: [Event Platform] Declare webrequest as an Event Platform stream - https://phabricator.wikimedia.org/T314956#10678659 (10Ottomata) I still think we should do this, even if we are not doing it now. Having this declared as a stream, even if not fully f... [14:11:07] 06Data-Engineering, 10Data-Platform-SRE (2025.03.22 - 2025.04.11): Multiple varnishkafka service failures - https://phabricator.wikimedia.org/T390031#10678715 (10brouberol) 05Duplicate→03Resolved [14:23:45] 06Data-Engineering, 10Cassandra, 10Commons-Impact-Metrics: Recreate top-based Cassandra tables for Commons Impact Metrics - https://phabricator.wikimedia.org/T374268#10678773 (10EChukwukere-WMF) @mforns pls confirm this is ready for testing ? and the changes can be tested in the QA env correct ? [14:29:37] 06Data-Engineering: AQS 2.0: follow-up deprecation work - https://phabricator.wikimedia.org/T390065 (10Milimetric) 03NEW [15:03:32] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Infrastructure-Foundations, 10netops: Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10679023 (10BTullis) Please could someone expedite this, if possible? We still have some alerts that are flag... [15:37:46] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Infrastructure-Foundations, 10netops: Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10679226 (10JAllemandou) a:03JAllemandou [15:38:54] 10Data-Engineering (Q3 2025 January 1st - March 31th): Deprecate `webrequest_sampled_128` druid datasource - https://phabricator.wikimedia.org/T385198#10679236 (10JAllemandou) a:03JAllemandou [15:58:34] 10Data-Engineering (Q3 2025 January 1st - March 31th): Deprecate `webrequest_sampled_128` druid datasource - https://phabricator.wikimedia.org/T385198#10679349 (10JAllemandou) [16:04:42] 10Data-Engineering (Q3 2025 January 1st - March 31th): Deprecate `webrequest_sampled_128` druid datasource - https://phabricator.wikimedia.org/T385198#10679417 (10Volans) @JAllemandou thanks for the heads up. I still have live the whole old superset dashboard (and related charts) using the `_128` dataset but AFA... [16:09:23] 06Data-Engineering, 06MW-Interfaces-Team, 10WMF-JobQueue, 10Event-Platform: Bug: event validation error: bad mediawiki.job.* meta.request_id field - https://phabricator.wikimedia.org/T390013#10679435 (10Ottomata) Likely culprit is in this block: https://github.com/wikimedia/mediawiki-extensions-EventBus/b... [16:10:31] 06Data-Engineering, 06Traffic, 10Data-Platform-SRE (2025.03.22 - 2025.04.11), 13Patch-For-Review: varnishkafka 1.1.0-5 exits on SIGHUP - https://phabricator.wikimedia.org/T389978#10679443 (10BCornwall) 05In progress→03Resolved This appears to be [16:11:10] 06Data-Engineering, 06MW-Interfaces-Team, 10WMF-JobQueue, 10Event-Platform: Bug: event validation error: bad mediawiki.job.* meta.request_id field - https://phabricator.wikimedia.org/T390013#10679451 (10Ottomata) If the errors are only for GlobalVanishJob, then I'd guess that job is setting `$params['reque... [16:39:22] 06Data-Engineering, 06SRE, 06Traffic-Icebox, 10MobileFrontend (Tracking): RFC: Remove m-dot subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998#10679643 (10Krinkle) [17:00:55] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Data Pipelines, 10Observability-Metrics, 07Essential-Work, and 2 others: Disable Data Platform Engineering generated graphite metrics and dashboards - https://phabricator.wikimedia.org/T372855#10679794 (10AndrewTavis_WMDE) Cross posting from https://... [18:07:50] 06Data-Engineering: NEW/CHANGE FEATURE REQUEST: Make Event Registration Tool's data available in Data Lake - https://phabricator.wikimedia.org/T389662#10680188 (10mpopov) **Quick update**: with the plans to make Event Registration available on more wikis, analytics for that is going to be really hard to do if it... [18:48:55] 06Data-Engineering, 06SRE, 06Traffic-Icebox, 10MobileFrontend (Tracking): RFC: Remove m-dot subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998#10680288 (10Krinkle) [18:58:18] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Product-Analytics, 10Event-Platform: [BUG] new eventgate-wikimedia header enrich config loses client set headers - https://phabricator.wikimedia.org/T387908#10680311 (10Ottomata) Ah! There isn't a bug in eventgate-wikimedia. We just forgot that the h... [19:00:18] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Product-Analytics, 10Event-Platform: [BUG] new eventgate-wikimedia header enrich config loses client set headers - https://phabricator.wikimedia.org/T387908#10680313 (10Ottomata) Hm, the `kaios_app.error` stream also has this data. Let's keep it there... [19:02:56] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Product-Analytics, 10Event-Platform: [BUG] eventgate-logging-external drops previously collected http request headers - https://phabricator.wikimedia.org/T387908#10680320 (10Ottomata) [19:09:03] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: [BUG] eventgate-logging-external drops previously collected http request headers - https://phabricator.wikimedia.org/T387908#10680332 (10Ottomata) [[ https://wikitech.wikimedia.org/w/index.php?... [19:22:52] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Product-Analytics, 10Event-Platform, 13Patch-For-Review: [BUG] eventgate-logging-external drops previously collected http request headers - https://phabricator.wikimedia.org/T387908#10680398 (10Ottomata) 05Open→03In progress p:05Triage→03High [21:23:49] 06Data-Engineering, 06MW-Interfaces-Team, 10WMF-JobQueue, 10Event-Platform: Bug: event validation error: bad mediawiki.job.* meta.request_id field - https://phabricator.wikimedia.org/T390013#10680787 (10A_smart_kitten) >>! In T390013#10679451, @Ottomata wrote: > If the errors are only for GlobalVanishJob,... [23:35:32] 06Data-Engineering, 06Data-Engineering-Radar, 06Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 06Movement-Insights: Unique Devices seasonal trends on small projects - https://phabricator.wikimedia.org/T344381#10681185 (10Mayakp.wiki) 05Open→03Resolved a:03Mayakp.wiki We can go ahead...