[00:53:42] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [04:53:42] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [06:54:45] joal: looking into this, sorry yesterday was a busy day [06:55:29] I've seen from https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems that this could be caused by non-existing series but trying to get data for `sum by (cluster, instance, site) (irate(haproxykafka_saturation_errors{channel='kafka-msg'}[5m]))` results in a correct value(s) [06:55:47] so I'm a little puzzled on why this wouldn't work [07:20:10] Hi fabfur [07:38:25] I must say that is puzzling indeed :S [07:49:24] is that against the correct prometheus instance? [07:50:27] fabfur: I don't see data for it in ops/codfw [07:50:55] or the other reported DCs [07:51:31] but I see it in ops/esams for example [07:52:04] that's strange [08:53:42] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [08:58:10] 06Data-Engineering, 06Data-Platform-SRE, 10Data-Services: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10829003 (10Gehel) p:05Triage→03Medium [08:58:49] 06Data-Engineering, 06Data-Platform-SRE, 10Data-Services: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10829005 (10Gehel) @EBernhardson do you know enough about JsonConfig (and maybe charts) to validate Ben's analysis above? [09:00:40] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Data-Platform-SRE: Provide tooling to instantiate ad-hoc temporary Airflow DEV environments - https://phabricator.wikimedia.org/T393521#10829011 (10Gehel) p:05Triage→03Medium [09:01:16] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Remove `analytics` instance folder in airflow repo - https://phabricator.wikimedia.org/T394015#10829016 (10Gehel) [09:01:19] 06Data-Engineering, 10Technical-blog-posts, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603#10829020 (10Gehel) [09:01:35] 06Data-Engineering, 10Data-Services, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10829021 (10Gehel) [09:22:30] joal: I've silenced the alert while I investigate [09:48:04] ack fabfur - thanks a lot for your work on this [14:12:56] 10Data-Engineering (Q4 2025 April 1st - June 30th): Modify scap config files so that we pull artifacts from main rather than deprecated analytics config - https://phabricator.wikimedia.org/T394343#10830099 (10xcollazo) [15:07:26] 06Data-Engineering, 06MW-Interfaces-Team, 10observability: [Needs grooming] Turnilo: include authentication status in request data cube - https://phabricator.wikimedia.org/T332864#10830349 (10Milimetric) This is currently in "Backlog" on the #data-engineering board, along with over 100 other tasks. To me it... [15:10:05] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830355 (10Milimetric) p:05Low→03High [15:27:25] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830415 (10mforns) Looking into this. [15:41:44] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830467 (10Vgutierrez) it looks like not only WMF-Last-Access-Global is impacted by this: ` vgutierrez@carrot:~$ curl -v "ht... [15:44:18] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830494 (10Vgutierrez) Varnish seems to be rewriting the host header from commons.wikimedia.org to en.wikipedia.org: ` - R... [15:45:06] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Cassandra, 13Patch-For-Review: Audit and update AQS Cassandra roles & grants - https://phabricator.wikimedia.org/T313877#10830496 (10Eevans) >>! In T313877#10826959, @JAllemandou wrote: > Hi @Eevans, > * I've checked with @mforns +, we can safely delete t... [15:49:52] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830521 (10Vgutierrez) this is triggered by the following VCL logic: ` # normalize all /static to the same hostname... [15:49:54] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830522 (10mforns) Thank you @Vgutierrez! It makes sense that the issue is not in the uniques code, since the Cookie request... [15:57:45] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830533 (10mforns) > this is triggered by the following VCL logic: > ` > # normalize all /static to the same hostname for ca... [15:59:04] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830535 (10Vgutierrez) >>! In T367346#10830533, @mforns wrote: >> this is triggered by the following VCL logic: >> ` >> # no... [16:12:27] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Cassandra, 13Patch-For-Review: Audit and update AQS Cassandra roles & grants - https://phabricator.wikimedia.org/T313877#10830568 (10Eevans) In addition to [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/1147026 | r1147026 ]] we'll also need to ap... [16:13:46] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830573 (10mforns) Oh, thanks a lot for finding this @Vgutierrez! So, we know the root cause. But it seems that, if we fixe... [16:19:26] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic: WMF-Last-Access-Global cookie set on wrong domain when accessing static assets - https://phabricator.wikimedia.org/T367346#10830580 (10Vgutierrez) >>! In T367346#10830573, @mforns wrote: > Oh, thanks a lot for finding this @Vgutierrez! > > So, we... [16:21:24] 06Data-Engineering, 10Data-Services, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10830585 (10EBernhardson) I don't know a ton about this, but i took a look at it. A few thoughts: * `x1` doesn't really mean i... [17:14:26] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10830705 (10mforns) I think we can make use of the existing methods: `k8s_proof_url()` or `is_running_in_kubernetes()`. @brouberol, usually wh... [17:14:54] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Support for 4.3.11 - webrequest based scraping detection - https://phabricator.wikimedia.org/T388721#10830707 (10Ahoelzl) a:05mforns→03None [17:39:07] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10830812 (10brouberol) @mforns we //shoud// have an entry in the service mesh for datahub. We did cut corners at the time, and circumvented the... [17:44:56] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: Enable Spark data lineage for all Airflow instances - https://phabricator.wikimedia.org/T386862#10830824 (10tchin) I think the solution is to make the code aware of both endpoints, and then pick the correct one inside the `SparkSubmitOpera... [18:34:54] 14Analytics-Radar, 06Data-Engineering, 06Data-Engineering-Icebox, 06SRE, and 4 others: Requests for /static get an invalid WMF-Last-Access cookie for wikipedia.org on non-Wikipedia requests - https://phabricator.wikimedia.org/T261803#10831043 (10Krinkle) [18:49:50] 10Data-Engineering (Q4 2025 April 1st - June 30th): Spike on choosing a solution for DagProperties - https://phabricator.wikimedia.org/T394541 (10mforns) 03NEW [19:55:21] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Add data quality metrics to mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T392494#10831285 (10xcollazo) Ok tests are looking good as per https://gitlab.wikimedia.org/repos/data-engineering/air... [20:16:12] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Add data quality metrics to mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T392494#10831335 (10xcollazo) 05Open→03In progress [21:30:13] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 06MediaWiki-Platform-Team, 06serviceops: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10831569 (10Scott_French)