[06:12:13] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop event_variant column from echo_event - https://phabricator.wikimedia.org/T385645#10585903 (10Marostegui) This is done, so I am going to revert back to RBR [06:18:23] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop event_variant column from echo_event - https://phabricator.wikimedia.org/T385645#10585909 (10Marostegui) 05Open→03Resolved All done [10:45:20] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] Fix HAProxy `uri_host` and `accept_language` differences with VarnishKafka - https://phabricator.wikimedia.org/T386354#10586458 (10JAllemandou) After a talk with @Fabfur , we have agreed on keeping the values HAProxy sends us. Varnish bei... [10:48:30] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] Fix HAProxy `uri_host` and `accept_language` differences with VarnishKafka - https://phabricator.wikimedia.org/T386354#10586466 (10JAllemandou) [10:54:36] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] Take actions on HAProxy `uri_host` and `accept_language` differences with VarnishKafka - https://phabricator.wikimedia.org/T386354#10586486 (10JAllemandou) [13:00:37] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 7 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10586671 (10JAllemandou) We have an airflow job a... [13:07:14] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1119725 (https://phabricator.wikimedia.org/T386464) (owner: 10Gerrit maintenance bot) [13:11:59] (03CR) 10Joal: [V:03+2 C:03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1120172 (https://phabricator.wikimedia.org/T386631) (owner: 10Gerrit maintenance bot) [13:34:26] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Traffic, 10DPE HAProxy Migration: [HAProxy migration] Some 200 requests in VK are logged as 400 in HAProxy - https://phabricator.wikimedia.org/T387451 (10JAllemandou) 03NEW [13:38:23] 06Data-Engineering, 06Traffic: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454 (10JAllemandou) 03NEW [14:05:36] 06Data-Engineering, 06Traffic: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10586980 (10Fabfur) [14:12:08] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10587023 (10HCoplin-WMF) @ottomata -- are you able to grant me the LDAP access? I created the phabricator ticket per the instructions, but I'm not sure ho... [14:16:56] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10587035 (10Ottomata) I probably can but it will take me ages to remember how to do it. I just bumped DPE-SRE, they are usually pretty snappy. BTW, you w... [14:18:56] brouberol, btullis o/ [14:19:13] something is hammering krb1001 from ~11:50 UTC [14:19:14] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=krb1001&var-datasource=thanos&var-cluster=misc&from=1740656134956&to=1740665502576 [14:19:19] there was also a previous burst in the morning [14:19:39] I am checking logs but have you changed/added anything today that could explain this? [14:20:35] huh, that is odd.. I don't remember changing anything, except deploying a new airflow instance in k8s, which runs a couple of jobs an hour [14:20:46] do we have a trace of the origin of the burst of requests in the logs? [14:20:52] yeah I am checking [14:21:24] the bulk seems to be an-presto-related [14:23:03] root@krb1001:/var/log/kerberos# zgrep an-presto krb5kdc.log | wc -l [14:23:04] 37320104 [14:23:17] ~50% of the auth requests are from an-prestoXXXX [14:23:20] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10function-evaluator, 10function-orchestrator, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 07Essential-Work: WF service logging seems to be partially missing - https://phabricator.wikimedia.org/T386972#10587050 (10DSantamaria) 05Open→03In progress [14:24:54] hmm, ok, so that might be coming from some data-engineering activity, but it does not match with any infrastructural change we've made. I can reach out to DE to see if we're running something particularly heavy [14:26:54] I reached out on slack, #data-engineering-team [14:37:34] joal is looking at what could be causing the spike of presto queries [14:47:41] we've identified a large query that basically was reading 2 whole months of webrequest data. It might not explain the whole event but definitely contributed to the load. [14:54:57] elukey: (paste from slack) "I think I have our culprit - he's done with his queries, and I explained he should be using Spark for the type of stuff he was doing." [14:55:20] i saw that / is 100% full on krb1001 btw [14:55:52] Thanks elukey <3 [14:57:12] and brouberol obviously <3 [14:57:32] _tips fedora_ [14:57:46] m'pleasure [15:03:11] brouberol: thanks, sigh [15:03:12] 16G krb5kdc.log [15:03:12] 15G krb5kdc.log.1 [15:03:12] 39G total [15:03:21] truncated one log volans [15:03:22] all good [15:03:36] brouberol: ack thanks! [15:03:46] :D [15:13:41] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 7 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10587244 (10Cparle) FYI `structured_data.commons_... [15:21:23] 06Data-Engineering, 06Growth-Team, 10GrowthExperiments, 06Structured-Data-Backlog, and 2 others: structured_data.commons_entity stuck at 2025-01-20 - https://phabricator.wikimedia.org/T387470 (10Cparle) 03NEW [15:23:39] 06Data-Engineering, 06Growth-Team, 10GrowthExperiments, 06Structured-Data-Backlog, and 2 others: structured_data.commons_entity stuck at 2025-01-20 - https://phabricator.wikimedia.org/T387470#10587343 (10Cparle) [16:07:54] 06Data-Engineering, 06Traffic, 10DPE HAProxy Migration: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10587663 (10Ahoelzl) [16:08:07] 06Data-Engineering, 06Traffic, 10DPE HAProxy Migration: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10587664 (10Fabfur) Given that it's just 4 bytes more, I think we can add this (I would do after we complete the migration, given that is a change on how we manage t... [17:32:36] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Upgrade to Spark 3.2 to support Spark lineage for Iceberg tables - https://phabricator.wikimedia.org/T378899#10588277 (10JAllemandou) The plan is to move to Spark 3.5. We need to sync with SREs to define when we do this (before HAdoop 3, after, same time...) [17:34:57] 06Data-Engineering, 06Data-Platform-SRE, 07Epic: Upgrade Hadoop to version 3.3.6 and Hive to version 4.0.1 - https://phabricator.wikimedia.org/T379385#10588284 (10JAllemandou) This is great @xcollazo :) I know some people still use hive. I guess when we remove it, people will be forced to move to Spark. [17:53:55] 10Data-Engineering (Q3 2024 January 1st - March 31th): [HAProxy migration] Take actions on HAProxy `uri_host` and `accept_language` differences with VarnishKafka - https://phabricator.wikimedia.org/T386354#10588383 (10JAllemandou) [20:32:17] 06Data-Engineering, 10Release-Engineering-Team (Radar): Create a GitLab CI/CD Component project for WMF CI/CD templates and components - https://phabricator.wikimedia.org/T382430#10588908 (10brennen) > I added Release-Engineering-Team just as an FYI and in case they have advice for us. RelEng, feel free to put... [20:40:15] 06Data-Engineering, 06Traffic, 10DPE HAProxy Migration: Add HAproxy termination field to webrequest - https://phabricator.wikimedia.org/T387454#10588919 (10Fabfur) [20:53:42] (03Abandoned) 10Xcollazo: Add knc.wikipedia to pageview allowlist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1115418 (https://phabricator.wikimedia.org/T385185) (owner: 10Gerrit maintenance bot) [23:39:34] 06Data-Engineering, 06Data-Persistence, 10MediaWiki-Page-derived-data, 07Schema-change: Add page_is_redirect/page_namespace/page_title index - https://phabricator.wikimedia.org/T387537 (10tstarling) 03NEW [23:50:28] 06Data-Engineering, 06Data-Persistence, 10MediaWiki-Page-derived-data, 07Schema-change: Add page_is_redirect/page_namespace/page_title index - https://phabricator.wikimedia.org/T387537#10589387 (10Pppery) It's also worth pointing out that Special:PrefixIndex can be abused to do this - https://en.wikipedia....