[00:06:06] 14Analytics-Radar, 06Data-Engineering, 06Data-Engineering-Icebox, 10MediaWiki-Action-API, 13Patch-For-Review: Run ETL for event.mediawiki_api_request into aggregate tables - https://phabricator.wikimedia.org/T137321#10635186 (10bd808) [08:12:40] 06Data-Engineering, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 10Discovery-Search (2025.03.01 - 2025.03.21): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10635423 (10Gehel) [08:18:58] !log clear APT cache on stat1011 (`sudo apt-get clean`) [08:18:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:39:51] 06Data-Engineering, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 10Discovery-Search (2025.03.01 - 2025.03.21): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10635446 (10dcausse) [[https://gitlab.wikimedia.org/repos/search-platf... [08:51:48] brouberol, btullis o/ [08:51:49] https://grafana.wikimedia.org/d/oITUqwKIk/netfilter-connection-tracking?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-percent_filter=20&var-hosts=krb1001:9100&from=now-12h&to=now [08:52:06] krb1001 is getting saturated again from time to time, it seems presto again [08:52:18] could you please check if any DE job is causing this? [08:52:30] not currently ongoing, but I see alerts since 2 AM UTC that flap [08:58:12] ack, I've asked in the data-engineering channel [09:04:15] brouberol: ah ok sorry I can try to do it next time, on slack? [09:07:26] no worries! It's just that the team has more context on the queries themselves [09:07:30] yep, on slcak [09:07:32] *slack [09:14:15] brouberol: FYI we also have ~70-80k requests/hour that return "Server not found in Kerberos database" and those all come from the same 4 combinations you can see in https://etherpad.wikimedia.org/p/volans-tmp2 [09:18:17] hmm, both of these instances are being migrated to kubernetes (an-launcher1002 is an ongoing process, while an-airflow1004) will happen next monday, so I expect these errors will disappear soon [09:18:30] great! thx [09:18:35] (we're not seeing them for any instance we've already migrated) [09:18:45] thanks for flagging! [09:18:49] unless the jobs that create them are migrated too :D [09:19:01] I mean still show the error after the migration [09:19:27] looking at the principals, I think this happens because they connect to postgresql via the an-db1001.eqiad.wmnet hostname [09:20:02] what I don't fully understand is that I don't think we've setup the PG connection via Kerberos in any way [09:21:10] so, I'm not 100% sure of the root cause, TBH, but given that the airflow instances running on kubernetes have their own dedicated DB running on k8s itself, I don't think we should be seeing this post migration [09:21:37] makes sense, thx! [09:22:06] stevemunene: would you mind keeping volans in the loop when you're done migrating airflow-platform-eng, so we can make sure some of these errors have disappeared? [09:22:23] 06Data-Engineering, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 10Discovery-Search (2025.03.01 - 2025.03.21): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10635581 (10dcausse) @pfischer reminded me that it's a known issue: T3... [09:25:19] 06Data-Engineering, 06serviceops, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 10Discovery-Search (2025.03.01 - 2025.03.21): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10635587 (10JMeybohm) Hi @dcausse , I'll be out for sa... [09:31:59] Ack, sure will brouberol [10:47:35] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 6 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10635681 (10BTullis) >>! In T386255#10634575, @xc... [13:28:57] 06Data-Engineering, 06Commons, 06Data-Persistence, 10MediaWiki-File-management, and 4 others: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) - https://phabricator.wikimedia.org/T28741#10636159 (10zdev) >>! In T28741#10541452, @Ladsgroup wrote: >>>! In T28741#10... [15:21:52] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10Commons-Impact-Metrics, 13Patch-For-Review: [CIM] Skewed ranking with the top Editors monthly API - https://phabricator.wikimedia.org/T370470#10636520 (10mforns) Thanks @eevans! Will do. [16:39:48] 06Data-Engineering, 06Data-Engineering-Radar, 10ConfirmEdit (CAPTCHA extension), 10MediaWiki-extensions-EventLogging, 05FY2024-25 WE4.2.3 CAPTCHA evaluation framework: Send captcha API response data to event logging - https://phabricator.wikimedia.org/T379179#10636882 (10kostajh) @Reedy @acooper In `exte... [17:12:20] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: [HAProxy migration] Compile expected migration delta, switch over plan and communicate - https://phabricator.wikimedia.org/T387750#10637035 (10Ahoelzl) [17:12:42] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE HAProxy Migration: [HAProxy migration] Compile expected migration delta, switch over plan and communicate - https://phabricator.wikimedia.org/T387750#10637038 (10Ahoelzl) Impact assessment: https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADs... [17:50:12] 06Data-Engineering, 06Commons, 06Data-Persistence, 10MediaWiki-File-management, and 4 others: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) - https://phabricator.wikimedia.org/T28741#10637163 (10Magnus) I am trying to fix my tools to use the new tables, once a... [19:09:27] 06Data-Engineering, 06Commons, 06Data-Persistence, 10MediaWiki-File-management, and 4 others: Migrate file tables to a modern layout (image/oldimage; file/filerevision; add primary keys) - https://phabricator.wikimedia.org/T28741#10637363 (10Jdforrester-WMF) >>! In T28741#10637163, @Magnus wrote: > I am tr... [19:11:39] 06Data-Engineering, 06tech-decision-forum, 10Event-Platform: MediaWiki Event Carried State Transfer - Problem Statement - https://phabricator.wikimedia.org/T291120#10637369 (10Ottomata) [19:11:40] 14Analytics, 06Data-Engineering, 06DBA, 10Event-Platform: Eventually Consistent MediaWiki State Change Events - https://phabricator.wikimedia.org/T120242#10637370 (10Ottomata) [20:05:56] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content: Implement alerting for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T384962#10637502 (10tchin) Here's an issue I currently see: the `data_quality_ops.data_quality_alerts` doesn't have a column to put... [20:20:50] (03PS1) 10TChin: Support inserting ResultKey into DeequVerificationSuiteToDataQualityAlerts [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1127964 (https://phabricator.wikimedia.org/T384962) [20:24:13] (03PS1) 10TChin: Add columns to data_quality_alerts to support inserting ResultKey [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1127967 (https://phabricator.wikimedia.org/T384962) [20:44:43] (03CR) 10Xcollazo: Add columns to data_quality_alerts to support inserting ResultKey (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1127967 (https://phabricator.wikimedia.org/T384962) (owner: 10TChin) [20:48:25] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 10Charts (Sprint 17), and 2 others: Deploy patch-gjlw_namespace_text.sql on x1.commonswiki for JsonConfig - https://phabricator.wikimedia.org/T385917#10637690 (10bvibber) [20:48:26] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Implement alerting for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T384962#10637691 (10xcollazo) >For instance if we want to alert on T388439 there isn't a way currently to dif... [20:48:36] FIRING: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [20:48:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [20:58:36] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [20:58:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [21:11:36] FIRING: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [21:11:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [21:16:36] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [21:16:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [23:02:51] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users group for DSantamaria - https://phabricator.wikimedia.org/T388693#10638118 (10BCornwall) @ATsay-WMF Do you approve of this? Thanks!