[06:59:56] now acked alerts about varnishkafka should be removed too (waiting for puppet run [07:06:41] I think that if we want to remove also these https://alerts.wikimedia.org/?q=%40state%3Dactive&q=alertname%3DVarnishkafkaNoMessages we need to edit this https://gerrit.wikimedia.org/r/plugins/gitiles/operations/alerts/+/refs/heads/master/team-data-engineering/varnishkafka.yaml [07:10:11] I can remove that but also I'd like to merge https://gerrit.wikimedia.org/r/c/operations/alerts/+/1136383 if anyone from DE is available for a small review [07:13:55] 06Data-Engineering-Radar, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Shutdown varnishkafka instances - https://phabricator.wikimedia.org/T393772#10825044 (10Fabfur) [07:27:51] Hi fabfur - +1 on the new HAProxyKafka alerts, I'm ok for you to remove the varnishkafka ones :) [07:28:26] ack! [07:28:28] thx [08:48:41] FIRING: AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [08:53:41] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [12:53:42] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [13:51:47] 06Data-Engineering, 06Machine-Learning-Team, 07Essential-Work: Make the revert risk predictions datasets available for analysis - https://phabricator.wikimedia.org/T388453#10826318 (10JAllemandou) This has been done by @fkaelin. The dataset is maintained by a research-team pipeline, it should only be radar f... [13:57:53] 10Data-Engineering (Q4 2025 April 1st - June 30th): [OpsWeek] RefineSanitize fails to send emails - https://phabricator.wikimedia.org/T393202#10826379 (10xcollazo) This patch did not help :( @JAllemandou: Do you think it is worth it to continue pursuing this? [13:59:15] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10826411 (10xcollazo) >>! In T393012#10821876, @xcollazo wrote: > set `"spark.sql.iceberg.locality.enabled":"true"` on `wmf_... [14:02:31] 10Data-Engineering (Q4 2025 April 1st - June 30th): [OpsWeek] RefineSanitize fails to send emails - https://phabricator.wikimedia.org/T393202#10826462 (10JAllemandou) I don't think it is worth. I hope we'll migrate to AirflowRefine soon enough. [15:05:39] 10Data-Engineering (Q4 2025 April 1st - June 30th): [OpsWeek] RefineSanitize fails to send emails - https://phabricator.wikimedia.org/T393202#10826881 (10xcollazo) Makes sense. Closing. [15:06:39] Hi fabfur, we have received alert emails after the aptch you deployed earlier today. The main message is: Pint reporter promql/series found problem(s) in /srv/alerts/ops/team-data-engineering_haproxykafka.yaml: prometheus "ops" at http://127.0.0.1:9900/ops didn't have any series for "haproxykafka_saturation_errors" metric in the last 1w [15:06:44] runbook = https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems [15:06:47] summary = Linting problems found for HaproxyKafkaDeliveryErrors [15:07:23] hi joal, I'll look into that soon, I'm currently deploying a change on cp hosts [15:07:40] no rush fabfur, thank you! [15:15:00] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10Data Pipelines, and 4 others: Merge ks-Arab and ks-Deva to ks - https://phabricator.wikimedia.org/T314476#10826953 (10srishakatux) [15:56:53] 06Data-Engineering, 06Research: Incremental HTML wiki content dataset to support "Who are moderators" - https://phabricator.wikimedia.org/T380874#10827173 (10leila) [16:08:57] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10GrowthExperiments, and 5 others: mw.track: support for histogram metrics - https://phabricator.wikimedia.org/T383563#10827241 (10Michael) @MSantos I see you moved this task to "Needs Input (waiting)" on the #mediawiki-engineering board. Could y... [16:37:36] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10827411 (10xcollazo) Ok, even though there are more avenues to explore, like tuning the HDFS Namenode further, or going dee... [16:43:09] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10827422 (10xcollazo) Ran the following: ` $ hostname -f an-launcher1002.eqiad.wmnet $ sudo -u analytics bash $ kerberos... [16:44:08] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10827424 (10xcollazo) Triggered a manual table maintenance run: https://airflow.wikimedia.org/dags/table_maintenance_iceberg... [16:53:42] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [17:31:13] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content, 13Patch-For-Review: Enable merge-on-read for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T393012#10827607 (10xcollazo) Still to do here: [] Run daily maintenance for a couple days to clear up remain... [17:34:22] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE HAProxy Migration: [HAProxy Migration ]MAY-2025 Delete `webrequest_deprecated` data and DAGs - https://phabricator.wikimedia.org/T391003#10827627 (10JAllemandou) →14Duplicate dup:03T394011 [17:34:26] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Traffic, 13Patch-For-Review: Clean-up varnishkafka webrequest leftovers in Hadoop-world - https://phabricator.wikimedia.org/T394011#10827630 (10JAllemandou) [17:35:22] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Traffic, 13Patch-For-Review: Clean-up varnishkafka webrequest leftovers in Hadoop-world - https://phabricator.wikimedia.org/T394011#10827632 (10JAllemandou) [17:48:39] 10Data-Engineering (Q4 2025 April 1st - June 30th), 06Traffic, 13Patch-For-Review: Clean-up varnishkafka webrequest leftovers in Hadoop-world - https://phabricator.wikimedia.org/T394011#10827714 (10JAllemandou) [18:33:53] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Neslihan_Turan_WMDE - https://phabricator.wikimedia.org/T394395#10827898 (10BCornwall) 05Open→03In progress p:05Triage→03Medium a:03WMDECyn [18:34:27] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Neslihan_Turan_WMDE - https://phabricator.wikimedia.org/T394395#10827905 (10BCornwall) L3/NDA is indeed valid, but the approval needs to happen still. @WMDECyn, Can you please comment here with your approva... [18:36:14] 06Data-Engineering, 10LDAP-Access-Requests, 06SRE, 13Patch-For-Review: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10827909 (10BCornwall) [19:47:58] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Experimentation Lab (Experiment Platform Sprint 6): FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10828084 (10dr0ptp4kt) [20:01:59] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Experimentation Lab (Experiment Platform Sprint 6): FY 24-25 SDS 2.4.9 CDN Synthetic Beacon: EventGate & Varnish: update to receive events from beacon event v2 - https://phabricator.wikimedia.org/T391959#10828121 (10dr0ptp4kt) An updated version of EventGat... [20:53:42] FIRING: [4x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem