[00:19:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [00:19:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [04:19:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [04:19:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [08:19:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [08:19:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [08:38:05] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting Kerberos access for SCardenas (WMF) - https://phabricator.wikimedia.org/T418664#11676605 (10Gehel) [08:38:17] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting Kerberos access for SCardenas (WMF) - https://phabricator.wikimedia.org/T418664#11676615 (10Gehel) a:03Gehel [09:05:34] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 13Patch-For-Review: Requesting Kerberos access for SCardenas (WMF) - https://phabricator.wikimedia.org/T418664#11676689 (10Gehel) @Scardenasmolinar : you need to first request production/shell access as documented... [12:11:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): [OpsWeek] Testing on airflow-devenvs can generate false alerts such as SLO misses - https://phabricator.wikimedia.org/T416596#11677211 (10AndrewTavis_WMDE) WMDE is using the `EmailOperator` in our DAGs a lot for notifying stakeholders that their data is a... [12:17:28] 06Data-Engineering: druid_load_webrequest_sampled_live_hourly - https://phabricator.wikimedia.org/T419121 (10dr0ptp4kt) 03NEW [12:17:52] 06Data-Engineering: druid_load_webrequest_sampled_live_hourly SerDe error in singular DAG run - https://phabricator.wikimedia.org/T419121#11677231 (10dr0ptp4kt) [12:19:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [12:19:14] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [12:39:48] 06Data-Engineering: druid_load_webrequest_sampled_live_hourly SerDe error in singular DAG run - https://phabricator.wikimedia.org/T419121#11677329 (10dr0ptp4kt) [13:04:53] 06Data-Engineering, 10ChangeProp, 10EventStreams, 06MediaWiki-Engineering, and 15 others: Migrate node-based services in production to node22 - https://phabricator.wikimedia.org/T393434#11677382 (10Krinkle) [13:44:11] 06Data-Engineering: Optimize enqueueing of refine_webrequest_hourly pipeline - https://phabricator.wikimedia.org/T419050#11677499 (10dr0ptp4kt) We've seen some issues with getting at log data that would help in troubleshooting this sort of thing. @amastilovic noted that https://github.com/apache/airflow/issues... [13:56:10] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): druid_load_webrequest_sampled_live_hourly SerDe error in singular DAG run - https://phabricator.wikimedia.org/T419121#11677540 (10Antoine_Quhen) 05Open→03Resolved a:03Antoine_Quhen Data cleaned with: `python import json sc = spark.sparkContext... [14:02:04] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): druid_load_webrequest_sampled_live_hourly SerDe error in singular DAG run - https://phabricator.wikimedia.org/T419121#11677563 (10dr0ptp4kt) @Antoine_Quhen, 🍪 for you. Well done! [14:18:23] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Adapt Sqoop for imagelinks schema changes - https://phabricator.wikimedia.org/T416481#11677623 (10Snwachukwu) Thank you @Zabe for the explanation. Indeed I used stale data from last sqoop run. [15:06:01] 06Data-Engineering, 06Data-Engineering-Radar, 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#11677830 (10cscott) Yeah, sounds good. [15:30:22] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): druid_load_webrequest_sampled_live_hourly SerDe error in singular DAG run - https://phabricator.wikimedia.org/T419121#11677953 (10Antoine_Quhen) One step higher in the problem is here: https://gerrit.wikimedia.org/g/operations/puppet/+/f0d57f3f75c39d9... [15:40:27] PROBLEM - Check if active EventStreams endpoint is delivering messages. on alert1002 is CRITICAL: CRITICAL: No EventStreams message was consumed from https://stream.wikimedia.org/v2/stream/recentchange within 10 seconds. https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams/Administration [15:40:38] 06Data-Engineering: Optimize enqueueing of refine_webrequest_hourly pipeline - https://phabricator.wikimedia.org/T419050#11678029 (10Gehel) Tagging #data-platform-sre for visibility [15:40:53] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Optimize enqueueing of refine_webrequest_hourly pipeline - https://phabricator.wikimedia.org/T419050#11678030 (10Gehel) [16:19:14] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [16:19:19] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag