[03:48:54] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [03:48:54] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [05:51:49] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11666512 (10Marostegui) [05:51:55] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Update imagelinks primary key on wmf production - https://phabricator.wikimedia.org/T415786#11666513 (10Marostegui) 05Open→03Resolved Finally all done! [05:58:51] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Drop cuc_agent & cuc_ip from cu_changes, cule_agent & cule_ip from cu_log_event, and cupe_agent & cupe_ip from cu_private_event on WMF wikis - https://phabricator.wikimedia.org/T418465#11666524 (10Marostegui) [07:00:06] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a data product of IP range to owner/provenance label - https://phabricator.wikimedia.org/T418466#11666555 (10KCVelaga_WMF) @JAllemandou @GGoncalves-WMF I put together a initial list on [[ https://docs.google.com/spreadsheets/d/159uXRqEsRsVmFG9jP8a-... [07:39:58] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-private-users for maxbinderWMF - https://phabricator.wikimedia.org/T417655#11666615 (10MoritzMuehlenhoff) 05Open→03Resolved Sounds good. The maxbinderwmf account is now disabled, resolving the task [07:49:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [07:49:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [08:10:44] 06Data-Engineering, 06Infrastructure-Foundations, 06Traffic, 13Patch-For-Review: Export development_network_probe data to Puppet servers for CDN deployment - https://phabricator.wikimedia.org/T402512#11666664 (10elukey) I am finally able to query a week worth of IPs from webrequest and dump a txt file on t... [08:32:55] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Reduce noise from HdfsRpcQueueLength alert - https://phabricator.wikimedia.org/T418152#11666735 (10Gehel) [08:34:24] 06Data-Engineering, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Transfer ownership of Watchlist CTR dashboard to Mikhail - https://phabricator.wikimedia.org/T418485#11666739 (10Gehel) [08:38:59] 06Data-Engineering, 06cloud-services-team, 10Data-Services, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Drop support for cl_to, cl_collation and il_to from wikireplicas - https://phabricator.wikimedia.org/T417492#11666764 (10Gehel) [08:39:31] 06Data-Engineering, 06Data-Engineering-Radar, 10FR-Tech-Analytics, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Create FR Tech Airflow instance - https://phabricator.wikimedia.org/T417213#11666766 (10Gehel) [08:45:10] 06Data-Engineering, 06Data-Engineering-Radar, 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting access to analytics-platform-eng-admins for milimetric - https://phabricator.wikimedia.org/T417906#11666786 (10Gehel) [09:08:57] an-worker1168 has Puppet disabled since Feb 3 and the server has consequently been evicted from Puppetdb, please fix this [09:18:47] btullis: is this a host with a broken RAID array? [09:31:18] 06Data-Engineering, 06Data-Engineering-Radar, 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting access to analytics-platform-eng-admins for milimetric - https://phabricator.wikimedia.org/T417906#11666968 (10Jelto) [09:58:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Research, 10Event-Platform, 13Patch-For-Review: Event stream with latest revision HTML & parent revision HTML diff - https://phabricator.wikimedia.org/T360794#11667078 (10JMonton-WMF) I had concerns about the `delete` as we are not doing anything... [10:08:41] FIRING: MediawikiPageContentChangeEnrichAvailability: ... [10:08:47] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [10:13:41] RESOLVED: MediawikiPageContentChangeEnrichAvailability: ... [10:13:41] Low percentage of enriched events produced by mw_page_content_change_enrich in codfw - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=codfw%20prometheus/k8s&var-namespace=mw-page-content-change-enrich&var-helm_release=main&var-operator_name=All&var-flink_job_name=mw_page_content_change_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiPageContentChangeEnrichAvailability [10:29:36] 06Data-Engineering, 06Data-Engineering-Radar, 06SRE, 10SRE-Access-Requests, and 2 others: Requesting access to analytics-platform-eng-admins for milimetric - https://phabricator.wikimedia.org/T417906#11667247 (10Jelto) 05Open→03Resolved a:03Jelto The access should be available in roughly 30 minut... [11:42:08] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services, and 3 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11667548 (10BTullis) I've added the puppet definition of the x1 section to `an-redacteddb1001` now. However, the service didn'... [11:49:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [11:49:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [12:05:42] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE, and 3 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11667642 (10BTullis) I have also fixed up the Icinga checks by manually creating the grants required to carry out the chec... [12:06:20] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE, and 3 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11667658 (10BTullis) 05Open→03Stalled a:05BTullis→03None [12:29:32] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11667797 (10GGoncalves-WMF) [12:30:44] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11667803 (10GGoncalves-WMF) [12:42:48] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE, and 3 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11667849 (10Marostegui) >>! In T407485#11667548, @BTullis wrote: > I've added the puppet definition of the x1 section to `... [12:43:31] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Reduce noise from HdfsRpcQueueLength alert - https://phabricator.wikimedia.org/T418152#11667859 (10JAllemandou) a:03JAllemandou [14:06:31] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Refactor our existing Airflow dags to use EasyDAG & DagProperties - https://phabricator.wikimedia.org/T336738#11668129 (10xcollazo) 05Open→03In progress a:03xcollazo [14:16:35] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE, and 3 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11668200 (10BTullis) >>! In T407485#11667849, @Marostegui wrote: >>>! In T407485#11667548, @BTullis wrote: >> This will be... [14:22:45] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 06Data-Platform-SRE, and 3 others: Set up x1 replication to an-redacteddb1001 - https://phabricator.wikimedia.org/T407485#11668231 (10Marostegui) No, I really don't have strong feelings about it. As this is host is owned by your team, I am happ... [15:07:37] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Movement-Insights: Investigate and repair pageviews and unique devices spike starting in Nov 2025 - https://phabricator.wikimedia.org/T416933#11668506 (10OSefu-WMF) [15:17:47] 06Data-Engineering, 06Data-Engineering-Radar, 06Test Kitchen, 10Wikidata, 10Wikidata Analytics: Add rcshowwikidata property to the existing PrefUpdate instrumentation for wmf_raw.mediawiki_user_properties - https://phabricator.wikimedia.org/T418246#11668531 (10JVanderhoop-WMF) Thanks to @phuedx for some... [15:18:26] 06Data-Engineering, 06collaboration-services, 10Dumps-Generation, 10Phabricator, and 4 others: Should we remove the Phabricator dump? - https://phabricator.wikimedia.org/T417824#11668534 (10A_smart_kitten) [15:24:17] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06collaboration-services, 10Dumps-Generation, 10Phabricator, and 4 others: Should we remove the Phabricator dump? - https://phabricator.wikimedia.org/T417824#11668591 (10xcollazo) [15:40:50] 06Data-Engineering, 06Privacy Engineering, 06Security-Team, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 07SecTeam-Processed: Privacy review of x1 tables in preparation of adding them to wikireplicas - https://phabricator.wikimedia.org/T415219#11668718 (10Gehel) [15:44:22] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): [OpsWeek] Testing on airflow-devenvs can generate false alerts such as SLO misses - https://phabricator.wikimedia.org/T416596#11668744 (10Gehel) [15:49:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [15:49:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [16:29:44] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): haproxykafka and varnishkafka sent different uri_paths - https://phabricator.wikimedia.org/T418767#11669032 (10Milimetric) link drop for later research: [[ https://docs.google.com/document/d/1cCSGzLUfVWUHjqG5v5VdLADsbzmMklczQ1YG7oghGl8/edit?tab=t.0#headi... [17:12:02] 06Data-Engineering, 06MW-Interfaces-Team, 10Event-Platform: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#11669245 (10AGhirelli-WMF) Hey, Looking at this, I think adding a dedicated `x-mw-render-id` header is the right cal... [17:15:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Event-Platform: Linter for mediawiki-event-enrichment - https://phabricator.wikimedia.org/T418889 (10JMonton-WMF) 03NEW [17:39:03] 06Data-Engineering, 10Data-Engineering-Wikistats, 10Pageviews-Anomaly: Sudden traffic increase on 1 November 2025 - https://phabricator.wikimedia.org/T412655#11669380 (10GGoncalves-WMF) Hi, we're still investigating this in T416933 and haven't made changes to the underlying data, so the trend you're seeing f... [19:21:32] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 13Patch-For-Review: Deploy turnilo to dse-k8s-eqiad - https://phabricator.wikimedia.org/T416113#11670047 (10JAllemandou) [19:22:19] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 13Patch-For-Review: Reduce noise from HdfsRpcQueueLength alert - https://phabricator.wikimedia.org/T418152#11670049 (10JAllemandou) [19:22:56] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-02-13 - 2026-03-06), 13Patch-For-Review: HdfsTotalFilesHeap warning - https://phabricator.wikimedia.org/T418551#11670054 (10JAllemandou) [19:36:24] 06Data-Engineering, 06Data-Engineering-Radar, 06SRE, 10SRE-Access-Requests, 06Data-Platform-SRE (2026-02-13 - 2026-03-06): Requesting access to analytics-platform-eng-admins for milimetric - https://phabricator.wikimedia.org/T417906#11670175 (10JerryWang-WMF) Approved. Thanks [19:43:00] 06Data-Engineering: refine_webrequest_hourly_text.refine_webrequest probably needs more memory - https://phabricator.wikimedia.org/T418552#11670225 (10dr0ptp4kt) We're getting more warnings on this DAG and it's causing downstreams to get in trouble. This was precipitated by a Gobblin ingestion failure clogging t... [19:47:15] 06Data-Engineering: refine_webrequest_hourly_text.refine_webrequest probably needs more memory - https://phabricator.wikimedia.org/T418552#11670232 (10dr0ptp4kt) [19:49:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [19:49:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [20:05:41] 06Data-Engineering, 06MW-Interfaces-Team, 10Event-Platform: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#11670307 (10Ottomata) > Any preference on the final header name? No preference from DPE side, just as long as the na... [20:06:30] 06Data-Engineering, 06Content-Transform-Team, 06MW-Interfaces-Team, 10Event-Platform: Expose MediaWiki Parser render_id as a response header in relevant MW REST API endpoints - https://phabricator.wikimedia.org/T418792#11670312 (10Ottomata) Adding #content-transform-team in case they have any thoughts. [20:23:25] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 07Epic, 05MW-1.46-notes (1.46.0-wmf.18; 2026-03-03): Roll instrument out to 100% of enwiki - https://phabricator.wikimedia.org/T418385#11670459 (10Milimetric) This has been s[[ https://test-kitchen.wikimedia.org/instrument/bot-detection-2026-03 | ched... [21:02:04] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Create a data product of IP range to owner/provenance label - https://phabricator.wikimedia.org/T418466#11670646 (10JAllemandou) a:05JAllemandou→03None [21:28:24] 06Data-Engineering: refine_webrequest_hourly_text.refine_webrequest probably needs more memory - https://phabricator.wikimedia.org/T418552#11670796 (10dr0ptp4kt) Bumping the memory to 20GB didn't work: ` yarn logs -appOwner analytics -applicationId application_1764064841637_2026494 ... Container killed by YARN... [21:39:27] 06Data-Engineering, 13Patch-For-Review: refine_webrequest_hourly_text.refine_webrequest probably needs more memory, executors - https://phabricator.wikimedia.org/T418552#11670827 (10dr0ptp4kt) [23:49:13] FIRING: [2x] MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [23:49:13] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [23:51:19] 06Data-Engineering, 13Patch-For-Review: refine_webrequest_hourly_text.refine_webrequest probably needs more memory, executors - https://phabricator.wikimedia.org/T418552#11671459 (10dr0ptp4kt) 24 GB memory, 128 executors did the trick. Leaving that in place for the moment. A number of downstreams are recoverin...