[01:06:43] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Visualizing inconsistencies and reconciles via Superset - https://phabricator.wikimedia.org/T420787#11756622 (10xcollazo) ## T420787 — MWCH Data Quality Dashboard: Summary ### What we built [[ https://superset.wikimedia.org/superset/dashboard/757 | A Su... [04:22:48] FIRING: EventgateLatency: Elevated latency for POST events on eventgate-analytics in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-analytics - https://alerts.wikimedia.org/?q=alertname%3DEventgateLatency [04:27:48] RESOLVED: EventgateLatency: Elevated latency for POST events on eventgate-analytics in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-analytics - https://alerts.wikimedia.org/?q=alertname%3DEventgateLatency [05:13:18] 06Data-Engineering, 10CheckUser-SuggestedInvestigations, 06DBA, 06Product Safety and Integrity, 07Schema-change-in-production: Drop cusi_case, cusi_signal, and cusi_user tables from wikis where they are unused - https://phabricator.wikimedia.org/T421353#11756816 (10Marostegui) p:05Triage→03Medium The... [05:47:04] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 10GlobalBlocking, and 2 others: Drop global_block_whitelist from closed wikis - https://phabricator.wikimedia.org/T420525#11756859 (10Marostegui) Table not written for a while on the master: ` aawiki -rw-rw---- 1 mysql mysql 80K Dec 5 15:09 /srv/sqldata... [06:31:01] 06Data-Engineering, 10CheckUser-SuggestedInvestigations, 06DBA, 06Product Safety and Integrity, 07Schema-change-in-production: Drop cusi_case, cusi_signal, and cusi_user tables from wikis where they are unused - https://phabricator.wikimedia.org/T421353#11756911 (10Marostegui) a:03Marostegui [07:19:36] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Visualizing inconsistencies and reconciles via Superset - https://phabricator.wikimedia.org/T420787#11756947 (10JAllemandou) This is awesome work, it will really help building trust in the dataset. Kudos @xcollazo and @APizzata-WMF :) [08:38:08] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-06 - 2026-03-27), 10Event-Platform, 13Patch-For-Review: Increase Max Message Size in Kafka Jumbo to 20MB - https://phabricator.wikimedia.org/T420356#11757051 (10Gehel) [09:22:48] FIRING: EventgateLatency: Elevated latency for POST events on eventgate-analytics-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-analytics-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLatency [09:27:48] RESOLVED: EventgateLatency: Elevated latency for POST events on eventgate-analytics-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-analytics-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLatency [09:49:53] 06Data-Engineering, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: ERROR AsyncEventQueue: Listener DatahubSparkListener threw an exception - https://phabricator.wikimedia.org/T400207#11757306 (10Gehel) [09:50:52] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Create alert on Airflow scheduler slow down - https://phabricator.wikimedia.org/T411405#11757330 (10Gehel) [09:50:58] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Superset "track job" button leads to broken URL - https://phabricator.wikimedia.org/T410149#11757334 (10Gehel) [09:51:07] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Wikimedia Enterprise, 10Wikimedia Enterprise - Content Integrity, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Implement an Airflow operator for moving data from point A to B - https://phabricator.wikimedia.org/T405360#11757344... [09:51:10] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Move the dumps_v1 DAGs from the Airflow test_k8s instance to the main instance - https://phabricator.wikimedia.org/T404084#11757348 (10Gehel) [09:51:32] 06Data-Engineering, 10BetaFeatures, 06cloud-services-team, 10Data-Services, and 2 others: Create view for betafeatures_user_counts table in wiki replicas - https://phabricator.wikimedia.org/T402145#11757352 (10Gehel) [09:52:08] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10Data Pipelines, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Airflow dynamic task mapping logs mix up when, on rerun, an id is mapped to a different map_index_template - https://phabricator.wikimedia.org/T408802#11757372 (10Gehe... [09:52:45] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Investigate Gobblin failures - https://phabricator.wikimedia.org/T419436#11757383 (10Gehel) [09:53:15] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Task Tries and Logs for Airflow DAGs sometimes unavailable - https://phabricator.wikimedia.org/T419162#11757389 (10Gehel) [09:54:02] 06Data-Engineering, 10Technical-blog-posts, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603#11757402 (10Gehel) [09:54:30] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work, 13Patch-For-Review: Provide an access to MaxMind GeoIP in DSE K8S pods - https://phabricator.wikimedia.org/T405509#11757408 (10Gehel) [09:54:44] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Investigate Gobblin failures - https://phabricator.wikimedia.org/T419436#11757412 (10JAllemandou) [09:54:59] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Analyze SQL queries generating metrics - https://phabricator.wikimedia.org/T420434#11757422 (10Gehel) [09:55:33] 06Data-Engineering, 06Test Kitchen, 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Airflow instance for Experiment Platform - https://phabricator.wikimedia.org/T416709#11757430 (10Gehel) [09:55:39] 06Data-Engineering, 10DPE-Mediawiki-Content, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: When wikis cannot be exported due to SiteInfo, don't fail them - https://phabricator.wikimedia.org/T408819#11757434 (10Gehel) [09:55:49] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Do performance testing of a big Hadoop Table hosted by Ceph - https://phabricator.wikimedia.org/T381416#11757436 (10Gehel) [09:56:31] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Deploy turnilo to dse-k8s-eqiad - https://phabricator.wikimedia.org/T416113#11757448 (10Gehel) [09:56:53] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work, 13Patch-For-Review: Carry out end-user testing of spark on kubernetes - https://phabricator.wikimedia.org/T412925#11757452 (10Gehel) [09:57:26] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 13Patch-For-Review: Requesting Kerberos access for SCardenas (WMF) - https://phabricator.wikimedia.org/T418664#11757465 (10Gehel) [09:57:32] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Optimize enqueueing of refine_webrequest_hourly pipeline - https://phabricator.wikimedia.org/T419050#11757469 (10Gehel) [09:57:58] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17), 07Essential-Work: Blunderbuss: Move Hadoop/HDFS XML configuration into Helm deployment chart - https://phabricator.wikimedia.org/T402323#11757481 (10Gehel) [09:58:04] 06Data-Engineering, 06cloud-services-team, 06Data-Persistence, 10Data-Services, and 3 others: Set up x1 replication to Wiki Replicas - https://phabricator.wikimedia.org/T395881#11757479 (10Gehel) [09:58:14] 06Data-Engineering, 06Data-Engineering-Radar, 06cloud-services-team, 06Data-Persistence, and 3 others: Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#11757483 (10Gehel) [09:58:24] 06Data-Engineering, 06Data-Engineering-Radar, 06Privacy Engineering, 06Security-Team, and 2 others: Privacy review of x1 tables in preparation of adding them to wikireplicas - https://phabricator.wikimedia.org/T415219#11757487 (10Gehel) [10:19:14] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11757591 (10TheDJ) where exactly are these file heuristics btw ? I noticed that the docs list - midi twice (I think... [10:25:42] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 06Data-Platform-SRE (2026-03-27 - 2026-04-17): Investigate Gobblin failures - https://phabricator.wikimedia.org/T419436#11757616 (10JAllemandou) We have experienced failures in the past few days days (March 24, 25, 26, 27). Here's a summary of the detai... [13:18:23] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Persistence, 06DBA, and 4 others: ICU 72 upgrade: `categorylinks` table swap - https://phabricator.wikimedia.org/T419980#11758203 (10Raine) [13:44:38] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Could not hoist data into experiment.subject_id for event - https://phabricator.wikimedia.org/T421152#11758347 (10phuedx) I've filtered out validation errors of this type from [the Eventgate v... [13:45:22] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Could not hoist data into experiment.subject_id for event - https://phabricator.wikimedia.org/T421152#11758361 (10phuedx) >>! In T421152#11753017, @Manvikesarwani09 wrote: > I have uploaded a... [13:46:59] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Could not hoist data into experiment.subject_id for event - https://phabricator.wikimedia.org/T421152#11758371 (10phuedx) We're seeing a large number of event validation errors for experiment-... [13:56:18] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Could not hoist data into experiment.subject_id for event - https://phabricator.wikimedia.org/T421152#11758393 (10Michael) Thank you for investigating this, @phuedx! I'm this to our tracking c... [14:19:29] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11758501 (10Snwachukwu) > midi twice (I think one of them should be .mid) > tiff twice (one should be .tif) @TheDJ In... [14:30:02] !log Test Kitchen edge-unique experiments (poll 33866) - adds: none; removes: synth-aa-test-traffic-impact-2, synth-aa-test-traffic-impact-1, synth-aa-test-traffic-impact-3; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD [14:30:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:41:13] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 10AQS2.0: Consider updating our heuristics for media type classification in AQS / wikistats - https://phabricator.wikimedia.org/T419882#11758550 (10Snwachukwu) > not listing opus > not listing .mpeg and .mpg I did a check on other media class to see wh... [14:43:33] 06Data-Engineering, 10DPE-Mediawiki-Content, 07Essential-Work: When wikis cannot be exported due to SiteInfo, don't fail them - https://phabricator.wikimedia.org/T408819#11758555 (10xcollazo) [14:44:43] 06Data-Engineering, 10DPE-Mediawiki-Content, 07Essential-Work: When wikis cannot be exported due to SiteInfo, don't fail them - https://phabricator.wikimedia.org/T408819#11758559 (10xcollazo) Noting that this has only happened once so far. [15:13:33] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th), 13Patch-For-Review: Optimize metrics computation for the MW Content Pipeline - https://phabricator.wikimedia.org/T401010#11758656 (10xcollazo) I am now of the opinion that we should just sunset these metrics: * On {T420787}, we were able to leverage th... [15:57:53] 06Data-Engineering, 06Growth-Team: Investigate empty Constructive edit rate of newer editors (mobile web) - https://phabricator.wikimedia.org/T421514 (10Sgs) 03NEW [15:58:39] 06Data-Engineering, 06Growth-Team: Investigate empty Constructive edit rate of newer editors (mobile web) - https://phabricator.wikimedia.org/T421514#11758823 (10Sgs) [15:58:46] 06Data-Engineering, 06Growth-Team: Investigate empty Constructive edit rate of newer editors (mobile web) - https://phabricator.wikimedia.org/T421514#11758826 (10Sgs) p:05Triage→03Medium [16:22:46] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Traffic referrer analysis - https://phabricator.wikimedia.org/T421516 (10Ahoelzl) 03NEW [16:26:30] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 10Prod-Kubernetes, and 2 others: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259#11758922 (10brouberol) a:03brouberol [16:26:33] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 10Prod-Kubernetes, and 2 others: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259#11758924 (10brouberol) [16:26:35] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 10Prod-Kubernetes, and 2 others: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259#11758927 (10brouberol) 05Open→03In progress [16:36:06] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 10Prod-Kubernetes, and 3 others: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259#11758969 (10brouberol) a:05brouberol→03BTullis [16:36:13] 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, 10Prod-Kubernetes, and 3 others: mediawiki-dumps-legacy is running without security policy on dse-k8s-eqiad - https://phabricator.wikimedia.org/T419259#11758981 (10brouberol) [17:15:39] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Could not hoist data into experiment.subject_id for event - https://phabricator.wikimedia.org/T421152#11759097 (10phuedx) [17:15:46] 06Data-Engineering, 06Data-Engineering-Radar, 06Growth-Team, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Could not hoist data into experiment.subject_id for event - https://phabricator.wikimedia.org/T421152#11759099 (10phuedx) a:03phuedx [20:23:48] FIRING: EventgateLatency: Elevated latency for POST events on eventgate-analytics-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-analytics-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLatency [20:25:26] Well well well if that isn’t our good friend fsgroupChangePolicy https://blog.cloudflare.com/one-line-kubernetes-fix-saved-600-hours-a-year/ [20:28:48] RESOLVED: EventgateLatency: Elevated latency for POST events on eventgate-analytics-external in codfw. - https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate - https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?viewPanel=79&orgId=1&var-service=eventgate-analytics-external - https://alerts.wikimedia.org/?q=alertname%3DEventgateLatency [20:48:46] 06Data-Engineering, 06Data-Engineering-Radar, 10MediaWiki-extensions-EventLogging, 07Essential-Work, 06Test Kitchen (Experiment Platform Sprint 21): Migrate "WikiLambda API" instrument to use the Test Kitchen SDK - https://phabricator.wikimedia.org/T415254#11759831 (10Sfaci) Posting here some technical d... [23:33:18] 06Data-Engineering (Q3 FY25/26 January 1st - March 31th): Visualizing inconsistencies and reconciles via Superset - https://phabricator.wikimedia.org/T420787#11760360 (10Ahoelzl) Great work. A few suggestions: - **Top completeness and loss rate numbers** - it should be highlighted that this is across all wikis...