[03:18:34] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [06:35:51] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Event-Platform, 13Patch-For-Review: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions - https://phabricator.wikimedia.org/T347282#10990516 (10gmodena) Re-opening. We saw a data leak on `mw-page-content-change... [07:18:34] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [07:19:10] https://phabricator.wikimedia.org/T399152 (Requesting access to analytics-privatedata-users for addshore) Not sure if anyone in particular wants to "sponser" me from the WMF side? :D [09:12:49] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), 07Essential-Work: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10991019 (10brouberol) a:03brouberol [09:12:52] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), 07Essential-Work: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10991021 (10brouberol) [09:12:55] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), 07Essential-Work: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10991022 (10brouberol) 05Open→03In progress [09:14:20] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10991025 (10ops-monitoring-bot) Start pool of db2161 gradually with 4 steps - Pooling in - fceratto@cumin1002 [09:14:21] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10991026 (10ops-monitoring-bot) Completed pool of db2161 gradually with 4 steps - Pooling in - fceratto@cumin1002 [09:15:14] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10991040 (10ops-monitoring-bot) Start pool of db2240 gradually with 4 steps - Pooling in - fceratto@cumin1002 [09:15:17] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10991041 (10ops-monitoring-bot) Completed pool of db2240 gradually with 4 steps - Pooling in - fceratto@cumin1002 [09:18:21] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), 07Essential-Work: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10991056 (10brouberol) Thanks for the report! We have identified an issue with the... [09:31:36] 10Data-Engineering (Q4 2025 April 1st - June 30th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10991068 (10Antoine_Quhen) Currently implementing the plan here: https://docs.google.com/document/d/1PXGlHfnZIwr54H4RVO... [10:35:24] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), 07Essential-Work: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10991276 (10brouberol) I have backfilled the non corrupted dumps to https://dumps.w... [11:18:34] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [11:23:36] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Wikidata, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), and 2 others: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10991414 (10Lydia_Pintscher) [13:42:55] (03CR) 10Mforns: [C:03+2] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120647 (https://phabricator.wikimedia.org/T300023) (owner: 10Xcollazo) [13:55:49] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Spike: Figure out a strategy to use Airflow's ExternalTaskMarker for our webrequest pipeline - https://phabricator.wikimedia.org/T399203 (10xcollazo) 03NEW [13:57:31] (03Merged) 10jenkins-bot: Delete dead code that used to generate wmf.wikidata_item_page_link. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1120647 (https://phabricator.wikimedia.org/T300023) (owner: 10Xcollazo) [14:29:27] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work, 13Patch-For-Review: Druid job of mediawiki_history_reduced overwhelms the cluster, using 85%+ of its capacity - https://phabricator.wikimedia.org/T399013#10992354 (10xcollazo) >>! In T399013#10992294, @gerritbot wrote: > Change #1167286 **... [15:18:34] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [15:53:56] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Druid job of mediawiki_history_reduced overwhelms the cluster, using 85%+ of its capacity - https://phabricator.wikimedia.org/T399013#10992686 (10xcollazo) Ran the following: ` $ hostname -f an-master1003.eqiad.wmnet $ sudo -u yarn bash $... [16:09:08] 06Data-Engineering, 10LDAP-Access-Requests, 06SRE: Grant Access to Product's Superset & Turnilo for SKivlehan - https://phabricator.wikimedia.org/T393626#10992745 (10SKivlehan-WMF) 05In progress→03Resolved I'm in! Marking as Resolved, thank you all for the assistance here. [16:38:30] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Druid job of mediawiki_history_reduced overwhelms the cluster, using 85%+ of its capacity - https://phabricator.wikimedia.org/T399013#10992866 (10xcollazo) [16:43:18] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Druid job of mediawiki_history_reduced overwhelms the cluster, using 85%+ of its capacity - https://phabricator.wikimedia.org/T399013#10992886 (10xcollazo) Copying the new queue policy here for completeness: > \# Specific user limit... [16:43:45] 10Data-Engineering (Q4 2025 April 1st - June 30th), 07Essential-Work: Druid job of mediawiki_history_reduced overwhelms the cluster, using 85%+ of its capacity - https://phabricator.wikimedia.org/T399013#10992888 (10xcollazo) a:03xcollazo [19:18:34] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [19:45:17] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Wikidata, 10Data-Platform-SRE (2025.07.05 - 2025.07.25), 07Essential-Work: wikidata-20250707-all.json.gz is corrupted - https://phabricator.wikimedia.org/T399077#10993385 (10brouberol) The backfill to `clouddumps1001` is done as well. [22:19:44] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add cl_timestamp_id index to categorylinks table - https://phabricator.wikimedia.org/T399249 (10Zabe) 03NEW [23:18:34] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem