[01:02:45] 06Data-Engineering, 10ChangeProp, 10Observability-Tracing, 13Patch-For-Review: Implement tracing across changeprop-jobqueue - https://phabricator.wikimedia.org/T395038#10851062 (10mszabo) I recommend https://gitlab.wikimedia.org/mszabo/the-worst-cloud-provider-you-have-ever-seen for local testing. [03:23:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [07:23:32] FIRING: [2x] AlertLintProblem: Linting problems found for HaproxyKafkaDeliveryErrors - https://wikitech.wikimedia.org/wiki/Alertmanager#Alert_linting_found_problems - TODO - https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem [08:05:19] 06Data-Engineering, 06Data-Persistence, 10MediaWiki-Core-Revision-backend, 07Schema-change: Rethink rev_sha1 field - https://phabricator.wikimedia.org/T389026#10851403 (10Marostegui) [08:07:13] (03CR) 10Aqu: [V:03+2 C:03+2] Add EvolveAndRefineToHiveTable job to refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1148317 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [08:20:54] (03Merged) 10jenkins-bot: Add EvolveAndRefineToHiveTable job to refine [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1148317 (https://phabricator.wikimedia.org/T369845) (owner: 10Aqu) [08:29:30] 06Data-Engineering, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): datahub.wikimedia.org is down - https://phabricator.wikimedia.org/T395057#10851451 (10BTullis) Datahub is back up and running now. It seems mainly related to resources, with many components being OOMKilled upon startup. We have increased the r... [08:39:11] (03PS1) 10Joal: Update pageview druid loading HQL code [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1149607 [08:49:08] 06Data-Engineering, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Fix the hard dependency between the Airflow scheduler and the DataHub GMS service - https://phabricator.wikimedia.org/T395106#10851514 (10BTullis) [08:51:01] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 07Schema-change-in-production: Drop afl_patrolled_by from abuse_filter_log in production - https://phabricator.wikimedia.org/T391056#10851515 (10FCeratto-WMF) Ok. The remaining part requires DC master flips AFAIK. I'm going to sync up with Amir. [08:58:21] 06Data-Engineering, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Fix the hard dependency between the Airflow scheduler and the DataHub GMS service - https://phabricator.wikimedia.org/T395106#10851527 (10Gehel) p:05Triage→03High [09:00:18] 07Analytics-Data-Problem, 06Discovery-Search, 06serviceops, 10Data-Platform-SRE (2025.05.02 - 2025.05.23): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10851544 (10Gehel) →14Duplicate dup:03T354853 [09:18:20] 10Data-Engineering (Q4 2025 April 1st - June 30th): Facilitate automatic artifact cache warming for airflow-dags artifacts - https://phabricator.wikimedia.org/T392244#10851595 (10Antoine_Quhen) I’d like to hold off on decommissioning an-launcher1002 for now. It still hosts the analytics and hdfs sudoable users t... [09:32:33] 10Data-Engineering (Q4 2025 April 1st - June 30th): Facilitate automatic artifact cache warming for airflow-dags artifacts - https://phabricator.wikimedia.org/T392244#10851643 (10JAllemandou) >>! In T392244#10851595, @Antoine_Quhen wrote: > I’d like to hold off on decommissioning an-launcher1002 for now. It stil... [10:05:31] Starting build #37 for job analytics-refinery-maven-release [10:25:57] Project analytics-refinery-maven-release build #37: 09SUCCESS in 20 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release/37/ [11:30:08] 06Data-Engineering: Unexpected hostname values are being detected by the pageview pipeline - https://phabricator.wikimedia.org/T395118 (10BTullis) 03NEW [11:35:05] 06Data-Engineering: Unexpected hostname values are being detected by the pageview pipeline - https://phabricator.wikimedia.org/T395118#10851916 (10JAllemandou) We should! Historically we have not been very restrictive on the URL for pageviews: it allows us to not to have to update the code regularly. If it's nee... [12:13:27] 07Analytics-Data-Problem, 06serviceops, 10Data-Platform-SRE (2025.05.02 - 2025.05.23), 10Discovery-Search (2025.05.02 - 2025.05.23): Search Update Pipeline requests to Action API are logged as coming from 127.0.0.1 - https://phabricator.wikimedia.org/T388855#10852059 (10Gehel) [13:01:15] 06Data-Engineering, 06Data-Engineering-Radar, 10CirrusSearch, 10Structured Data Engineering, and 3 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10852202 (10Gehel) [13:01:41] 06Data-Engineering, 06Java-Scala-Standardization, 10Discovery-Search (2025.05.24 - 2025.06.13): Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10852218 (10Gehel) [13:01:57] 06Data-Engineering, 06Data-Platform-SRE, 06Java-Scala-Standardization, 10Discovery-Search (2025.05.24 - 2025.06.13), 13Patch-For-Review: Migrate existing Java packages to deploying to Gitlab, including new version of parent pom, validation that all depen... - https://phabricator.wikimedia.org/T367405#10852220 [13:06:37] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Enable lock transaction management on the Hive Metastore - https://phabricator.wikimedia.org/T386854#10852264 (10Gehel) [13:08:45] 06Data-Engineering, 06Discovery-Search, 06Java-Scala-Standardization, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 07Epic: [Epic] Replace Archiva with Gitlab artifact repositories - https://phabricator.wikimedia.org/T367315#10852313 (10Gehel) [13:10:04] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Dumps-Generation, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): The 20250401 dumps haven't started on time because the mediawikiwiki dump from 20250320 is looping - https://phabricator.wikimedia.org/T390839#10852351 (10Gehel) [13:10:44] 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Fix the hard dependency between the Airflow scheduler and the DataHub GMS service - https://phabricator.wikimedia.org/T395106#10852367 (10Gehel) [13:10:50] 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): datahub.wikimedia.org is down - https://phabricator.wikimedia.org/T395057#10852369 (10Gehel) [13:11:46] 06Data-Engineering, 10Data-Services, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Create wiki replicas views for globaljsonlinks tables - https://phabricator.wikimedia.org/T387419#10852383 (10Gehel) [13:12:04] 06Data-Engineering, 06Data-Engineering-Radar, 06Infrastructure-Foundations, 06SRE, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Rebuild Spark images with Bookworm / bullseye-backports deprecation - https://phabricator.wikimedia.org/T390139#10852395 (10Gehel) [13:12:20] 06Data-Engineering-Radar, 10HaproxyKafka, 06Traffic, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), and 2 others: Replicate current low-message alerting from VarnishKafka - https://phabricator.wikimedia.org/T391810#10852393 (10Gehel) [13:13:19] 06Data-Engineering, 10Technical-blog-posts, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Write a blog post about the recent Airflow migration to Kubernetes - https://phabricator.wikimedia.org/T393603#10852417 (10Gehel) [13:13:29] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 13Patch-For-Review: Remove `analytics` instance folder in airflow repo - https://phabricator.wikimedia.org/T394015#10852414 (10Gehel) [13:14:01] 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): Airflow UI sometimes shows no response for a DAG run task with many mapped tasks - https://phabricator.wikimedia.org/T381479#10852427 (10Gehel) [13:14:25] 06Data-Engineering, 06Data-Engineering-Radar, 10CirrusSearch, 10Structured Data Engineering, and 3 others: Migrate image recommendation to use page_weighted_tags_changed stream - https://phabricator.wikimedia.org/T372912#10852434 (10Gehel) [13:14:55] 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 07Documentation: https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log should be on Wikitech - https://phabricator.wikimedia.org/T387878#10852450 (10Gehel) [13:15:11] 06Data-Engineering, 06Data-Engineering-Radar, 06Discovery-Search, 06Infrastructure-Foundations, and 2 others: Elasticsearch dependency upgrade in spicerack - https://phabricator.wikimedia.org/T390860#10852448 (10Gehel) [13:36:22] 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): datahub.wikimedia.org is down - https://phabricator.wikimedia.org/T395057#10852545 (10BTullis) 05Open→03Resolved I have created a follow-up task here: {T395126} to address the problems in the chart that prevent it from being deployed cl... [13:40:11] 06Data-Engineering, 10Data-Engineering-Jupyter, 06Data-Platform-SRE: Conda-Analytics has package conflict when trying to install R with key packages (R-Arrow and R-Stringi) - https://phabricator.wikimedia.org/T391911#10852559 (10BTullis) [13:46:37] 06Data-Engineering, 10ContentTranslation, 10Metrics Platform: Update WMF-deployed extensions to use mw.config checks instead of manual m-dot URL hacks - https://phabricator.wikimedia.org/T390923#10852584 (10Jdlrobson-WMF) Hey @Krinkle can we recommend something other than `mw.config.get( 'wgMFMode' )` going... [13:54:03] 06Data-Engineering, 10Data-Platform-SRE (2025.05.24 - 2025.06.13), 07Documentation: https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log should be on Wikitech - https://phabricator.wikimedia.org/T387878#10852601 (10BTullis) At the risk of getting into a long discussion about naming things, should we con... [14:37:11] 06Data-Engineering, 10ContentTranslation, 10Metrics Platform: Update WMF-deployed extensions to use mw.config checks instead of manual m-dot URL hacks - https://phabricator.wikimedia.org/T390923#10852709 (10Krinkle) > I understand there are concerns around forced style calculation in certain situations To w... [14:54:22] 06Data-Engineering: Unexpected hostname values are being detected by the pageview pipeline - https://phabricator.wikimedia.org/T395118#10852750 (10BTullis) >>! In T395118#10851916, @JAllemandou wrote: > We should! > Historically we have not been very restrictive on the URL for pageviews: it allows us to not to h... [15:03:24] 10Data-Engineering (Q4 2025 April 1st - June 30th): [OpsWeek] RefineSanitize fails to send emails - https://phabricator.wikimedia.org/T393202#10852777 (10xcollazo) 05In progress→03Resolved [16:27:05] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: MediaWiki Content History alerts too much for minor reconcile issues - https://phabricator.wikimedia.org/T395139 (10xcollazo) 03NEW [16:27:22] 10Data-Engineering (Q4 2025 April 1st - June 30th), 10DPE-Mediawiki-Content: MediaWiki Content History alerts too much for minor reconcile issues - https://phabricator.wikimedia.org/T395139#10853011 (10xcollazo) [16:27:23] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Dumps 2.0 Phase III: Production level dumps - https://phabricator.wikimedia.org/T366752#10853012 (10xcollazo) [16:30:37] 06Data-Engineering, 06Data-Engineering-Icebox, 06Data-Platform-SRE: LVS in Analytics VLANs - https://phabricator.wikimedia.org/T288750#10853022 (10BTullis) 05Open→03Declined I think we can decline this now. We have managed pretty well without LVS in the analytics vlans up until now, so I feel that we... [16:31:35] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Platform-SRE (2025.05.24 - 2025.06.13): LVS in Analytics VLANs - https://phabricator.wikimedia.org/T288750#10853025 (10BTullis) [17:08:02] 10Data-Engineering-Roadmap, 10DPE-Mediawiki-Content, 07Epic: Daily updated wmf_content.mediawiki_content_current_v1 - https://phabricator.wikimedia.org/T391279#10853211 (10xcollazo) 05In progress→03Resolved Copy pasting final Asana report: **Hypothesis**: “If we provide a daily updated table wmf_con... [18:05:26] 06Data-Engineering, 10ContentTranslation, 10Metrics Platform: Update WMF-deployed extensions to use mw.config checks instead of manual m-dot URL hacks - https://phabricator.wikimedia.org/T390923#10853421 (10Jdlrobson-WMF) Hey @krinkle > To write JavaScript at scale, you generally want to treat the DOM as an... [18:22:57] 06Data-Engineering, 10ContentTranslation, 10Metrics Platform: Update WMF-deployed extensions to use mw.config checks instead of manual m-dot URL hacks - https://phabricator.wikimedia.org/T390923#10853452 (10Krinkle) >>! In T390923#10853421, @Jdlrobson-WMF wrote: > I think it's already a huge anti-pattern tha... [20:00:22] 06Data-Engineering, 10FR-Tech-Analytics, 10FR-tech-data-integrity, 10fundraising-tech-ops: Low volume in new webrequest feed - https://phabricator.wikimedia.org/T395089#10853694 (10Dwisehaupt) a:03Dwisehaupt We had a discussion in slack regarding this: https://wikimedia.slack.com/archives/C055QGPTC69/p17... [20:06:34] 06Data-Engineering, 10FR-Tech-Analytics, 10FR-tech-data-integrity, 10fundraising-tech-ops: Low volume in new webrequest feed - https://phabricator.wikimedia.org/T395089#10853705 (10Dwisehaupt) For reference, the service restarts with the final good config were: * frban1002 - 2025-05-23 19:18:33 UTC * frban... [23:07:38] 06Data-Engineering, 10FR-Tech-Analytics, 10FR-tech-data-integrity, 10fundraising-tech-ops: Low volume in new webrequest feed - https://phabricator.wikimedia.org/T395089#10854054 (10Dwisehaupt) Started collecting the backfill data on frban1001 using the following in a screen session: ` sudo kcat -C -X secur...