[00:16:19] 06Data-Engineering, 06Research, 10Event-Platform: Implement stream of HTML content on mw.page_change event - https://phabricator.wikimedia.org/T360794#10490925 (10leila) Thanks for following up, Virginia. Can you add the specific questions you want us to think about on our end somewhere in this task? thanks.... [05:05:36] FIRING: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [05:05:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [05:30:36] RESOLVED: MediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag: ... [05:30:36] High Kafka consumer lag for mw_content_history_reconcile_enrich in eqiad - TODO - https://grafana.wikimedia.org/d/K9x0c4aVk/flink-app?orgId=1&var-datasource=eqiad%20prometheus/k8s-dse&var-namespace=mw-content-history-reconcile-enrich&var-helm_release=production&var-operator_name=All&var-flink_job_name=mw_content_history_reconcile_enrich - https://alerts.wikimedia.org/?q=alertname%3DMediawikiContentHistoryReconcileEnrichHighKafkaConsumerLag [08:38:15] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592#10491383 (10Marostegui) [08:39:01] 06Data-Engineering, 06DBA, 07Schema-change-in-production: Add normalization columns to categorylinks table - https://phabricator.wikimedia.org/T384592#10491384 (10Marostegui) [11:01:47] 06Data-Engineering, 06Data-Platform-SRE, 10Dumps-Generation, 05MW-1.39-notes, and 3 others: WE 5.4 KR - Hypothesis 5.4.6 - Q3 FY24/25 - Validate Dumps 1.0 compatibility with PHP 8.1 - https://phabricator.wikimedia.org/T382484#10491695 (10BTullis) ` 2025-01-23 17:10:40: enwiki SUCCESS: done. Dump of wiki en... [11:26:51] 06Data-Engineering, 06Data-Platform-SRE, 10Dumps-Generation, 05MW-1.39-notes, and 3 others: WE 5.4 KR - Hypothesis 5.4.6 - Q3 FY24/25 - Validate Dumps 1.0 compatibility with PHP 8.1 - https://phabricator.wikimedia.org/T382484#10491763 (10BTullis) >>! In T382484#10491695, @BTullis wrote: > It's obviously sm... [12:24:52] 06Data-Engineering: Agree on Plan for Anonymous Reader Analytics - https://phabricator.wikimedia.org/T324554#10492001 (10Aklapper) Adding #Data-Engineering to this lingering open task without project tags, so the task shows up on a workboard and can be found. Feel free to correct the project tag. [12:24:53] 06Data-Engineering: Anonymous Reader Analytics - https://phabricator.wikimedia.org/T322629#10492003 (10Aklapper) Adding #Data-Engineering to this lingering open task without project tags, so the task shows up on a workboard and can be found. Feel free to correct the project tag. [13:14:54] 06Data-Engineering, 06Data-Engineering-Radar, 10Wikidata, 03Discovery-Search (Current work), 10Event-Platform: Configure https://stream.wikimedia.org to expose rdf-streaming-updater.mutation - https://phabricator.wikimedia.org/T374921#10492095 (10Gehel) 05Open→03Resolved [13:16:54] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Research, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 03Discovery-Search (Current work): Low available space on Hadoop / HDFS - https://phabricator.wikimedia.org/T381707#10492099 (10Gehel) Closing this task as we've been able to reduce usage suffi... [13:23:59] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Research, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 03Discovery-Search (Current work): Low available space on Hadoop / HDFS - https://phabricator.wikimedia.org/T381707#10492154 (10Gehel) 05Open→03Resolved a:03Gehel [13:42:35] 06Data-Engineering, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 07Essential-Work: Update canary_events DAG to use an internal domain and/or the service mesh to obtain its eventstream config - https://phabricator.wikimedia.org/T384329#10492217 (10Gehel) [14:18:18] 06Data-Engineering, 10Dumps-Generation, 06MediaWiki-Platform-Team, 06serviceops: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10492315 (10jijiki) 05Stalled→03In progress [14:33:33] 06Data-Engineering, 10Data-Engineering-Wikistats, 07Essential-Work: Migrate from Semantic UI to Codex - https://phabricator.wikimedia.org/T384047#10492354 (10Milimetric) [14:33:54] 06Data-Engineering, 10Data-Engineering-Wikistats, 07Essential-Work: Migrate from Vue 2 to Vue 3 - https://phabricator.wikimedia.org/T384046#10492355 (10Milimetric) [14:34:11] 06Data-Engineering, 10Data-Engineering-Wikistats, 07Essential-Work: Update tests - https://phabricator.wikimedia.org/T384048#10492357 (10Milimetric) [14:34:32] 06Data-Engineering, 10Data-Engineering-Wikistats, 07Essential-Work: Update routing - https://phabricator.wikimedia.org/T384049#10492358 (10Milimetric) [14:34:38] 06Data-Engineering, 10Data-Engineering-Wikistats, 07Essential-Work: Update state management - https://phabricator.wikimedia.org/T384050#10492359 (10Milimetric) [14:34:54] 06Data-Engineering, 10Data-Engineering-Wikistats, 07Essential-Work: Epic - update Wikistats dependencies - https://phabricator.wikimedia.org/T384042#10492360 (10Milimetric) [14:38:11] 06Data-Engineering, 06Data-Engineering-Icebox, 10Data-Engineering-Wikistats, 10PageViewInfo, and 3 others: Pageviews Analysis 3.0 (Vue + Codex) - https://phabricator.wikimedia.org/T378549#10492388 (10Milimetric) (fyi: I'm starting Wikistats maintenance work now, at a reduced 10% time Fridays kind of pace) [15:14:16] 06Data-Engineering, 06Data-Platform-SRE, 10Dumps-Generation, 05MW-1.39-notes, and 3 others: WE 5.4 KR - Hypothesis 5.4.6 - Q3 FY24/25 - Validate Dumps 1.0 compatibility with PHP 8.1 - https://phabricator.wikimedia.org/T382484#10492468 (10xcollazo) >>! In T382484#10491763, @BTullis wrote: >>>! In T382484#10... [15:17:44] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10492483 (10xcollazo) (I just fixed an issue... [15:58:06] 06Data-Engineering, 10Dumps-Generation, 06MediaWiki-Platform-Team, 06serviceops: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10492650 (10Scott_French) [16:35:53] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Discovery-Search, 10Dumps 2.0, 10Data-Platform-SRE (2025.01.11 - 2025.01.31), 13Patch-For-Review: Add relevant kafka clusters to defined airflow connections in puppet - https://phabricator.wikimedia.org/T379676#10492852 (10xcollazo) [16:39:44] 06Data-Engineering: Implement full parity between HiveSensor and RESTExternalTaskSensor - https://phabricator.wikimedia.org/T384726 (10amastilovic) 03NEW [16:54:40] 06Data-Engineering: Implement full parity between HiveSensor and RESTExternalTaskSensor - https://phabricator.wikimedia.org/T384726#10493007 (10xcollazo) [17:12:24] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10493064 (10Ahoelzl) [17:13:17] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs data engineering and platform users - https://phabricator.wikimedia.org/T384100#10493068 (10Ahoelzl) p:05Triage→03High [17:13:33] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10493069 (10Ahoelzl) p:05Triage→03High [17:13:39] 10Data-Engineering (Q3 2024 January 1st - March 31th): Handle Late-Arrived Events from Gobblin into Airflow triggered Refine - https://phabricator.wikimedia.org/T370665#10493070 (10Ahoelzl) p:05Triage→03Medium [17:13:55] 10Data-Engineering (Q3 2024 January 1st - March 31th), 13Patch-For-Review: [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10493071 (10Ahoelzl) p:05Triage→03Medium [17:14:04] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Optimize XML Dump code to be able to handle wikis from simplewiki to enwiki - https://phabricator.wikimedia.org/T381016#10493073 (10Ahoelzl) p:05Triage→03Medium [17:14:14] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Product-Analytics, 13Patch-For-Review: [SPIKE] Experiment with approaches for a incremental updates of MediaWiki data in the Data Lake - https://phabricator.wikimedia.org/T370354#10493074 (10Ahoelzl) p:05Triage→03Low [17:15:47] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs data engineering and platform users - https://phabricator.wikimedia.org/T384100#10493077 (10Ahoelzl) [17:17:14] 10Data-Engineering (Q3 2024 January 1st - March 31th): HDFS capacity needs data engineering and platform users - https://phabricator.wikimedia.org/T384100#10493078 (10Ahoelzl) @JAllemandou besides webrequests, are there any other data sets that need extended retention? Also, are there scenarios, e.g. backfill, w... [17:22:31] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10493108 (10xcollazo) The task is failing wit... [17:23:17] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10493110 (10xcollazo) Now runninf at https://... [17:31:39] 06Data-Engineering, 10MediaWiki-extensions-WikimediaEvents, 13Patch-For-Review: Owners phpunit test does not work with subfolders - https://phabricator.wikimedia.org/T352472#10493124 (10phuedx) The above patch is not perfect by any stretch of the imagination but it gets us to where we want to be. LMK what yo... [17:56:19] 06Data-Engineering, 06Data-Platform-SRE, 10Dumps-Generation, 10MW-on-K8s, and 4 others: WE 5.4 KR - Hypothesis 5.4.4 - Q3 FY24/25 - Migrate current-generation dumps to run on kubernetes - https://phabricator.wikimedia.org/T352650#10493191 (10BTullis) I have had some more thoughts about how to get this to w... [18:02:37] !log [data lake temp accounts] re-ran druid loading task in mediawiki_history_reduced for 2024-12 [18:02:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:13:26] 06Data-Engineering, 06Data-Platform-SRE, 10Dumps-Generation, 05MW-1.39-notes, and 3 others: WE 5.4 KR - Hypothesis 5.4.6 - Q3 FY24/25 - Validate Dumps 1.0 compatibility with PHP 8.1 - https://phabricator.wikimedia.org/T382484#10493282 (10BTullis) >>! In T382484#10492468, @xcollazo wrote: >>>! In T382484#10... [18:30:09] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10function-evaluator, 10Wikifunctions, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: Function Evaluator log data loss due to ECS nonconforming fields - https://phabricator.wikimedia.org/T383448#10493299 (10Ahoelzl) [18:30:19] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10function-evaluator, 10Wikifunctions, 10Abstract Wikipedia team (25Q3 (Jan–Mar)), 13Patch-For-Review: Function Evaluator log data loss due to ECS nonconforming fields - https://phabricator.wikimedia.org/T383448#10493300 (10Ahoelzl) a:03tchin [18:33:58] 10Data-Engineering (Q3 2024 January 1st - March 31th): Haproxy kafka and varnishkafka produce compatible datasets - https://phabricator.wikimedia.org/T382571#10493309 (10Ahoelzl) a:03JAllemandou [18:35:16] 06Data-Engineering, 06Data-Platform-SRE, 07Epic: HDFS capacity needs FY24/25 - https://phabricator.wikimedia.org/T384098#10493313 (10Ahoelzl) [18:35:54] 06Data-Engineering, 07Epic, 13Patch-For-Review: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition - https://phabricator.wikimedia.org/T354694#10493314 (10Ahoelzl) [18:36:44] 06Data-Engineering, 10Data Pipelines, 10Data-Catalog: Upgrade to Spark 3.2 to support Spark lineage for Iceberg tables - https://phabricator.wikimedia.org/T378899#10493317 (10Ahoelzl) [18:43:28] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Create and test a new produced_by config for the datalake table - https://phabricator.wikimedia.org/T381432#10493351 (10amastilovic) [18:44:37] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Identify Internal Users of MediaWiki Wikitext Tables - https://phabricator.wikimedia.org/T383743#10493357 (10Ahoelzl) [18:44:53] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10493369 (10Ahoelzl) [18:47:11] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Create and test a new produced_by config for the datalake table - https://phabricator.wikimedia.org/T381432#10493375 (10amastilovic) 05In progress→03Resolved [19:16:09] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Create and test a new produced_by config for the datalake table - https://phabricator.wikimedia.org/T381432#10493467 (10xcollazo) I can't see the DAG at https://airflow-platform-eng.wikimedia.org/home, perhaps we are missing... [20:09:12] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10493698 (10xcollazo) I just realized we have... [21:04:44] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10493782 (10tchin) Nice catch! Putting up a p... [21:08:22] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list January 2025 - https://phabricator.wikimedia.org/T384259#10493790 (10xcollazo) Next scheduled deployment is Tue Jan 28, thus adding this to the deployment train at https://etherpad.wikime... [21:19:34] 10Data-Engineering (Q3 2024 January 1st - March 31th), 06Experimentation Lab, 10Dumps 2.0 (Kanban Board), 13Patch-For-Review: Dashboard and alerting of data quality metrics for wmf_content.mediawiki_content_history_v1 - https://phabricator.wikimedia.org/T357684#10493798 (10xcollazo) First `compute_metrics`... [21:20:01] !log [data lake temp accounts] deployment of data lake temp accounts changes is complete [21:20:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:20:31] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Create and test a new produced_by config for the datalake table - https://phabricator.wikimedia.org/T381432#10493800 (10amastilovic) You were right, I haven't deployed it at the time. I deployed it now but it doesn't work du... [21:26:14] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Dumps 2.0 (Kanban Board): Create and test a new produced_by config for the datalake table - https://phabricator.wikimedia.org/T381432#10493803 (10amastilovic) 05Resolved→03In progress [21:49:02] 06Data-Engineering, 10Commons-Impact-Metrics: "Pageview counts for a given category" not offering "deep" metrics - https://phabricator.wikimedia.org/T382733#10493836 (10Dominicbm) >>! In T382733#10480190, @mforns wrote: > @GFontenelle_WMF The deep metrics are only available for the primary categories (the ones... [21:54:42] 06Data-Engineering, 10Commons-Impact-Metrics: "Pageview counts for a given category" not offering "deep" metrics - https://phabricator.wikimedia.org/T382733#10493849 (10GFontenelle_WMF) Hi, @mforns. Thanks! That's correct, they both are not primary categories. I guess the only way to get this data is to add th... [22:13:42] 06Data-Engineering, 10Commons-Impact-Metrics, 10Commons-Impact-Metrics-Requests: Update Commons Impact Metrics allow-list January 2025 - https://phabricator.wikimedia.org/T384259#10493923 (10GFontenelle_WMF) Hi @xcollazo! Just a quick note that I was talking with @mforns and we will need a few more days to u... [22:25:10] 06Data-Engineering, 10Commons-Impact-Metrics: "Pageview counts for a given category" not offering "deep" metrics - https://phabricator.wikimedia.org/T382733#10493964 (10mforns) > I guess the only way to get this data is to add them as primary to the allow-list, correct? Yes! [22:53:49] 06Data-Engineering, 10Wmfdata-Python: Deprecate the Hive module - https://phabricator.wikimedia.org/T384541#10494035 (10nshahquinn-wmf) p:05Low→03Medium [22:57:47] 06Data-Engineering, 06Data-Engineering-Radar, 10Wmfdata-Python: Add type annotations to Wmfdata-Python and make the type-checking CI blocking - https://phabricator.wikimedia.org/T381656#10494048 (10nshahquinn-wmf) p:05Medium→03Low [23:06:34] 06Data-Engineering, 06Data-Engineering-Radar, 06Product-Analytics, 10Wmfdata-Python: Retrieve host & port info when connecting to MariaDB replicas on the cluster - https://phabricator.wikimedia.org/T340472#10494070 (10nshahquinn-wmf) [23:06:36] 14Analytics-Kanban, 06Data-Engineering, 06Data-Engineering-Icebox, 06Product-Analytics, 10Wmfdata-Python: wmfdata.mariadb relies on analytics-mysql being available - https://phabricator.wikimedia.org/T292479#10494073 (10nshahquinn-wmf) →14Duplicate dup:03T340472 [23:07:14] 06Data-Engineering, 06Data-Engineering-Radar, 06Product-Analytics, 10Wmfdata-Python: Enable Wmfdata-Python to access MariaDB replicas from the cluster - https://phabricator.wikimedia.org/T340467#10494074 (10nshahquinn-wmf) [23:09:07] 06Data-Engineering, 06Data-Engineering-Radar, 06Product-Analytics, 10Wmfdata-Python: Let user specify cnf to use when connecting to MariaDB - https://phabricator.wikimedia.org/T340469#10494079 (10nshahquinn-wmf) p:05Triage→03Low [23:30:59] 06Data-Engineering, 06Data-Engineering-Radar, 06Product-Analytics, 10Wmfdata-Python, 07Epic: Enable Wmfdata-Python to access MariaDB replicas from the cluster - https://phabricator.wikimedia.org/T340467#10494107 (10nshahquinn-wmf) [23:31:06] 06Data-Engineering, 06Data-Engineering-Radar, 06Product-Analytics, 10Wmfdata-Python, 07Epic: Enable Wmfdata-Python to access MariaDB replicas from the cluster - https://phabricator.wikimedia.org/T340467#10494108 (10nshahquinn-wmf) p:05Triage→03Low [23:33:22] 06Data-Engineering, 06Data-Engineering-Radar, 06Product-Analytics, 10Wmfdata-Python: Retrieve host & port info when connecting to MariaDB replicas on the cluster - https://phabricator.wikimedia.org/T340472#10494109 (10nshahquinn-wmf) p:05Triage→03Low To some extent, this is blocked on {T293700}. If th... [23:34:46] 06Data-Engineering, 06Data-Engineering-Icebox, 06Product-Analytics, 10Wmfdata-Python: Set up Wmfdata-Python integration test suite to run automatically - https://phabricator.wikimedia.org/T304547#10494128 (10nshahquinn-wmf) p:05Medium→03Low [23:40:08] 06Data-Engineering, 06Data-Engineering-Icebox, 06Product-Analytics, 10Wmfdata-Python: wmfdata should display more progress information and metadata when running a query - https://phabricator.wikimedia.org/T259808#10494134 (10nshahquinn-wmf) 05Open→03Declined I don't think this is worth doing. In mo... [23:41:36] 06Data-Engineering, 06Data-Engineering-Icebox, 06Product-Analytics, 10Wmfdata-Python: PyHive ignores SET statements with a leading newline - https://phabricator.wikimedia.org/T334442#10494138 (10nshahquinn-wmf) 05Open→03Declined We will be deprecating Hive soon (T384541).