[00:18:23] 10Analytics-Canonical-Data, 06Data-Engineering, 06Data-Engineering-Icebox, 06Movement-Insights: Automatically update the canonical data tables - https://phabricator.wikimedia.org/T339928#10608241 (10nshahquinn-wmf) [00:19:00] 10Analytics-Canonical-Data, 06Data-Engineering, 06Data-Engineering-Icebox, 06Movement-Insights: Automatically update the canonical data tables - https://phabricator.wikimedia.org/T339928#10608246 (10nshahquinn-wmf) I've updated the description so it has a pretty full description about how I would approach... [00:19:38] 10Analytics-Canonical-Data, 06Data-Engineering, 06Data-Engineering-Icebox, 06Movement-Insights: Automatically update the canonical data tables - https://phabricator.wikimedia.org/T339928#10608248 (10nshahquinn-wmf) [02:52:10] 06Data-Engineering, 06Data-Engineering-Radar, 06Data-Platform-SRE, 06serviceops, 10Event-Platform: Make eventstreams-internal available to WMF staff without an ssh tunnel - https://phabricator.wikimedia.org/T348763#10608493 (10Ottomata) [03:09:51] 10Analytics-Canonical-Data, 06Data-Engineering, 06Data-Engineering-Icebox, 06Movement-Insights: Automatically update the canonical data tables - https://phabricator.wikimedia.org/T339928#10608515 (10Ottomata) Hm, I wonder if some of the work here (interacting with git repos using airflow jobs) could be sha... [03:09:53] 06Data-Engineering, 06Traffic: GeoDNS: Pipeline from event.development_network_probe to operations/dns.git - https://phabricator.wikimedia.org/T380626#10608518 (10Ottomata) Hm, I wonder if some of the work here (interacting with git repos using airflow jobs) could be shared with {T339928}. [06:06:31] !log removing dse-k8s-etcd1001 from the dse-k8s cluster to allow a reimage to bookworm T377875 [06:06:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:06:34] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [06:11:41] !log reimaging dse-k8s-etcd1001 [06:11:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:30:34] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 10Charts (Sprint 17), 07Schema-change-in-production: Deploy patch-gjlw_namespace_text.sql on x1.commonswiki for JsonConfig - https://phabricator.wikimedia.org/T385917#10608634 (10Marostegui) @bvibber can you confirm this schema change needs to be appli... [07:31:40] !log removing dse-k8s-etcd1003 from the dse-k8s cluster to allow a reimage to bookworm T377875 [07:31:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:31:43] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [07:37:44] !log reimaging dse-k8s-etcd1003 [07:37:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:47:43] Hey folks, there are alerts going off for multiple an-worker hosts about the megaraid write cache policy. They are flapping quite a lot. Can one of you take a look? [08:11:15] Hi RhinosF1 , sorry for the noise. Having a look at them [08:11:37] stevemunene: no worries [08:19:47] Checked all the recent alerts and these are the recently repurposed hosts from https://phabricator.wikimedia.org/T382410 an-worker106[5-8] [08:20:44] We have new HDDs that have arrived and are about to begin the expansion so I do not think we shall keep them for long [08:22:18] I think it's the BBU rather than drives that normally causes that alert [08:22:39] But if we do plan to we shall reach out to dc ops like last time we had these https://phabricator.wikimedia.org/T318659 [08:25:21] I suggest reaching out then you can see the quote anyway [08:25:31] And maybe downtiming the alerts if they are all known faulty [08:27:21] Ack, that sounds good [10:07:04] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10609026 (10mfossati) [10:12:39] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10609057 (10mfossati) [10:13:13] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10609060 (10mfossati) [10:13:23] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Image-Suggestions, 10Section-Level-Image-Suggestions, 10Structured-Data-Backlog (Current Work): [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10609064 (10mfossati) [10:16:52] (03CR) 10Peter Fischer: "recheck" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1115440 (https://phabricator.wikimedia.org/T384385) (owner: 10Peter Fischer) [10:23:04] (03CR) 10CI reject: [V:04-1] Adapt table/column names [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1115440 (https://phabricator.wikimedia.org/T384385) (owner: 10Peter Fischer) [10:24:41] 06Data-Engineering: [Maintenance] Reduce number of HDFS files - https://phabricator.wikimedia.org/T347975#10609104 (10JAllemandou) As pointed out in the task-desc, the main solution we envision for this is the iceberg-ification of `event_sanitized`. Maybe we can use this task as a global epic for other sub-tasks... [11:03:00] Hi RhinosF1 - Thanks, yes we will look at relaxing the alerts for these hosts, to stop the notifications. They are old and the RAID battery will have failed, but the alerts indicate that they are running in a safer, slower configuration. We can live with this for as long as we need them, so I will relax the checks. Sorry for the spam. [11:20:12] 06Data-Engineering, 06Data-Engineering-Radar, 10Cassandra, 10Data Pipelines, and 2 others: Create puppet resource for adding/updating/deleting secrets or other small files on HDFS - https://phabricator.wikimedia.org/T323692#10609353 (10BTullis) I haven't had any further time to work on this recently, so I'... [11:20:41] 06Data-Engineering, 06Data-Engineering-Radar, 10Cassandra, 10Data Pipelines, and 2 others: Create puppet resource for adding/updating/deleting secrets or other small files on HDFS - https://phabricator.wikimedia.org/T323692#10609360 (10BTullis) a:05BTullis→03None [11:24:31] 06Data-Engineering, 10Data-Engineering-Jupyter, 06Data-Engineering-Radar, 10Data-Platform-SRE (2025.03.01 - 2025.03.21): Cannot spawn a Jupyter server on stat1010 - https://phabricator.wikimedia.org/T385647#10609403 (10BTullis) 05Open→03Resolved a:03BTullis I believe that this has been fixed now.... [12:10:07] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 07Essential-Work, 13Patch-For-Review: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114#10609573 (1... [12:30:05] (03PS6) 10Peter Fischer: Adapt table/column names [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1115440 (https://phabricator.wikimedia.org/T384385) [12:34:51] (03PS5) 10Peter Fischer: Partial dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1115441 (https://phabricator.wikimedia.org/T384383) [12:43:52] (03CR) 10CI reject: [V:04-1] Adapt table/column names [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1115440 (https://phabricator.wikimedia.org/T384385) (owner: 10Peter Fischer) [13:09:35] 10Data-Engineering (Q3 2025 January 1st - March 31th), 06Infrastructure-Foundations, 10netops: Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10609723 (10BTullis) @JAllemandou - would you like any help to implement these new retention parameters, or c... [13:30:36] !log initialize election of new leader on dse-k8s-etcd cluster to allow reimage of dse-k8s-etcd1002 T377875 [13:30:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:30:39] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [14:50:20] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Data-Platform-SRE (2025.03.01 - 2025.03.21), 07Essential-Work, 13Patch-For-Review: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114#10610123... [14:51:48] (03CR) 10Xcollazo: [C:03+2] Partial dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1115441 (https://phabricator.wikimedia.org/T384383) (owner: 10Peter Fischer) [14:58:43] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content: Estimate effort for migrating wmf.wikidata_entity to the new mediawiki content pipelines - https://phabricator.wikimedia.org/T388040#10610223 (10Ottomata) Ah! Discussed with Joseph today. 'metadata' just means not page content. I... [15:07:01] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 06Research: A dataset sensor should work indepent of airflow instance - https://phabricator.wikimedia.org/T386973#10610241 (10xcollazo) 05Open→03Resolved [15:57:34] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content: Estimate effort for migrating wmf.wikidata_entity to the new mediawiki content pipelines - https://phabricator.wikimedia.org/T388040#10610541 (10xcollazo) >>! In T388040#10610223, @Ottomata wrote: > Ah! Discussed with Joseph today.... [16:09:52] (03PS1) 10Milimetric: Add a closed flag to the project namespace map dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125184 (https://phabricator.wikimedia.org/T241741) [16:26:14] (03CR) 10Snwachukwu: [C:03+2] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125184 (https://phabricator.wikimedia.org/T241741) (owner: 10Milimetric) [16:29:26] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Dumps-Generation: commonswiki dump stuck for 20250301 - https://phabricator.wikimedia.org/T387992#10610717 (10Ahoelzl) [16:29:31] 10Data-Engineering (Q3 2025 January 1st - March 31th), 10DPE-Mediawiki-Content, 10Dumps-Generation: commonswiki dump stuck for 20250301 - https://phabricator.wikimedia.org/T387992#10610723 (10Ahoelzl) p:05Triage→03High [18:06:11] 06Data-Engineering, 10DPE-Mediawiki-Content: [Dumps 2] Investigate reasons for remaining inconsistencies - https://phabricator.wikimedia.org/T385112#10611005 (10Ottomata) [21:42:40] (03PS1) 10Joal: Bump memory for gobblin event_default job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125238 [21:51:25] (03CR) 10Ottomata: [C:03+2] Bump memory for gobblin event_default job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125238 (owner: 10Joal) [21:51:43] (03CR) 10Ottomata: [V:03+2 C:03+2] Bump memory for gobblin event_default job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125238 (owner: 10Joal) [23:03:35] (03PS1) 10Joal: Remove recentchange from event_default gobblin job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125244 [23:15:39] (03CR) 10Joal: [V:03+2 C:03+2] "self-merging hotfix" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1125244 (owner: 10Joal) [23:19:03] !log Deploying refinery onto an-launcher1002 to remove recentchange from gobblin [23:19:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [23:20:49] !log Force killing gobblin failing job to let next one with patched code run [23:20:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log