[00:41:58] 06Data-Engineering, 10Growth-Structured-Tasks, 06Growth-Team, 10Image-Suggestions, and 5 others: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20 - https://phabricator.wikimedia.org/T386255#10547183 (10HNordeenWMF) [00:44:39] 06Data-Engineering, 10Data Pipelines: Add azwiki to clickstream - https://phabricator.wikimedia.org/T386288 (10Nemoralis) 03NEW [00:55:34] 06Data-Engineering, 10Data Pipelines: Add azwiki to clickstream - https://phabricator.wikimedia.org/T386288#10547203 (10Nemoralis) 05Open→03Resolved a:03Nemoralis Because the website isn't readable in dark mode, I didn't notice the `az.wikipedia.org` in the dropdown. [08:58:59] 06Data-Engineering, 06Data-Platform-SRE: Grow number of Gobblin mappers ingesting `webrequest_frontend` data - https://phabricator.wikimedia.org/T386174#10548070 (10JAllemandou) 05Open→03Resolved a:03JAllemandou After a day of running, the Gobblin runs are a bit faster and more stable: {F58394464}... [09:00:06] 06Data-Engineering, 06Data-Platform-SRE: Grow number of Gobblin mappers ingesting `webrequest_frontend` data - https://phabricator.wikimedia.org/T386174#10548083 (10JAllemandou) [09:34:36] (03PS1) 10Joal: Add is_redirect_to_pageview to webrequest_frontend [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1119465 (https://phabricator.wikimedia.org/T386176) [09:35:35] 06Data-Engineering: Requesting Kerberos Password Reset - https://phabricator.wikimedia.org/T386225#10548319 (10SCherukuwada) 05Open→03Resolved a:03SCherukuwada Miraculously, I just remembered my password. No reset needed any more. [09:55:14] !log draining dse-k8s-worker1003 ready for reimage to bookworm and containerd for T377875 [09:55:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:55:18] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [10:00:16] (03CR) 10Gmodena: [C:03+1] Add is_redirect_to_pageview to webrequest_frontend [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1119465 (https://phabricator.wikimedia.org/T386176) (owner: 10Joal) [10:12:19] !log reimaging dse-k8s-worker1003 [10:12:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:36:51] (03CR) 10Joal: [V:03+2 C:03+2] "Merging and manually deploying to fast-track" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1119465 (https://phabricator.wikimedia.org/T386176) (owner: 10Joal) [10:39:37] (03CR) 10Joal: [C:03+2] Keep event.mediawiki_page_change_v1 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1114803 (owner: 10DCausse) [10:39:39] (03CR) 10Joal: [V:03+2 C:03+2] Keep event.mediawiki_page_change_v1 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1114803 (owner: 10DCausse) [10:44:12] !log Deploying refinery [10:44:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:50:12] !log Deploying refinery to HDFS [10:50:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:54:05] !log Alter wmf_staging.webrequest schema adding is_redirect_to_pageview after is_pageview [10:54:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:54:45] !log Unpause webrequest_frontend DAGs [10:54:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:06:09] 10Data-Engineering (Q3 2024 January 1st - March 31th): Validate `pageview` and `unique_devices` generated from `webrequest_frontend` - https://phabricator.wikimedia.org/T386343 (10JAllemandou) 03NEW [11:40:26] 06Data-Engineering: Add `is_redirect_to_pageview` field to `wmf_staging.webrequest` table - https://phabricator.wikimedia.org/T386176#10548786 (10JAllemandou) 05Open→03Resolved a:03JAllemandou Confirmed the field is now set from 2025-02-13T10:00 onward. [12:19:56] !log draining dse-k8s-worker1004 ready for reimage to bookworm and containerd for T377875 [12:19:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:20:00] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [12:24:24] 06Data-Engineering, 10ActiveAbstract, 10Dumps-Generation, 13Patch-For-Review: Undeploy and archive ActiveAbstract - https://phabricator.wikimedia.org/T382069#10548928 (10Ladsgroup) >>! In T382069#10546922, @xcollazo wrote: > +1 to move ahead and stop this dump. Deployed. In the next run, there shouldn't b... [12:24:28] !log reimaging dse-k8s-worker1004 [12:24:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:47:22] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 06MediaWiki-Platform-Team, and 3 others: Drop module_deps table in WMF prod - https://phabricator.wikimedia.org/T385997#10549001 (10Ladsgroup) I will drop the table and remove it from the catalog the week after to make sure all code has reached production. [12:50:08] 10Data-Engineering (Q3 2024 January 1st - March 31th): Fix HAProxy `uri_host` and `accept_language` differences with VarnishKafka - https://phabricator.wikimedia.org/T386354 (10JAllemandou) 03NEW [12:55:03] 10Data-Engineering (Q3 2024 January 1st - March 31th): Validate `pageview` and `unique_devices` generated from `webrequest_frontend` - https://phabricator.wikimedia.org/T386343#10549039 (10JAllemandou) [13:23:35] !log draining dse-k8s-worker1005 ready for reimage to bookworm and containerd for T377875 [13:23:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:23:38] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [13:26:18] !log reimaging dse-k8s-worker1005 [13:26:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:52:38] 06Data-Engineering, 06Research: Incremental HTML wiki content dataset to support "Who are moderators" SDS 1.2.3 - https://phabricator.wikimedia.org/T380874#10549155 (10XiaoXiao-WMF) DP decided to not prioritize it in Q3. Moving to freezer. [13:52:59] 06Data-Engineering, 06Research, 06Research-Freezer: Incremental HTML wiki content dataset to support "Who are moderators" SDS 1.2.3 - https://phabricator.wikimedia.org/T380874#10549156 (10XiaoXiao-WMF) [13:58:38] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10549187 (10YLiou_WMF) 05Resolved→03Open [14:08:10] 06Data-Engineering, 06Data-Engineering-Radar, 06DBA, 06MediaWiki-Platform-Team, and 3 others: Drop module_deps table in WMF prod - https://phabricator.wikimedia.org/T385997#10549206 (10Hokwelum) Sounds good! >>! In T385997#10549001, @Ladsgroup wrote: > I will drop the table and remove it from the catalog t... [14:16:47] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10549226 (10YLiou_WMF) Unfortunately, I'm reopening this as I'm experiencing an issue logging into JupyterHub. This app... [14:27:09] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10549262 (10Ottomata) [14:29:16] !log draining dse-k8s-worker1006 ready for reimage to bookworm and containerd for T377875 [14:29:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:29:20] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [14:33:18] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10549298 (10Ottomata) Thanks for this Aleks! > One-by-one DAG: test them in test-k8s, apply necessary modifications, migrate to airflow-analytics. Another variation on this idea: we make a n... [14:33:28] 06Data-Engineering, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Grow number of Gobblin mappers ingesting `webrequest_frontend` data - https://phabricator.wikimedia.org/T386174#10549299 (10Gehel) [14:35:07] !log reimaging dse-k8s-worker1006 [14:35:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:35:58] 06Data-Engineering, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Grow kafka partition number for topics `webrequest_frontend_text` and `webrequest_frontend_upload` - https://phabricator.wikimedia.org/T386173#10549311 (10Gehel) [14:38:21] 06Data-Engineering, 10Data-Engineering-Jupyter, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Cannot spawn a Jupyter server on stat1010 - https://phabricator.wikimedia.org/T385647#10549331 (10Gehel) [14:38:30] 06Data-Engineering, 10Data-Engineering-Jupyter, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Cannot spawn a Jupyter server on stat1010 - https://phabricator.wikimedia.org/T385647#10549332 (10Gehel) p:05Triage→03High [14:39:15] 06Data-Engineering, 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 14Mediawiki Content: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114#10549336 (10Gehel) [14:39:57] 06Data-Engineering, 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 14Mediawiki Content: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114#10549340 (10Gehel) Movign to "Blo... [14:41:12] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10549344 (10BTullis) It looks like the cause of this is that the `yliou` account is not a member of either the `wmf` or... [14:53:43] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10549415 (10BTullis) I added the record from `mwmaint1002` with the following command: ` btullis@mwmaint2002:~$ sudo mo... [15:09:47] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10549517 (10brouberol) @Ottomata the only issue that I have with that is that we'd lose the opportunity to migrate the DB data, and would effectively start from an empty database, as there wou... [15:20:19] 06Data-Engineering, 10Event-Platform: [NEEDS GROOMING] We should improve the code health of gobblin-wmf - https://phabricator.wikimedia.org/T370368#10549582 (10Ottomata) FWIW, I'm working on doing eventutilties too here: https://gitlab.wikimedia.org/otto/eventutilities/-/blob/master/.gitlab-ci.yml?ref_type=hea... [15:32:00] 06Data-Engineering, 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 14Mediawiki Content: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114#10549649 (10xcollazo) Copy pastin... [16:09:35] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10549996 (10Gehel) Do we think that the access logs we have are sufficiently relevant? Given our bandwidth limitation, I suspect that most users are downl... [16:17:42] !log draining dse-k8s-worker1007 ready for reimage to bookworm and containerd for T377875 [16:17:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:17:45] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [16:19:54] 06Data-Engineering, 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 14Mediawiki Content, 13Patch-For-Review: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114#10550044 (10... [16:20:02] !log reimaging dse-k8s-worker1007 [16:20:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:25:03] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 07Essential-Work, and 2 others: DAG failing due to failure to acquire lock on wmf_data_ops.data_qu... - https://phabricator.wikimedia.org/T386114#10550067 [16:25:55] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10550069 (10mforns) We have finished the log path parsing code that will classify and augment the dump webrequest logs. See: https://docs.google.com/sprea... [16:35:22] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10550138 (10YLiou_WMF) @BTullis the yliou account seems to work to login to jupyterhub now! Thank you for all your help! [16:35:38] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10550139 (10YLiou_WMF) 05Open→03Resolved [16:36:03] 10Data-Engineering (Q3 2024 January 1st - March 31th): Validate `pageview` and `unique_devices` generated from `webrequest_frontend` - https://phabricator.wikimedia.org/T386343#10550164 (10JAllemandou) [16:46:10] 10Data-Engineering (Q3 2024 January 1st - March 31th), 07Essential-Work: Analyze Dumps Usage Through Apache Logs - https://phabricator.wikimedia.org/T383175#10550199 (10mforns) > Do we think that the access logs we have are sufficiently relevant? Given our bandwidth limitation, I suspect that most users are do... [16:48:27] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Image-Suggestions, 10Section-Level-Image-Suggestions, 06Structured-Data-Backlog, 14Mediawiki Content: [SPIKE] Check the Wikimedia content history dataset - https://phabricator.wikimedia.org/T385787#10550221 (10Ahoelzl) [17:06:38] !log draining dse-k8s-worker1008 ready for reimage to bookworm and containerd for T377875 [17:06:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:06:41] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [17:12:30] !log reimaging dse-k8s-worker1008 [17:12:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:20:48] 06Data-Engineering, 06Data-Engineering-Radar, 10BDC-Implementation, 06Data-Platform-SRE, 07Epic: [Trino] Develop procedure and scripting for Trino cluster maintenance. - https://phabricator.wikimedia.org/T386391 (10Jgreen) 03NEW [17:23:34] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10550326 (10Ottomata) Hm, true. Okay. [17:27:01] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10550331 (10BTullis) >>! In T386282#10549517, @brouberol wrote: > @Ottomata the only issue that I have with that is that we'd lose the opportunity to migrate the DB data, and would effectively... [17:34:18] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10550345 (10Ottomata) > `production` instead of `main` production seems more like a phase or env that many other instances might have as well. E.g. image suggestions dag is 'production' but... [18:01:10] !log draining dse-k8s-worker1009 ready for reimage to bookworm and containerd for T377875 [18:01:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:01:13] T377875: Migrate dse-k8s cluster from docker to containerd - https://phabricator.wikimedia.org/T377875 [18:04:46] !log reimaging dse-k8s-worker1009 [18:04:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:24:16] !log deployed latest DAGs to analytics Airflow instance. T386114. [20:24:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:24:22] T386114: DAG failing due to failure to acquire lock on wmf_data_ops.data_quality_metrics table - https://phabricator.wikimedia.org/T386114 [20:35:08] 06Data-Engineering, 06Release-Engineering-Team: Create a GitLab CI/CD Component project for WMF CI/CD templates and components - https://phabricator.wikimedia.org/T382430#10550811 (10Ottomata) [20:35:45] 06Data-Engineering, 06Release-Engineering-Team: Create a GitLab CI/CD Component project for WMF CI/CD templates and components - https://phabricator.wikimedia.org/T382430#10550812 (10Ottomata) I added #release-engineering-team just as an FYI and in case they have advice for us. RelEng, feel free to put this i... [20:36:01] 06Data-Engineering, 06Discovery-Search, 06Java-Scala-Standardization: Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406 (10Ottomata) 03NEW [20:36:10] 06Data-Engineering, 06Discovery-Search, 06Java-Scala-Standardization: Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10550841 (10Ottomata) [20:36:11] 06Data-Engineering, 06Java-Scala-Standardization: Resolve conflict between GitLab CI automated package deployment token variable names - https://phabricator.wikimedia.org/T386056#10550840 (10Ottomata) [20:38:32] 10Data-Engineering (Q3 2024 January 1st - March 31th), 10Data-Platform (Data Platform Ops Week Working Group), 10Data-Platform-SRE (2025.02.10 - 2025.02.28), 07Essential-Work, and 2 others: DAG failing due to failure to acquire lock on wmf_data_ops.data_qu... - https://phabricator.wikimedia.org/T386114#10550843 [20:38:42] 06Data-Engineering, 06Discovery-Search, 06Java-Scala-Standardization: Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406#10550845 (10Ottomata) FWIW I have been working while wikimedia-event-utilities to gitlab as part of {T368927}. See: https://gitlab.wikimedia.org/otto/... [20:46:11] 06Data-Engineering, 06Language and Product Localization, 10MediaWiki-extensions-Translate, 06MW-Interfaces-Team, and 3 others: Intermittent JobQueueError due to "Unable to deliver all events: 500: Internal Server Error" - https://phabricator.wikimedia.org/T386138#10550871 (10Urbanecm) [20:51:02] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10550873 (10YLiou_WMF) 05Resolved→03Open [20:52:37] 06Data-Engineering, 06Language and Product Localization, 10MediaWiki-extensions-Translate, 06MW-Interfaces-Team, and 3 others: Intermittent JobQueueError due to "Unable to deliver all events: 500: Internal Server Error" - https://phabricator.wikimedia.org/T386138#10550901 (10Urbanecm) Filled {T386410} as a... [20:54:40] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10550911 (10YLiou_WMF) Unfortunately I'm now experiencing a separate issue! I'm trying to install R and am receiving th... [20:58:54] 06Data-Engineering, 06Language and Product Localization, 10MediaWiki-extensions-Translate, 06MW-Interfaces-Team, and 3 others: Intermittent JobQueueError due to "Unable to deliver all events: 500: Internal Server Error" - https://phabricator.wikimedia.org/T386138#10550918 (10Nikerabbit) [21:05:49] 06Data-Engineering, 06Data-Engineering-Radar, 10Research-engineering, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Requesting Ceph S3 credentials for research - https://phabricator.wikimedia.org/T385608#10550949 (10bking) I have added the 4TB quota as requested: ` sudo radosgw-admin quota set --quota-sc... [21:12:45] 06Data-Engineering, 06SRE, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users ssh access and Kerberos identity for YLiou_WMF - https://phabricator.wikimedia.org/T385220#10550958 (10BTullis) Strangely, that file isn't displaying for me. It's showing as restricted. It might be an idea to... [21:14:01] 06Data-Engineering, 06Data-Engineering-Radar, 10Research-engineering, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Requesting Ceph S3 credentials for research - https://phabricator.wikimedia.org/T385608#10550963 (10BTullis) p:05Triage→03Medium a:03bking [21:30:41] 06Data-Engineering, 06Language and Product Localization, 10MediaWiki-extensions-Translate, 06MW-Interfaces-Team, and 3 others: Intermittent JobQueueError due to "Unable to deliver all events: 500: Internal Server Error" - https://phabricator.wikimedia.org/T386138#10550995 (10hnowlan) It looks like eventgat... [22:28:31] 06Data-Engineering: Migrate analytics Airflow DAGs to k8s Airflow deployment - https://phabricator.wikimedia.org/T386282#10551261 (10brouberol) > That's a good point but maybe the old DB data doesn't have as much value as we might think it does. If it does not, then I'm 100% for it! [23:00:52] 06Data-Engineering, 06Data-Engineering-Radar, 10Research-engineering, 10Data-Platform-SRE (2025.02.10 - 2025.02.28): Requesting Ceph S3 credentials for research - https://phabricator.wikimedia.org/T385608#10551290 (10bking) 05Open→03Resolved Per [[ https://wikimedia.slack.com/archives/C055QGPTC69/p...