[06:40:22] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Cynthia Makonyango WMDE - https://phabricator.wikimedia.org/T371689#10041144 (10WMDECyn) Yes I do. [07:10:30] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10041171 (10JoelyRooke-WMDE) Yes I will need access to private data, just not ssh key entry [08:27:16] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Remove Benthos from ulsfo hosts - https://phabricator.wikimedia.org/T370741#10041302 (10Vgutierrez) we had some alerts ongoing during the weekend due to this task: ` FIRING: SystemdUnitFailed: wmf_auto_restart_b... [08:30:09] 06Data-Engineering, 06Data Products, 10Observability-Logging, 06Traffic, 13Patch-For-Review: Remove Benthos from ulsfo hosts - https://phabricator.wikimedia.org/T370741#10041303 (10Vgutierrez) p:05Triage→03Medium [09:35:39] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041467 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm [10:32:59] !log failing over HDFS namenode on hadoop-test cluster to an-master1002 for T366555 [10:33:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:49:15] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041698 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm executed with errors: - an-... [10:50:21] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041699 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm [10:51:45] !log failing over HDFS namenode on hadoop-test cluster back to an-master1001 for T366555 [10:51:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:17:52] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041750 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host an-conf1004.eqiad.wmnet with OS bookworm completed: - an-conf1004 (*... [11:42:29] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041784 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host an-conf1005.eqiad.wmnet with OS bookworm [12:11:39] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041850 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host an-conf1005.eqiad.wmnet with OS bookworm completed: - an-conf1005 (*... [12:16:17] 06Data-Engineering: Refine optimizations on output and parallelization - https://phabricator.wikimedia.org/T371803 (10Antoine_Quhen) 03NEW [12:18:17] 10Data-Engineering (Q1 2024 July 1st - September 30th), 13Patch-For-Review: [Refine refactoring] Refine jobs should be scheduled by Airflow: implementation - https://phabricator.wikimedia.org/T356762#10041876 (10Antoine_Quhen) I've isolated the optimization here https://phabricator.wikimedia.org/T371803 [12:27:00] 06Data-Engineering: Refine optimizations on output and parallelization - https://phabricator.wikimedia.org/T371803#10041888 (10Antoine_Quhen) The test code: https://gitlab.wikimedia.org/-/snippets/149 [12:27:57] 06Data-Engineering: Refine optimizations on output and parallelization - https://phabricator.wikimedia.org/T371803#10041904 (10Antoine_Quhen) [12:27:58] 10Data-Engineering (Q1 2024 July 1st - September 30th), 13Patch-For-Review: [Refine refactoring] Refine jobs should be scheduled by Airflow: implementation - https://phabricator.wikimedia.org/T356762#10041905 (10Antoine_Quhen) [12:30:10] (03PS31) 10Aqu: Refactor Refine to be triggerd by Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) [12:30:20] (03CR) 10Aqu: Refactor Refine to be triggerd by Airflow (037 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [12:32:21] (03CR) 10CI reject: [V:04-1] Refactor Refine to be triggerd by Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [12:50:25] (03CR) 10Btullis: [V:03+2 C:03+2] Add aewikimedia to the sqoop list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1054285 (https://phabricator.wikimedia.org/T362529) (owner: 10Btullis) [12:52:56] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10041984 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host an-conf1006.eqiad.wmnet with OS bookworm [13:20:21] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10042054 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host an-conf1006.eqiad.wmnet with OS bookworm completed: - an-conf1006 (*... [13:24:01] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10042064 (10SLyngshede-WMF) [13:24:12] 06Data-Engineering, 06Data-Platform-SRE, 06SRE: Streamline Data Platform access approvals for WMF staff - https://phabricator.wikimedia.org/T370424#10042058 (10SLyngshede-WMF) @Ottomata I'm just removing the SRE-Access-Requests tag to remove this from the Clinic Duty dashboard. [13:30:14] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10042068 (10SLyngshede-WMF) @KFrancis Do we have an NDA for @JoelyRooke-WMDE @JoelyRooke-WMDE without a shell account (S... [13:33:42] 06Data-Engineering: Event Utilities partially downloads schemas - https://phabricator.wikimedia.org/T309717#10042078 (10Ottomata) Another example happened today: application_1719935448343_494307 ` 24/08/04 16:26:16 ERROR Refine: Failed refinement of dataset hdfs://analytics-hadoop/wmf/data/raw/eventlogging_leg... [13:39:46] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10042083 (10JoelyRooke-WMDE) I believe I have already signed the NDA when I got basic LDAP access (https://phabricator.wiki... [16:07:40] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10042756 (10Dzahn) >>! In T371584#10042067, @SLyngshede-WMF wrote: > @KFrancis Do we have an NDA for @JoelyRooke-WMDE I c... [16:23:51] (03PS32) 10Aqu: Refactor Refine to be triggerd by Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) [16:31:13] (03CR) 10Aqu: Refactor Refine to be triggerd by Airflow (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [16:43:30] 10Quarry, 10cloud-services-team (FY2023/2024-Q3-Q4): Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#10042952 (10bd808) [16:51:38] (03CR) 10Ottomata: Refactor Refine to be triggerd by Airflow (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [16:52:53] (03CR) 10Ottomata: Refactor Refine to be triggerd by Airflow (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [16:58:02] (03PS1) 10Milimetric: Add temporary dashboard pointing to old data [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/1059914 (https://phabricator.wikimedia.org/T342267) [16:58:20] (03CR) 10Milimetric: [V:03+2 C:03+2] "tested this locally, seems ok" [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/1059914 (https://phabricator.wikimedia.org/T342267) (owner: 10Milimetric) [16:58:55] 10Quarry, 10cloud-services-team (FY2023/2024-Q3-Q4): Allow Quarry to query its own database - https://phabricator.wikimedia.org/T367415#10043015 (10bd808) >>! In T367415#10038537, @fnegri wrote: > @bd808 I'm interested in your opinion on this one. I created a pull request, but I'm also wondering if anybody sti... [17:08:26] (03PS33) 10Aqu: Refactor Refine to be triggerd by Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) [17:16:10] (03CR) 10Ottomata: Refactor Refine to be triggerd by Airflow (0310 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [17:19:02] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 5 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10043076 (10EBernhardson) a:05EBernhardson→03None While the stack trace indicates this... [17:24:23] 06Data-Engineering, 03Discovery-Search (Current work), 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07Wikimedia-production-error: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string - https://phabricator.wikimedia.org/T286814#10043092 (10EBernha... [18:16:48] 10Data-Engineering (Q1 2024 July 1st - September 30th): [Spike] [Refine Refactoring] List out all production Refine datasets that need to be migrated to the config store (Airflow and Iceberg) - https://phabricator.wikimedia.org/T361498#10043241 (10Ahoelzl) @aqu do you have the stats on last created events per da... [18:25:24] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10043259 (10Liz) Do you have any idea how much longer this job will take? It's been a couple days now. [18:52:17] (03CR) 10Michael Große: docs(image_suggestion_interaction): fix doc message (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1059254 (https://phabricator.wikimedia.org/T335716) (owner: 10Sergio Gimeno) [19:01:07] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 5 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10043372 (10Ottomata) This error message happens when eventgate-wikimedia succeeds in fetc... [19:06:59] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10043398 (10KFrancis) Hi all, I am also confirming we have an NDA on file for @JoelyRooke-WMDE. Thanks! [19:57:18] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10043578 (10Jclark-ctr) [19:57:29] 06Data-Engineering, 06DC-Ops, 10ops-eqiad, 06SRE: Q4:rack/setup/install an-conf100[4-6] - https://phabricator.wikimedia.org/T364429#10043580 (10Jclark-ctr) 05Open→03Resolved [20:06:33] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 5 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10043600 (10EBernhardson) > did these errors happen in a short time frame? They are sprea... [20:16:14] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 5 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10043613 (10EBernhardson) Perhaps one notable bit, it looks like these error messages star... [21:13:14] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board), 10Event-Platform, 10MW-1.43-notes (1.43.0-wmf.15; 2024-07-23): [Event Platform] Instrument EventBus with prometheus MW Statslib - https://phabricator.wikimedia.org/T363587#10043737 (10Ottomata) I enabled on all wikis! Woohoo! [21:13:30] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board), 10Event-Platform, 10MW-1.43-notes (1.43.0-wmf.15; 2024-07-23): [Event Platform] Instrument EventBus with prometheus MW Statslib - https://phabricator.wikimedia.org/T363587#10043738 (10Ottomata) [21:39:26] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Joely Rooke WMDE - https://phabricator.wikimedia.org/T371584#10043764 (10Dzahn) User should be converted from "LDAP-only" to "analytics-privatedata-users" (looks like shell access, but... [21:44:53] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Cynthia Makonyango WMDE - https://phabricator.wikimedia.org/T371689#10043774 (10Dzahn) side comment: Is it technically even possible to have approvals before we know what is being app... [21:47:20] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Cynthia Makonyango WMDE - https://phabricator.wikimedia.org/T371689#10043769 (10Dzahn) I'm pretty sure this access would be like T371584 for Joely Rooke, so analytics-privatedata-user... [21:59:47] 06Data-Engineering, 06SRE, 10SRE-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for Cynthia Makonyango WMDE - https://phabricator.wikimedia.org/T371689#10043786 (10Dzahn) I amended to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1059371 in a way which I think... [22:21:36] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10043806 (10GTrang) The enwiki database is the last remaining database that is still on replag. The commonswiki and testcommonswiki databases are not replagge... [23:22:06] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10043858 (10Liz) I know, right now the replag is 61 hours.