[07:13:56] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10060454 (10Marostegui) First of all, apologies for the delay this is causing. However, this is coming from a production maintenance and there's not much else... [08:57:19] 06Data-Engineering: Reset kerberos password for WMDE-leszek - https://phabricator.wikimedia.org/T365137#10060787 (10WMDE-leszek) Hello, anything I could do to bump this? [09:01:34] (03PS38) 10Aqu: Refactor Refine to be triggerd by Airflow [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) [09:14:46] (03CR) 10Aqu: Refactor Refine to be triggerd by Airflow (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [10:02:57] !log Temporarily disable gobblin timers to upgrade Airflow T365449 [10:03:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:03:00] T365449: Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449 [11:19:06] 06Data-Engineering, 10Data Products (Data Products Sprint 17), 13Patch-For-Review: Add wikitech (labswiki) to the sqoop list - https://phabricator.wikimedia.org/T217792#10061200 (10phuedx) a:03Milimetric [11:22:25] About to reboot `an-launcher1002.eqiad.wmnet` in the next 5 [11:28:47] !log reboot an-launcher1002.eqiad.wmnet for T365449 and T366555 [11:29:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:29:20] T365449: Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449 [12:11:48] 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394 (10rook) 03NEW [12:21:19] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10061436 (10Marostegui) The ALTER finished on clouddb1017 and the host is catching up at a rate that goes 1:4 (seconds:seconds) or 1:5, depending on the trans... [12:24:38] 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394#10061438 (10github-toolforge-bot) vivian-rook opened https://github.com/toolforge/quarry/pull/62 [12:30:04] 10Quarry: remove k8s_123_2 cluster from tofu - https://phabricator.wikimedia.org/T372397 (10rook) 03NEW [12:40:18] 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394#10061464 (10github-toolforge-bot) vivian-rook closed https://github.com/toolforge/quarry/pull/62 [12:46:08] 10Quarry: Improve idempotency detection with helm diff - https://phabricator.wikimedia.org/T372394#10061471 (10rook) 05Open→03Resolved a:03rook [12:46:40] 10Quarry: remove k8s_123_2 cluster from tofu - https://phabricator.wikimedia.org/T372397#10061479 (10rook) https://github.com/toolforge/quarry/pull/63 [12:46:53] 10Quarry: remove k8s_123_2 cluster from tofu - https://phabricator.wikimedia.org/T372397#10061480 (10rook) 05Open→03Resolved [13:49:38] !log deployed Airflow upgrade to v 2.9.3 for analytics instance. T365449. [13:49:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:49:41] T365449: Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449 [14:02:19] (03CR) 10STran: [C:03+1] Replace references to removed GlobalBlocking message keys [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/1059311 (https://phabricator.wikimedia.org/T332401) (owner: 10Dreamy Jazz) [14:12:23] !log restarting an-db1001 for T366555 [14:13:46] btullis: o/ now you are probably going to throw something at me, but did you see https://phabricator.wikimedia.org/T372257 ? [14:15:59] !log restarting all airflow schedulers [14:16:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:16:55] elukey: Thanks. I had forgotten about it :-) [14:17:18] :D [14:17:20] Will do it after this has finished, although now would have been the perfect time. [14:32:39] (03Abandoned) 10Milimetric: Replace references to removed GlobalBlocking message keys [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/1059311 (https://phabricator.wikimedia.org/T332401) (owner: 10Dreamy Jazz) [14:34:59] (03Abandoned) 10Milimetric: Add a script for checking number of pages published despite failures [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/282312 (https://phabricator.wikimedia.org/T127283) (owner: 10Amire80) [15:03:13] (03CR) 10Ottomata: Enable pivoting with varied casing (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1062152 (https://phabricator.wikimedia.org/T372364) (owner: 10Milimetric) [15:30:14] (03PS2) 10Milimetric: Enable pivoting with varied casing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1062152 (https://phabricator.wikimedia.org/T372364) [15:30:44] (03CR) 10Milimetric: Enable pivoting with varied casing (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1062152 (https://phabricator.wikimedia.org/T372364) (owner: 10Milimetric) [16:00:55] 06Data-Engineering, 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Reset kerberos password for WMDE-leszek - https://phabricator.wikimedia.org/T365137#10062219 (10Gehel) p:05Triage→03High [16:06:44] 06Data-Engineering, 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Reset kerberos password for WMDE-leszek - https://phabricator.wikimedia.org/T365137#10062232 (10BTullis) a:03BTullis [16:11:08] 06Data-Engineering, 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Reset kerberos password for WMDE-leszek - https://phabricator.wikimedia.org/T365137#10062251 (10BTullis) Apologies for having missed this request @WMDE-leszek. It seems to have fallen through the cracks. Following the instructions here: https:... [16:19:36] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 10Release-Engineering-Team (Radar): Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10062265 (10Ottomata) @Stevemunene @BTullis for future upgrades: - Gobblin systemd timer... [17:02:03] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 10Release-Engineering-Team (Radar): Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10062394 (10BTullis) Ah, sorry. That's 100% my fault. This was counter to some previous g... [17:17:20] (03CR) 10Snwachukwu: [C:03+2] Enable pivoting with varied casing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1062152 (https://phabricator.wikimedia.org/T372364) (owner: 10Milimetric) [17:29:01] (03Merged) 10jenkins-bot: Enable pivoting with varied casing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1062152 (https://phabricator.wikimedia.org/T372364) (owner: 10Milimetric) [17:39:45] !log ran the following to kill zombie dumps process from weeks ago: 'kerberos-run-command analytics yarn application -kill application_1719935448343_454537' [17:39:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:28:52] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 10Release-Engineering-Team (Radar): Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10062745 (10Ottomata) > sensors should recover cleanly from a restart, so pausing ingesti... [19:32:55] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 10Release-Engineering-Team (Radar): Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10062756 (10Ottomata) We may have another minor problem: In [[ http://localhost:8600/dag... [19:36:05] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), 10Release-Engineering-Team (Radar): Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10062763 (10Ottomata) Ah, here is more detail on failed sensor log. It can't actually loa... [20:04:04] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10062820 (10Ottomata) No errors since I restarted eventgate-main in codfw. So why did thi... [20:06:26] 06Data-Engineering, 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Reset kerberos password for WMDE-leszek - https://phabricator.wikimedia.org/T365137#10062822 (10WMDE-leszek) 05Open→03Resolved many thanks @BTullis ! Got an email, reset the password, authentication works.