[03:44:37] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10056500 (10Liz) But, the million dollar question is, when will it ever be completed? Obviously not in 26 hours since we are currently at 209 hours. I know th... [07:03:34] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10056707 (10Rchard2scout) >>! In T367856#10048617, @Ladsgroup wrote: > I checked the wikireplica slave stauts. For clouddb1013 it's: >> Slave_SQL_Running_Stat... [09:18:31] (03CR) 10Jakob: [C:03+2] "Yay no more eval!" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1059396 (https://phabricator.wikimedia.org/T371706) (owner: 10Lucas Werkmeister (WMDE)) [09:33:48] (03CR) 10Jakob: [C:03+2] Add missing `static` to function [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1059395 (owner: 10Lucas Werkmeister (WMDE)) [09:34:39] (03Merged) 10jenkins-bot: Add missing `static` to function [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1059395 (owner: 10Lucas Werkmeister (WMDE)) [09:34:40] (03Merged) 10jenkins-bot: Rewrite WikimediaDbSectionMapper::loadDbMap() [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1059396 (https://phabricator.wikimedia.org/T371706) (owner: 10Lucas Werkmeister (WMDE)) [09:38:59] (03PS1) 10Lucas Werkmeister (WMDE): Add missing `static` to function [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1061963 [09:39:46] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] Add missing `static` to function [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1061963 (owner: 10Lucas Werkmeister (WMDE)) [09:40:17] (03Merged) 10jenkins-bot: Add missing `static` to function [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1061963 (owner: 10Lucas Werkmeister (WMDE)) [09:40:29] (03PS1) 10Lucas Werkmeister (WMDE): Rewrite WikimediaDbSectionMapper::loadDbMap() [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1061964 (https://phabricator.wikimedia.org/T371706) [09:42:59] 06Data-Engineering, 10Data-Platform-SRE (2024.07.29 - 2024.08.16): request for new matomo site: trace.wikimedia.org/ - https://phabricator.wikimedia.org/T371124#10057024 (10BTullis) a:03BTullis [09:43:22] (03CR) 10Lucas Werkmeister (WMDE): [C:03+2] Rewrite WikimediaDbSectionMapper::loadDbMap() [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1061964 (https://phabricator.wikimedia.org/T371706) (owner: 10Lucas Werkmeister (WMDE)) [09:43:52] (03Merged) 10jenkins-bot: Rewrite WikimediaDbSectionMapper::loadDbMap() [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1061964 (https://phabricator.wikimedia.org/T371706) (owner: 10Lucas Werkmeister (WMDE)) [10:06:40] (03Abandoned) 10Lucas Werkmeister (WMDE): DNM: hacks to test the parent change locally [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/1059397 (https://phabricator.wikimedia.org/T371706) (owner: 10Lucas Werkmeister (WMDE)) [11:19:21] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057226 (10Ladsgroup) >>! In T367856#10056707, @Rchard2scout wrote: > @Ladsgroup, am I right in thinking that once the copy to the tmp table is done, the tmp... [11:25:15] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057258 (10Ladsgroup) I just depooled it. Everything is now using the web database for now (clouddb1013) that means they will have a lower timeout for a whil... [12:43:33] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057397 (10fnegri) Ladsgroup: thanks, I was planning to do the same if the lag kept on increasing. The temp table is now at 262G so hopefully the lag on clo... [12:44:48] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057402 (10fnegri) I was wrong: the temp table is only at 87G: ` root@clouddb1017:/srv/sqldata.s1/enwiki# ls -Ssh | head total 1.1T 262G revision.ibd 97G p... [12:50:20] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057570 (10fnegri) I have increased the timeout on clouddb1013 to 10800, I will check if clouddb1013 can handle the load. ` fnegri@clouddb1013:~$ sudo vi /e... [13:00:37] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), and 2 others: Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10057592 (10Stevemunene) After testing on the `an-test-client-1002` we are ready to start the deployment. First I... [13:05:11] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057598 (10fnegri) > Could it be possible that some really big jobs on 'enwiki' that started before the maintainence are still running and are stuck in a loo... [13:05:25] !log Bump airflow version on `an-test-client1002` T365449 [13:05:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:05:28] T365449: Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449 [13:17:55] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10057633 (10RedDirector) Thanks for all the work that has been done on the maintenance. [13:26:27] 06Data-Engineering, 06collaboration-services, 10Data Pipelines, 10Data-Platform-SRE (2024.07.29 - 2024.08.16), and 2 others: Upgrade Airflow to 2.9.3 - https://phabricator.wikimedia.org/T365449#10057654 (10Stevemunene) Upgraded the test instance to v2.9.3 by re enabling puppet and running puppet, then rest... [13:36:33] 10Data-Engineering (Q1 2024 July 1st - September 30th), 06Data-Platform-SRE: Add Retry package to airflow conda environment - https://phabricator.wikimedia.org/T372279 (10Snwachukwu) 03NEW [13:41:12] 06Data-Engineering: Change the way Refine handles its status (currently flags in partitions) - https://phabricator.wikimedia.org/T312785#10057751 (10Ottomata) →14Duplicate dup:03T369900 [13:41:29] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Dumps 2.0 (Kanban Board): Develop Airflow ExternalTaskSensor to orchestrate DAG dependencies - https://phabricator.wikimedia.org/T369900#10057753 (10Ottomata) [13:43:14] 10Data-Engineering (Q1 2024 July 1st - September 30th): [Refine Refactoring] [Spike] Define a concept and provide a PoC for dynamic DAG execution in Airflow - https://phabricator.wikimedia.org/T356362#10057757 (10Ottomata) 05Open→03Resolved Being bold and resolving. This has been done for canary events a... [13:46:17] 06Data-Engineering, 10Data Pipelines, 13Patch-For-Review: Fix generation of _IMPORTED flags by Gobblin - https://phabricator.wikimedia.org/T365223#10057792 (10Ottomata) @Antoine_Quhen now that canary events are being produced twice an hour, can we resolve this? [13:48:54] 06Data-Engineering, 13Patch-For-Review: Timeout hive-metastore locks - https://phabricator.wikimedia.org/T365563#10057806 (10Ottomata) @Antoine_Quhen can you add just a little bit more info to this task please? Why do we need want this? What happens if we don't do it? I see there is a patch and this is conn... [13:54:48] (03CR) 10Ottomata: Refactor Refine to be triggerd by Airflow (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1016808 (https://phabricator.wikimedia.org/T356762) (owner: 10Aqu) [14:37:15] 06Data-Engineering, 03Discovery-Search (Current work), 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07Wikimedia-production-error: '.event.pageViewId' should be string, '.event.subTest' should be string, '.event.searchSessionId' should be string - https://phabricator.wikimedia.org/T286814#10057978 (10EBernha... [14:40:19] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Add Retry package to airflow conda environment - https://phabricator.wikimedia.org/T372279#10057992 (10BTullis) a:05BTullis→03Stevemunene Assigning this to @Stevemunene because he has already started work... [14:40:29] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data-Platform-SRE (2024.07.29 - 2024.08.16): Add Retry package to airflow conda environment - https://phabricator.wikimedia.org/T372279#10057997 (10BTullis) p:05Triage→03High [15:33:21] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10058226 (10dr0ptp4kt) Untagging Search Platform, so that Growth and folks on EventGate (D... [15:39:08] 06Data-Engineering, 06Java-Scala-Standardization, 10Release-Engineering-Team (Radar): Java projects hosted on Gerrit should publish artifacts to Gitlab - https://phabricator.wikimedia.org/T370400#10058278 (10Gehel) [15:53:02] (03PS1) 10Milimetric: Remove scripts related to old hive version [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1062044 (https://phabricator.wikimedia.org/T342267) [15:54:27] (03CR) 10Milimetric: [V:03+2 C:03+2] Remove scripts related to old hive version [analytics/refinery] - 10https://gerrit.wikimedia.org/r/1062044 (https://phabricator.wikimedia.org/T342267) (owner: 10Milimetric) [16:22:43] 10Data-Engineering (Q1 2024 July 1st - September 30th): [Refine Refactoring] Refine jobs should be scheduled by Airflow: deployment - https://phabricator.wikimedia.org/T369845#10058395 (10Ottomata) a:03Antoine_Quhen [16:24:29] 06Data-Engineering, 10Data Pipelines: Refine jobs should be scheduled by Airflow - https://phabricator.wikimedia.org/T307505#10058418 (10Ottomata) > How did we over come the 'gotcha' described in the task description here? >> The main issue is that the source data for the Refine pipeline can be updated after i... [16:26:04] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10058428 (10Liz) Yes, looks like the long wait is over. Thank you for helping the servers/system run smoothly. [16:57:25] (03CR) 10STran: [C:03+2] Replace mentions to removed GlobalBlocking message keys [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1059307 (https://phabricator.wikimedia.org/T332401) (owner: 10Dreamy Jazz) [16:58:02] (03Merged) 10jenkins-bot: Replace mentions to removed GlobalBlocking message keys [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/1059307 (https://phabricator.wikimedia.org/T332401) (owner: 10Dreamy Jazz) [16:58:50] 06Data-Engineering, 06Data Products, 06DBA, 07Schema-change-in-production: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856#10058604 (10fnegri) > I suggest depooling clouddb1017 now, that'd redirect all the traffic to web which is caught up and it also makes clouddb1017 move forwar... [17:06:08] !log Ran " ALTER TABLE wmf_dumps.wikitext_inconsistent_rows_rc1 SET TBLPROPERTIES ( 'commit.retry.num-retries' = '10' ); ". T368756. [17:06:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:06:12] T368756: Airflow job to orchestrate the dumps reconcilliation emission mechanism - https://phabricator.wikimedia.org/T368756 [18:45:20] 10Data-Engineering (Q1 2024 July 1st - September 30th), 10Data Pipelines, 10Data-Catalog: Spike: Integrate Spark with DataHub - https://phabricator.wikimedia.org/T306896#10059094 (10tchin) [19:14:22] 07Analytics-Data-Problem, 06Data-Engineering, 10Data-Engineering-Dashiki, 10Data Products (Data Products Sprint 17), 10MediaWiki-Platform-Team (Radar): Investigate surprising "10% Other" portion of Analytics Browsers report - https://phabricator.wikimedia.org/T342267#10059304 (10Milimetric) the new graph... [19:32:16] 06Data-Engineering, 06Data Products, 06Traffic: New software: haproxykafka - https://phabricator.wikimedia.org/T370668#10059470 (10Fabfur) [19:42:55] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10059525 (10Ottomata) > i rolled another restart on eventgate-main, but it doesn't seem to... [19:46:39] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10059533 (10Ottomata) Actually, just going to do a rolling restart. [19:51:08] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10059544 (10Ottomata) Let's watch and see... [19:55:00] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10059548 (10Ottomata) > I think codfw is currently active DC, and eventgate-main pods hav... [20:28:07] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10059746 (10EBernhardson) Interesting! Poking at my bash history, it looks like i rolled t... [20:51:59] 06Data-Engineering, 10Add-Link, 10CirrusSearch, 06Growth-Team, and 4 others: revalidateLinkRecommendations.php fails periodically with JobQueueError: Could not enqueue jobs - https://phabricator.wikimedia.org/T371767#10059819 (10Ottomata) > would expect the large majority of jobs to come from the active dc... [22:59:54] 07Analytics-Data-Problem, 06Data-Engineering, 10Data-Engineering-Dashiki, 10Data Products (Data Products Sprint 17), 10MediaWiki-Platform-Team (Radar): Investigate surprising "10% Other" portion of Analytics Browsers report - https://phabricator.wikimedia.org/T342267#10060142 (10Krinkle) Hm.. it seems th... [23:07:34] 06Data-Engineering, 10Data Products (Data Products Sprint 17): Bug: pivot does not handle varied casing - https://phabricator.wikimedia.org/T372364 (10Milimetric) 03NEW [23:08:59] (03PS1) 10Milimetric: Enable pivoting with varied casing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/1062152 (https://phabricator.wikimedia.org/T372364)