[07:20:11] 10Analytics-Radar, 10Dumps-Generation: xmldatadumps dumpstatus.json files only readable by root - https://phabricator.wikimedia.org/T287989 (10ArielGlenn) 05Open→03Resolved Yes, the underlying issue (cron job that kept running when dumpsdata roles were switched) is fixed. Closing! [08:30:40] 10Analytics-Radar, 10Product-Analytics, 10Growth-Team (Current Sprint), 10MW-1.37-notes (1.37.0-wmf.18; 2021-08-09), 10Patch-For-Review: Add geolocation information to Growth schemas - https://phabricator.wikimedia.org/T287121 (10Tgr) @Krinkle pointed out on https://gerrit.wikimedia.org/r/c/mediawiki/ext... [09:12:42] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) p:05Triage→03Medium a:03BTullis [09:13:20] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) [09:13:26] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban: Deploy an-test-coord1002 as a Ganeti VM to facilitate failover testing of analytics coordinator role - https://phabricator.wikimedia.org/T287864 (10BTullis) [09:29:14] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) I realize that this is a bit of a big VM at 32 GB, but I'm not sur... [09:59:58] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) Proceeding with this now. ` btullis@cumin1001:~$ sudo cookbook sre... [10:13:03] 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10BTullis) [10:20:43] 10Analytics, 10Patch-For-Review: Use types in Analytics Puppet classes/profiles/etc.. - https://phabricator.wikimedia.org/T252617 (10BTullis) a:03BTullis I'll work on this whenever possible and try to apply the technique to all of my new code. [11:03:05] really liked this post on os, sicylla and the dangers of super configuarble systems: https://www.scylladb.com/2019/03/20/discord-on-the-joy-of-opinionated-systems/ [11:13:51] 10Analytics, 10SRE, 10Patch-For-Review: Trash cleanup cron spams on an-test hosts - https://phabricator.wikimedia.org/T286442 (10BTullis) OK, in that case I've done the following to clear this bit of cron spam temporarily. ` btullis@an-test-client1001:~$ ls -l /srv/home ls: cannot access '/srv/home': No suc... [11:36:05] 10Analytics-Clusters, 10 Data-Engineering: LVS in Analytics VLANs - https://phabricator.wikimedia.org/T288750 (10BTullis) I'm also in favour of option 2. I think that it's the cleanest solution and ultimately presents the lowest risk of the three options. I appreciate that this involves work for both the DCOps... [11:41:49] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) `ganeti1016` was allocated as the primary and `ganeti1017` as the... [12:23:32] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) [13:06:34] 10Analytics-Radar, 10Dumps-Generation: xmldatadumps dumpstatus.json files only readable by root - https://phabricator.wikimedia.org/T287989 (10JAllemandou) Many thanks @ArielGlenn, @elukey and @Ottomata for investigating and fixing :) [13:07:54] Thank you for the post nuria :) [13:08:38] Heya team - I didn't deploy yesterday, doing today (refinery only, with restart of the monthly pageview-dump job) [13:09:00] !log Deploying refinery using scap [13:09:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:10:06] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/713662 (owner: 10Joal) [13:26:41] (03Abandoned) 10Michael DiPietro: upgrade quarry to python 3.9 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/710305 (https://phabricator.wikimedia.org/T288249) (owner: 10Michael DiPietro) [13:34:10] !log Deploy refinery onto HDFS [13:34:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:40:27] !log Kill restart pageview-monthly_dump job and 2 backfilling jobs [13:40:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:45:40] 10Analytics, 10 Data-Engineering: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration - https://phabricator.wikimedia.org/T288766 (10odimitrijevic) p:05Triage→03Medium [14:47:53] 10Analytics, 10 Data-Engineering: Deploy an-test-launcher1002 as a Ganeti VM to test high-availability of scheduled jobs - https://phabricator.wikimedia.org/T288767 (10odimitrijevic) p:05Triage→03Medium [14:51:13] 10Analytics, 10 Data-Engineering: Use corosync and pacemaker for presto coordinator active/standby configuration - https://phabricator.wikimedia.org/T287967 (10odimitrijevic) p:05Triage→03Medium [15:04:38] joal: hi! for the cassandra3 checking script, which log files did you refer to? [15:18:42] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10MW-1.37-notes (1.37.0-wmf.17; 2021-08-02), 10Patch-For-Review: EchoMail and EchoInteraction Event Platform Migration - https://phabricator.wikimedia.org/T287210 (10mforns) [15:32:38] btullis: do you want to skip the sync later on? (I see that Razzi and Andrew declined) [15:33:54] Yes, that's fine by me. [16:08:49] (03PS1) 10Lucas Werkmeister (WMDE): Add active_items.php to daily.03.sh [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/714803 (https://phabricator.wikimedia.org/T286903) [16:12:48] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:23:42] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:55:04] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10jcrespo) Hey @BTullis, I run upon this ticket while on clinic duty. Apol... [17:23:06] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) Hi @jcrespo - Sincere apologies. I hadn't meant to bypass the s... [17:30:55] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10jcrespo) I said I didn't intend to block anything, and I mean it. Please... [17:38:34] (03CR) 10Ladsgroup: [C: 03+2] "looool" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/714803 (https://phabricator.wikimedia.org/T286903) (owner: 10Lucas Werkmeister (WMDE)) [17:38:46] (03PS1) 10Ladsgroup: Add active_items.php to daily.03.sh [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/714680 (https://phabricator.wikimedia.org/T286903) [17:38:52] (03CR) 10Ladsgroup: [C: 03+2] Add active_items.php to daily.03.sh [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/714680 (https://phabricator.wikimedia.org/T286903) (owner: 10Ladsgroup) [17:39:39] (03Merged) 10jenkins-bot: Add active_items.php to daily.03.sh [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/714803 (https://phabricator.wikimedia.org/T286903) (owner: 10Lucas Werkmeister (WMDE)) [17:40:03] (03Merged) 10jenkins-bot: Add active_items.php to daily.03.sh [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/714680 (https://phabricator.wikimedia.org/T286903) (owner: 10Ladsgroup) [17:49:23] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10Cmjohnson) [17:57:21] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10Cmjohnson) [18:02:29] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10jcrespo) Hey, @robh, by any chance, is it possible that there could be a... [18:11:58] 10Analytics, 10Analytics-Wikistats: wikistats: montly pageview dumps are not bz2 files - https://phabricator.wikimedia.org/T287684 (10JAllemandou) a:03JAllemandou [18:12:47] 10Analytics, 10Analytics-Wikistats: wikistats: montly pageview dumps are not bz2 files - https://phabricator.wikimedia.org/T287684 (10JAllemandou) The fix has been found and data regeneration is on its way. It will take a few days to get done, please be patient :) [18:20:39] 10Analytics, 10Analytics-Wikistats: wikistats: montly pageview dumps are not bz2 files - https://phabricator.wikimedia.org/T287684 (10Radim.kubacki) Thanks. No problem. [18:30:16] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10RobH) >>! In T289664#7309598, @jcrespo wrote: > Hey, @robh, by any chanc... [18:31:20] 10Analytics-Clusters, 10Analytics-Kanban: Migrate eventlog1002 to buster - https://phabricator.wikimedia.org/T278137 (10Cmjohnson) [18:31:40] joal, how about testing historical pagecounts? [18:32:04] ah, interesting mforns - I had not thought about that one [18:32:29] mforns: actually historical pagecounts don't have pagetitles, so we should be fine [18:33:14] I'll add them to the list of 'generated' checks [18:33:44] Thank you mforns :) [18:33:56] mforns: cause the data is aggregated IIRC :) [18:34:02] yes yes [18:35:40] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10jcrespo) > Please note an 'inventory/spare' host doesn't mean no mgmt ap... [19:31:22] (03PS7) 10Joal: [WIP] Add cassandra3 to oozie loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) [19:46:10] (03PS2) 10Michael DiPietro: celery update [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714632 [19:47:48] (03CR) 10Michael DiPietro: "A few changes to get python 3.7 working on VMs" [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714632 (owner: 10Michael DiPietro) [19:49:46] (03CR) 10Andrew Bogott: [C: 03+1] celery update [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/714632 (owner: 10Michael DiPietro) [22:22:36] 10Analytics, 10 Data-Engineering: Deploy an-test-launcher1002 as a Ganeti VM to test high-availability of scheduled jobs - https://phabricator.wikimedia.org/T288767 (10BTullis) [22:32:38] 10Analytics, 10 Data-Engineering: Deploy an-test-presto1002 as a Ganeti VM to test Presto and Alluxio integration - https://phabricator.wikimedia.org/T288766 (10BTullis) [22:44:15] 10Analytics-Clusters, 10Analytics-Kanban, 10 Data-Engineering, 10Data-Engineering-Kanban, 10vm-requests: Site: Eqiad - 1 VM request for analytics test cluster - coordinator replica role - https://phabricator.wikimedia.org/T289664 (10BTullis) Thanks @jcrespo - that does seem to me like a very viable optio... [23:42:33] (03PS1) 10Shay Nowick: Creating android_setting_action schema Bug: T285779 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714871 (https://phabricator.wikimedia.org/T285779) [23:42:35] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/714871 (https://phabricator.wikimedia.org/T285779) (owner: 10Shay Nowick)