[13:00:50] mornin joal ! [13:15:32] 10Analytics: Refactor webrequest_source partitions and oozie jobs - https://phabricator.wikimedia.org/T116387 (10Ottomata) 05Open→03Declined Never worked on, and we are doing a small change to how webrequest ingestion works as part of T271232, althought it probably won't do what this ticket desires. Moving... [13:19:27] Hi ottomata :) [13:20:21] hello! [13:25:46] ok so! [13:25:54] yes joal empty run is good! [13:26:04] i can merge the systemd timer for webrequest_test anytime [13:26:09] ack [13:26:59] so, empty run now, more-or-less regular manual runs while I have the kids, then timer? [13:27:05] or timer right now? [13:27:11] Oh an [13:27:20] There is another thing we need to do ottomata [13:27:27] joal: i guess we can just do 1 manual run [13:27:29] and then do the timer [13:27:32] no difference really, right? [13:27:34] works for me ottomata [13:27:45] I should have the time to actually do that before the kids :) [13:28:31] ok great! [13:29:37] !log Clean gobblin state_store and data before starting webrequest_test on analytics-test-hadoop [13:29:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:29:57] !log Run first manual empty job for webrequest_test on analytics-test-hadoop [13:29:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:31:03] ottomata: ready for timer test :) [13:31:18] joal: first run done? or you want to do first run with timer? [13:31:22] i guess we could eh? [13:31:28] ottomata: first run done [13:31:32] oh wow [13:31:34] ok merging timer then [13:31:38] ottomata: there was no data :) [13:31:48] oh because of latest? [13:31:53] correct ottomata [13:31:57] ottomata: in case: sudo -u analytics PYTHONPATH=/srv/deployment/analytics/refinery/python:$PYTHONPATH kerberos-run-command analytics /srv/deployment/analytics/refinery/bin/gobblin --sysconfig /srv/deployment/analytics/refinery/gobblin/common/analytics-test-hadoop.sysconfig.properties /srv/deployment/analytics/refinery/gobblin/jobs/webrequest_test.pull [13:32:02] the command I manually ran [13:32:18] great [13:32:38] i'll check that that looks like what is in the systemd timer unit [13:33:06] ack [13:34:29] /usr/local/bin/kerberos-run-command analytics /srv/deployment/analytics/refinery/bin/gobblin --sysconfig=/srv/deployment/analytics/refinery/gobblin/common/analytics-test-hadoop.sysconfig.properties --jar=/srv/deployment/analytics/refinery/artifacts/gobblin-wmf.jar /srv/deployment/analytics/refinery/jobs/webrequest_test.pull [13:35:10] hmm jobconfig file not correct [13:35:14] fixing [13:35:22] ottomata: typo - /srv/deployment/analytics/refinery/jobs/webrequest_test.pull --> /srv/deployment/analytics/refinery/gobblin/jobs/webrequest_test.pull [13:37:00] ottomata: I created a CR to bump the AQS druid snapshot [13:37:21] ottomata: The procedure Luca wrote is here: /srv/deployment/analytics/refinery/jobs/webrequest_test.pull [13:37:24] oops [13:37:32] ottomata: here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS#Deploy_new_History_snapshot_for_Wikistats_Backend [13:37:59] oh ok i merged [13:38:04] lemme get the gobbln thing fixed [13:38:08] then can do the other steps [13:38:36] sure ottomata [13:39:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Replace Camus by Gobblin - https://phabricator.wikimedia.org/T271232 (10Ottomata) [13:44:04] ok joal [13:44:11] shall i force a run of the timer? [13:44:20] sure let's try :) [13:44:55] hmm we need PYTHONPATH [13:44:57] in the unit [13:44:58] fixing [13:47:43] joal: parallelizing [13:47:59] going to restart aqs servers and test on canary [13:48:30] ottomata: ping me when canary is ready, I can testn [13:48:34] ok great [13:49:33] joal: canary ready [13:49:46] testing [13:49:49] eqi aqs1004 [13:49:53] woppsy [13:50:45] all good ottomata - you can finish the deploy [13:50:59] ok [13:53:14] ok joal gobblin job running [13:53:28] checking on yarn ottomata [13:54:03] joal: job just finished [13:54:12] Number of bytes written=232064 [13:54:34] ottomata: there is now data in the correct folder :) [13:54:38] great! [13:54:53] also, FYI aqs restart finished [13:55:09] ottomata: I just checked stats.wikimedia.org - all looks good [13:55:21] ottomata: thank you for the fast actions :) [13:55:33] joal: i think wrong group ownership on webrequest_gobblin? [13:55:46] hm - I don't think :) [13:56:03] hmmm [13:56:15] ottomata: we had it to analytics, but you asked for a change [13:56:16] oh its different than webrequest [13:56:17] but i thtink you are right [13:56:19] RIIGHHHTHTH [13:56:20] very good! [13:56:23] :) [13:56:32] I changed it that was on purpose [13:56:36] the gids are not synced yet [13:56:41] i just saw 99 on an-test-coord [13:56:51] we are waiting for all nodes buster to enforce consistent gids [13:57:03] hdfs dfs -ls looks correct [13:57:23] ottomata: I suggest we let gobblin run while I'm with the kids, then we finalize the migration? [13:57:29] ok! [13:57:34] perfect :) [13:57:42] see you in ~1h30 [13:57:49] ok! [13:58:08] hm the topic dir got written with 777 perms [13:58:08] drwxrwxrwx 3 analytics analytics-privatedata-users 4096 Jul 6 13:53 webrequest_test_text [13:59:50] i guess it doesn't really matter since parent dir is correct [14:01:15] oh, FYI i have a dentist appt in 3 hrs [14:16:22] !log restarted aqs for july mw histroy snapshot deploy [14:16:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:24:59] (03PS1) 10Gerrit maintenance bot: Add jv.wikisource to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/703454 (https://phabricator.wikimedia.org/T286241) [15:34:38] Heya ottomata - Here I am [15:35:37] hello! [15:35:44] runs looking good from here joal [15:35:47] No gobblin failure so far? [15:35:50] \o/ [15:36:05] Let's replicate the plan we have for webrequest? [15:36:11] I'll list here: [15:36:17] ok! [15:36:20] - Stop gobblin and camus [15:36:27] - Move camus data [15:36:34] - Drop table [15:36:38] - Move gobblin data [15:36:46] (dropping incomplete first hour) [15:36:54] - recreate table and partitions [15:37:15] - Update gobblin job + deploy (mwarf) [15:37:18] - restart job [15:37:30] How does that look --^ ? Have I forgotten anything? [15:37:32] to recreate table, do we do msck repair? [15:38:04] we also need a puppet patch to remove camus job [15:38:07] nope ottomata - We recreate schema, and manually create partition with explicit paths (we miss the webrequest_source= bit) [15:38:14] ah right [15:39:06] - And puppet patch to absent the camus job, right [15:39:23] Happy with the plan ottomata? Action? [15:39:24] ok going to stop pupept and the camus and gobblini jobs [15:39:26] ya! [15:39:28] ack! [15:40:21] done [15:40:34] joal i'll prepare a puppet patch if you want to proceed with the data moves [15:40:57] ottomata: currently preparing the refinery patch, will do the data move in a minute [15:41:17] ok [15:43:14] (03PS1) 10Joal: Update webrequest_test gobblin job to prod folder [analytics/refinery] - 10https://gerrit.wikimedia.org/r/703456 (https://phabricator.wikimedia.org/T271232) [15:43:18] ottomata: --^ [15:44:33] (03CR) 10Ottomata: [C: 03+2] Update webrequest_test gobblin job to prod folder [analytics/refinery] - 10https://gerrit.wikimedia.org/r/703456 (https://phabricator.wikimedia.org/T271232) (owner: 10Joal) [15:44:35] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update webrequest_test gobblin job to prod folder [analytics/refinery] - 10https://gerrit.wikimedia.org/r/703456 (https://phabricator.wikimedia.org/T271232) (owner: 10Joal) [15:44:59] joal you want to deploy or shall I? [15:45:11] please go ottomata if you have bandwidth [15:45:14] k [15:46:04] ottomata: please wait [15:46:12] ottomata: we also need to merge the patch for webrequest [15:46:16] ok [15:46:18] ? [15:46:19] I thought it was [15:46:24] oh the table? [15:46:26] correct [15:46:34] i guess that doesn't really matter whern we merge it since it is manual [15:46:35] but ya ok [15:46:41] not only the table definition, also the dataset.xml file (path changes) [15:46:53] OH [15:46:54] ok [15:46:56] hmmm [15:47:01] won't this break regular? [15:47:03] non test? [15:47:23] we'll deploy to test only, and breaking regular would require a job restart - we're safe [15:47:31] ok [15:47:36] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update webrequest for gobblin [analytics/refinery] - 10https://gerrit.wikimedia.org/r/702073 (https://phabricator.wikimedia.org/T271232) (owner: 10Joal) [15:47:54] Thank you ottomata :) [15:48:01] !log deploying refinery to test cluster for webrequest_test gobblin job [15:48:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:51:50] ! log Moved camus and gobblin data [15:52:02] !log Moved camus and gobblin data for webrequest on analytics-test-hadoop [15:52:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:54:45] ok joal deploy finished [15:55:00] !log Drop and recreate wmf_raw.webrequest table [15:55:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:55:18] ack ottomata - doing some checks on data movement [16:01:55] ottomata: all good on my side - Have you deployed refinery onto HDFS? [16:01:59] oh no [16:02:00] doing [16:02:06] ottomata: in any case, ou can restart gobblin :) [16:02:30] ottomata: I can care the HDFS-deploy if you wish [16:02:40] am doing now [16:02:43] ack [16:02:58] ok going to merge the absent camus patch so I can start pupppet [16:03:21] !log Kill webrequest_test oozie job [16:03:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:06:16] ok joal camus webrequest gone [16:06:21] gobblin job should be restart [16:06:31] ok ottomata - let's wait and see :) [16:06:37] Is refinery deployed onto HDFS? [16:06:52] its doing.... [16:07:00] ack - It's always long:) [16:07:59] ok its done joal [16:08:18] ack ottomata - will restart oozie job with an hour to be redone [16:08:22] use wmf; [16:08:24] oops [16:10:04] :q [16:11:42] k! [16:21:03] ottomata: soving test-related issues with oozie - hopefully done soon [16:21:06] ok [16:21:12] i gotta leave for dentist appt in 20 mins [16:22:49] ack ottomata - will continue and let you know when you're back [16:27:53] ok - problem found and solved [16:28:47] what was it? [16:29:20] ottomata: month=${"$"}{MONTH + 0}/day=${"$"}{DAY + 0}/hour=${"$"}{HOUR + 0} doesn't work, as gobblin imports padded hours - we need month=${MONTH}/day=${DAY}/hour=${HOUR} [16:29:25] Sending a patch now [16:29:53] oh nice [16:30:00] even better [16:31:37] ottomata: I had copied the pattern from webrequest without thinking (it was done the correct way for camus originally ) [16:32:17] (03PS1) 10Joal: Fix webrequest datasets_raw.xml for gobblin [analytics/refinery] - 10https://gerrit.wikimedia.org/r/703465 (https://phabricator.wikimedia.org/T271232) [16:32:32] ottomata: If you want, we can do webrequest and netflow in prod when you're back :) [16:34:09] joal: i'm down! [16:34:29] my dentist is very fast, i betcha i'll be back by :40 after the next hour [16:38:00] ack - I'm gonna merge/deploy as need, and starting importing data [16:38:24] +1 k! [16:38:32] ok back shortly! [16:38:36] later! [16:39:17] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merge for deploy of gobblin" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/703465 (https://phabricator.wikimedia.org/T271232) (owner: 10Joal) [16:39:57] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merge for gobblin deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/702075 (https://phabricator.wikimedia.org/T271232) (owner: 10Joal) [16:41:18] !log Deploy refinery for gobblin [16:41:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:07:33] ottomata: I have an impromptu visit - Deploy is taking ages, will continue to make it work [17:31:43] joal: back! [17:31:54] wow - even faster than expected :) [17:32:22] i know! [17:32:44] i even got a couple of superficial fillings (not the kind you have to drill for, just a weird paste they put on top and then shine a UV light on) [17:32:45] AND! [17:32:54] my dentist gave me a cutting of a plant! [17:33:03] :) [17:33:14] Impressive dentist! [17:33:21] !log Deploy refinery onto HDFS [17:33:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:33:52] ok ottomata - test-cluster ok (I had some issues related to restarting the job, settings changes etc - nothing gobblin related) [17:35:41] Currently deploying refinery onto hdfs, and also just launched a manual run of webrequest, will do netflow just after [17:36:12] ottomata: if ok for you, you can deploy timers for both, we leave a couple hours and then switch [17:38:02] on prod cluster? [17:38:07] eah! [17:38:08] ok making patch gimme a fe mins [17:38:09] k [17:38:23] ottomata: will be on/off, I have friends at home :) [17:38:36] k! [17:38:43] ottomata: initial jobs have run for both webrequest and netflow (sudo -u analytics PYTHONPATH=/srv/deployment/analytics/refinery/python:$PYTHONPATH kerberos-run-command analytics /srv/deployment/analytics/refinery/bin/gobblin /srv/deployment/analytics/refinery/gobblin/jobs/netflow.pull [17:39:03] niiiice [17:39:18] and sudo -u analytics PYTHONPATH=/srv/deployment/analytics/refinery/python:$PYTHONPATH kerberos-run-command analytics /srv/deployment/analytics/refinery/bin/gobblin \ [17:39:21] > /srv/deployment/analytics/refinery/gobblin/jobs/webrequest.pull [17:52:31] ok joal jobs declared [17:52:36] \o/ [17:52:51] i guess we have to wait a bit now right? [17:53:16] ottomata: we have a perms issue, but as you noticed, having the parent folder perm setup correctly mitigates for now - I'll investigate soon [17:53:20] ok [20:23:30] ottomata: Hi! Is now the time to switch data and oozie, or do you prefer tomorrow? [20:40:07] ok - let's make it tomorrow :) [21:17:05] ah sorry joal! [21:17:12] lets do tomrrow! [21:17:17] i'll be on by 9 my time, maybe before?