[07:30:17] (03CR) 10Gehel: [C: 04-1] "Clarification comment about static vs instance variables in WebRequest. Ping me for a more synchronous discussion if this isn't clear enou" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [07:41:55] (03CR) 10Gehel: "See inline proposal to cleanup some of the Maven configuration. It seems like a good time to do it (maybe as a preliminary patch), but it " (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/656897 (owner: 10Joal) [07:42:15] joal: je t'ai laissé qq commentaires sur ^ [07:42:26] fais-moi signe si tu veux en discuter en plus de détails [07:46:16] (03CR) 10Gehel: "Minor comments inline, feel free to ping me for more details." (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/686629 (https://phabricator.wikimedia.org/T280649) (owner: 10Milimetric) [09:05:55] (03CR) 10Joal: Add property disabling gobblin lock (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706696 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [09:07:41] (03PS3) 10Joal: Add property disabling gobblin lock [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706696 (https://phabricator.wikimedia.org/T286559) [09:07:55] gehel: Will look into your comments when back from holidays ) [09:08:15] right! sorry for the ping then. Enjoy the holiday! [09:08:26] np gehel - thanks a lot for the reviews :) [09:10:58] joal: byeeeeeeee [09:11:19] elukey: Hey! not gone yet, but packing up everything today :) [09:16:48] 10Analytics: Check home/HDFS leftovers of jkatz - https://phabricator.wikimedia.org/T287235 (10MoritzMuehlenhoff) [12:24:20] (03PS5) 10Joal: Load cassandra3 from spark [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/686629 (https://phabricator.wikimedia.org/T280649) (owner: 10Milimetric) [13:05:34] (03PS3) 10Joal: [WIP] Add cassandra3 to oozie loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) [13:21:15] joal: o/ have you tested those gobblin settings? [13:25:40] Hi ottomata - I have not [13:25:51] ottomata: shall we try them on the test-cluster first? [13:26:11] yaaa lets do that, can you do that without a deploy? [13:26:22] just to make sure the commented out lock.dir doesn't cause that error anyway [13:27:48] ottomata: I think you need to stop puppet, and update the file manually (don't forget to add the new line :) [13:28:04] hm ok ya i'll do that, then force a run [13:28:04] ok [13:28:31] actually don't need to stop puppet [13:28:34] this is refinery [13:29:46] Ah trueP! [13:29:52] running now... [13:30:11] job launched successfully, so i think its working! [13:30:14] merging and deploying [13:30:16] \o/ [13:30:24] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add property disabling gobblin lock [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706696 (https://phabricator.wikimedia.org/T286559) (owner: 10Joal) [13:52:11] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Investigate Hive & Hadoop permissions for users in same group - https://phabricator.wikimedia.org/T285503 (10Ottomata) 05Open→03Declined Ok, finally looked into this. Directories and files do inherit the group ownership of the parent directory, so... [13:54:43] joal: still there? [13:54:49] yessir [13:54:51] https://phabricator.wikimedia.org/T280175 [13:54:56] looking at this [13:54:59] right [13:55:00] want to verify my plan with you [13:55:45] hdfs dfs -ls -d /user/hive/warehouse [13:55:45] drwxrwxrwt - hive hadoop 0 2021-07-13 15:42 /user/hive/warehouse [13:56:13] so, i think i should chgrp that and all subfolders (managed db folders) to analytics-privatedata-users [13:56:24] that'll make any new dbs and tables be properly group owned [13:56:59] i kinda of wantto chgrp -R everything there [13:57:07] but that sounds riskier [13:57:47] ottomata: I think it should be done on a case by case bases - user data should not be visible by other by default, no? [13:58:04] also, it would mean changing ownership to 750, right? [13:58:31] with special ownership for special tables whose need to be written b multiple people [13:58:58] yueah. [13:59:09] yeah [13:59:32] hm, the current hdfs umask settings we have should work...but i do see files world readable in there now.. [13:59:33] hm [13:59:40] will have to do som etesting [13:59:58] at the very least i gguess we should chgrp and chmod the warehouse and .db folders ? [14:00:05] so that new data is cr eated correctly? [14:01:50] ottomata: we can do that [14:02:13] ottomata: this data being managed by hive, I think the ownership and perms are managed differently that on basic HDFS [14:02:30] yueah i think so too, there are different modes with sticky bits set [14:02:33] will investigate that [14:02:35] ottomata: IIRC hive replicates parent folder ownership and perms (not taking umask into account) [14:02:41] i thikn the owner ship looks normal though [14:03:22] ottomata: so we go for: USER:USER for user tables, and USER:analytics-privatedata-users for special tables? [14:03:31] all with 750 or 640 as needed? [14:03:36] hmmm [14:03:41] yeah maybe user:user is right.. [14:04:05] although hm, why not USER:analytics-privatedata-users for all? [14:04:12] actually, i think we have to do t hat [14:04:19] otherwise we can't have a good defalut group ownership [14:04:25] we need to chgrp the /user/hive/warehouse dir to something [14:04:31] ottomata: to prevent user data to be readable b others? [14:04:37] so that when new dbs are created, they don't get what they get now, which is 'hadoop' groupownership [14:04:52] hm [14:04:55] joal i think if user can read privatedata, reading each others data is ok [14:05:12] ok I think it's a safe assumption :) [14:05:30] as for perms [14:05:31] https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cdh_ig_filesystem_perm.html [14:05:40] ottomata: this type of discussion touches the discussion we were having yestrerday - we'll need ranger soon :) [14:05:45] i think we (or some default) just set /user/hive/warehouse to 1777 to start with [14:05:50] i think we want to do differnetly [14:06:03] so 750 on warehouse and also all user dbs [14:06:09] indeed joal [14:06:31] joal, i'll update with plan in ticket and we can discuss with team before i take action [14:06:31] ty [14:07:38] all good ottomata - thanks for caring that [14:09:42] (03PS4) 10Joal: [WIP] Add cassandra3 to oozie loading jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) [14:10:08] (03CR) 10Joal: [V: 03+1] "Tested with an oozie job on cluster :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/686629 (https://phabricator.wikimedia.org/T280649) (owner: 10Milimetric) [14:12:40] joal let me know when you want to catch up today, I'm all yours [14:12:50] 10Analytics, 10Analytics-Kanban: Fix default ownership and permissions for Hive managed databases in /user/hive/warehouse - https://phabricator.wikimedia.org/T280175 (10Ottomata) Just chatted with @JAllemandou, here's what we think we should do. Currently, /user/hive/warehouse has mode 1777 as [[ https://docs... [14:21:32] (03CR) 10Joal: "Tested on cluster with https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/686629." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal) [14:24:33] (03CR) 10Joal: "> Patch Set 4:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/706605 (https://phabricator.wikimedia.org/T280649) (owner: 10Joal) [14:30:07] 10Analytics: Purge gobblin files - https://phabricator.wikimedia.org/T287084 (10Ottomata) Let's just hdfs-cleaner rm them with trash :) [14:32:10] Ok team - I'm gooooOOOOOOnne :) [14:32:21] * elukey hugs joal [14:32:59] See you all folks in a few weeks [14:34:02] btullis: o/ I think that https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/705869 is ready, if you want to merge and test it on aqs100x I am available [14:43:40] 10Analytics, 10Analytics-Kanban: Fix gobblin not writing _IMPORTED flags when runs don't overlap hours - https://phabricator.wikimedia.org/T286343 (10Ottomata) a:03JAllemandou Assigning since this is in Kanban [14:49:10] mforns: you there? [14:49:57] heya ottomata yes [14:50:02] want to do https://phabricator.wikimedia.org/T280293 [14:50:12] got a sec to be my second pair of eyes? [14:50:43] ottomata: yes! [14:50:48] gr8 bc [14:50:52] omw [15:39:06] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Metrics-Platform, 10Product-Data-Infrastructure: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10colewhite) Thinking back to the last discussion, I was under the imp... [15:43:16] 10Analytics, 10Analytics-Kanban: Delete UpperCased eventlogging legacy directories in /wmf/data/event 90 days from 2021-04-15 (after 2021-07-14) - https://phabricator.wikimedia.org/T280293 (10Ottomata) Done with @mforns ` for d in $(sudo -u hdfs hdfs dfs -ls /wmf/data/event | awk '{print $8}' | grep -E '/wmf... [16:03:22] (03PS3) 10Fdans: Alter routing logic to allow value lists [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/694634 (https://phabricator.wikimedia.org/T283596) [16:17:27] (03CR) 10Fdans: [V: 03+2 C: 03+2] Alter routing logic to allow value lists [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/694634 (https://phabricator.wikimedia.org/T283596) (owner: 10Fdans) [16:18:22] (03PS2) 10Fdans: Change state to allow more than one project [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/697797 (https://phabricator.wikimedia.org/T283624) [16:23:40] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Investigate Hive & Hadoop permissions for users in same group - https://phabricator.wikimedia.org/T285503 (10mpopov) Thank you so much for looking into it @Ottomata! This is super helpful and I really appreciate the thoroughness. [16:26:59] (03PS2) 10Fdans: Update languages.json to include newly translated languages [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/705926 (owner: 10Mforns) [16:34:34] (03CR) 10Fdans: [C: 03+2] Update languages.json to include newly translated languages [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/705926 (owner: 10Mforns) [16:55:15] 10Analytics-Clusters: Update ROCm version on GPU instances. - https://phabricator.wikimedia.org/T287267 (10EBernhardson) [18:06:40] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Metrics-Platform, 10Product-Data-Infrastructure: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10Ottomata) Right, I think the work on that just stalled and never got... [18:08:06] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Metrics-Platform, 10Product-Data-Infrastructure: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10Ottomata) That patch still will still have a conflict with the `http... [18:57:31] (03PS1) 10Mforns: Simplify RSVD anomaly detection job for Airflow POC [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/707517 (https://phabricator.wikimedia.org/T285692) [19:42:58] 10Analytics, 10Analytics-Kanban: Deprecate profile::analytics::cluster::users - https://phabricator.wikimedia.org/T287063 (10Ottomata) > Note: allocating users via data.yaml means that they will be deployed across all the hosts managed by puppet (even non analytics ones). This should be fine but it would be go... [19:55:24] 10Analytics, 10Analytics-Kanban: Deprecate profile::analytics::cluster::users - https://phabricator.wikimedia.org/T287063 (10Ottomata) ` 19:54:31 [@cumin1001:/home/otto] $ sudo cumin 'R:Class = profile::hadoop::common' 'for u in swift hdfs yarn mapred analytics druid hadoop analytics-privatedata analytics-prod... [22:36:10] 10Analytics-Radar, 10Growth-Scaling, 10Growth-Team (Current Sprint), 10Patch-For-Review, 10Product-Analytics (Kanban): Growth: update welcome survey aggregation schedule - https://phabricator.wikimedia.org/T275172 (10nettrom_WMF) 05Open→03Resolved This work is completed and the notebook is on GitHub:... [23:12:16] (03PS3) 10Mholloway: Add Refine transform function to add normalized host [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) [23:12:23] (03CR) 10Mholloway: Add Refine transform function to add normalized host (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [23:22:13] (03CR) 10Mholloway: Add Refine transform function to add normalized host (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway) [23:26:17] (03CR) 10Mholloway: Add Refine transform function to add normalized host (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/705021 (https://phabricator.wikimedia.org/T251320) (owner: 10Mholloway)