[07:23:26] btullis: o/ [07:23:54] I had some time to follow up on the weird zkCli.sh issue of the other day, and I found the problem [07:24:17] on the druid nodes we have two openjdks, 11 and 8, and alternatives for /usr/bin/java points to 8 [07:24:30] Druid uses 8, but zookeeper uses 11 [07:24:53] there is a script called by zkCli.sh that hardcodes JAVA=/usr/bin/java [07:25:46] so the zkCli.sh command uses 8 to run, but zookeeper's bytecode is 11, and a runtime error pops up when trying simple commands like 'ls /' in the zookeeper cli [07:26:05] we don't have this problem elsewhere where zookeeper runs since java 11 is the only one deployed [07:27:27] Druid in theory is not 100% ready for 11 (I am reading things like https://github.com/apache/druid/issues/5589) and also it fetches data from Hadoop that uses 8, so we have always tried to keep the same version everywhere [07:27:47] for zookeeper is different since we use the Debian version, that gets built with 11 [08:01:34] 10Analytics: Move the Data Engineering infrastructure to Debian Bullseye - https://phabricator.wikimedia.org/T288804 (10elukey) [08:02:19] opened --^ for Bullseye, added some high level ideas [08:31:33] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10elukey) Before proceeding with what it is indicated in the task's description, there are some extra steps to do: 1) run... [09:03:29] elukey: Thanks. That zkCli research makes perfect sense. So perhaps we could add `/usr/share/zookeeper/bin/zkEnv.sh` to puppet and change the hardcoded JAVA location? [09:32:39] btullis: (sorry I was afk) yes it could be an option, it should hold for the general use case.. it would be great if the script allowed for an override of $JAVA or similar, so that JAVA=blabla /usr/share/zookeeper/bin/etc.. may work [10:03:01] (03CR) 10Kosta Harlan: [C: 03+2] homepagevisit: add 'contributelist' to 'referer_route' value list [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/710414 (https://phabricator.wikimedia.org/T287926) (owner: 10Gergő Tisza) [10:03:38] (03Merged) 10jenkins-bot: homepagevisit: add 'contributelist' to 'referer_route' value list [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/710414 (https://phabricator.wikimedia.org/T287926) (owner: 10Gergő Tisza) [11:32:03] Holaaa, I'm up early [11:32:24] Hi razzi. [11:32:33] Hi Ben, how goes it? [11:34:13] OK, pretty well, thanks. multitasking a bit too much though. Lurking at the hackathon, listening to a video about alluxio and starting to write manifests to install it. [11:36:32] I haven't done any work on the new hdfs worker nodes yet, so nothing to handover to you. [11:37:37] Oh wow! I guess wikimania has started huh [11:38:59] Yeah, well the hackathon bit anyway. I learnt quite a bit about PAWS: https://wikitech.wikimedia.org/wiki/PAWS/PAWS_examples_and_recipes [11:40:42] that's cool [12:01:48] 10Analytics: Cleanup cassandra keyspaces and host - https://phabricator.wikimedia.org/T278231 (10hnowlan) 05Open→03Resolved [14:25:29] 10Analytics-Radar, 10SRE, 10Patch-For-Review, 10Services (watching), 10User-herron: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10elukey) @herron +1 for the new task, opening one [14:44:37] razzi: o/ qq about topicmappr - when the json files are created, I see that they list all the partitions to move in the same file.. did you have to split them up manually? [14:49:49] hm, looks like Hive metastore was down for a bit or something similar, four jobs failed with similar metastor-ey looking errors [14:50:07] I'll restart one and see what happens [14:50:52] I don't think I did anything. [14:51:22] There are some firewall related Icinga errors on an-worker nodes. I wonder if it could be related to those. [14:52:09] !log rerunning webrequest-druid-hourly-wf-2021-8-13-13 because of failure to connect to Hive metastore [14:52:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:52:27] milimetric: o/ what was the full error msg? [15:05:48] oof, I think I can't find it now since I reran it in place, but the email seemed detailed enough "SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://analytics-hive.eqiad.wmnet:10000/default;principal=hive/analytics-hive.eqiad.wmnet@WIKIMEDIA: java.net.UnknownHostException: analytics-hive.eqiad.wmnet" [15:06:16] but don't worry, we got this :) [15:06:53] it reran nicely, so I'm gonna try the other jobs too [15:12:22] the learning features actor hourly job failed around the same time for probably the same reason, though the error is more hidden in the email via Hive2Main abstraction. The link for the failed workflow is https://hue.wikimedia.org/hue/jobbrowser/#!id=0056316-210701181527401-oozie-oozi-W (before I lose that one too :)) [15:15:23] milimetric: there was a network hiccup during the past hour, see #operations, it might be related [15:15:34] elukey: yes, I split the partitions myself [15:16:56] !log reran the other three failed jobs successfully [15:16:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:17:05] makes sense, thx elukey [15:41:41] elukey: hello! o/ Quick Puppet question for you about kerberos::systemd_timer – is there a preferred way to disable it besides removing it or commenting it out? asking for https://gerrit.wikimedia.org/r/c/operations/puppet/+/712422 I didn't see a way to specify ensure => 'absent' [16:11:19] bearloga: hi! It should support ensure => absent [16:11:48] yep I just checked [16:12:26] so the preferred way for us would be to do it in two steps: 1) ensure => absent + puppet run, so everything gets cleaned up 2) code removal [16:12:45] we can also remove everything manually if needed [16:12:56] but it is easier if puppet does the clean up [16:28:14] elukey: oh nice! thank you for clarifying, I'll update the patch accordingly [16:35:18] elukey: fixed! I was also wondering: I can just add it to a backport window, right? [16:35:52] instead of annoying you or andrew :) [16:37:50] elukey: also I do remember you're not on data eng team anymore, just to be clear [16:40:30] bearloga: it is fine to ping, no issue :) [16:41:35] it is also good to remove special things from stat1007! [16:42:09] running pcc [16:42:36] bearloga: ready to go? [16:42:39] if so I'll merge [16:42:42] elukey: absolutely! [16:43:16] elukey: and thank you!!! and yes, love decommissioning legacy stuff and cleaning up tech debt [16:45:14] bearloga: just to confirm, ok to clean up [16:45:16] elukey@stat1007:~$ ls -l /srv/discovery [16:45:16] total 12 [16:45:16] drwxrwxr-x 2 analytics-search analytics-privatedata-users 4096 Aug 13 16:44 log [16:45:19] drwxrwxr-x 136 analytics-search analytics-privatedata-users 4096 Apr 6 15:03 r-library [16:45:22] drwxr-xr-x 6 analytics-search analytics-search-users 4096 Apr 13 14:37 venv [16:45:46] elukey: yep! [16:46:56] !log cleanup /srv/discovery on stat1007 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/712422 [16:47:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:48:22] bearloga: all cleaned! thanks! [16:48:59] elukey: thank YOU! [16:52:51] 10Quarry: Vizquery for quarry - https://phabricator.wikimedia.org/T288841 (10Slowking4) [18:38:11] 10Analytics, 10Product-Analytics: Server-side Event Platform events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10nettrom_WMF) [18:39:34] 10Analytics, 10Product-Analytics: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10nettrom_WMF) [18:52:22] 10Analytics, 10Growth-Team, 10Product-Analytics: Migrated Server-side EventLogging events recording http.client_ip as 127.0.0.1 - https://phabricator.wikimedia.org/T288853 (10nettrom_WMF) [20:47:09] (03PS10) 10Michael DiPietro: add stop query function [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/710067 (https://phabricator.wikimedia.org/T71037) [20:47:40] 10Analytics: Change /user/hive/warehouse/neilpquinn.db/editor_month ownership to iflorez - https://phabricator.wikimedia.org/T288864 (10Iflorez) [20:47:58] 10Analytics: Change /user/hive/warehouse/neilpquinn.db/editor_month ownership to iflorez - https://phabricator.wikimedia.org/T288864 (10Iflorez) p:05Triage→03High [21:50:23] 10Analytics, 10Product-Analytics: Change /user/hive/warehouse/neilpquinn.db/editor_month ownership to iflorez - https://phabricator.wikimedia.org/T288864 (10Mayakp.wiki) [21:50:38] 10Analytics, 10Product-Analytics: Change /user/hive/warehouse/neilpquinn.db/editor_month ownership to iflorez - https://phabricator.wikimedia.org/T288864 (10Mayakp.wiki) [21:51:36] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Change /user/hive/warehouse/wmf_product.db ownership to iflorez - https://phabricator.wikimedia.org/T288657 (10Mayakp.wiki) [21:51:41] (03CR) 10Bstorm: tox: Add python to the allowlist_externals (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/711134 (owner: 10David Caro) [21:52:08] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Change /user/hive/warehouse/wmf_product.db ownership to iflorez - https://phabricator.wikimedia.org/T288657 (10Mayakp.wiki) [21:57:03] (03CR) 10Bstorm: [C: 03+1] "Do you want to deploy this on 3.5 (obviously not on the weekend) and sync up the buster branch or move this to buster? I think it would be" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/711456 (https://phabricator.wikimedia.org/T287471) (owner: 10David Caro) [21:59:10] (03CR) 10Bstorm: upgrade quarry to python 3.7 (031 comment) [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/711208 (https://phabricator.wikimedia.org/T288528) (owner: 10Michael DiPietro) [22:16:57] (03PS3) 10Bstorm: upgrade quarry to python 3.7 [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/711208 (https://phabricator.wikimedia.org/T288528) (owner: 10Michael DiPietro) [22:19:20] (03CR) 10Bstorm: upgrade quarry to python 3.7 (031 comment) [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/711208 (https://phabricator.wikimedia.org/T288528) (owner: 10Michael DiPietro) [22:20:14] (03PS4) 10Bstorm: upgrade quarry to python 3.7 [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/711208 (https://phabricator.wikimedia.org/T288528) (owner: 10Michael DiPietro) [22:22:32] (03CR) 10Bstorm: "Naturally *I* think it's ready for merge now, but that seems unfair, so I'll wait for a +1. I only changed the tox.ini." [analytics/quarry/web] (buster) - 10https://gerrit.wikimedia.org/r/711208 (https://phabricator.wikimedia.org/T288528) (owner: 10Michael DiPietro)