[08:59:48] pfischer: o/, we have some settings for scala imports here: https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/discovery-maven-tool-configs/+/refs/heads/master/src/editors/intellij/codestyles/ [09:00:29] sorry I should have told you that before... [10:02:41] dcausse: 1:1 time (but please finish your discussion in -serviceops first!) [10:02:47] oops [10:59:16] lunch [11:03:03] Lunch [14:11:46] o/ [14:19:54] school is cancelled today due to frozen roads, kids are at home [14:21:28] same here but because of strikes against the pension reform :) [14:27:31] good times ;) [14:28:27] Was looking at https://phabricator.wikimedia.org/T304914 , are you OK with me deleting the `flink_ha_storage` in `rdf-streaming-updater-codfw` bucket yet? [14:28:46] I think we were working off of staging last wk? [14:29:52] inflatador: should be flink_ha_storage in rdf-streaming-updater-staging and rdf-streaming-updater-staging+segments (because of s3) [14:30:07] rdf-streaming-updater-codfw should not be touched, it's currently running [14:31:46] got it, will not touch rdf-streaming-updater-codfw . OK if I delete the `flink_ha_storage` folder from `rdf-streaming-updater-staging`> [14:31:47] ? [14:31:59] inflatador: yes [14:40:54] dcausse OK, it's deleting. Are we ready to make a new image yet? [14:42:20] inflatador: not yet, we need to update the values-staging.yaml to stop pointing at swift://flink_ha_storage but change this to s3://flink_ha_storage [14:42:44] this is in the deployment-chart repo (under services/rdf-streaming-updater) [14:42:53] ACK, will get to work on that [14:43:28] helmfile.d/services/rdf-streaming-updater to be precise [15:01:27] any idea what's going on with wdqs1012? guessing someone is testing alerts? [15:02:10] inflatador: no I think the alert is legit select wdqs1012 here: https://grafana-rw.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&viewPanel=32 [15:03:44] and it's not catching up (lag is reasonable on this machine), so most probably blazegraph misbehaving [15:03:50] and needs to be restarted [15:06:11] I didn't think wdqs1012 was in production, but looks like I was wrong [15:06:17] will restart [15:22:27] hmm, the alert cleared after restart, but no improvement in grafana yet, even after I restarted the exporter [15:23:43] should take some time to be visible I think [15:24:04] hm, I think I was fooled by the scale of the graph [15:30:57] OK, PR to remove swift up at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885365 [15:35:00] inflatador: lgtm, did you cleanup the corresponding folder on the +segments swift containner too? [15:36:13] dcausse I didn't see a +segments folder, maybe cause it's not using S3 yet? [15:38:29] inflatador: oh you're right! [15:38:58] in the +segments container I can only see my test savepoints made with flink 1.16 [15:39:30] inflatador: let me know if you need help to deploy this deployment-charts patch, happy to jump in a meet with you [15:51:36] \o [15:52:19] o/ [15:52:58] dcausse I'm up at https://meet.google.com/tak-xney-yhx if you wanna join [16:20:33] joining [16:47:11] hmm, are you supposed to be able to run ./mvnw from a system without maven installed, or is `mvn wrapper:wrapper` intended to be run to download the jar and place it in .mvn/wrapper [16:47:28] (system ~= docker image) [16:51:12] ebernhardson: you are supposed to be able to run ./mvnw without Maven installed locally. Depending on the version of the wrapper that was initially installed, there are multiple mechanisms to download the binaries. [16:51:18] i guess the jar is 60k so not the end of the world to include in the repo, but for some reason i thought the related shell script would parse and fetch the right thing from maven-wrapper.properties [16:52:00] that **should** be the case. It might require wget to be available (or curl, not sure). [16:52:39] for some versions of the wrapper, there was a fallback to a .java class, that would be compiled and used to download the binaries. [16:52:50] oh, maybe it doesn't have curl or wget..lemme check the image [16:52:54] Adding the 60k jar is the easy version! [16:53:12] the image has curl, should be ok. hmm [16:53:22] which project? [16:53:26] mjolnir [16:53:52] it has a small jvm component for the DBN and one aggregator which can't be implemented in pyspark side (i suppose i should check, maybe in spark 3 that piece can move) [16:54:43] looks like it should be able to use both curl or wget [16:55:09] yea i don't know why i didn't think to simply read the script. it would have been obvious it does have a download component :) [16:55:31] and that version even has the Java fallback. So not sure what else could be failing. Is there a proxy in the way? [16:55:43] the actual problem is some permission denied writing a temp file, which i can solve :) [17:01:44] workout, back in ~40 [17:15:41] and the answer is ... the jvm ignores $HOME? -Duser.home=$HOME made things work and gets the jvm to stop trying to write files to places it isn't allowed [17:19:23] ebernhardson: I'm not ready for our ITC. I'll reschedule. Sorry :/ [17:19:24] weird... from inside a docker image or locally? (don't remember having problems with maven wrapper locally) [17:19:43] in the docker image, where /home/{username} doesn't exist and HOME=/tmp/home [17:22:16] oh ok, the jvm is probably sourcing this from /etc/passwd? [17:22:48] perhaps, i was trying to find some docs about how user.home is initialized but turning up a bunch of windows stuff. i suppose not super important, but was surprising [17:23:07] sure [17:23:41] gehel: no worries [17:31:17] and yea it looks to be /etc/passwd it's reading, if i adjust /etc/passwd to not contain the current user it seems to use $HOME [17:43:30] back [18:04:35] small PR for the rdf streaming updater in staging https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/885394 [18:05:57] dinner [18:24:31] o/ this spark upgrade is driving me nuts. I’m down to two failing tests (the last ones in SubgraphUtilsUnitTest.scala) but I can’t figure out why the data frames do not match up as expected. [18:30:21] I redeployed rdf-streaming-updater in staging, this time a new error: `503 Server Error: Service Unavailable for url: https://staging.svc.eqiad.wmnet:4007/jobs/overview` [18:30:36] that is output by the python job, there are no errors in the kubectl logs [18:31:18] But I don't see any flink_ha_storage folder in rdf-streaming-updater-staging bucket yet [18:32:35] FWiW, the staging environment is in CODFW, not sure if it can/should use an EQIAD endpoint [18:32:53] anyway, lunch..back in ~45 or so [19:17:22] Not feeling super well. I'll cancel the rest of my meetings for today (cc: ebernhardson, inflatador, ryankemper) [19:17:50] ack. feel better gehel ! [19:19:28] inflatador: oh right endpoints used by the python script are hardcoded to staging@eqiad :( [19:20:49] feel free to update the flink-job python script to support staging@codfw it's in the deploy repo if you want [19:25:01] back, well wishes to gehel [19:26:38] * ebernhardson is somewhat amazed to be getting a pass from the kokkuri image i setup to run mjolnir with updated spark and the conda env [19:26:49] ryankemper ebernhardson I moved back the pairing session to 2 PM PST since MrG is out [19:27:00] inflatador: kk [19:27:07] sounds good [19:46:56] such a useful type definition: Union[bool, Any] [20:05:57] Mac is acting funky. Gonna reboot and see what happens [20:15:31] back [21:27:34] PR for flink deploy in staging if anyone has a chance to look: https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/885425/ [22:04:29] ryankemper we're in https://meet.google.com/eki-rafx-cxi if you want to join