[05:39:03] ebernhardson: (for tomorrow) ^ That's what I was thinking, thus the failure to unfreeze the index (elasticsearch API can't be reached at all). However we did a rolling restart the day before trying the upgrade, and that worked fine. But it might be that it only manifests for either a full reboot or the attempt to upgrade (which would be a bit like a reboot temporally speaking) [05:54:24] dcausse: I spoke with m.oritzm and it sounded like last you guys left off you were trying to figure out if the test itself was broken or if it was actually a problem in the jvmquake application [05:55:15] related to that, your patch for testing jvmquake on wdqs1010 is ready to merge once we get the build side of things figured out: https://gerrit.wikimedia.org/r/c/operations/puppet/+/770978 (but we didn't want to merge it before we figure that stuff out) [07:14:28] hello folks [07:15:19] I see that there is some discussion about relforge, but the cluster is yellow [08:30:46] elukey: o/ [08:30:48] looking [08:34:56] ryankemper: about jvmquake yes that's why I was hoping to merge the patch wdqs1010 (where it's installed manually) to test it before [08:35:28] but it's not OK to merge this patch in puppet I think I might be able to manually restart restart blazegraph with the proper options [08:36:11] s/but/but if/ [08:50:26] relforge has cluster.routing.allocation.enable=primaries set in its transient settings so it won't allocate replicas and will stay yellow til we remove that setting [08:50:56] I guess this was done on purpose to debug something so I'll let it that way (relforge is not actively used) [09:30:35] gehel: you amended https://gerrit.wikimedia.org/r/c/operations/puppet/+/770978 with a ensure_packages requirement, since we want to test jvmquake first what approach should we take? [09:31:26] If we test for just a few days, hacking the package installation is fine. If we plan to test for more than a few days, we should properly manage that package [09:31:39] I'm fine with you reverting that commit if you think it makes more sense [09:32:15] I'm fine restarting blazegraph manually too, I guess that would do the same [09:32:32] that's fine with me as well [09:32:43] ok doing that so [11:10:24] lunch [11:11:18] lunch 2 [12:57:49] greetings [12:57:54] o/ [12:58:02] o/ [13:52:40] inflatador: just to check, did you follow up with moritzm on jvmquake? [13:55:04] gehel: I synced up with Ryan earlier the morning, David is currently testing the package (there was a failing test case and the tests should show whether the functionality itself is broken or whether it's just caused by different circumstances on the build hosts) [14:00:26] building a stress test on wdqs1010 at the moment, hoping to trigger jvmquake behavior [14:04:22] per yesterday's conversation, ryan-kemper was going to reach out to moritz-m. But if I can do anything for d-causse or moritz-m , let me know [14:06:30] moritzm: thanks a lot for the help! [14:07:45] inflatador: thx, I think at this point we only need to wait for wdqs1010 to bail :-) [14:09:57] sounds good. I was going to ask, do we monitor upstream github repos somewhere? I doubt there will be much action on jvmquake, but just so we know when to rebuild [14:11:19] [not sponsored] I personally use libraries.io with my personal GH account to get notified of releases of libraries I care about [14:12:31] LOL @"not sponsored" [14:12:56] we have a bot that can open phab tickets for upstream releases! https://www.mediawiki.org/wiki/Libraryupgrader#Upstream_release_monitoring [14:13:40] Perfect! Thanks taavi [14:15:39] looks like ryan-kemper 's upstream PR might get merged soon https://github.com/Netflix-Skunkworks/jvmquake/pull/9/files [14:39:28] FYI, I'm working on the rolling restart cookbook again, will be touching relforge. Just put in a 24h alert suppression on both rhelforge hosts [14:52:49] and ofcourse when you want blazegraph to die it just stays up... [15:01:07] SDAW Search Experimentation meeting is starting: https://meet.google.com/pxo-cgwu-nog [15:01:36] dcausse, ebernhardson, inflatador, Trey314159 ^ [15:01:40] oops [15:11:49] I'm in [15:23:28] had to drop for doctor appointment, back in ~30 [16:38:56] back from doctor, but my Mac has taken 30m and counting to apply security updates. Hopefully will be back soon... [16:45:33] back [18:40:17] lunch/errands, back in ~45 [19:21:04] back [19:36:29] lunch [20:27:56] back [21:35:02] ryankemper up at https://meet.google.com/ttj-hrua-uot whenever you wanna join [21:35:30] inflatador: cool, there in 4-5 [21:35:46] ACK [23:06:17] latersville!