[01:21:21] ebernhardson: okay, we're in `lvs_setup` now (w/ the rolling restarts of low-traffic pybal hosts done) [01:22:08] I can't get a curl of `curl -v -k http://wcqs.svc.eqiad.wmnet` from an arbitrary host (cumin in this case) to succeed, but I think the dns discovery plumbing might be missing. Need to step out now but will take a look later [03:30:59] Actually using https "works", I do get a 500 service error but it at least doesn't get connection refused the way the http does [03:31:26] I checked the dns steps in the LVS docu and we've got all the stuff in place that we should for the lvs_setup so I think everything's working as intended [10:38:09] lunch + errand (another trip to the hospital - still nothing bad) [11:08:25] lunch [11:37:05] ejoseph: I'm available for plugin work if you want [12:01:03] gehel: I'm did want to talk about yesterday fresh out of the VP meeting, but I'd love a discussion today, before or after meeting with Maryana [12:06:42] hmm, I'm not sure why, but after imporitng settings to intellij, it still sorts imports in the order that checkstyle doesn't accept [12:44:13] Those settings were created a long time ago. The format might have changed [12:44:56] The settings are versioned in https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/discovery/discovery-maven-tool-configs/+/refs/heads/master/src/editors/intellij [12:45:15] If you find a new format that works better, feel free to submit a patch! [12:45:42] I'll join the open hangout at 3pm CET, we can have a discussion there. [12:45:51] Or jump in right now if someone is interested [13:26:19] Need to make lunch, let's follow up on open hangout later [13:59:55] open hangout is now opened: https://meet.google.com/ugw-nsih-qyw [14:09:13] I need to finish my lunch, I'll be there in 15min [15:14:40] Car broke down, so I need to take My daughterto extra activities by bus and wait for her, will be gone for today [15:41:10] zpapierski: good luck! [15:41:41] Trey314159, ebernhardson, ryankemper, mpham and others: open hangout is open if you want to join: https://meet.google.com/ugw-nsih-qyw [15:41:54] there might be more activity in there for the rest of the day [15:53:45] \o [15:57:59] o/ [16:01:33] * ebernhardson has no clue why :9999 on all the wcqs machines returns 503. But i guess i'll find out today :P [16:09:21] 9999 is blazegraph iirc but 503 I'm not sure... I don't think it's some code we handle, might be jetty or blazegraph? [16:13:27] it's gotta be the jetty level givin 503, i would have expected something different for throwing exceptions. Not sure what makes jetty give 503 [16:21:05] probably related Wrapped by: org.openrdf.rio.RDFHandlerException: org.eclipse.jetty.io.EofException [16:22:45] :/ [16:23:31] The EofException has the message of `null`, so that should help :) [17:21:34] dcausse: re https://gerrit.wikimedia.org/r/c/operations/puppet/+/743216/, were we accidentally overwriting the `extra_jvm_opts` previously, resulting in us omitting `'-Dwdqs.throttling-filter.time-bucket-capacity-in-seconds=240', '-Dwdqs.throttling-filter.time-bucket-refill-amount-in-seconds=120', '-Dwdqs.throttling-filter.ban-duration-in-minutes=60'`? [17:21:54] ie per pcc wdqs2008's `/etc/default/wdqs-blazegraph` is changing: https://puppet-compiler.wmflabs.org/compiler1002/1113/wdqs2008.codfw.wmnet/index.html [17:22:29] ryankemper: I think I broke that couple month ago when switching to the streaming updater on this machine :/ [17:22:57] dcausse: makes sense, was just wondering what was going on there [17:23:04] dcausse: patch looks ready to merge, any objection to me doing so? [17:23:09] I think it's to relax the throttling mechanism on the internal cluster [17:23:14] ryankemper: fine by me [17:48:50] What would be querying wcqs with Twisted PageGetter as UA? At first i thought it would be the prometheus metric but those use a dedicated UA [17:49:09] is that LVS? [17:51:04] yes, lvs/pybal monitoring [17:51:23] hmm, should fix up it's UA to say so. will have to find where thatis [17:56:46] ebernhardson: its probably somewhere around . It looks like there is a prop for changing it in the twisted side -- https://github.com/twisted/twisted/blob/trunk/src/twisted/web/client.py#L95 [17:58:44] bd808: thanks! Much faster than me...i only managed to open the .deb and see whats even in pybal :) [18:00:58] yw. I was in a nerd snipe friendly place while checking backscroll ;) [20:25:12] * ebernhardson goes back to guessing at blazegraph. i love that searching for any error it emits either returns the apidocs or the github source, noone else uses this :P [20:29:34] the odd thing is i'm now getting success from localhost/readiness-probe, but the error log still keeps filling with EOF exceptions every 10s that lvs pings [20:49:15] meh, maybe the EOF's are unrelated. prometheus metrics collection causes the same EOF errors, but :9195/metrics seems to "work" and config-master seems to think wcqs is fully pooled. But wcqs.svc.eqiad.wmnet:80 is still no-connect [20:51:56] sigh, i'm not very smart :P https://wcqs.svc.eqiad.wmnet/ is fine. The problem is the auth nginx is fronting returning 503's [20:58:15] (the EofException is also a problem, wdqs doesn't do it, but appears unrelated)