[10:10:14] lunch [10:32:00] lunch 2 [12:51:46] I tried a food yesterday that’s not agreeing with my stomach, I still don’t feel better [12:56:56] good day [12:57:14] o/ [12:57:27] bon jour [13:04:38] Info: PDU maintenance happening again today ( https://phabricator.wikimedia.org/T310145 ) , we will be shutting down a few hosts in codfw [13:28:58] gehel: restarting pc, will be 2 min late [13:29:04] ack [14:18:31] tanny411: my previous meeting was shorter than expected. I'm already in https://meet.google.com/ecf-yucv-icg if you have time to ump in [15:02:04] \o [15:04:07] o/ [15:18:01] ebernhardson we still haven't deployed changeprop unfortunately. No response in serviceops. Once we get done with the PDU maintenance, we'll try again [15:20:26] asking for a friend: Is it possible to point your local Special:Search to a production search API? [15:20:32] Seddon: ^ [15:22:35] * Seddon appears [15:30:05] depends what you mean, of course you could port forward things around such that your local development environment is connecting to production servers but thats risky [15:30:36] port forwarding into cloudelastic would be better, but cloudelastic is partially out of sync at the moment [15:30:43] whats the goal? [15:31:51] Seddon: ^ [15:31:53] To be able to leverage the full variety of content that is in production locally without having to faff with imports etc. [15:33:04] the difficulty is cirrus doesn't have any concept of a "read-only" service, if you configure your development environment to talk to production it will end up writing to production at some point or another on accident [15:33:49] I suspect that this is related to work on UI. In this context, is it possible to not connect to Elasticsearch, but to use action API to get the results from production? [15:34:07] Which would be a lot simpler (public API) and safer [15:35:03] I barely remember Gergo making a SearchEngine wrapper around the action api, looking [15:44:14] not finding it... I might be wrong [15:45:19] search SRE around, wdqs hosts are still up in C6 in codfw [15:45:33] ryankemper: ^? [15:46:35] people are on it [15:47:25] Amir1 ryankemper I'm on it [15:47:33] Thank you [16:03:17] for those flink job not running alerts, we had to shut off a couple of wdqs servers in codfw for the PDU maintenance [16:03:23] Are those alerts at all related? [16:04:02] if there are k8s nodes down it might be the reason, looking [16:07:20] that's probably it, been some chatter in serviceops about that [16:10:44] I think it's fine to ignore the updater in codfw as I think I've seen joe switching wdqs traffic from eqiad only [16:11:03] excellent, thank you for checking [16:13:27] yes in yesterday's SAL: 11:26 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wdqs [16:15:11] we could silence these streaming updater@codfw alerts if they're causing noise, there's no user impact since no users should be hitting codfw [16:54:56] it tricks the system to make the "wiki" index seen as wiktionary and the cirrustestwiki index seen as wikibooks [16:55:20] sadly I can't test if it's working as my vagrant box no longer works [17:00:34] Seddon: it most probably won't work directly, I'll try to test that on my setup tomorrow [17:01:14] dinner [17:02:14] dcausse: No worries. At some point it might be great to build in an external API provider configuration for local dev. Having that in mediasearch has been a godsend [17:12:47] hmm, something about POST'ing to commons-query.wikimedia.org gives 500's, while GET doesn't. Have no clue why the query service ui sometimes chooses to post things [17:13:45] also everything i can see so far suggests refreshing the page makes it work again. Which makes me dubious that people would have complained so much about breaking all their workflows, since refreshing a page is a fairly first-step debugging thing. Maybe the thing i've reproduced isn't really what their problem is :S [17:19:54] lunch, back in ~45 [18:04:42] dcausse: I had to tweak slightly but got this working locally :D [18:05:17] The setup worked [18:13:24] yay [18:21:37] should have been more obvious previously, it seems the UI tries a get first, if that fails it post's. So the GET runs, fails due to CORS, they retry as a POST and that 500's for unknown reasons [18:22:40] they interpret no response headers as 'browser did not send the request' when in this case its 'browser refuses to show you the response due to security concerns' [18:24:14] completely unrelatedly, with the number of problems we have in multiple repositories with 'change didn't deploy because a separate variable declaring the version number didn't change' it almost feels like actual version numbers might be an anti-pattern :P [18:25:09] ^^ you read my mind on that one ;) [18:25:33] also seems like it would be pretty easy to check for that in CI ;P [18:26:51] inflatador: reminds me, https://gerrit.wikimedia.org/r/c/operations/software/elasticsearch/plugins/+/804004 needs to be merged which doesn't exactly make CI check version numbers, but does make the makefile you should be running to prepare a commit fail [18:27:22] i tihnk it can simply be CR+2'd and jenkins will merge with no further action necessary [18:27:57] hmm, actually it might be a manual merge repository [18:28:12] so CR+2 and click the merge button :) [18:28:59] * inflatador is possessed by the spirit of ebernhardson [18:29:16] :P [18:30:07] I did it! [18:31:06] thanks, at least one less repo that will (hopefully) not fail in the future due to version numbers... [18:31:25] unclear if thats fixable in the helm chart, not sure how testing works there [18:51:26] ebernhardson we're deploying changeprop in https://meet.google.com/eki-rafx-cxi if you feel like popping in [18:51:37] sure [20:07:55] ryankemper: a reminder that we should be making sure to include WCQS as well when putting together the SLI/SLO for WDQS [20:11:05] ryankemper: you've lost audio in meet [20:26:34] mpham: ack [21:23:52] ryankemper: any luck shipping the eqiad changes? [21:43:43] ebernhardson no luck last I checked [22:09:55] ebernhardson: nah we got held up on the codfw changes, rollout failed due to the pods being unable to completely schedule on nodes due to the amount of memory the pods request from the nodes [22:10:16] ebernhardson: I went ahead and asked for advice in service-ops