[07:32:53] Will be ~30 mins late to weds mtg [08:17:33] Goals for this quarter published to https://wikitech.wikimedia.org/wiki/Search_Platform/Goals/OKR_2024-2025_Q1 [08:17:50] Yes, I know, we're already half way through the quarter :) [11:51:28] lunch [12:57:44] \o [13:23:27] thx gehel for posting ⭐ [13:28:12] gehel i'm guessing that it would be later tomorrow my afternoon or sometime friday before we'd be confident about graph splits for prod hosts. it appeared that the two eqiad hosts were caught up, and Ryan was working on data transferring the journals to codfw. for me it's probably going to be tomorrow when i can better look closely with David's instructions. from there, then i think it would be connecting the domains and ... [13:29:39] Ryan saw some places in deployment / testing tooling to tidy up - those shouldn't block exposing of the servers to internet mediated traffic and the external facing communication - but i suppose could be considered after actions (although I suppose if he gets some spare minutes they may be done before that) [13:30:38] now, i think this means that the present week would be the better one for the Elasticsearch > OpenSearch email, which i still plan to draft [13:31:27] dr0ptp4kt: sounds good! [13:50:27] sigh..almost embarasing to take 2 days to find this :P The problem with SUP<->private wiki comms is...`x-forwarded-for: 127.0.0.1` [13:51:40] curiously, Special:MyContributions, which i expected to report the generic web req ip, gives the host ip addr, but the authentication scheme must be seeing 127.0.0.1 [13:52:07] fails with that header, works without it. Probably can get away with 127.0.0.1 in the ip whitelist [13:58:18] Hm, the only reason for that header was to mark our requests internal and enable envoy retry configuration (per additional headers) [13:59:56] i can whitelist the address, so it's not a big deal, but it does suggest the ip whitelisting is mostly pointless because any request from the internal networks can claim any ip address itwants [14:05:30] will poke around, maybe would be better for NetworkSession to ignore x-forwarded-for in a similar manner to how usernames do [14:10:16] i suppose on the upside, learned more about k8s restrictions, and learned how to hax up envoy config to log requests to the filesystem inside the container... [14:17:02] Local logging seems helpful. Do you have exec permissions to inspect the logs? [14:17:21] ebernhardson: BTW: here’s the x-forwarded-for ticket: https://phabricator.wikimedia.org/T354853 [14:18:20] to exec requires setting the KUBECONFIG env var to the deploy config, essentially `kube_env ...` sets it to the normal one, you have to override it with the deploy one which is found in same directory but with -deploy- in the name [14:19:23] getting the logs made was a bit tedious though, it's burried in our templated vendor bits, essentially needed to adjust them and deploy from a deployment-charts repo in my home dir since i don't think i'm supposed to change those in the real repo [14:19:53] could maybe work up some way to make it configurable via mesh.* values, but not sure it's worthwhile [14:31:22] updated the whitelist, deployed, and it looks like sup is now working with NetworkSession and should have private wiki access [14:40:10] Awesome! [14:59:38] P&T staff meeting is conflicting with Wednesday meeting. I propose to join the staff meeting first and come back to the Wednesday meeting after it (for those who don't have kids to feed) [15:00:58] Weird, that P&T staff meeting is only in my calendar, not in the WMF Staff calendar. [15:01:02] oh next week? [15:01:16] i'm going to hop onto the search call in a couple minutes, wrapping up talking about vector stuff [15:01:52] Yeah, it looks like the P&T staff meeting is next week, but it shows up in my calendar for this week as well. Something is wrong. [15:02:09] So let's keep the Wednesday meeting for this week, and reschedule it later for next week [19:11:50] lunch [19:45:19] * ebernhardson tries to remember how language selection worked in wikibase entity search, but currently failing :P [20:28:04] inflatador: if you have a chance today the sup<->mwapi thing is working now, can inject the token to the helmfile private settings. Details at https://phabricator.wikimedia.org/P67273 [20:34:50] * dr0ptp4kt ebernhardson: i think he returns tomorrow [20:44:03] no hurry i suppose, unless ryankemper feels like looking into it. It goes into the puppet private repo [20:44:54] found the problem with entity search language overrides in wikibase. sad story :( The language handling was added in the same weeks that WikibaseCirrusSearch was being split out of the Wikibase repo. This code didn't make it somehow :( [20:45:04] ebernhardson: I can do it. My lunch is running super long tho so I’m still 25ish mins from computer [20:45:17] ryankemper: sure, i'll be around [20:45:25] will i need to do any k8s stuff after merging the patch? [20:46:00] ryankemper: a diff should be enough to see it's working, i'm intending to apply the change which will cause the SUP to start sending that token to mwapi [20:46:10] staging is already sending it, so in theory staging shouldn't show a diff [21:31:08] ebernhardson: back in action, looking at the phab now [21:34:21] ebernhardson: so for the required helmfile config, that's supposed to go in /srv/private? I'm having trouble finding the right directory [21:36:56] Oh, it's probably `hieradata/role/common/deployment_server/kubernetes.yaml` [21:47:37] sorry was distracted [21:47:45] ryankemper: it probably goes in the same place the swift key is defined [21:48:27] ebernhardson: that makes sense, that's what i was thinking. so just to confirm, do we still only have SUP in eqiad? because the swift key is only defined for eqiad [21:48:47] ryankemper: at the end puppet puts it in /etc/helmfile-defaults/private/main_services/cirrus-streaming-updater/{eqiad,codfw,staging}.yaml [21:49:18] ryankemper: hmm, the the same swift key gets put in a different file for each env [21:49:18] https://www.irccloud.com/pastebin/RZiRp7RD/what_it_currently_looks_like.txt [21:49:50] ryankemper: so that basically copies the eqiad config to codfw and staging, we want the same here [21:50:00] i think you can put the config at the same level as flink: [21:50:03] ah, that's what the & does [21:50:09] ok [21:50:11] app: and flink: should be the same level [21:50:57] the & basically saves the dict/hash/map to a name, and the * expands into the thing that was saved [21:54:20] ebernhardson: alright how's the following look [21:54:23] https://www.irccloud.com/pastebin/17Kj0p01/ [21:56:56] ryankemper: yea looks like i would expect [21:57:20] then if we run puppet on deploy1003 should be able to see it in a helmfile diff [21:57:25] ebernhardson: what phab ticket should i associate this commit to btw [21:57:48] hmm, sec [21:58:22] I guess i've been using T345185 [21:58:23] T345185: Provide a method for internal services to run api requests for private wikis - https://phabricator.wikimedia.org/T345185 [21:58:42] but the parent task, T341332, might be reasonable [21:58:42] T341332: [EPIC] The CirrusSearch streaming updater should support private wikis - https://phabricator.wikimedia.org/T341332 [22:02:41] Committed, running puppet now on deploy1003 [22:05:22] staging diff looks right, i had only applied the key to consumer-search, so doing an apply so it applies that to producer as well [22:05:26] ebernhardson: okay, looks as expected. diffs for eqiad and codfw but not staging [22:05:50] as far as the files on the deployserver are concerned [22:07:21] ryankemper: yup looks good, thanks! [22:28:35] * ebernhardson found the problem with reindex & backfill skipping the last backfill...and no idea how to add a test for it :P [23:17:10] hmm, eqiad updater is fine but codfw having some problem [23:24:57] how weird...it's complaining that it cant get metadata for codfw.mediawiki.cirrussearch.page_weighted_tags_change.rc0 because "This server does not host this topic-partition". It seems the mirroring didn't create the topics in codfw? [23:25:14] running out of time today, will revert since the topics are empty and not being produced to anyways, can create the topics tomorrow [23:38:22] fixed. also its only the codfw producer, which doesn't really do anything since eqiad is primary right now