[03:30:04] 10serviceops, 10Observability-Metrics, 10SRE, 10Maps (Kartotherian): Get Kartotherian SLO metrics into Prometheus - https://phabricator.wikimedia.org/T320748 (10lmata) [03:32:20] 10serviceops, 10Maps, 10Observability-Metrics, 10SRE, and 2 others: SLO dashboards with N latency targets - https://phabricator.wikimedia.org/T320749 (10lmata) [07:38:46] <_joe_> mutante: there is documentation for what you want to do [07:39:03] <_joe_> which is dismissing an LVS endpoint [07:54:27] mornin' [07:57:30] 10serviceops, 10Discovery-Search, 10SRE, 10serviceops-collab, 10Technical-Debt: Sunset search.wikimedia.org service - https://phabricator.wikimedia.org/T316296 (10Clement_Goubert) @mpopov @Gehel This was clarified in our ServiceOps meeting. We are not touching the `search` and `search-https` services. [08:18:24] 10serviceops, 10Kubernetes, 10Patch-For-Review: Remove kubeyaml from deployment-charts CI - https://phabricator.wikimedia.org/T316348 (10Clement_Goubert) Thanks for the deployment @Jdforrester-WMF :) [08:52:38] hi folks, can I stop apache on mw1339 temporarily? I'd like to test a thing for https://gerrit.wikimedia.org/r/c/operations/alerts/+/841905 (host is depooled) [08:54:53] 10serviceops, 10SRE, 10Performance-Team (Radar): Remove nutcracker from mediawiki chart - https://phabricator.wikimedia.org/T321042 (10Clement_Goubert) a:05jijiki→03Clement_Goubert [08:57:33] sorry for the direct ping, maybe _joe_ or claime ^ ? [08:58:05] <_joe_> godog: just depool it [08:58:24] yep, agreed with _joe_ if it's depooled, go nuts [08:58:28] _joe_: the host is depooled already yeah [08:58:30] ok! thank you [08:58:55] not going to touch its pooled/depooled status fwiw [08:59:54] ack [09:00:26] <_joe_> no idea why it's depooled... [09:01:08] godog did you depool it? [09:01:30] claime: no I didn't, it was already depooled [09:01:35] hmm [09:01:39] <_joe_> let me search SAL [09:01:45] ack [09:01:52] I'm done with my tests btw on mw1339 [09:02:05] That was fast lol [09:05:00] heheh yeah I'm minimally invasive [09:05:16] I have no idea, latest info I can find on mw1339 is from last february and it's actually pooling the host [09:05:30] Deployment hiccup? [09:05:43] fwiw mw1410 is also depooled, I'm looking at this [09:05:44] puppetmaster1001:~$ grep -ir enabled.*false /srv/config-master/pybal | grep mw [09:06:12] Same, last ref in SAL is from last feb [09:06:14] anyways, I'll go back to the original problem [09:06:22] wth [09:28:48] 10serviceops, 10Discovery-Search, 10SRE, 10serviceops-collab, and 2 others: Sunset search.wikimedia.org service - https://phabricator.wikimedia.org/T316296 (10Clement_Goubert) Disregard the above related patch, I fumbled the Bug id. [09:54:27] 10serviceops, 10Discovery-Search, 10SRE, 10serviceops-collab, and 2 others: Sunset search.wikimedia.org service - https://phabricator.wikimedia.org/T316296 (10Clement_Goubert) Restored the trafficserver search.wikimedia.org removal patch. As I understand it, removing this mapping will stop traffic to the... [11:12:59] 10serviceops, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Remove nutcracker from mediawiki chart - https://phabricator.wikimedia.org/T321042 (10Clement_Goubert) ` root@deploy1002:/srv/deployment-charts/helmfile.d/services/mwdebug# kube_env mwdebug codfw root@deploy1002:/srv/deployment-charts/hel... [11:19:30] 10serviceops, 10Observability-Tracing, 10Epic: OpenTelemetry Collector puppetized and able to be deployed easily to arbitrary roles - https://phabricator.wikimedia.org/T320565 (10Clement_Goubert) a:03Clement_Goubert [11:20:03] 10serviceops, 10Observability-Tracing, 10Epic: Package OpenTelemetry Collector atop our own base Docker images - https://phabricator.wikimedia.org/T320552 (10Clement_Goubert) a:03Clement_Goubert [11:25:00] 10serviceops, 10Observability-Tracing, 10Epic: OpenTelemetry Collector running as a DaemonSet on Wikikube - https://phabricator.wikimedia.org/T320564 (10Clement_Goubert) [13:11:14] 10serviceops, 10Observability-Tracing, 10Epic: Package OpenTelemetry Collector as a .deb - https://phabricator.wikimedia.org/T320551 (10fgiunchedi) I took a quick look at [[ https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.62.1/otelcol-contrib_0.62.1_linux_amd64.deb | o... [13:54:01] hi, the localhost:6500 proxy that we reference in deployment-charts, for proxying requests to MediaWiki –– does that block redirects, somehow? I'm trying to figure out why outbound requests from linkrecommendation service to MW for be-x-old.wikipedia.org don't redirect to be-tarask.wikipedia.org [13:55:21] <_joe_> kostajh: uhhh not that I know of [13:55:44] <_joe_> what do you mean by "blocking redirects"? [13:56:35] i'll try to explain, give me a moment [13:57:15] a POST request to linkrecommendation service has all the info needed to get a response, so we don't use the proxy to MW API at all [13:57:38] for GET requests, we use the proxy to call the MW API to get things like revision ID for a title [13:58:29] <_joe_> ok [13:58:52] when I run the app locally, if I do a request like `curl "http://localhost:8080/v1/linkrecommendations/wikipedia/be_x_old/Foo"`, the app converts be_x_old to be-x-old and makes an HTTP request to be-x-old.wikipedia.org/w/api.php. [13:59:16] <_joe_> ok [13:59:17] be-x-old.wikipedia.org/w/api.php is a redirect to be-tarask.wikipedia.org, though [13:59:28] <_joe_> what response does the app get back? [13:59:53] when I run this locally, that works fine. in production, it hangs with no response [14:01:03] e.g. try `curl -vvv "http://localhost:6029/v1/linkrecommendations/wikipedia/be_x_old/Foo"` from a deploy server [14:01:14] HTTP/1.1 504 Gateway Timeout [14:01:32] <_joe_> ok [14:01:52] <_joe_> it would be ideal if such stuff was in a task [14:01:59] <_joe_> this is only happening with redirects? [14:02:21] Yes, I'll make a task now, just wondering if I was missing something obvious [14:02:28] yes, only with the be-x-old -> be-tarask redirect [14:02:48] <_joe_> and other redirects work? [14:03:29] I'm not aware of any other wiki IDs we have that are using redirects [14:05:45] <_joe_> I would bet the problem is the application logic somewhere [14:05:54] <_joe_> envoy does what you'd expect [14:06:31] I was suspicious of the application logic too, but why does it work locally and not in production? [14:06:36] <_joe_> kubernetes1008:~# nsenter -t 61854 -n curl -H 'Host: be-x-old.wikipedia.org' localhost:6500/ -v 2>&1 | grep location: [14:06:38] <_joe_> < location: https://be-tarask.wikipedia.org/ [14:07:08] <_joe_> because locally you're either a) not going through the service proxy or b) don't have egress rules blocking calls to the internet [14:08:09] <_joe_> and in fact, from the main app container [14:08:11] <_joe_> kubernetes1008:~# nsenter -t 61704 -n curl https://be-tarask.wikipedia.org/ [14:08:14] <_joe_> hangs :P [14:08:26] <_joe_> so I think you don't rewrite the redirect to go through the same proxy [14:09:10] T321082 [14:09:20] <_joe_> thanks, I'll comment there [14:09:22] ...is the task I just filed about this [14:09:41] https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink/+/refs/heads/main/src/MediaWikiApi.py#68 is (probably) the relevant code [14:13:00] <_joe_> heh yes, I think requests doesn't do the right thing here [14:13:35] <_joe_> the nice thing is that nsenter is amazing and I can experiment with python code in production :P [14:14:36] _joe_: thanks for your help! [14:14:56] I posted a patch, not sure if it fixes the issue... I thought "allow_redirects=True" was the default [14:15:30] hmm, it definitely is the default [14:18:05] kostajh: can we set allow_redirects=False and follow the redirect manually? It feels like a huge hack, but it might work? [14:18:48] <_joe_> urbanecm: it is most definitely not a hack :) [14:19:15] <_joe_> kostajh: the problem is that it actually follows the redirect [14:19:30] urbanecm: well, here I was pondering whether to just hardcode the exception for be-x-old -> be-tarask [14:19:41] Or that. [14:19:44] <_joe_> kostajh: that's also a possibility yes [14:19:51] <_joe_> but ugh [14:20:18] ugh [14:20:25] urbanecm: yeah, let's set allow_redirects=False [14:33:43] <_joe_> kostajh: https://phabricator.wikimedia.org/T321082#8325457 this snippet will work [14:33:49] <_joe_> ofc you have to adapt it to your situation [14:33:58] thank you! cc urbanecm [14:34:00] <_joe_> and I suspect it should be possible to wire this into requests [14:34:07] <_joe_> via a Session or something [14:34:59] 10serviceops: Envoy can't connect to servers using TLS 1.3 (but can serve TLS 1.3 to clients) - https://phabricator.wikimedia.org/T246083 (10bking) a:05bking→03None Unassigning, will talk to other ServiceOps members on possible next steps. [15:10:47] 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, and 2 others: Migrate thumbor to Kubernetes - https://phabricator.wikimedia.org/T233196 (10VirginiaPoundstone) a:03hnowlan [15:32:14] 10serviceops, 10Prod-Kubernetes: sextant: tool for helm chart management - https://phabricator.wikimedia.org/T320793 (10Joe) I created https://gitlab.wikimedia.org/repos/sre/sextant/-/merge_requests/1 for this. Already used to bundle the modules for scaffolding. [15:48:14] 10serviceops, 10SRE: service implementation tracking: arclamp2001.codfw.wmnet - https://phabricator.wikimedia.org/T319429 (10LSobanski) a:05Dzahn→03None Doesn't look like collab, unassigning from Daniel. [15:48:58] 10serviceops, 10SRE: service implementation tracking: arclamp1001.eqiad.wmnet - https://phabricator.wikimedia.org/T319434 (10LSobanski) a:05Dzahn→03None Doesn't look like collab, unassigning from Daniel. [15:49:10] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install arclamp1001.eqiad.wmnet - https://phabricator.wikimedia.org/T319433 (10LSobanski) [18:47:24] 10serviceops, 10RESTBase, 10Wikipedia-iOS-App-Backlog, 10iOS-app-feature-Performance, and 2 others: PCS caching and pregeneration when restbase is decommissioned - https://phabricator.wikimedia.org/T319365 (10daniel) Tagging per conversation with Josh [19:08:35] 10serviceops, 10Performance-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Krinkle) a:03Krinkle