[08:38:01] <wikibugs>	 06serviceops, 10Deployments, 06Release-Engineering-Team: httpbb appserver test breaks deployment of the week due to a timeout parsing page - https://phabricator.wikimedia.org/T360867 (10hashar) 03NEW
[08:39:26] <wikibugs>	 06serviceops, 10Deployments, 06Release-Engineering-Team: httpbb appserver test breaks deployment of the week due to a timeout parsing page - https://phabricator.wikimedia.org/T360867#9656726 (10hashar) That got followed by a 503 which I haven't found the root cause for: ` 08:28:14 Executing check 'check_test...
[08:55:15] <godog>	 hi folks, I have re-deployed the prometheus patch to only fetch envoy in-use metrics, I've checked e.g. https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s for eqiad and things seem to be right, please double check
[08:56:05] <godog>	 the change being https://gerrit.wikimedia.org/r/c/operations/puppet/+/1013515?usp=dashboard
[08:57:08] <_joe_>	 godog: LGTM right now
[08:57:39] <godog>	 ack, thank you _joe_, I'll reenable puppet in codfw too
[09:00:22] <_joe_>	 I did check eqiad did I?
[09:43:21] <claime>	 LGTM godog
[10:41:55] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9656987 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw2336.codfw.wmnet with OS bullseye
[10:42:23] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9656990 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw2337.codfw.wmnet with OS bullseye
[10:42:51] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9656991 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw2386.codfw.wmnet with OS bullseye
[10:43:20] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9656995 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw2387.codfw.wmnet with OS bullseye
[10:43:48] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9656996 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw2388.codfw.wmnet with OS bullseye
[10:44:26] <wikibugs>	 06serviceops, 10Deployments, 06Release-Engineering-Team: httpbb appserver test breaks deployment of the week due to a timeout parsing page - https://phabricator.wikimedia.org/T360867#9656998 (10hashar) >>! In T360867#9656730, @Joe wrote: > I don't think httpbb tests should really break deployment, but rather...
[10:44:27] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657000 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1002 for host mw2389.codfw.wmnet with OS bullseye
[10:47:45] <wikibugs>	 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Migrate changeprop to mw-api-int - https://phabricator.wikimedia.org/T360767#9657017 (10Clement_Goubert) 05Open→03In progress
[10:59:48] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657076 (10Jgiannelos)
[11:01:08] <wikibugs>	 06serviceops, 10Deployments, 06Release-Engineering-Team: httpbb appserver test breaks deployment of the week due to a timeout parsing page - https://phabricator.wikimedia.org/T360867#9657089 (10hashar) From the log server, the page routinely takes more than 10 seconds to parse :/  ` zgrep 'Parsing Barack Oba...
[11:02:06] <Amir1>	 to double check eqiad is still depooled?
[11:08:08] <_joe_>	 Amir1: lol no?
[11:08:14] <_joe_>	 Amir1: yo mean codfw?
[11:08:17] <Amir1>	 ah yeah
[11:08:21] <_joe_>	 we've moved to eqiad
[11:08:24] <Amir1>	 sorry, switchover confusion time
[11:08:29] <_joe_>	 so eqiad is very much not depooled
[11:08:34] <_joe_>	 codfw, OTOH, is
[11:08:38] <_joe_>	 for another day or two
[11:08:41] <Amir1>	 until when? Tuesday?
[11:08:56] <Amir1>	 good to know, I do some stuff today then
[11:08:59] <_joe_>	 Amir1: as long as you need, preferably not further than wednesday
[11:10:04] <Amir1>	 sure
[11:20:29] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657159 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw2386.codfw.wmnet with OS bullseye completed: - mw23...
[11:21:56] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657162 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw2388.codfw.wmnet with OS bullseye completed: - mw23...
[11:24:19] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657172 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw2336.codfw.wmnet with OS bullseye completed: - mw23...
[11:25:50] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657178 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw2387.codfw.wmnet with OS bullseye completed: - mw23...
[11:26:08] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657181 (10Jgiannelos) From logs I think there are 2 things to investigate: * What happened since ~10th March ? *...
[11:27:52] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657192 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw2389.codfw.wmnet with OS bullseye completed: - mw23...
[11:30:31] <wikibugs>	 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9657205 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1002 for host mw2337.codfw.wmnet with OS bullseye completed: - mw23...
[11:36:48] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657229 (10hnowlan) Turnilo says that this is mostly being caused by clients using the mobile apps, various versi...
[11:40:38] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657241 (10hnowlan) >>! In T360597#9657181, @Jgiannelos wrote: > From logs I think there are 2 things to investig...
[11:45:24] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657257 (10Jgiannelos) I was trying to see if there is a correlation between this issue and switching over parsoi...
[12:14:55] <wikibugs>	 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Migrate changeprop to mw-api-int - https://phabricator.wikimedia.org/T360767#9657385 (10Clement_Goubert) `mw-api-int` is now receiving all calls to `mwapi_uri` from changeprop {F43323601}  There are still calls coming from the `ChangePropagation/WM...
[12:18:13] <wikibugs>	 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: 14Migrate changeprop to mw-api-int - 14https://phabricator.wikimedia.org/T360767#9657393 (10Clement_Goubert) 05In progress→03Resolved
[12:18:44] <wikibugs>	 06serviceops, 10MW-on-K8s, 10RESTBase, 06SRE, 13Patch-For-Review: Migrate restbase from mwapi-async to mw-api-int - https://phabricator.wikimedia.org/T358213#9657395 (10Clement_Goubert)
[12:23:56] <wikibugs>	 06serviceops, 10MW-on-K8s, 07Video: 14Create new flavour of shellbox for video transcoding - 14https://phabricator.wikimedia.org/T357296#9657406 (10kamila) 05Open→03Resolved a:03kamila 14Based on some quick tests the image seems to be working \o/
[12:25:07] <wikibugs>	 06serviceops, 10MW-on-K8s, 06SRE, 06Traffic, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120#9657411 (10Clement_Goubert)
[12:45:04] <wikibugs>	 06serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE (2024.03.25 - 2024.04.14), 07Kubernetes, 13Patch-For-Review: 14Migrate an example chart to the Calico network policies template - 14https://phabricator.wikimedia.org/T359411#9657459 (10brouberol) 05Open→03Resolved 14Both `superset-staging` and...
[12:45:26] <wikibugs>	 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Migrate charts to Calico Network Policies - https://phabricator.wikimedia.org/T359423#9657463 (10brouberol)
[12:54:50] <nemo-yiannis>	 hnowlan: 👋 for T360597 i have this feeling that the problem is redirects again. All the failing `requests.url` from logstash are redirects to commons. Is there any way I can see internally (eg from the container) what this returns:
[12:54:59] <nemo-yiannis>	 curl http://localhost:6503/fr.wikipedia.org/v1/page/summary/Fichier%3ACleopatra_poster.jpg
[12:59:29] <hnowlan>	 nemo-yiannis: https://phabricator.wikimedia.org/P58907 
[13:00:20] <nemo-yiannis>	 yeah same problem: Wikifeed times out because the location is the public URL
[13:00:43] <hnowlan>	 aha
[13:01:00] <wikibugs>	 06serviceops, 06Data Products: Service Ops Review of Metrics Platform Configuration Management UI - https://phabricator.wikimedia.org/T358577#9657504 (10MShilova_WMF) Thank you, @akosiaris . I've just added you as a subscriber to {T358115}. Let me know if it automatically granted you access.
[13:01:27] <hnowlan>	 I dunno if this is the same as the issue causing the increased latency though 
[13:01:52] <nemo-yiannis>	 the increased error rate/timeouts though is because of that
[13:02:59] <nemo-yiannis>	 i will update the ticket with the information and continue investigating whats causes the increased latency
[13:05:10] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657528 (10Jgiannelos) From the URLs from logstash as @hnowlan pointed out it looks like the main cause of timeou...
[13:06:00] <hnowlan>	 cool 
[13:06:11] <hnowlan>	 the timing of those restbase nodes being added is *very* fishy imo
[13:07:59] <nemo-yiannis>	 yeah i dont think that the root cause for the increase in 10th march is this but the fact we have a spike last weekend probably added to the latency and triggered the alerts
[13:08:24] <nemo-yiannis>	 (plus the switchover could have had an impact too)
[13:09:19] <hnowlan>	 yeah switchover means that all errors are occurring in a single datacentre so they'll go above thresholds
[13:09:25] <nemo-yiannis>	 yeah 
[13:15:13] <nemo-yiannis>	 hnowlan: I am not very familiar with the SAL logs for pooling/depooling nodes but FWIW scap targets were not updated so not sure what state of restbase the new nodes are serving
[13:15:30] <nemo-yiannis>	 i just did a git pull on restbase/deploy and the changes to targets were fetched
[13:16:22] <hnowlan>	 me neither tbh - they're pooled and so will be receiving requests 
[13:16:23] <nemo-yiannis>	 So this: https://gerrit.wikimedia.org/r/c/mediawiki/services/restbase/deploy/+/1009842 is not deployed
[13:16:42] <hnowlan>	 aha 
[13:16:46] <nemo-yiannis>	 (unless done in another manual way other than `scap deploy`)
[13:16:52] <hnowlan>	 they'll have pulled whatever the master version was at the time of provisioning 
[13:16:56] <nemo-yiannis>	 ok
[13:17:04] <hnowlan>	 should be do a deploy> 
[13:17:15] <nemo-yiannis>	 I can do a scap deploy
[13:17:29] <nemo-yiannis>	 but before can somebody check which hash of restbase we are running?
[13:17:50] <hnowlan>	 wary about that disabled puppet state on those hosts but it's been a few days 
[13:18:16] <hnowlan>	 on restbase1042 it's 7e5e72087d8331131669babfb8f40b269c024cd7 
[13:19:08] <nemo-yiannis>	 either way scap overrides configurations so if scap is not running, config would be different
[13:19:24] <nemo-yiannis>	 *has not run
[13:22:45] <nemo-yiannis>	 ok i think thats even more interesting: https://phabricator.wikimedia.org/P58908
[13:22:58] <nemo-yiannis>	 check the differences between the notes redirect url
[13:23:32] <nemo-yiannis>	 We should different bring the nodes to the same state
[13:23:41] <nemo-yiannis>	 if you think its OK i can run a scap deploy on current master
[13:24:52] <nemo-yiannis>	 the responses are very problematic
[13:28:29] <hnowlan>	 there's nothing in the diffs that should cause that kind of deviation in behaviour surely 
[13:28:35] <hnowlan>	 but I'd say go for it 
[13:29:15] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657593 (10Jgiannelos) It looks like this path was not deployed using scap: https://gerrit.wikimedia.org/r/c/medi...
[13:30:39] <nemo-yiannis>	 should i try just one node of the failing ones ?
[13:31:01] <nemo-yiannis>	 to see if the response changes after the deployment?
[13:31:11] <hnowlan>	 please do
[13:35:26] <nemo-yiannis>	 didn't help much 
[13:36:06] <hnowlan>	 hm :/ which host did you deploy to? 
[13:36:29] <nemo-yiannis>	 1034
[13:37:04] <nemo-yiannis>	 restbase1034.eqiad.wmnet
[13:39:38] <nemo-yiannis>	 Also with "cache-control: no-cache" restbase returns 404
[13:39:42] <nemo-yiannis>	 curl -v -o /dev/null "http://restbase1034.eqiad.wmnet:7233/fr.wikipedia.org/v1/page/summary/Fichier%3ACleopatra_poster.jpg" -H "Cache-control: no-cache"
[13:46:23] <hnowlan>	 all hosts do the same it seems
[13:48:37] <hnowlan>	 that 404 isn't coming from restbase though 
[13:48:56] <hnowlan>	 oh wait no, ignore
[13:50:34] <nemo-yiannis>	 yeah all of them are returning 404
[13:50:51] <nemo-yiannis>	 which means we are serving the latest good state in cassandra
[13:53:15] <nemo-yiannis>	 now we need to find why RB returns 404 and what 500 it hides :P
[13:57:50] <hnowlan>	 wait so 
[13:57:59] <hnowlan>	 restbase1034 now returns the correct location header
[13:58:14] <hnowlan>	 where it previously had an incorrect one?
[14:00:11] <hnowlan>	 If that's correct then doing a scap deploy to get everyone on the same version will get things in better shape
[14:01:25] <nemo-yiannis>	 no it doesn't
[14:01:29] <nemo-yiannis>	 it returns the public URL
[14:01:40] <nemo-yiannis>	 (which I assume is the last known state for this node)
[14:01:56] <nemo-yiannis>	 because forcing purge returns 404
[14:05:14] <nemo-yiannis>	 (which i believe it really is an RB error)
[14:45:41] <elukey>	 claime: o/ I'd need to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1013541 to upgrade the eqiad's docker registry nodes (the standby ones). Ok from your point of view or better to wait?
[14:48:48] <_joe_>	 elukey: I'd say go on, just make sure no mediawiki deployment is ongoing
[14:48:54] <elukey>	 +1 thanks :)
[14:52:09] <wikibugs>	 06serviceops, 06Machine-Learning-Team, 13Patch-For-Review: Bump memory for registry[12]00[34] VMs - https://phabricator.wikimedia.org/T360637#9657834 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=78701a88-bd13-4896-9ad1-88076e82347e) set by elukey@cumin1002 for 1:00:00 on 1 host(s) and...
[14:52:31] <wikibugs>	 06serviceops, 06Machine-Learning-Team, 13Patch-For-Review: Bump memory for registry[12]00[34] VMs - https://phabricator.wikimedia.org/T360637#9657836 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=9cabb1e2-3230-40ba-8e89-bce14ddf9042) set by elukey@cumin1002 for 1:00:00 on 1 host(s) and...
[15:04:56] <elukey>	 bumped both nodes, all good :)
[15:05:04] <elukey>	 going to schedule the work for codfw then
[15:05:16] <elukey>	 when would be the best time to do it?
[15:05:32] <elukey>	 surely far from mw deployments or busy deploy schedules
[15:20:51] <jayme>	 elukey: not in mw-deploy windows. I think you could claim an mw infrastructure window for that
[15:21:11] <elukey>	 make sense yes
[15:27:37] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9657965 (10Eevans) >>! In T360597#9657227, @hnowlan wrote: > Turnilo says that this is mostly being caused by cli...
[15:36:13] <elukey>	 jayme: the new prometheus job for istio seems to work as expected, and the labels are dropped
[15:42:21] <godog>	 the change added about 16k samples/s more to the scraping, is that expected ?
[15:42:57] <godog>	 I'm looking at this for example
[15:42:59] <godog>	 https://grafana.wikimedia.org/d/GWvEXWDZk/prometheus-server?orgId=1&refresh=1m&var-Prometheus=prometheus1005%3A9906&var-RuleGroup=All&var-datasource=thanos&var-name=k8s&from=1711370571232&to=1711381371232&viewPanel=1
[15:44:26] <godog>	 k8s-pods-istio (4848/10668 up) mmhh I don't think that's expected, so many targets?
[15:44:51] <godog>	 I'm looking at this https://prometheus-eqiad.wikimedia.org/k8s/targets?search=&scrapePool=k8s-pods-istio
[15:44:54] <godog>	 elukey: ^
[15:46:22] <elukey>	 godog: argh I as about to ask in #observability if there were metrics to check
[15:46:47] <elukey>	 in theory no, I was spot checking on the thanos' ui for before/after of some metrics
[15:47:22] <godog>	 we should be matching __meta_kubernetes_pod_annotation_sidecar_istio_io_inject to be true I think? otherwise the k8s-pods-istio job on prometheus k8s is trying to fetch from all targets
[15:48:07] <elukey>	 godog: either true or false, since "true" is set on sidecars and "false" on gateways, but at this point the regex that I added (.*) is wrong?
[15:48:15] <godog>	 yeah that should be .+
[15:48:22] * elukey cries in a corner
[15:48:27] <elukey>	 of course sorry :(
[15:48:28] <elukey>	 fixing
[15:48:38] <godog>	 probably in both places
[15:49:07] <elukey>	 godog: or (true|false), wdyt?
[15:49:24] <godog>	 sure that works too elukey 
[15:53:21] <elukey>	 godog: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1014035
[15:53:25] <elukey>	 sorry again :(
[15:53:36] <elukey>	 after the rollout is there any cleanup that I can do?
[15:54:32] <godog>	 elukey: I think the regex needs changing in the original job for symmetry
[15:55:07] <elukey>	 true true
[15:55:49] <elukey>	 fixed :)
[15:57:29] <godog>	 elukey: no cleanup, that's fine
[15:57:33] <godog>	 patch LGTM
[15:57:41] <elukey>	 super, rolling out in a bit
[16:00:38] <elukey>	 started now
[16:12:31] <godog>	 k8s-pods-istio (176/176 up)
[16:12:33] <godog>	 that's more like it
[16:14:08] <elukey>	 godog: where do I see that info?
[16:14:14] <elukey>	 anyway thanks a ton for checking
[16:14:30] <godog>	 elukey:  https://prometheus-eqiad.wikimedia.org/k8s/targets?search=&scrapePool=k8s-pods-istio
[16:14:44] * elukey bookmarked
[16:39:38] <wikibugs>	 06serviceops, 10Prod-Kubernetes: PodSecurityPolicies will be deprecated with Kubernetes 1.21 - https://phabricator.wikimedia.org/T273507#9658418 (10elukey) @JMeybohm thanks a lot for the great wikipage, it explains the problem very well. The only thing that worries me is the maintenance of those extra policies...
[16:56:08] <wikibugs>	 06serviceops, 10Data Products (Data Products Sprint 11): Service Ops Review of Metrics Platform Configuration Management UI - https://phabricator.wikimedia.org/T358577#9658508 (10VirginiaPoundstone) a:03phuedx
[16:56:11] <wikibugs>	 06serviceops, 10Data Products (Data Products Sprint 11): Service Ops Review of Metrics Platform Configuration Management UI - https://phabricator.wikimedia.org/T358577#9658502 (10VirginiaPoundstone)
[17:18:34] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9658606 (10Jgiannelos) It looks like errors/latency are stabilized after depooling some nodes: {F43348837}  {F433...
[17:18:53] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9658608 (10Eevans) The restbase deployments are somewhat out of sync, with HEAD (eqiad) looking like:  ` restbase...
[18:01:18] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9658800 (10Eevans) After experimenting with targeted deployments, both to hosts that correlate to the correct beh...
[18:55:21] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9659020 (10Eevans) From IRC:  ` 1:13 PM <nemo-yiannis> so apparently the 404 is expected behaviour for "cache-con...
[18:56:39] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9659023 (10Jgiannelos) So the root cause looks like is the following:  * Apparently the 404 is expected behaviour...
[18:57:38] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9659025 (10Eevans) >>! In T360597#9659020, @Eevans wrote: > From IRC: >  > ` > 1:13 PM <nemo-yiannis> so apparent...
[20:01:11] <mutante>	 deploying change to the prometheus-apache-exporter that will make it work on all distros, including bookworm
[20:01:43] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: Increased latency, timeouts from wikifeeds since march 10th - https://phabricator.wikimedia.org/T360597#9659275 (10Eevans) What changed here was that some hosts have been deployed with ipv6 dns records:  ` eevans@rest...
[21:00:42] <wikibugs>	 06serviceops, 06Content-Transform-Team, 06Content-Transform-Team-WIP, 13Patch-For-Review: 14Increased latency, timeouts from wikifeeds since march 10th - 14https://phabricator.wikimedia.org/T360597#9659525 (10Eevans) 05Open→03Resolved a:03Eevans 14The ipv6 dns records for all restbase hosts have...