[02:29:48] 06serviceops, 10MW-on-K8s, 06SRE: Create a mwdebug deployment for mediawiki on kubernetes - https://phabricator.wikimedia.org/T283056#10681399 (10Krinkle) [10:41:07] 06serviceops, 06Abstract Wikipedia team: Provide guidance on how to use apache bench to benchmark requests not through SSL for production services - https://phabricator.wikimedia.org/T390099#10682134 (10Clement_Goubert) You can expose the http port of the mediawiki deployment, bypassing the TLS termination, by... [10:51:50] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE, 13Patch-For-Review: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10682176 (10Clement_Goubert) >>! In T384970#10681272, @Jhancock.wm wrote: > @Clement_Goubert hey i need a little favor. i noti... [11:35:49] 06serviceops, 10Observability-Alerting, 07Kubernetes, 13Patch-For-Review, 10SRE Observability (FY2024/2025-Q3): mcrouter and thumbor declare unscrapable ports to prometheus - https://phabricator.wikimedia.org/T389480#10682366 (10jijiki) https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1129... [13:27:47] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10682959 (10Jhancock.wm) all good. thank you for your help! [13:32:22] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683027 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2300.codfw.wmnet with... [13:32:34] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683030 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2301.codfw.wmnet with... [13:32:45] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683032 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host wikikube-worker2302.codfw.wmnet with... [13:45:57] 06serviceops, 10Image-Suggestions, 10Structured Data Engineering, 06Structured-Data-Backlog: Migrate data-engineering jobs to mw-cron - https://phabricator.wikimedia.org/T388537#10683099 (10matthiasmullie) >>! In T388537#10668836, @Clement_Goubert wrote: > ... there are alert receivers for structured-data... [14:02:07] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683179 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2302.codfw.wmnet with OS... [14:06:33] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683183 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2301.codfw.wmnet with OS... [14:08:41] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683188 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host wikikube-worker2300.codfw.wmnet with OS... [14:09:05] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE: Q3:rack/setup/install wikikube-worker2248-2331, wikikube-ctrl2004-2005 - https://phabricator.wikimedia.org/T384970#10683191 (10Jhancock.wm) [14:19:12] 06serviceops, 06Traffic, 06Wikimedia Enterprise, 10Content-Transform-Team (Work In Progress), 13Patch-For-Review: Restbase API returns 404 for some articles with revision - https://phabricator.wikimedia.org/T389628#10683237 (10SLopes-WMF) [14:21:45] 06serviceops, 10Cassandra: restbase service crashing - https://phabricator.wikimedia.org/T389410#10683258 (10SLopes-WMF) [14:44:20] Heads up, i am planning to deploy restbase to deprecate some old codebase that doesn't accept any traffic for quite some time now [14:56:03] ack [15:03:12] 06serviceops, 10Image-Suggestions, 10Structured Data Engineering, 06Structured-Data-Backlog: Migrate data-engineering jobs to mw-cron - https://phabricator.wikimedia.org/T388537#10683439 (10Clement_Goubert) You're correct that you do not receive alerts from these jobs at the moment. I meant that there is a... [15:29:35] hi! when attempting to helmfile apply in staging, I am getting: [15:29:35] in ./helmfile.yaml: error during ../global.yaml.part.0 parsing: template: stringTemplate:1:49: executing "stringTemplate" at <.Values.kubernetesVersion>: map has no entry for key "kubernetesVersion" [15:30:03] also when doing kube_env in staging I get: [15:30:04] INFO: No matching kubectl version (1.31) for cluster staging (everything might still work) [15:30:51] kube-env what, and which service? [15:31:28] kube_env eventgate-logging-external staging [15:31:36] /srv/deployment-charts/helmfile.d/services/eventgate-logging-external [15:31:49] helmfile diff staging [15:33:58] hmm that's related to the kubernetes 1.31 upgrade let me find the resources [15:34:27] i see global.yaml is trying to vary the version of helm used [15:34:49] https://phabricator.wikimedia.org/T388390 [15:35:32] some staging values file needs to set kubernetesVersion? [15:35:51] i could work aroudn atm by setting it on CLI. which k8s version is staging cluster [15:36:18] 1.31 [15:39:34] hm, /etc/helmfile-defaults/general-staging.yaml has kubernetesVersion [15:40:13] yeah I'm not sure what's going on [15:40:42] kamila_: if you're around ^^ [15:41:04] the helmfile CLI opts look different than I remember? But reading them I think --state-values-set (or -string?) is the one to use? when I do I get: [15:41:10] helmfile --state-values-set kubernetesVersion=1.31 diff staging [15:41:14] in ./helmfile.yaml: failed executing release templates in "helmfile.yaml": failed executing templates in release "helmfile.yaml"."production": failed executing template expressions in release "production".version = "{{ if hasKey .Environment.Values "releases" }}{{ has .Release.Name .Environment.Values.releases }}{{ else }}{{ "no releases defined for this environment" | fail }}{{end}}": template: stringTemplate:1:156: executing [15:41:14] "stringTemplate" at : error calling fail: no releases defined for this environment [15:41:59] that's because you're missing the -e before staging [15:42:08] oh duh. [15:42:08] helmfile --state-values-set kubernetesVersion=1.31 diff -e staging --context 5 works fine [15:42:09] ty [15:42:23] it does indeed [15:42:37] and it works without the --state-values-set as well [15:42:46] I wish helmfile was more explicit about its failure modes [15:43:08] ah, so i'm just a dumb dumb and all is well? the kubernetesVersion was not actually a problem!? [15:43:17] but yeah basically what happened was, since you didn't give it an environment [15:43:31] it couldn't find the kubernetesVersion and helmfile version to use for that environment [15:43:44] okay welp, sorry for the noise, and much thanks for your help claime ! [15:43:48] No problem [15:44:03] (I really wish it were more explicit in how it fails though) [15:44:45] about the kube-env warning, I don't think it's a problem since we're not yet using 1.31 features that wouldn't be supported by the 1.23 client [15:44:51] Sorry, here now [15:44:55] kamila_: all good [15:45:10] correct, kubernetes version is not a problem [15:45:26] the other thing is, looking... [15:46:20] right, and i shoudl ahve realized that the kube_env error was not related since it correctly figures out the correct kubernetesVersion to tell it isn't using the matching kubectl :p [15:46:48] kamila_: the other thing is ok, just a missing flag that breaks non-obviously [15:47:20] oh, right... that's still not ideal though [15:47:31] thank you claime <3 [15:49:30] q: do we have any tricks to force a request to go to a particular k8s pod or release? [15:49:30] To verify that my deploy doesn't break things, I need the request to go through varnish. Before when I did this I caused https://phabricator.wikimedia.org/T387850. [15:49:48] I could deploy to just canary release, but I need varnish to set headers [15:50:02] oh hm...i suppose i could fake this by setting the headers myself on the CLI... [15:50:50] not from varnish (that would be a bad idea), but you can do kubectl get pods -o wide, grab the IP and curl to it [15:51:42] but also you should be able to upgrade just the canaries and wait to see logs from them. Eventually a request will make it to them [15:52:10] ya i'm curling to the pod now. Okay. [15:52:16] i'll do my fake header test in staging first [15:52:45] then do canary and see. hm. I'm not sure if i can tell in the event whether or not the request went through the canary pod [15:52:52] i'd need varnish -> canary -> event in kafka [15:53:14] eh i could just curl the varnish url repeatedly and hope, buti don't want to inject too much fake data either [15:53:33] I 'll just point out that canaries aren't mean to be tests [15:53:36] sure [15:53:38] meant* [15:53:49] if you want to test, there are other ways, curl being a very good one [15:54:03] i'm hoping for a way to test that [15:54:03] - varnish header settting works [15:54:03] - eventgate does the right thing with varnish headers [15:54:26] i can just work around varnish header setting though, there are no changes there so my fake headers should be good enough! I hope! [15:57:19] okay cool, all looks good in staging. [15:57:37] meetings starting, will proceed with eqiad and codfw this afternoon. thank you all! [15:59:51] akosiaris: just curious, how does WikimediaDebug work for k8s? [16:00:06] wdym? [16:00:20] ottomata: it's a bit probabilistic, but it does have it's own dedicated service to be routed to [16:00:27] there are 2 pods [16:00:34] any one of them can grab your request [16:00:49] but it's just 2, so it's usually good enough [16:01:05] ah! k ty [16:01:40] dedicated service that is chosen by varinsh based on the header then? [16:01:40] actually, i remember you and i discussing this lOOng ago when we worked a bit on canary release stuff :) [16:03:04] not varnish, ATS. And yes, have a look at https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/files/trafficserver/x-wikimedia-debug-routing.lua for how it works [16:03:09] caution, it's Lua [16:15:19] huh! for some reason I thought we dropped ATS! I guess cuz we decided not to to atskafka and do haproxykafka instead. [16:26:43] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress), and 2 others: Pregeneration rules don't pregenerate caches for the same cases restbase did - https://phabricator.wikimedia.org/T388214#10683772 (10Jgiannelos) 05Open→03Resolved [16:26:47] 06serviceops, 10Page Content Service, 10RESTBase Sunsetting, 10Content-Transform-Team (Work In Progress), 07Essential-Work: Change changeprops rules to pre-generate/invalidate cache directly to PCS rather than in restbase - https://phabricator.wikimedia.org/T348996#10683774 (10Jgiannelos) 05Open→0... [16:50:32] 06serviceops, 10Observability-Logging: Add canary release to api-gateway - https://phabricator.wikimedia.org/T390218 (10Clement_Goubert) 03NEW [16:53:15] 06serviceops: Update api-gateway ratelimit version - https://phabricator.wikimedia.org/T388804#10683932 (10hnowlan) Related - it'd be nice if this work could get us some logging enhancements. We saw in T390215 that the current version's debug logging (which is the only log level that we can currently use to get... [17:46:30] 06serviceops, 06Traffic, 06Wikimedia Enterprise, 10Content-Transform-Team (Work In Progress), 13Patch-For-Review: Restbase API returns 404 for some articles with revision - https://phabricator.wikimedia.org/T389628#10684130 (10hnowlan) r/1130728 has been deployed and pages now appear to render correctly. [17:49:51] 06serviceops, 06Traffic, 06Wikimedia Enterprise, 10Content-Transform-Team (Work In Progress), 13Patch-For-Review: Restbase API returns 404 for some articles with revision - https://phabricator.wikimedia.org/T389628#10684161 (10ABreault-WMF) 05Open→03Resolved a:03ABreault-WMF @hnowlan Thanks for... [17:53:34] 06serviceops, 06Traffic, 06Wikimedia Enterprise, 10Content-Transform-Team (Work In Progress), 13Patch-For-Review: Restbase API returns 404 for some articles with revision - https://phabricator.wikimedia.org/T389628#10684172 (10ABreault-WMF) > Franklin D. Roosevelt: https://en.wikipedia.org/api/rest_v... [18:11:28] 06serviceops, 10Scap: Migrate scap's maintenance script invocations to PHP 8.1 - https://phabricator.wikimedia.org/T390225 (10Scott_French) 03NEW [18:11:43] 06serviceops, 10Scap: Migrate scap's maintenance script invocations to PHP 8.1 - https://phabricator.wikimedia.org/T390225#10684262 (10Scott_French) [18:11:55] 06serviceops, 06Data-Engineering, 06Data-Engineering-Radar, 10Dumps-Generation, and 2 others: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432#10684263 (10Scott_French) [18:14:33] 06serviceops, 10Scap: Migrate scap's maintenance script invocations to PHP 8.1 - https://phabricator.wikimedia.org/T390225#10684280 (10Scott_French) @dduvall - When you get a chance, could you confirm that my understanding here is correct? If so, then //making// the change is in and of itself fairly simple (i... [19:36:32] 06serviceops, 06Abstract Wikipedia team: Provide guidance on how to use apache bench to benchmark requests not through SSL for production services - https://phabricator.wikimedia.org/T390099#10684583 (10ecarg) Thank you, everyone! to @Clement_Goubert ~ > What curl commands and ab configurations did you use? D...