[08:04:58] 10serviceops, 10SRE, 10good first task: Upgrade all deployment charts to use the latest version of common_templates - https://phabricator.wikimedia.org/T292390 (10Joe) 05Open→03Resolved We've since moved to using modules. [08:32:31] 10serviceops, 10RESTBase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10daniel) [08:32:59] 10serviceops, 10RESTBase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10daniel) [08:35:21] 10serviceops, 10RESTBase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Replace usage of RESTbase parsoid endpoints - https://phabricator.wikimedia.org/T328559 (10daniel) [08:37:51] 10serviceops, 10RESTBase Sunsetting, 10Parsoid (Tracking): Enable WarmParsoidParserCache on all wikis - https://phabricator.wikimedia.org/T329366 (10daniel) 05Open→03Resolved Parsoid cache warming has been enabled everywhere for a couple of months now. [08:56:13] Hi all! [08:56:31] Hi :) [08:56:33] If I wanted to know all things that internally hit a certain RESTbase path, how would I do that? [08:56:43] * claime runs away [08:56:47] I am trying to survey all internal callers of parsoid [08:57:04] Some will probably use service mesh... and some won't. [09:00:01] claime: I suppose I could grep for discovery URLs and service ports. But what exactly do I grep for, and where do I grep? [09:00:10] Hmmmm [09:00:14] I honestly don't know [09:28:58] <_joe_> duesen: everything uses the service mesh [09:29:10] <_joe_> apart from the traffic layer [09:29:21] <_joe_> but to your question, we don't keep per-endpoint stats [09:29:37] <_joe_> and restbase doesn't record access logs anywhere [09:29:44] <_joe_> what we can do is: [09:30:16] <_joe_> change the envoy configuration on a restbase host to allow access logging on local requests [09:30:22] <_joe_> and then you can grep a day of that [09:30:47] <_joe_> ah wait, you want to know the *callers* [09:30:57] that sounds great! What would that give me? The IPs of the services that make the requests? [09:31:01] Yes. [09:31:03] <_joe_> so we'll need the value of X-client-Ip too [09:31:19] The question I need to answer is: "if I turn off parsoid in restbase, what is going to break"? [09:31:19] <_joe_> duesen: which endpoint, btw? [09:31:37] <_joe_> duesen: oh I see [09:31:47] anything backed by parsoid. so page/html and transform, primarily [09:32:32] <_joe_> duesen: I would assume "every service" is the right answer [09:32:34] <_joe_> :) [09:32:57] heh, maybe. [09:33:13] <_joe_> do you have a 1:1 correspondence for page/html -> mw rest api urls? [09:34:13] yes. well, kind of. the endpoints exposed by the parsoid extension are very very similar to the ones exposed by restbase. [09:34:29] <_joe_> duesen: but, we can do as follows, given your actual goal [09:34:30] We don't want the public to use them, we prefer the core html endpoints. [09:34:40] but for internal calls, that would be ok, at least for now. [09:34:58] <_joe_> duesen: uhm this should be enforced though [09:35:06] <_joe_> anyways [09:35:11] <_joe_> do you have a task? [09:35:18] yes. it's not right now, but i am also not seeing external traffic to these endpoints. [09:35:39] https://phabricator.wikimedia.org/T333536 [09:35:53] <_joe_> I think I have an idea on how to find this [09:36:01] It's a bit old, I need to update the description [09:36:25] <_joe_> Oh I see I already made a suggestion [09:36:27] <_joe_> https://phabricator.wikimedia.org/T333536#8741154 [09:37:41] Ha, I forgot about that... [09:37:48] <_joe_> duesen: given it's so few services, it's maybe possible to check the code [09:38:01] <_joe_> at least for mobileapps/PCS and mediawiki you should be able to go that way [09:38:05] But... I think I saw other things using parsoid. Like wikifunctions. [09:38:25] <_joe_> oh dear, wikifunctions? how? [09:38:36] I have been grepping the code, but I want to be sure I didn't miss anything [09:38:57] <_joe_> I think you're wrong [09:39:01] <_joe_> about wikifunctions [09:39:47] possibly. i just saw it in passing. checking again now [09:40:41] <_joe_> duesen: let's put it this way: anything calling restbase NOT via the mesh is doing it out of prescription [09:40:45] <_joe_> and if it breaks, so be it [09:41:48] <_joe_> duesen: it's possible wikifunctions has some disabled functionality to call restbase [09:42:26] what discovery url and service port can i grep for? [09:42:38] these would be the same for all of restbase, right? [09:42:52] It would be useful to have a separate service port for parsoid [09:43:45] (I can't find the wikifunctions thing now, but I note that servicelib-node provides a function for fetching parsoid html from restbase) [09:44:51] by the way... I would have expected to see changeprop on the list of thigns that call restbase... any idea why it doesn't show? [09:49:23] <_joe_> duesen: I excluded it [09:49:31] <_joe_> it calls a different endpoint [09:49:55] <_joe_> duesen: as for changing the endpoint for parsoid... sure we can do it, but then you have to modify the code to split calls :) [09:50:21] <_joe_> duesen: but I am thinking now [09:51:19] <_joe_> we mostly want to know how many non-changeprop, non-edge calls we get to page/html internally. Then we can just change that method to call the rest api instead? [09:51:48] <_joe_> duesen: can you show me that method in servicelib-node? [09:52:04] <_joe_> I have a lingering doubt that bypasses envoy by going to the CDN [10:00:10] 10serviceops, 10CX-cxserver, 10RESTBase Sunsetting: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase - https://phabricator.wikimedia.org/T344982 (10daniel) [10:01:14] _joe_: the method is here: https://gerrit.wikimedia.org/g/mediawiki/services/servicelib-node/+/81e751c633302ae45c072f23491fc63f30eb814d/examples/api.js#81 [10:01:27] But I am not at all sure that all code that fetches html is actually using that method [10:01:41] And stuff that uses the transform endpoint for sure does not. [10:01:42] <_joe_> apiUtil.restApiGet [10:01:47] <_joe_> ok let me look at that [10:03:27] _joe_: a concrete focus might be to migrate cxserver: https://phabricator.wikimedia.org/T344982 [10:03:59] <_joe_> duesen: yeah I think we can remove pregeneration well before we can turn off parsoid-in-rb [10:04:25] <_joe_> but let me check that apiUtil.restApiGet method [10:04:52] I have a patch up for making storage in RB configurable per domain [10:06:36] <_joe_> uri: 'http://{{domain}}/api/rest_v1/{+path}', [10:06:38] <_joe_> SIGH [10:06:44] <_joe_> it goes via the CDN [10:06:45] <_joe_> of course [10:07:30] <_joe_> I'm looking at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/servicelib-node/+/HEAD/utils/api-util.js#75 [10:08:02] <_joe_> this gets passed down to the request template [10:08:26] <_joe_> also http? not sure what we do with this tbh. For sure we don't call restbase internally there [10:08:48] <_joe_> this explains why so little services call restbase [10:08:58] <_joe_> OTOH, the CDN urls won't change [10:10:57] <_joe_> duesen: am I doing something wrong? https://codesearch.wmcloud.org/search/?q=apiUtil.restApiGet&files=&excludeFiles=&repos=&i=fosho [10:11:05] <_joe_> it seems it's only used in that example [10:13:47] _joe_: it's typically called on "this", I think: https://codesearch.wmcloud.org/search/?q=%5C.restApiGet%5C%28&files=&excludeFiles=&repos=&i=fosho [10:14:22] But it's not used a lot. My guess is that it's relatively recent, and older stuff doesn't use it and does the call "by hand". [10:14:25] <_joe_> duesen: the only call to .restApiGet is in cxserver and it's /not/ that function [10:15:07] <_joe_> but yeah cxserver calls restbase via the CDN [10:15:14] <_joe_> *sigh* [10:15:29] sigh... I think I remember... I think servicelib-node is an attempt to replace service-template, which never happened. [10:15:52] sorry, service-template-node, or whatever that is called [10:15:57] the template is copied around... [10:16:00] <_joe_> because priorities changed every week, yes [10:16:05] hmhm [10:16:40] yea, every service has its own copy of that method, various versions of it: https://codesearch.wmcloud.org/search/?q=%5C+restApiGet%5C%28&files=&excludeFiles=&repos=&i=fosho [10:16:45] <_joe_> and yes, I'm looking at the whole thing, we clearly call rb from these applications mostly via the CDN [10:16:57] <_joe_> duesen: yep, all so sutainable and organized [10:17:25] <_joe_> the problem there is that the services team did a /lot/ of work for others that no one kept doing after that team was disbanded [10:17:38] https://phabricator.wikimedia.org/T291843 [10:17:44] <_joe_> I would say it was a broken model, but this is the consequence of a non-proper substitution [10:17:51] <_joe_> of the functions of that team [10:18:01] yes. [10:18:03] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10jijiki) [10:18:17] i actually tried to push the scaffolding thing a bit further, as a code jam project [10:18:33] anyway - there is no easy way to find out what calls parsoid by looking at code. [10:18:55] I guess the list we have on that ticket is as good as it gets. [10:19:28] We'll just fix the stuff we know of, and then we investigate whatever is left [10:20:01] <_joe_> duesen: I might have another out, give me one sec [10:23:07] 10serviceops, 10RESTBase Sunsetting, 10API Platform (RESTbase Deprecation Roadmap), 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10daniel) [11:58:29] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10SRE: Apache logs get split across packets in MW-on-K8s - https://phabricator.wikimedia.org/T344991 (10kamila) [12:00:24] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10SRE: Keep calculating latencies for MediaWiki requests in the WikiKube environment - https://phabricator.wikimedia.org/T276095 (10kamila) The errors are caused by T344991. Thus, the metrics produced by Benthos are not counting those requests. [12:37:19] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10SRE: Apache logs get split across packets in MW-on-K8s - https://phabricator.wikimedia.org/T344991 (10kamila) [13:37:02] 10serviceops, 10SRE, 10ops-codfw: Move codfw thumbor hosts to kubernetes cluster - https://phabricator.wikimedia.org/T343996 (10Jhancock.wm) [13:37:25] 10serviceops, 10SRE, 10ops-codfw: Decommission thumbor200[34] - https://phabricator.wikimedia.org/T344597 (10Jhancock.wm) 05Open→03Resolved [13:51:09] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10SRE: Apache logs get split across packets in MW-on-K8s - https://phabricator.wikimedia.org/T344991 (10kamila) a:03kamila [14:37:03] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q1:rack/setup/install kubernetes20[25-54] - https://phabricator.wikimedia.org/T342534 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host kubernetes2048.codfw.wmnet with OS bullseye [16:27:45] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q1:rack/setup/install kubernetes20[25-54] - https://phabricator.wikimedia.org/T342534 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host kubernetes2048.codfw.wmnet with OS bullseye completed: - kubernetes2048 (**WARN*... [16:31:44] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q1:rack/setup/install kubernetes20[25-54] - https://phabricator.wikimedia.org/T342534 (10Papaul) [17:59:39] Hey ServiceOpsen, we've got a prod outage for the Wikifunctions service https://phabricator.wikimedia.org/T344998 [17:59:56] I'm speculating that https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/950188 might have broken it, but only due to timing. [18:59:46] 10serviceops, 10Abstract Wikipedia team, 10SRE, 10Wikifunctions, and 2 others: Wikifunctions functions that call the evaluator are all getting no response, UX instead showing 'http' - https://phabricator.wikimedia.org/T344998 (10Jdforrester-WMF) Unfortunately at this point I'm out of ideas as to what's cau...