[09:23:36] godog: thanos is spamming with failed crons, is that something is being looked at? [09:24:40] dcaro: thank you for the heads up, looking into it [09:25:16] np, let me know if I can help :) [09:27:50] yeah I'm reimaging thanos-fe hosts, likely related to that [09:32:34] FWIW I've stopped the crons for now, and opened a task to convert them to systemd timer to end the cronspam sillyness [09:33:33] nice, I'm curious though, how will you track them working or not then, with alerts? [09:35:17] that's correct yeah, the standard systemd unit failed checks cover those too [18:06:47] dancy: Afaik there isn't a reason it couldn't. We allow network from there afaik, as we need internal http calls between wikis for various other features. [18:07:07] Hmm. [18:07:10] commons api calls being the most prominent one. [18:08:43] Do you know if details of the failure of $client->execute(); (favicon.php:43) would be logged somewhere? [18:09:29] ... and it looks likek commons api calls may be broken on k8s [18:09:42] eg. view https://test.wikipedia.org/wiki/File:Example.png [18:09:51] the whole description block is empty when viewed over k8s XWD [18:10:22] Last I checked, Logstash was not wired up yet for k8s-mw [18:10:29] so I have no telemetry on any of that indeed. [18:10:45] the only thing that goes into logstash currently is an unsampled request log from some higher layer. [18:10:56] nod. TODO items. :-) [18:11:10] s/I/we [18:11:54] I'll file a task about the commons issue. [18:12:02] The favicon is likely the same root cause. [18:12:13] is there a task for the favicon issue? [18:12:38] Not yet. I was thinking of creating one. [18:15:44] This is potentially a cache poisoining issue as well since file descriptions are memcached. [18:15:52] anyway [18:16:35] https://phabricator.wikimedia.org/T288848 [18:17:43] btw I'm looking at https://test.wikipedia.org/wiki/File:Example.png and I'm not seeing a difference between k8s/normal. [18:18:19] (e.g, there is a bunch of text in the Description box). [18:21:45] perhaps one of my accesses to the production URL populate memcache and the a subsequent request to k8s used that info? [18:28:54] dancy: be sure to do a hard refresh, since you can hit http 304 between something you cached before [18:29:28] hm.. I can't repro it now. [18:29:33] yeah, I guess it got cached [18:29:37] purge doesn't help either for this [18:30:05] ok, I got it now by purging on commons [18:30:11] https://commons.wikimedia.org/wiki/File:Example.png?action=purge [18:30:16] then visiting the test side [18:30:38] ok, I see it now. [18:30:44] big section is gone. [18:31:08] also a fatal error on k8s for pages like https://meta.wikimedia.org/wiki/User:Krinkle/global.js [18:31:34] I [18:31:43] I'm guessing syntax highlighter is missing [18:32:36] I'll stop testing again for now, and wait for lego and I to write up the docs, and then we can "just" diff the provisioning more or less and annotate what's intentionally missing vs what's not [18:33:59] Nod. Sounds good. Ideally T288848 would have a curl-like test case that could be added to the httpbb tests. [18:34:00] T288848: Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 [18:57:34] That's a bit paradoxal unfortunately since almost by design any public url to a feature that makes an http call, very likely is something we'd want to cache [18:57:59] I'd consider it in a deeper category of integration tests, like editing and logging in, etc. [19:40:26] > I'm guessing syntax highlighter is missing -- I believe that all shell access content is missing from the mw-on-k8s test nodes. As far as I understand the problem, things like syntaxhighlight will have to be available via shellbox or other service containers to work in the k8s setup. [20:04:35] bd808: yeah, that makes sense. though afaik we haven't specifically blocked shell exec in wmf-config for k8s yet, and in some simple cases might even be required still. and I thought the pygments binary was checked into mw, not something externally handled. [20:04:56] but maybe we ended up changing that already, I know there was an idea to use the debian package instead [20:05:50] I think the bundle that Ori made is still in there, but maybe there is not a configured python runtime to execute it? *shrug* [20:06:12] highlightoid needs to happen :) [20:30:40] oh mediawiki-on-k8s is live? [20:31:37] ori: for some value of live... :P [20:31:45] you can use it via wikimedia debug