[05:42:22] 10serviceops, 10MW-on-K8s, 10SRE, 10Patch-For-Review, and 2 others: The restricted/mediawiki-webserver image should include skins and resources - https://phabricator.wikimedia.org/T285232 (10Joe) >>! In T285232#7443847, @Joe wrote: > Sadly I found a problem with our current approach: any file under static/... [05:42:35] 10serviceops, 10Parsoid (Tracking): wtp1026 and wtp1042 continue to be depooled - https://phabricator.wikimedia.org/T294212 (10Joe) p:05Triage→03High a:03Joe [05:47:26] 10serviceops, 10Parsoid (Tracking): wtp1026 and wtp1042 continue to be depooled - https://phabricator.wikimedia.org/T294212 (10Joe) 05Open→03Resolved @ssastry sorry for the inconvenience, I was ensured the tests would be done and the servers would be repooled; apparently not! [05:58:46] o/ [05:58:48] https://knative.dev/blog/articles/announcing-knative-1.0/ [06:06:55] <_joe_> I hope they changed everything and you have to start from scratch :P [06:33:40] we are running 0.18 and upstream is at 0.26 right now (our version is that upstreams tags as compatible with k8s 1.16), so I am pretty sure a lot of things already changed :D [06:34:07] for example we use two istio ingress gws, one for internal traffic and one for external traffic, that is not needed anymore after 0.18 [06:34:20] (apparently up to 0.18 knative needs the local one) [06:43:06] \o/ let me know when you're done with the k8s 1.22 update :-p [06:46:40] sure sure I am planning to do it today :D [06:48:05] jokes aside, on the ml-serve front - my team is currently starting to deploy models for 3 out of 4 ORES categories (the various revscoring-blabla that you see flying by on IRC) [08:57:10] 10serviceops, 10Release-Engineering-Team: Puppet failure on deploy-1002.devtools.eqiad1.wikimedia.cloud due to missing profile::kubernetes::deployment_server::user_defaults - https://phabricator.wikimedia.org/T294174 (10hashar) #beta-cluster-infrastructure has it set via Horizon: ` profile::kubernetes::deploym... [10:15:31] 'lo [10:15:41] <_joe_> klausman: hi :) [10:16:02] <_joe_> so, I wanted to ask you and elukey how useful the current chart scaffolding is for your charts [10:16:11] <_joe_> I would expect none to very little [10:16:44] I mean, it is clearly geared to what "came before us" [10:16:59] <_joe_> I think we are at the point where we need to re-think it *and* common templates [10:17:00] so there's overlap, but some things are tricky/messy to do [10:17:45] <_joe_> I did write https://phabricator.wikimedia.org/T292818 for "a better scaffolding", which ofc could use your input as well [10:18:21] takin' a look-see [10:18:24] <_joe_> and more in general, I'd like to restructure common_templates to be a proper library of templates, namespaced and versioned [10:18:43] <_joe_> yeah feel free to edit the task description to add your use-cases [10:18:50] Yes, especially versioning (if it can be made to not hinder more than help ;)) would be neat [10:19:22] <_joe_> klausman: we have a poor man's versioning in the common_templates directory, we create a new copy when we make breaking changes [10:19:28] <_joe_> but that's for *all* templates [10:19:37] <_joe_> which is a bit too much imho [10:19:39] They're basically pure Go templates, right? [10:19:43] <_joe_> yes [10:19:59] Not my fave templating language, but at least it's complete. [10:20:07] <_joe_> you tell me :D [10:20:36] <_joe_> I think my "go text/template sucks" rants are one of the constants of this channel :D [10:20:37] must... resist...urge to... template templates. [10:20:48] <_joe_> yeah no. [10:21:02] If anything, it'd make things *worse*, not better [10:21:03] <_joe_> we actually already do it in helmfile.yaml [10:21:26] <_joe_> but that's very limited in scope and I think is ok-ish in that context [10:21:35] <_joe_> (it's how helmfile is designed after all) [10:24:21] my main problem is that I am not quite sure yet that I understand the scope of what helmfiles (and then its templating) are meant to able to do. [10:24:39] i.e. where does the scope creep set in [10:25:15] <_joe_> ok, how familiar are you with puppet? [10:25:45] <_joe_> I have a relatively easy equivalence with how puppet works. [10:25:47] (I will note that I have gone through at least three different systems at the Goo that do similar things: borgcfg, piccolo and Boq) [10:26:05] Puppet and I do not like each other, but I get its principle [10:26:12] we use a lot the common_templates, not much of scaffold IIUC [10:28:02] <_joe_> klausman: so you can think of your chart as a puppet profile; common_templates as base modules you use in your profiles; helmfile.d stuff as both the role definition and the hiera hierarchy to configure it [10:28:11] <_joe_> elukey: yeah and you should have your own scaffolding! [10:29:01] _joe_ all the services that we will provide will use the same chart, namely kserve-inference, since kserve works with InferenceService resources [10:29:15] <_joe_> ohhhh [10:29:27] <_joe_> ok so like shellbox, but with CRDs instead of PHP [10:29:48] Yeah, I think one of the fundamental differences between ML stuff and what has been there is the level of granulairty/abstraction where templating/refactoring is most useful [10:29:50] yes exactly, this is the current thought, but it may change in the future [10:29:55] <_joe_> then yes, you don't need scaffolding [10:30:54] now that we are discussing it our loud, some new use cases will be presented by Chris today, for sure :D [10:31:27] Of course. [10:31:34] Mruphy's eye sees all. [10:32:52] <_joe_> I'm sure [10:33:09] <_joe_> so, let's circle back tomorrow :D [10:33:25] I just hope that whatever contortions we (as in: ML) put the tree through won't make it less useful for others. [10:36:25] <_joe_> klausman: re: common templates; reality is I hate using a templating system for things that usually are done in software (sharing code, build logical gates, etc), and indeed I do think we should at some point free ourselves from helm and use a proper programming language to generate those damn kubernetes manifests [10:37:34] So you're saying we make our own manifests, with templates and logic! oh. [10:38:40] <_joe_> that part is already "solved" by various projects, I am in particular relatively fond of cdk8s from amazon [10:39:45] Hmmm. that looks nice. [10:39:50] <_joe_> https://github.com/lavagetto/cdk8s-stub/blob/master/blubberoid/main.py was generating a similar output to the blubberoid chart [10:40:08] I mean, anything to not have to write 500kB YAML files, amirite? [10:40:50] <_joe_> klausman: yes, and the alternative shouldn't be to have a 3MB go text/template of YAML files, [10:41:30] <_joe_> also writing an actual programming language allows you to do things like https://github.com/lavagetto/cdk8s-stub/blob/master/blubberoid/wmf_cdks/deployment.py#L44 [10:41:55] <_joe_> (as you can see, the code is a stub) [10:42:15] Neat. Though I'd probably want to have type annotations :) [10:42:17] <_joe_> but composition, inheritance, versioning are all "solved" problems [10:42:24] Oh wait, nvm [10:43:17] I mean, I'd say 90% of my career has been not solving entirely new problems, but re-applying (partial) exisiting solutions to new problem domains. [10:45:06] <_joe_> indeed [10:46:04] <_joe_> it's somewhat reassuring knowing google had to change course multiple times on this problem too [10:46:15] <_joe_> tbh when we started, helm looked like "the right thing" [10:47:25] πάντα ῥεῖ, things change. [10:50:53] <_joe_> in particular if you never used a tool, or solved a class of problems before [10:51:40] <_joe_> (I appreciate the accuracy of the ancient greek typing, btw) [10:56:41] (you're welcome, though I do admit mit some minor c&p there) [10:59:12] <_joe_> I think I didn't see the rough breathing on a rho since high school :) [10:59:41] <_joe_> https://en.wikipedia.org/wiki/Rough_breathing took some time to get to the english term [11:00:25] incidentally, the rho was what I needed to c&p as I don't have a compose sequence for that [11:01:13] <_joe_> so I gather you're another unlucky soul who's been imposed the study of ancient greek in school by an oppressive education regime [11:01:16] (the Iota still needed like five tries) [11:01:38] I have not had any formal education in languages other than German, English and French :) [11:02:00] My knowledge of Latin and (Ancient) Greek is entirely osmotic/due to personal interest [11:02:24] <_joe_> my formal education in languages was only Latin and ancient greek (plus Italian) [11:03:02] (and me being overly curiosu about the differences between words like θύρα and πόρτα) [11:04:07] Also: Topic drift, on my IRC? Unthinkable! [11:04:12] <_joe_> ahahah [11:04:14] <_joe_> yeah [11:04:33] <_joe_> I'm also a specialist [13:05:08] seeing greek chars in this channel is something I did not expect.... [13:18:32] <_joe_> akosiaris: greek chars we get all the time, when you or effie forget to switch keyboard layouts [13:32:36] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability, 10Patch-For-Review: Make logging work for mediawiki in k8s - https://phabricator.wikimedia.org/T288851 (10Joe) I have a (supposedly) working set of normalizing rules to interpret and structure both the php-fpm error log (mostly uninteresting) and the... [17:01:10] 10serviceops, 10Release-Engineering-Team, 10SRE: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) [17:01:32] 10serviceops, 10Release-Engineering-Team, 10SRE: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) [17:04:02] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10hashar) contint2001.wikimedia.org is indeed the primary for CI (Jenkins and Zuul). We could switch over to the other host but the runboo... [17:21:51] 10serviceops, 10Release-Engineering-Team: contint2001 hardware refresh - https://phabricator.wikimedia.org/T294276 (10Dzahn) [17:22:14] 10serviceops, 10Release-Engineering-Team: contint2001 hardware refresh? - https://phabricator.wikimedia.org/T294276 (10Dzahn) [17:25:39] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) < mutante> then let's just tell @Papaul what time is ok, basically < mutante> or a time where all can be around with him in DC <+... [18:33:55] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-11), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10NRodriguez) Hello there @Legoktm we just resolved https://phabricator.wikimedia.org/T290731#745546... [20:01:43] 10serviceops, 10SRE, 10wikidiff2, 10Community-Tech (CommTech-Sprint-12), 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10ldelench_wmf) [20:13:17] 10serviceops: Migrate WMF Production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Jdforrester-WMF) [20:21:38] 10serviceops, 10SRE: Package php 7.4 for wikimedia production - https://phabricator.wikimedia.org/T293449 (10Legoktm) We needed `ast` and `imagick` for CI, so I've uploaded `php7.4-` versions of those too. [20:57:02] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability, 10Patch-For-Review: Make logging work for mediawiki in k8s - https://phabricator.wikimedia.org/T288851 (10colewhite) >>! In T288851#7454801, @Joe wrote: > * What topic should I use on kafka? We talked offline a bit. Although I could not find it in t... [21:12:29] 10serviceops, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10SRE: schedule downtime for contint2001 - https://phabricator.wikimedia.org/T294271 (10Dzahn) also see T256422 - switch contint prod server back from contint2001 to contint1001 [21:18:16] 10serviceops, 10MW-on-K8s, 10SRE, 10SRE Observability, 10Patch-For-Review: Make logging work for mediawiki in k8s - https://phabricator.wikimedia.org/T288851 (10Ottomata) As we make these decisions, I'd love if we could keep {T291645} in mind. > What topic should I use on kafka? I support a separate t... [23:21:51] 10serviceops, 10MW-on-K8s, 10SRE, 10MW-1.37-notes (1.37.0-wmf.20; 2021-08-23), and 2 others: Make HTTP calls work within mediawiki on kubernetes - https://phabricator.wikimedia.org/T288848 (10Legoktm) MultiHttpClient is more complicated than the previous part since it's in includes/libs/ and isn't supposed...