[10:12:43] <wikibugs>	 10serviceops, 10Kubernetes: Show less diff context by default on helm apply - https://phabricator.wikimedia.org/T326205 (10fgiunchedi)
[10:40:12] <wikibugs>	 10serviceops, 10Kubernetes: Show less diff context by default on helm apply - https://phabricator.wikimedia.org/T326205 (10Clement_Goubert) You can add `--args '--context n'` to any command that uses helm-diff (basically any command that can be run with `-i`, and `diff`)  I'll try and find if there's a way to...
[11:03:16] <wikibugs>	 10serviceops, 10Kubernetes: Show less diff context by default on helm apply - https://phabricator.wikimedia.org/T326205 (10Clement_Goubert) p:05Triage→03Low Ok, got it, it can be added to the helmfile in `helmDefaults['args']` I'll make a CR for it
[13:20:00] <wikibugs>	 10serviceops, 10MediaWiki-Shell, 10SRE: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603 (10LSobanski)
[13:52:55] <wikibugs>	 10serviceops, 10MediaWiki-Shell, 10SRE: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603 (10Joe) 05Open→03Invalid Since then we've moved to using remote shellbox in production, so I'm not strictly interested anymore in any solution compatible with cgr...
[14:00:33] <wikibugs>	 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10JMeybohm) >>! In T324576#8464284, @Ottomata wrote: > **Ingress**: I don't think we //need// an...
[14:12:59] <wikibugs>	 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10TheDJ) Maybe we should add imposm to release monitoring ?  https://phabricator.wikimedia.org/diffusion/LLIC/browse/master/monitoring.json
[14:14:10] <wikibugs>	 10serviceops, 10Content-Transform-Team-WIP, 10Maps: OSM import fails on both eqiad/codfw because of wrong data input - https://phabricator.wikimedia.org/T325293 (10Jgiannelos) It looks like OSM syncing is catching up with all the old diffs on eqiad after bumping imposm version.
[14:34:36] <ottomata>	 jayme: ty for reviews and comments so far
[14:34:47] <ottomata>	 trying to understand how the  egress serviceproxy bit works here.
[14:35:01] <ottomata>	 the flink app isn't really a 'service' so it isn't going to have a 'public_port' (usually)
[14:35:24] <jayme>	 ottomata: you can ignore that side of the thing
[14:35:52] <jayme>	 the service-proxy has two jobs currently
[14:36:12] <jayme>	 one is tls-termination for it's backend service (this is the part you can ignore)
[14:36:53] <jayme>	 the other is proxying, tls-termination etc. for connections to other services (from your/the backend service)
[14:36:54] <ottomata>	 hm okay, so I just remove  the public_port part in the chart values.yaml?
[14:36:59] <ottomata>	 right.
[14:37:40] <ottomata>	 or do I just make up a dummy port?
[14:38:04] <jayme>	 I think the module might require a value here so you'd have to go with a dummy I think.
[14:38:20] <jayme>	 but tbh. I'm not sure because this is a first :)
[14:38:39] <jayme>	 (ab)using the service-proxy for just the service-proxying part
[14:38:44] <jayme>	 ;)
[14:38:53] <ottomata>	 :)
[14:38:54] <ottomata>	 oiay
[14:38:56] <ottomata>	 okay
[14:45:32] <ottomata>	 jayme:  it looks like kafka clusters are not conected to via  service proxy, right?
[14:47:42] <ottomata>	 what is tcp_proxy.listeners?  I guess just manually defined proxy endpointts for the service proxy? vs discovery.listeners which values comes from config management?
[14:54:04] <jayme>	 ottomata: yeah I think the connection to kafka brokers is a direct one
[14:55:00] <jayme>	 tcp_proxy probably came in for psql... effie?
[14:58:13] <jayme>	 for kafka direct connections are probably still required as we don't want load-balancing there I suppose :) but other stuff (like calling mw-api) should go via the proxy
[14:58:36] <ottomata>	 right, okay
[14:59:00] <effie>	 mmmmm I am not on my computer so I will have to come back to you on that, but  I take full responsibility whatever it ia
[14:59:02] <effie>	 is
[14:59:05] <ottomata>	 next q: the 'more complex' setup idea is nice for us here because then we don't need to deploy discovery endpoints for each flink-app?  
[14:59:29] <effie>	 unless it involves a dead body, you'd be on your own there
[14:59:52] <jayme>	 effie: too late...responsibility already taken :-)
[15:00:22] <effie>	 but there is a bullet wound and I am holding an axe, it is impossible
[15:01:25] <jayme>	 ottomata: compared to our usual ingress setup, yes
[15:03:19] <jayme>	 plus less TLS certificates (there might even be work needed to get that right with the Kubernetes native Ingress objects, I'm not sure)
[15:04:37] <ottomata>	 okay, i think i can maybe test this fancy ingress stuff locally...but i think I can't test the service proxy mesh container locally?
[15:04:39] <ottomata>	 or can I?
[15:04:45] <ottomata>	 i think it needs extra config from prod?
[15:06:13] <ottomata>	 actually, you know, maybe we can punt on the fancy ingress for the jobmanager for now?  it isn't a requirement.  as long as I can access the job manager port via an ssh tunnel, that's good enough for now
[15:09:33] <wikibugs>	 10serviceops, 10Data-Engineering, 10Discovery-Search (Current work), 10Event-Platform Value Stream (Sprint 05), 10Patch-For-Review: Flink on Kubernetes Helm charts - https://phabricator.wikimedia.org/T324576 (10Ottomata) Ingress: Okay, let's put off working on ingress for the jobmanager UI port until lat...
[15:10:05] <ottomata>	 jayme: okay, cool.  added mesh.deployment.container to https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/866510/.  Couple of outstanding comments there still.
[15:10:47] <ottomata>	 also responded to your comments on operator at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/865158/
[16:10:54] <_joe_>	 I strongly suggest NOT to use envoy's TCP proxies
[16:11:17] <_joe_>	 my experience with them is that they're ok for light loads, but horrible for anything else
[16:13:17] <ottomata>	 k, don't need them, was just wondering what they were for
[16:13:21] <_joe_>	 as for flink: yes if they're not exposing any port then you only need the service proxy, and we might need to tweak the module
[16:13:42] <_joe_>	 so that it's possible not to add the service and local TLS listener
[16:13:45] <ottomata>	 _joe_:  should I modify the copy of the module vendor file in my chart?
[16:13:51] <ottomata>	 or should I make a patch to make a new version of the module?
[16:13:52] <_joe_>	 ottomata: no!
[16:14:06] <_joe_>	 make a patch for that module 1.0.1 :)
[16:14:08] <ottomata>	 the dummy port is probably fine too
[16:14:13] <ottomata>	 okay if you would prefer that i can do it!
[16:14:30] <_joe_>	 <3
[16:14:47] <_joe_>	 I mean if you don't feel like it, just use the dummy port for now and wait for me to get around implementing it
[16:14:53] <_joe_>	 it should work too
[16:16:21] <ottomata>	 i'm waiting for more reviews rn anyway...
[16:28:14] <jayme>	 ottomata: sorry, had to make a call. Yeah we can totally kick the ingress stuff down the road as people can do port-forward to the UI
[16:28:54] <ottomata>	 gr8
[16:30:08] <jayme>	 well..that's people with deployment access. But I think for now that's still okay
[16:31:48] <ottomata>	 yeah that's totally fine, it would only be those folks
[16:31:58] <ottomata>	 admins of the flink app
[16:32:20] <ottomata>	 _joe_: should I also conditionally include the tls-proxy-certs?  if no public_port i guess no certs too?
[16:32:38] <_joe_>	 ottomata: correct
[16:32:45] <ottomata>	 best practice for doing that?  I'd guess i'd wrap all the usages of that in
[16:32:47] <ottomata>	 {{- if or .Values.mesh.certs .Values.puppet_ca_crt }}
[16:33:19] <_joe_>	 I would rather add all the logic based on if a tls port is offered
[16:33:23] <ottomata>	 should I just repeat that eerywhere, or make some define mesh.service.enabled or somethign?
[16:33:31] <ottomata>	 okay, so everywhere, if public_port
[16:33:38] <_joe_>	 if the port is nil or 0, something like that
[16:33:41] <ottomata>	 great
[16:33:48] <_joe_>	 if I don't have a public poort, I don't have a tls terminator
[16:33:55] <_joe_>	 then I don't have certs either
[16:33:58] <ottomata>	 right cool
[16:34:07] <_joe_>	 but if I have a public port and no certs, I want the chart to fail
[16:38:27] <ottomata>	 _joe_:  do I copy the new tpl files to the scaffold?
[16:39:01] <_joe_>	 ottomata: it's enough for now that you just commit the module change, we'll apply to the charts in subsequent changes
[16:39:06] <ottomata>	 okay
[16:39:11] <ottomata>	 i wanted to add some fixtures or something
[16:39:16] <ottomata>	 that's just in the scaffold?
[16:40:10] <_joe_>	 uhm good idea
[16:40:27] <_joe_>	 fixtures are in all the charts but yes, update the scaffold would be good too
[16:40:38] <_joe_>	 the best way to do it is to do as follows:
[16:40:47] <_joe_>	 * pip install sextant
[16:40:59] <_joe_>	 * sextant vendor --force _scaffold
[16:41:10] <ottomata>	 intterresting!
[16:41:11] <ottomata>	 okay...
[16:41:15] <_joe_>	 (not sure if force is needed :P)
[16:42:16] <_joe_>	 yeah sorry, the documentation is wip :P
[16:42:17] <ottomata>	 _joe_:  should all  mesh tpls get a new version, even if no change?
[16:42:25] <ottomata>	 i don't have to change name_1.0.0.tpl
[16:42:33] <_joe_>	 no, just the one you changed
[16:42:34] <ottomata>	 but all the others have new version
[16:42:36] <ottomata>	 okay
[16:44:13] <_joe_>	 FTR, sextant is https://gitlab.wikimedia.org/repos/sre/sextant
[16:44:22] <bd808>	 sextant could use some docs on the pypi side -- https://pypi.org/project/sextant/
[16:44:44] <_joe_>	 bd808: yeah on both sides heh
[16:44:52] <_joe_>	 I only have so much time :/
[16:45:12] <_joe_>	 I am currently writing a mcrouter module, btw, expect a patch for toolhub to use it to land soon
[16:45:14] <_joe_>	 :)
[16:45:20] * bd808 orders a timespinner for _joe_ 
[16:45:34] <wikibugs>	 10serviceops, 10Event-Platform Value Stream (Sprint 05): k8s deployment-charts mesh module should allow use of mesh without public_port Service - https://phabricator.wikimedia.org/T326252 (10Ottomata)
[16:45:38] <bd808>	 *time turner
[16:45:51] <_joe_>	 it's a HP thing isn't it?
[16:46:33] <_joe_>	 I watched them all dubbed in italian; by the time Alice was able to understand the movies in english I could get away with not rewatching them
[16:46:41] <bd808>	 yeah and also jk rawling is trash.
[16:46:56] <ottomata>	 _joe_:  do I replace the existing modules in modules.json with new versions, or do I add new module entries
[16:47:08] <ottomata>	 e.g. do I add a new  {
[16:47:08] <ottomata>	             "name": "deployment",
[16:47:08] <ottomata>	             "version": "1.1",
[16:47:08] <ottomata>	 ...
[16:47:09] <ottomata>	 }
[16:47:21] <ottomata>	 or just bump the version in the existing one?
[16:47:29] <_joe_>	 ottomata: you should not bump a minor, you're only adding a new switch
[16:47:43] <_joe_>	 but if you do, just add it
[16:47:45] <ottomata>	 minor is for new feature, no?  patch is for bugfixes etc?
[16:47:58] <_joe_>	 right yes, so this is indeed more of a feature
[16:48:37] <_joe_>	 and it could be disruptive, potentially
[16:48:45] <ottomata>	 k i've added, but sextand didn't seem to find 1.1.0
[16:48:48] <ottomata>	 well its backwards compatible :)
[16:48:56] <ottomata>	 as long as public_port is defined everywhere, which ...it should be?
[16:49:06] <_joe_>	 who knows!
[16:49:17] <_joe_>	 so, you should also update package.json in the chart
[16:49:26] <_joe_>	 it still points to 1.0 I guess?
[16:49:30] <ottomata>	 oh package.json, right
[16:49:30] <ottomata>	 ka
[16:50:25] <_joe_>	 in theory, you should be able to do $ sextant update _scaffold mesh.service
[16:50:31] <_joe_>	 but... try it :P
[16:51:24] <_joe_>	 I have a working implementation on my computer, but it needs a couple finishing touches
[16:51:57] <ottomata>	 hm, _joe_  i think i need to make a new name_1.1.0 after all, just to fix dependency issues
[16:52:03] <_joe_>	 basically it would go around a charts tree and find all charts using that module, and update the package.json and then vendor dependencies
[16:52:04] <ottomata>	 Module mesh.configuration:1.0.0 (required by mesh.name:1.0.0) is incompatible with module mesh.configuration:1.1.0
[16:52:13] <_joe_>	 ah uh, yes
[16:52:19] <ottomata>	 k
[16:52:48] <_joe_>	 this isn't great for code review, heh
[16:53:09] <ottomata>	 _joe_:  reminds me of our versioned event schema repos :p
[16:53:32] <_joe_>	 yeah whatever way you go, it's gonna suck
[16:53:34] <_joe_>	 but you know
[16:53:39] <ottomata>	 except instead of Jsonschema $ref pointers we've got tpl defines
[16:53:41] <_joe_>	 we can just download the patch and use diff
[16:54:08] <ottomata>	 hehe, at least with our eventschemas we only have one file  (current.yaml) to edit, and the rest is 'generated' from that?  :p
[16:54:10] <ottomata>	 but yeah.
[16:54:41] <_joe_>	 I'm open to ideas to improve upon this btw
[16:54:59] <_joe_>	 this is all an attempt to fit a square in a round peg
[16:55:08] <_joe_>	 use a templating language like it was software libraries
[16:55:13] <ottomata>	 which is the hole and which is the peg ? :p
[16:55:31] <_joe_>	 and the square is fighting back :P
[16:56:45] <ottomata>	 ooo, we got a circular dependency tho
[16:56:52] <ottomata>	 Module mesh.name:1.0.0 (required by base.meta:1.0.0) is incompatible with module mesh.name:1.1.0
[16:56:58] <ottomata>	 any way we can avoid using mesh from base?
[16:57:47] <_joe_>	 uhm wait
[16:58:26] <_joe_>	 why are you updating mesh.name though?
[16:59:05] <_joe_>	 and yes, probably we can move some functions to avoid that dependency
[17:00:10] <wikibugs>	 10serviceops, 10DC-Ops, 10ops-eqiad, 10Patch-For-Review: hw troubleshooting:  CPU1 machine check error on parse1002.eqiad.wmnet - https://phabricator.wikimedia.org/T326119 (10Clement_Goubert) >>! In T326119#8499305, @gerritbot wrote: > Change 875360 **merged** by Clément Goubert: > %%%[operations/puppet@pr...
[17:00:14] <_joe_>	 also, sorry, I really GTG
[17:00:24] <ottomata>	 if i don't update mesh.name
[17:00:25] <ottomata>	 Module mesh.configuration:1.0.0 (required by mesh.name:1.0.0) is incompatible with module mesh.configuration:1.1.0
[17:00:35] <ottomata>	 okay no worries!
[17:03:11] <_joe_>	 if you have something, I can take a look
[17:03:27] <ottomata>	 k will push with name 1.1.0
[17:03:29] <_joe_>	 tomorrow morning I mean
[17:03:32] <ottomata>	 yaya
[17:03:34] <ottomata>	 ty
[17:04:31] <wikibugs>	 10serviceops, 10Data-Engineering-Radar, 10MW-on-K8s, 10Patch-For-Review: IPInfo MediaWiki extension depends on presence of maxmind db in the container/host - https://phabricator.wikimedia.org/T288375 (10Clement_Goubert) PSP needs to be updated before we can deploy.
[21:15:41] <wikibugs>	 10serviceops, 10SRE, 10Thumbor, 10Thumbor Migration, 10User-jijiki: Upgrade Thumbor to Bullseye - https://phabricator.wikimedia.org/T216815 (10VirginiaPoundstone)
[21:59:46] <wikibugs>	 10serviceops, 10GitLab, 10serviceops-collab, 10Kubernetes: Trusted gitlab runner containers need access to staging k8s cluster - https://phabricator.wikimedia.org/T325385 (10dancy) I verified today that trusted runners can now complete a network connection to kubestagemaster.svc.eqiad.wmnet:6443 so that pa...