[09:02:00] Morning :) [09:05:52] * Emperor grudgingly concedes that it is morning :) [09:21:38] <_joe_> hnowlan: I was thinking we could convert thumbor to use modules now, given it's still nto in production [09:21:39] <_joe_> :) [09:41:45] hello folks [09:42:03] hey elukey :) what's up? [09:43:05] I am checking https://github.com/kubernetes/kubernetes/tree/master/build/pause for the pause container's image, the idea is to have something in production-images to build on our own. The kubernetes repo is gigantic, so pulling it down only to build pause.c seems.. not great. Plus IIUC docker is used in the Makefile to build it, so my proposal would be to just pull pause.c and compile it in [09:43:11] our build image. [09:43:14] does it make sense? [09:43:19] (and then the final image will run the binary) [09:44:03] basically build https://github.com/kubernetes/kubernetes/blob/master/build/pause/linux/pause.c with gcc using the Makefile's flags etc.. [09:44:20] super light and it should work nicely [09:48:25] Err yeah, that seems simpler than what they're doing, which is using a crossbuild image (and why they're using docker in the Makefile) [09:48:34] s/and/hence/ [09:48:52] it seems a little overkill for our use case [09:48:58] Agreed [09:49:18] ack will try to work on a patch today :) [09:49:21] I'll let the others chime in, but what you're proposing makes sense to me [09:55:06] claime: Janis declared yesterday that I am not anymore in his attention set so I can write anything :D [09:55:35] lol [09:55:46] ejected from attention set [09:56:57] :-p [09:57:04] makes sense to me as well [09:58:51] <_joe_> elukey: I assumed that was the way to go [10:03:14] 4/win 11 [10:45:03] _joe_: by modues what do you mean? [10:45:40] <_joe_> hnowlan: moving from the old common_templates to the new structure we created under modules [10:45:55] <_joe_> I'd show yu if we had gerrit :P [10:48:37] is there a way to temporarily disable systemd targets in a way that puppet does noe re-enable them? I tried masking, but that got fixed by puppet as well [10:49:15] That's a good question. [10:49:29] Mask, chattr +i the masking file ? :´) [10:49:33] (don't) [10:52:38] I don't really see a way to do it and not have the puppet run fail [10:52:54] _joe_: ohhhh right, sorry I thought you meant reorganising the thumbor code somehow and panicked [10:53:07] could be fun yeah [10:53:07] <_joe_> ahah [10:53:30] <_joe_> jayme: you want to disable puppet [10:53:36] <_joe_> puppet is all or nothing, really [10:54:27] yeah, I'm with _jo.e_ on this, best way is to disable puppet, because anything else will make it fail [10:57:57] hmm... [10:58:02] okaay [11:17:30] <_joe_> jelto, arnoldokoth , sobanski we have a gerrit situation ongoing, your input is requested in #-operations [11:18:16] Looking [12:32:09] volans: if you're feeling adventurous, this is the error we were seeing :) https://phabricator.wikimedia.org/T323114 however it's mostly a question of debugging how thumbor's inheritance patterns work and tornado's own execution model so it's not a very worthwhile puzzle [12:32:32] odds are there's a missing encode() or some automatic decoding of something urlencoded [12:33:15] latin-1? [12:34:05] I'm not sure I want to know why tornado is trying to encode in latin1 and is hardcoded [12:36:23] :D [12:36:29] from one issue: "any bytes can be considered latin1 so we can smuggle bytes through latin1" [12:37:25] but I am fairly sure that we're triggering this as a result of some new behaviour rather than it being (specifically) a problem with tornado itself [12:37:47] is this happening with all URLs with non-ascii chars? [12:38:28] or specific to some subset of chars? [12:39:53] that seems the url, if thumbor passes it starting from 'thumb/0/01/NLC403-3120....' [12:40:13] most URLs with non-ascii afaics [12:40:18] it clearly works in utf8 [12:40:24] I've seen it with chinese and cyrillic characters so far [12:41:34] * volans checking RFCs [12:42:19] I think you can %url-encode them: https://www.rfc-editor.org/rfc/rfc2616#section-3.2.3 [12:43:10] the thing that is most puzzling here is that this URL works fine with the python2-based thumbor [12:43:43] given encoding and string/bytearray changes in python 3 it seems most likely this is an accidental new behaviour rather than a bug [12:44:20] that URL can't be encoded in latin1 in py2 either (different error): [12:44:23] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 37: ordinal not in range(128) [12:44:46] but sure, the diffs for encoding in py2/3 are huge so it's very possible that the behaviour changed [12:45:00] based on how the conversion both in thumbor and tornado has been done [12:45:36] maybe before tornado was expecting bytes and not str? [12:46:24] yeah :/ I really feel like it's gonna be a one-line fix once we find the line [12:46:45] but getting a repro test case is also proving a bit annoying, working with S&F atm to try to get something offline [12:47:05] the tornado trace isn't very useful because it doesn't even show *where* it's coming from :D [12:48:13] you should probably log/print/inspect the headers inside _write_results_to_client [12:48:24] before they are passed to tornado [12:48:41] that would be my first guess with zero-context on the code [15:09:59] <_joe_> I'm still looking at lines.extend(line.encode("latin1") for line in header_lines) with a lingering sense of doom [15:10:09] <_joe_> do we REALLY want to use tornado? [15:19:43] _joe_ can you add more info about the FROM ARCH? [15:20:08] I don't get what I should change (also the original docker file states FROM BASE) [15:20:16] <_joe_> look at the dockerfile in the k8s repo for pause [15:20:53] yes but we don't cross compile, ARCH seemed pointless [15:21:17] or you mean that I should use their base image? [15:22:24] <_joe_> I mean use FROM scratch [15:22:32] <_joe_> or one of the base images [15:24:47] ok so FROM {{ "bullseye" | image_tag }} ? [15:24:54] for example? (to undestand) [15:25:32] I thought that seed was ok, but it is my misunderstanding, I keep being confused about their differences [15:40:23] <_joe_> elukey: I'll try to take a look once I'm done with what I'm busy with rn [15:43:26] FROM {{ "bullseye" | image_tag }} seems the usual pattern for a build then run Dockerfile here, but idk the diff with FROM seed [15:46:24] <_joe_> AFAICT the google way is to build a bare image with just the executable [15:46:38] <_joe_> anyways [15:46:56] Then FROM scratch makes sense [15:47:07] <_joe_> I'll take a better look in a few [15:59:11] I am not really sure what FROM scratch is in the production-images context [16:01:26] <_joe_> same as for any docker daemon [16:01:33] <_joe_> it's an empty container [16:03:40] ok so I can use it directly in there okk [16:03:47] never really used it [16:07:22] <_joe_> elukey: let me do one test first [16:08:06] _joe_ I am going to test the pause image with docker as well (sharing the namespaces with an echo server etc..) [16:08:25] <_joe_> elukey: I just built using FROM scratch and it works [16:08:39] <_joe_> if you prefer you can start from bullseye, but what's the point? [16:09:47] yep yep I didn't know about scratch and how to use it, fine for me [16:09:56] just tested it again, works fine afaics [16:10:35] but I added some generic questions in the code review's log, I am not sure how this new version will play with the "latest" one that we currently have [16:13:12] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Decommission mw13[07-48] - https://phabricator.wikimedia.org/T306162 (10ayounsi) I was wondering if there was any timeline for this, to unblock {T308339} Thanks! [16:19:12] <_joe_> we push :latest with docker-pkg [16:19:21] <_joe_> so you need first to create another tag of the old image [16:49:56] 10serviceops, 10dev-images, 10Release-Engineering-Team (Seen): Sync node versions between docker dev and slim images - https://phabricator.wikimedia.org/T265554 (10jijiki) 05Open→03Resolved a:03jijiki Bluntly closing this, please reopen if needed [16:58:07] _joe_ ahh okok I didn't know the bit of pushing a tag for the old image, can I do it via docker-registryctl ? [16:58:46] <_joe_> docker pull docker-registry.discovery.wmnet/pause:latest [16:59:00] <_joe_> docker tag docker-registry.discovery.wmnet/pause:latest docker-registry.discovery.wmnet/pause:old [16:59:13] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Create mw-web helmfile deployment - https://phabricator.wikimedia.org/T321900 (10Clement_Goubert) 05In progress→03Resolved [16:59:15] <_joe_> docker push docker-registry.discovery.wmnet/pause:old [16:59:22] <_joe_> on the build server [16:59:25] <_joe_> should do the trick [16:59:25] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [16:59:35] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Create mw-jobrunner helmfile deployment - https://phabricator.wikimedia.org/T321897 (10Clement_Goubert) 05In progress→03Resolved [16:59:47] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [17:00:07] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [17:00:17] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Create mw-api-ext helmfile deployment - https://phabricator.wikimedia.org/T321896 (10Clement_Goubert) 05In progress→03Resolved [17:00:27] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Create mw-api-int helmfile deployment - https://phabricator.wikimedia.org/T321895 (10Clement_Goubert) 05In progress→03Resolved [17:00:40] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [17:00:56] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Stop spamming SAL with helmfile on scap deployments - https://phabricator.wikimedia.org/T323296 (10Clement_Goubert) [17:01:45] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Stop spamming SAL with helmfile on scap deployments - https://phabricator.wikimedia.org/T323296 (10Clement_Goubert) 05Open→03In progress p:05Triage→03Medium [17:01:57] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Deploy mediawiki kubernetes services - https://phabricator.wikimedia.org/T321786 (10Clement_Goubert) [17:05:00] thanks, noted all in the task [17:16:36] 10serviceops, 10SRE, 10Wikidata, 10wdwb-tech: Hourly read spikes against s8 resulting in occasional user-visible latency & error spikes - https://phabricator.wikimedia.org/T264821 (10jijiki) 05Open→03Resolved a:03jijiki Bluntly closing [17:50:40] I'm off, ttyt [18:27:25] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Stop spamming SAL with helmfile on scap deployments - https://phabricator.wikimedia.org/T323296 (10JMeybohm) helmfile_log_sal has support for that already: ` # Allow to explicitely suppress logging to SAL SUPPRESS_SAL=${SUPPRESS_SAL:-false} `