[16:21:58] what happened to cause this jump at 22:40 on Monday? https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red?orgId=1&refresh=1m&from=now-2d&to=now&viewPanel=39 [19:17:48] having an odd issue in the k8s jobrunners, our job that talks to https://cloudelastic.wikimedia.org:9243, via envoy on localhost:6105, is failing to connect with a TLS error. Using mw-debug-repl can reproduce. Curiously curl in php can talk to cloudelastic, but envoy is failing. [19:47:25] hnowlan: any chance you know how to move the jobs in ^^ back to the normal job runners? This bug combines with another bug and ends up increasing cirrusSearchLinksUpdate jobs from ~400/s to 800-1k/s [20:19:28] ebernhardson: a random theory is that cloudelastic uses acme-chief/LE certs but the envoy config is validating things against the wmf internal ca bundle [20:23:13] taavi if the hosts are using Puppet 7, that means they use acmechief, right? [20:23:43] taavi: yea seems possible, i haven't yet dug into how that envoy container works but it seems like its the next step [20:23:50] inflatador: this is k8s so no puppet [20:24:54] ebernhardson let me verify, but I think the cloudelastic hosts are on puppet 7 [20:25:16] inflatador: oh, i was thinking the other side. Indeed cloudelastic hosts are puppet, the envoy side is k8s [20:26:38] ebernhardson I was wrong, looks like the hosts are on puppet 5.5. hmm, I remember asking for Puppet 7 when I reimaged theses guys [20:28:14] how does the puppet version affect anything here? [20:29:00] dwisehaupt: hi, you have a patch waiting [20:29:17] let me know once it's merged in puppetmaster [20:29:55] taavi one of the hiera values you have to set to use puppet7 is `acmechief_host: ${HOST}`, so I was thinking acmechief certs were somehow involved. LMK if not [20:39:06] Amir1: sorry, i think that's the one that jeff +2'd then reverted since no one was around at the time [20:39:18] if you are able to do it now, i could +2 it again. [20:39:31] inflatador: no, it's just that puppet 7 talks to acme-chief in a slightly different way, so if a host on puppet7 wants to use acme-chief it needs to talk to a different host than if a puppet5 host wants to do that [20:39:34] the revert didn't go in [20:39:39] dwisehaupt: I merge it [20:39:49] ok. cool. thanks. [20:39:58] not sure how to clean that up if it needs cleaning. [20:41:50] taavi ACK, thanks for clearing that up [20:58:43] Yeah.. it's just a matter of trusting the right CA [20:59:02] (Puppet 5 and puppet 7 don't use the same CA) [22:13:33] Does envoy log more verbosely to logstash or something? I'm looking in `/var/log/envoy` on my host and not seeing anything useful [22:58:44] "Warning: The directory '/srv/prometheus' contains 18196 entries, which exceeds the default soft limit 1000 and may cause excessive resource consumption and degraded performance."