[07:59:57] hello on-callers [08:00:26] I am going to restart kafka logging clusters to pick up some threads setting changes, already applied to kafka main yesterday and it should only improve perfs [08:00:31] if you see anything weird lemme know [08:13:40] ack, thanks [10:54:28] update - I am doing a roll restart of logging-eqiad (kafka), some brokers already restarted and all is looking good [10:55:05] since the cookbook takes a lot of time to run to be safe etc.., I'll take lunch and get back later, but the cookbook is in a tmux on cumin1001 if you need to check it [14:54:27] does anyone know if there's a way to do dynamic firewall config w/our puppet config? Like "all servers with this role applied can talk to each other"? [15:01:57] I think there are more like a bunch of workarounds around that [15:02:15] such as using resources or hiera keys [15:02:29] yeah, usually a static set of hiera values is used [15:02:47] you could probably write something with wmflib::puppetdb_query but you will encounter consistency problems [15:03:18] when you're adding a new host D, you'll have to run puppet there to get the roles in the puppetdb at all, and then once that is successful, re-run puppet on hosts A,B,C [15:03:48] and if D needs A,B,C to be accepting connections from D to bootstrap itself or similar, then you'll need to accept that the first puppet run on D fails and then re-run it on D after you've re-run it on A,B,C [15:03:49] yeah, I was looking through all the indirection happening and was hoping it led to something dynamic ;( [15:04:03] hiera keys are not that bad because depending 100% on roles could be an issue to get thing fully automated fixed to the roles [15:04:06] the problem is that doing it in the dynamic way results in more trouble than it creates [15:04:20] ^what cdanis says [15:04:32] er, more trouble than it solves [15:04:42] if you end up automating that, the `wmflib::*::hosts` functions are likely a better option that raw puppetdb queries. the same issues exist, but the code is a bit neater [15:04:43] e.g. "this is a database host" vs "this is an active db host fully into produciton" [15:05:04] role vs pooled difference, for example [15:05:37] yeah, understood re: creating vs solving problems [15:06:31] you still see it done ofc https://gerrit.wikimedia.org/g/operations/puppet/+/313ed990877f7cf1278d898f64708ae584728f50/modules/profile/manifests/netbox/scripts.pp#54 [15:06:41] but I think usually not for stuff that is in the 'same' cluster, only related ones [15:11:09] sounds like it's not worth the effort for my use case...thanks for clearing that up though [15:11:12] inflatador: Is this for zookeeper, by any chance? [15:16:31] btullis Y, looks like the magic happens in hieradata/common.yaml ... if you have any further suggestions LMK [15:22:51] Cool. Was just wondering. Each of the existing zookeeper clusters has a static set of hostnames in common.yaml as you noticed and a cllient srange that gets added with this hiera key: `profile::zookeeper::firewall::srange` [15:23:44] I can't immediately see how the cluster members talk to each other though, now you mention it. [15:27:39] has anyone used debian-glue for bookworm packages? [15:27:53] as in, has anyone built a package for bookworm that calls debian-glue? [15:28:03] we are seeing some weird build failures and wanted to check if it just us [15:31:32] * jhathaway googles for debian glue [15:31:34] yeah, looks like you need entries in hieradata/common.yaml , modules/base/templates/firewall/defs.erb, modules/profile/manifests/firewall.pp for any role you create. I should probably check docs or document this if it's not up somewhere [15:34:03] inflatador: Agreed - Looks like the docs are bit on the short side. [15:37:45] btullis just got the patch up https://gerrit.wikimedia.org/r/c/operations/puppet/+/940243 [15:39:18] sukhe: huh, I'm a bit surprised you need special tooling to build inside jenkins [so, err, sorry, no I haven't] [15:40:31] Emperor: regarding T211661, I'm confused by 149M being the number of thumb requests on all eqiad frontends in a 24-hour period. Is that cache misses only? [15:40:33] T211661: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 [15:40:37] Emperor: this is the "build a debian package" CI bit. https://wikitech.wikimedia.org/wiki/Debian_Glue [15:41:26] we serve about 7M requests a minute at frontends, although that's from all POPs, not 1 PoP. Still 149M seems very low for a full day. [15:50:19] Krinkle: yes, that's the ones that get to the swift frontends [15:54:14] sukhe: TIL [I've only built things from gitlab, and I tend to basically just use dgit :) ] [15:57:01] Emperor: do you have anything on the WMF gitlab? we built a custom thing for WMCS use cases, but it might be worth to standardize on something instead of everyone rolling their own solution [15:57:34] Emperor: guess I will be reaching out to you soon for the gitlab stuff once we have to move our repos there :) [16:02:08] hmm, looks like using the string "cluster" in my role makes puppet think it should defined be in 'wikimedia_clusters' [16:04:20] inflatador: Ah, maybe just go back to `role::zookeeper::flink` then, as we talked about here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/940243/comments/a7eadad6_d3ee097b [16:05:37] taavi / sukhe: I have some ad-hoc package building on gitlab; I have a KR this quarter to build some properly-production-quality packages on gitlab, and the gitlab folk also have a KR for this quarter to work improving the automation of package-build workflows on gitlab. So I'm hoping to have a neater story to tell about this soon :) [16:05:48] :) [16:16:51] Emperor: ah, so we're talking *swift* frontends in eqiad, not varnish frontends. [16:17:33] I hadn't heard the word frontend used in that context before, thanks :) [16:17:47] I guess every backend is just another one's frontend [16:53:57] it's frontends all the way back :) [17:11:52] btullis can you clarify https://gerrit.wikimedia.org/r/c/operations/puppet/+/940243/comments/3bc949de_907dffe9 ? Like, should I move `zookeeper_flink_cluster_hosts` after `druid_public_hosts` or something? [17:17:23] btullis nm, now I get it [17:17:58] put each hosts' IPs together instead of separating v4 vs v6 [17:33:40] Jenkins still bombing out https://gerrit.wikimedia.org/r/c/operations/puppet/+/940243 with `Cluster flink not defined in wikimedia_clusters` https://puppet-compiler.wmflabs.org/output/940243/2072/flink-zk1001.eqiad.wmnet/change.flink-zk1001.eqiad.wmnet.err . Anyone know why puppet thinks the cluster has to be defined in `wikimedia_clusters`? [17:34:59] hold up, I may have missed a firewall defs [17:44:22] becasue you have `cluster: flink` set in the role hieradata [17:46:56] thanks taavi ...let's see how the other ZKs are handling this. I must've missed something [17:49:03] https://github.com/wikimedia/operations-puppet/blob/production/hieradata/role/common/analytics_cluster/zookeeper.yaml uses the same var, but the folder is deeper [17:50:29] actually no, they are at the same folder depth...hmm [17:51:26] analytics is an existing cluster: https://gerrit.wikimedia.org/g/operations/puppet/+/f199a3e74d2bd6ed17a264d2ee6fb6cff815cd4f/hieradata/common.yaml#253 [17:57:39] arnoldokoth: can I bribe you into a +2 for https://gerrit.wikimedia.org/r/c/operations/puppet/+/941989 ? [17:57:44] or any other SRE around [18:45:05] RhinosF1: I'm all ears. :D [18:45:50] arnoldokoth: thank you! [18:49:56] arnoldokoth: you need to merge too, it's puppet [18:50:23] Ooops. On it. [18:52:17] RhinosF1: Done. [18:54:29] arnoldokoth: i just did https://gerrit.wikimedia.org/r/c/operations/puppet/+/941991 [18:59:02] Says it's a work in progress? [18:59:58] arnoldokoth: yes, i told mutante i'd help but I have no powers to break production puppet [19:00:04] trying to get the role to apply [19:02:19] I will need SRE to push the exciting buttons arnoldokoth [19:05:52] arnoldokoth: don't forget merge too [19:06:28] done! [19:06:33] let's see [19:10:03] Yeah, all good now. [19:10:16] arnoldokoth: https://gerrit.wikimedia.org/r/c/operations/puppet/+/941993 :) [19:10:48] do feel free to ssh wikistats-bookworm.wikistats.eqiad1.wikimedia.cloud so you can see the red [19:17:30] Merged. :) [19:18:46] arnoldokoth: that's more promising, it's not clean but i think it will work [19:19:14] i think there's some missing requires that at some point should be cleaned up [19:19:24] i will paste in phab for future task [19:33:20] arnoldokoth: i am very confused