[09:08:38] jbond, moritzm when you have the chance... https://gerrit.wikimedia.org/r/c/operations/puppet/+/977997 [09:11:40] having a look in a few [09:21:04] jbond: puppet is disabled on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet due to the the deployment of https://gerrit.wikimedia.org/r/c/operations/puppet/+/976273 (already reverted per gerrit and the revert has also been reverted) so when could we enable puppet on those hosts? :_) [09:21:31] (almost 24h without puppet on those hosts) [09:21:38] who owns phabricator? is it releng still? [09:21:41] hashar: ^ [09:23:19] Essentially I want to tag the right team at https://phabricator.wikimedia.org/T352149 just for them to be aware that I will be putting phabricator in RO this week, I don't need them to be present, just as a heads up [09:24:42] marostegui: you might want to notify seriveops-collabs too (from /etc/wikimedia/contacts.yaml ;) ) [09:24:53] wilco [09:24:57] thanks volans [09:28:16] marostegui: Release engineering owns Phabricator yes with Andre holding the administrative process, Brennen the code base/deployment etc [09:28:33] SRE service collab offers the support for the underlying infrastructure sustaining the app [09:29:59] I don't know what kind of action has to be conducted on Phabricator while the DB is moved :-\ [09:36:48] hashar: Nothing, I will put it in RO for a few seconds and that's it [09:45:41] BTW.. I'm the only one under the impression that puppet-merge is slower than before? [09:46:38] a bit slower is expected [09:46:56] a lot not, and there is a patch ready to be merged to gather metrics into prometheus for it [09:47:09] so that we'll be able to get actual data [09:56:29] volans: `git gc --auto` the local copy maybe and check whether the local copy has a git procotol version 2 enabled which dramatically help fetching patches ( `git config --get protocol.version` , see https://phabricator.wikimedia.org/J199 ) [09:56:40] or whatever cause the slowness really :-] [09:58:37] it's a new way of deploying puppet code to avoid race conditions [09:58:40] is not git slow :D [09:58:46] we're doing more opertions, hence is slower [10:14:06] !incidents [10:14:06] 4284 (ACKED) ProbeDown sre (198.35.26.98 ip4 ncredir-https:443 probes/service http_ncredir-https_ip4 ulsfo) [10:14:06] 4283 (RESOLVED) HaproxyUnavailable cache_text global sre () [10:14:06] 4282 (RESOLVED) [2x] ProbeDown sre (ncredir-https:443 probes/service ulsfo) [10:23:27] !incidents [10:23:28] 4284 (RESOLVED) ProbeDown sre (198.35.26.98 ip4 ncredir-https:443 probes/service http_ncredir-https_ip4 ulsfo) [10:23:28] 4283 (RESOLVED) HaproxyUnavailable cache_text global sre () [10:23:28] 4282 (RESOLVED) [2x] ProbeDown sre (ncredir-https:443 probes/service ulsfo) [10:34:03] vgutierrez: puppet is enabled now [10:34:19] jbond: thx <3 [10:37:57] gotta love pwru [10:38:01] 0xffff8bca25e37000 0 [pwru(327553)] kfree_skb_reason(SKB_DROP_REASON_IP_RPFILTER) 198.35.26.12:12250->198.35.26.98:80(tcp) [11:06:32] <_joe_> jbond: for hosts running puppet 7, where do I find the volatile directory? [11:07:36] _joe_: /srv/puppet_fileserver/volatile/ [11:07:40] are you seing some issue? [11:07:44] <_joe_> jbond: thanks :) [11:07:50] <_joe_> jbond: not with puppet 7 itself [11:07:52] <_joe_> :) [11:08:09] ok well let me know if you want me to look at anything [11:19:06] <_joe_> nah it's just that we've introduced the use of a geoip database discontinued in april 2022... in september 2022 [11:20:25] ahh i see :/ [11:22:19] <_joe_> so the old files are still in puppet5's volatile [11:22:23] <_joe_> but not on puppet7's [11:25:52] _joe_: thats possible when i did the migration i looked for what was actully used in the puppet repo and only migrated them [11:31:01] joe actully both systems have the same config. however there is nothing that cleans up that dir. so i guess those files just exist from when puppet deployed them and they never got deleted [11:31:52] _joe_: https://gerrit.wikimedia.org/r/c/operations/puppet/+/942453 [11:32:54] <_joe_> jbond: that's not it, those formats were not provided since april 2022 [11:33:02] <_joe_> so they're just there lying around outdated [11:59:02] sounds like me of an evening ;p [14:19:43] hmmm looking into why I need to explicitly disable rp_filter in our instances I see that's been enabled by default on base::sysctl porting some Ubuntu defaults 10 years ago [14:20:27] so rather than enabling and then disabling it again with some sysctl::parameters on top of sysctl::parameters it makes sense to me make it optional [14:23:22] !incidents [14:23:23] 4285 (ACKED) HaproxyUnavailable cache_text global sre () [14:23:23] 4284 (RESOLVED) ProbeDown sre (198.35.26.98 ip4 ncredir-https:443 probes/service http_ncredir-https_ip4 ulsfo) [14:23:24] 4283 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:23:24] 4282 (RESOLVED) [2x] ProbeDown sre (ncredir-https:443 probes/service ulsfo) [14:30:25] vgutierrez: +1 on making this a Hiera flag [14:35:33] !incidents [14:35:34] 4285 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:35:34] 4284 (RESOLVED) ProbeDown sre (198.35.26.98 ip4 ncredir-https:443 probes/service http_ncredir-https_ip4 ulsfo) [14:35:34] 4283 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:35:34] 4282 (RESOLVED) [2x] ProbeDown sre (ncredir-https:443 probes/service ulsfo) [14:47:27] vgutierrez: moritzm: fyi i did start work on a patch to make this a bit ore configuerable https://gerrit.wikimedia.org/r/c/operations/puppet/+/662932 [14:50:56] !incidents [14:50:56] 4286 (UNACKED) HaproxyUnavailable cache_text global sre () [14:50:57] 4285 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:50:57] 4284 (RESOLVED) ProbeDown sre (198.35.26.98 ip4 ncredir-https:443 probes/service http_ncredir-https_ip4 ulsfo) [14:50:57] 4283 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:50:57] 4282 (RESOLVED) [2x] ProbeDown sre (ncredir-https:443 probes/service ulsfo) [14:51:01] !ack 4286 [14:51:02] 4286 (ACKED) HaproxyUnavailable cache_text global sre () [14:51:25] hehe, this looked vaguely familiar until I actually watched at the date :-) [14:57:53] !incidents [14:57:54] 4287 (ACKED) HaproxyUnavailable cache_text global sre () [14:57:54] 4286 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:57:54] 4285 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:57:54] 4284 (RESOLVED) ProbeDown sre (198.35.26.98 ip4 ncredir-https:443 probes/service http_ncredir-https_ip4 ulsfo) [14:57:55] 4283 (RESOLVED) HaproxyUnavailable cache_text global sre () [14:57:55] 4282 (RESOLVED) [2x] ProbeDown sre (ncredir-https:443 probes/service ulsfo) [14:57:57] !ack 4287 [14:57:58] 4287 (ACKED) HaproxyUnavailable cache_text global sre () [15:33:47] _joe_: fyi, if you need telemetry on k8s jobrunners, arclamp picked it up automatically: https://performance.wikimedia.org/php-profiling/ https://performance.wikimedia.org/arclamp/svgs/daily/2023-11-27.excimer-k8s-wall.RunSingleJob.svgz [15:34:48] oh nice [15:36:12] <_joe_> noice :) [15:36:18] <_joe_> hnowlan: ^^ [15:37:01] oh rad! [15:37:45] hnowlan: Today you get instrumentation apparently xD [15:37:47] this will be particularly cool to see as we add more jobs [15:51:15] jbond: your CR looks interesting but it isn't exactly the same scope? [15:51:42] vgutierrez: i guess you are talking about the ssl cr? [15:51:57] jbond: no sorry, the performance tweaks one [15:52:14] ahh that one [15:52:50] tbh its a while since i looked at that but the idea of it was to make all of the sysctl things we apply accross the fleet configuerable [15:53:10] that said i also wonderd if we can get a set of defaults that are sane everywhere [15:53:37] so i was in too minds about it but if its intrested please comment on the cr and i can pick it up again [16:36:20] jbond: I'm going small with https://gerrit.wikimedia.org/r/c/operations/puppet/+/978088/1 [16:37:19] vgutierrez: yes +1 my patch is defeintly way too much scope creep for this one toggle :) [19:56:47] bblack /traffic I'm about to update DNS for the first time if https://gerrit.wikimedia.org/r/c/operations/dns/+/978131 passes review, anything I need to know beyond https://wikitech.wikimedia.org/wiki/DNS#authdns-update ? [20:05:09] ^^ went ahead and merged in the interest of time, everything looks OK so far [21:13:34] inflatador: ack, looks fine anyways :)