[07:13:25] inflatador: o/ the alternative is to test the new cfssl PKI infrastructure (as suggested by taa*vi), there should be a test intermediate ca in deployment-prep. [07:14:33] it is a bigger work for sure since some extra puppet code will likely be needed to request the certs etc.. [07:15:13] * elukey hopes for a future in which we don't use the Puppet CA anymore [08:00:54] inflatador: if I understand correctly, cergen is about generating certs in a coherent way and publishing them in a standard play on the puppetmaster (modules/secrets/...). This is not coupled to how we then push those certs to the appropriate hosts. So you could tackle each problem separately and hack your way to a working solution on deployment-prep by manually generating the certs. [08:01:17] I might have a completely wrong understanding here, I've never actually touched cergen. [08:09:47] elukey: let's just drop TLS altogether, too much complexity there! [08:11:38] gehel: sure seems legit :) [09:19:47] I've added two prometheus hosts to conftool in https://gerrit.wikimedia.org/r/c/operations/puppet/+/757612 and puppet-merged the change, I still don't see the two hosts in https://config-master.wikimedia.org/pybal/codfw/prometheus have you run into this before? [09:22:08] my expectation is to see the two nodes there (and depooled) [09:23:17] godog: they won't show up on config-master if they have pooled=inactive (opposed to pooled=no) [09:26:38] taavi: thank you, that explains! the hosts are indeed inactive [09:28:20] I'll update https://wikitech.wikimedia.org/wiki/Conftool once I'm done [09:29:36] <_joe_> godog: you also need to set a weight for them [09:30:14] yeah I did _joe_ [09:38:09] I haven't added a node to conftool in a while (clearly) and I must say I found the previous behaviour (i.e. the host has a default weight, and is ready to be pooled) to be more intuitive/less work [09:38:18] (wikitech updated) [09:45:35] <_joe_> godog: OTOH this avoids servers being added to a pool when they're not ready [09:45:50] <_joe_> which has been a repeated issue [09:50:36] In 10 minutes we are switching m1 master, a few seconds of RO are expected. Affected services at: https://phabricator.wikimedia.org/T299624 [09:50:58] double checked dbbackups db, no activity expected there [09:52:03] _joe_: thank you for the context, I didn't know that [09:52:04] I will shutdown bacula dir just out of extrem precaution [10:17:54] jbond moritzm can you confirm if cas and pki are working fine? we did a master swap on m1 [10:23:19] cas seems fine, I'll also quickly test a token re-registration just to be sure [10:23:51] thanks! [10:29:25] yeah, confirmed to work fine [10:29:56] thanks! [11:25:24] <_joe_> I would double-check the PKI works, I've seen the code for that thing [11:25:39] <_joe_> jayme can confirm [11:40:19] lol [11:42:44] marostegui: sorry was in a meeting, confirmed the cfssl stuff is still working [11:42:54] thanks jbond! [11:48:15] hey Daimona__ [11:48:41] yo :) [11:49:20] Daimona__: the emergency thresholds for abusefilter do not block 'warn' or 'disallow' right? [11:49:33] only block, rangeblock, degroup, etc. [11:50:46] there's a report at T299868 that it might not be working as expected [11:50:46] T299868: Abusefilter throttle is too low on Incubator - https://phabricator.wikimedia.org/T299868 [11:50:57] Correct [11:53:56] AF logs whenever an action is skipped due to throttling, and I can't see any such entry for incubatorwiki in the last 20 days [13:04:31] hi folks, I have a potentially breaking change for rsyslog [13:04:32] https://gerrit.wikimedia.org/r/c/operations/puppet/+/739463 [13:04:41] it has been tested in deployment-prep too [13:05:07] it should work but it may raise some issues, due to how big it is I'd be inclined to merge and check as puppet roll it out [13:05:14] lemme know if you have concerns etc.. [13:05:31] (the idea is to swap the CA bundle to allow PKI-based Kafka brokers) [13:18:24] elukey: i always dislike uncertainty. any way you can make it a guaranteed breaking change? [13:19:59] kormat: the fact that I am merging it should be a solid proof of that [13:20:07] :D [13:21:18] <$ [13:21:20] <3 [13:40:00] merged, lemme know if you see issues later on [13:40:12] I'll keep an eye on puppet failing and kafka logging's traffic [13:41:31] taavi: o/ as FYI I merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/739463 that should work in deployment-prep, but if you see anything breaking lemme know (thanks!) [13:58:12] elukey: ack, thanks! [14:56:30] <_joe_> elukey: can you take a particularly hard look on mw* ? [14:56:51] <_joe_> rsyslog is how we relay all app-level logs to logstash/centrallog [14:58:10] _joe_ so far I haven't seen any error rsyslog -> kafka in various syslogs, and the overall traffic towards the kafka clusters didn't vary [14:58:31] I just checked an api appserver and it looks good [14:58:40] I can check in logstash as well [15:01:29] <_joe_> elukey: I'm sure mmkubernetes will fail silently just because it dislikes us [15:02:26] _joe_ you can even say me I can take it, k8s hates me [15:03:52] (checked the mediawiki logstash dashboard for the past 2h, number of msgs look ok) [16:59:03] hey _joe_, can you have a look at my re to your -1 at https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/749734? Thanks! [17:08:34] <_joe_> urbanecm: sigh I was sure I did remove it, sorry :(( [17:08:42] <_joe_> (in a meeting atm) [17:09:26] thanks for the +1 _joe_ – enjoy the rest of your meeting :) [17:09:39] <_joe_> "enjoy" [17:09:46] well, yeah :) [18:27:10] does anyone know if 'web*.example.com' is an acceptable subject or altname for an SSL cert? re: https://gerrit.wikimedia.org/r/c/labs/private/+/757699/comment/d1e89906_0dc57193/ [18:37:19] TIL. yeah, looks like it is valid: https://datatracker.ietf.org/doc/html/rfc2818#section-3.1 [19:30:29] dwisehaupt awesome, thanks for looking that up [19:31:37] np. mostly i was curious. :)