[07:43:13] o/ [07:43:28] fabfur: I noticed that there is an alert for pybal: https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DCheck%20if%20Pybal%20has%20been%20restarted%20after%20pybal.conf%20was%20changed [07:43:52] is it related to the work on proxoid that we were doing with Effie? [07:45:27] that has been ongoing since monday [07:45:48] I asked about it yesterday [07:47:39] last time I checked yes was for proxoid [07:48:44] after the change because it didn't work I think pybal was not restarted in the other dc [07:49:04] yeah I think only one pybal was restarted for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188309 [07:49:49] but I think it is good to have wrr instead of maglev for the current config, namely without ipip encapsulation [07:50:13] alright, when fabfur is around, can sort it [07:50:23] and in any case we should not leave pybal half-started to avoid the alert and also surprises in case of a restart for other reasons [07:50:55] the other thing worth to note is https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=alertname%3DKafka%20broker%20TLS%20certificate%20validity [07:51:13] I wrote in the serviceops chan as well yesterday, we'd need to roll restart the main kafka brokers [07:52:00] worst case I'll do it during the MW window [07:55:48] effie: the other remaining thing is to move to maglev and ipip_encapsulation.. I can already picture Valentin coming back from holidays, swearing in multiple languages :D [07:56:17] elukey: oh valentin +1ed this LVS :p [07:56:43] we can def suggest we move it, but he cant swear :p [07:59:35] mmmm ok [08:01:32] effie: all right if you know the risks I'll not say anything more :D [08:01:49] hahaha [08:02:24] I already owe him, so I will just add some more coffee bags :p [08:24:27] I think we don't need to move to MH && ipip encapsulation RN [08:24:41] but I can restart pybal on the remaining lvs hosts and clear the alert [08:24:56] thx [08:40:41] {{done}} [09:20:03] oncallers: I am going to roll restart kafka main codfw to pick up the new certs [09:22:27] thanks [09:26:38] k [09:43:40] kafka restarts done [10:01:35] I am going to deploy mobile apps in eqiad to upgrade the statsd image [10:02:00] already pinged Yiannis for guidance, there shouldn't be anything specific to do [10:08:25] deployed, I am watching metrics and I don't see anything horrible. The only thing that popped up right after I deployed was https://grafana.wikimedia.org/d/000000208/edit-count?orgId=1&viewPanel=panel-13&from=now-3h&to=now&timezone=utc [10:24:51] all good now