[07:54:39] <elukey>	 Hello folks
[07:55:06] <elukey>	 if you need Superset/Turnilo's webrequest live data please note that we had a problem in the past couple of days: https://phabricator.wikimedia.org/T331801
[07:55:24] <elukey>	 traffic data seems normal now, but if you look back to hours ago there is only upload traffic registered
[07:55:32] <elukey>	 keep it in mind if anything occurs :)
[07:55:41] <elukey>	 rzl: --^ :)
[07:55:58] <elukey>	 (we'll need to add some traffic volume alerts to Benthos, or something similar)
[08:27:33] <elukey>	 the traffic volume reported for upload/text on webrequest_live vs 128 is still not right
[08:27:48] <elukey>	 benthos didn't go back to the previous traffic volume
[08:52:43] <elukey>	 going to test a theory with https://gerrit.wikimedia.org/r/c/operations/puppet/+/896043
[08:53:42] <elukey>	 I have seen in the past clients getting stuck in a weird way (like varnishkafka) when tcp connections where left hanging (like a hard reboot for a cp node etc..)
[08:53:53] <elukey>	 the consumer seemed stuck in a weird state
[08:54:22] <elukey>	 now I am wondering if on the Kafka broker side, the leader of the consumer group still thinks or have assigned some partitions to centrallog1001
[08:58:06] <elukey>	 ok 1001 is back in the consumer group
[09:24:01] <elukey>	 very weird, traffic volume increased and we got back into the "only-upload-data" weird state
[09:31:28] <elukey>	 left a note in the task about a possible alternative step, but I'll wait for some feedback before proceeding
[10:06:34] <elukey>	 also created https://gerrit.wikimedia.org/r/c/operations/puppet/+/897063 
[10:45:47] <elukey>	 ok I went forward and resetted the offsets like indicated in the task, the status of webrequest live was broken anyway
[11:12:49] <elukey>	 the situation improved a little, but benthos is now handling 1/3 of the original traffic before the centrallog1001 -> 1002 switch
[11:12:52] <elukey>	 that is very weird
[11:20:30] <elukey>	 (need to step afk, will check later(
[15:10:12] <rzl>	 elukey: oh wow, thanks so much for looking on the weekend
[16:15:18] <elukey>	 rzl: np! Sadly still not working as before, really werid
[16:15:20] <elukey>	 *weird
[17:01:07] <denisse>	 Hi, I'm here 
[17:01:21] <denisse>	 Is there anything I could do to help??
[17:02:26] <denisse>	 I made a failover of centrallog1001 -> centrallog1002 last week.
[17:02:27] <denisse>	 Do you think it may be related to this issue??
[17:07:37] <denisse>	 Sorry in advance if I broke anything.
[17:07:37] <denisse>	 I'm digging into the issue to understand what happened.
[17:11:44] <elukey>	 denisse: o/ o/ it should be related to the move from 1001 to 1002 but it is a kafka weirdness, not your fault don't worry :)
[17:11:56] <elukey>	 I am testing a few things, and reporting in the task
[17:16:19] <elukey>	 (going afk)