[12:55:08] oncallers: I'm testing prometheus2007 as the only pybal-pooled host as part of https://phabricator.wikimedia.org/T383232 . note that this change does not affect queries going through thanos [12:55:48] i.e. only the read path for prometheus-codfw.w.o and clients using prometheus.svc.codfw [12:57:03] k [12:57:04] thx [14:40:57] `check_conntrack_table_size` alert is expected? [14:42:25] vgutierrez: slyngs is working on removing the check from icinga afaict [14:42:53] ack [14:43:13] vgutierrez: Yeah, sorry, I got the patches submitted in the wrong order. It is fixed, but needs to rollout. [14:58:47] slyngs: ok to merge your puppet patch? [14:58:55] Yes, please do [14:59:31] slyngs: ack, done [14:59:39] Thanks [15:03:51] vgutierrez: Again sorry, should be fixed now. I'll remove the monitoring service a little more carefully sometime tomorrow. [16:02:34] moritzm: I have your change on puppet along with mine, going to merge, ok? [16:03:03] fabfur: ack [22:14:27] isaranto: any eta on the liftwing error rate? [22:14:46] isaranto: wondering how long to set a silence for [22:15:07] (also, is there a phab to reference?) [22:27:30] urandom o/ let's set it for a day. the traffic is just too high to handle at the moment. there is this task that we can use for reference https://phabricator.wikimedia.org/T387019 [22:27:49] isaranto: is 12 hours ok? [22:27:51] there was an issue already and it exploded when traffic increased [22:28:03] or do we need longer? [22:34:14] a little longer for sure (18h) [22:35:57] thank you! [22:42:02] ok