[07:02:50] reading backlog, the change was a noop, just a chart-bump for calico from 0.3.0 to 0.3.1. [07:05:00] as to why it is failing, apparently calico-kube-controllers failing readiness probe, /me investigating [07:07:04] thanks, Alex [07:07:39] this is weird, this has worked just fine in staging-codfw [07:11:37] ok, found it [07:12:12] I was messing a bit yesterday with globalnetworkpolicies to verify the MTU change was ok and apparently I interfered with calico-kube-controllers ? [08:47:50] writeup in https://phabricator.wikimedia.org/T352956#10851490 [09:32:47] Thanks also, and nice writeup. [11:09:55] I'm staring at FixedRandomDelay= systemd.timer(5) option [11:10:28] it seems a nice|native replacement of fqdn_rand() for systemd timers [11:10:47] and it should be available on >=bullseye [11:46:49] could you help me with something? where are the thanos-sourced alerts defined? I was unable to grep them on puppet [11:48:15] jynus: https://gerrit.wikimedia.org/r/c/operations/alerts/ ? [11:48:31] https://gerrit.wikimedia.org/r/q/project:operations/alerts sorry [11:48:51] thanks I searched on codeseach and got no answers [13:43:01] jynus we were looking at the same thing the other day, at least I think we were? Related to SLO stuff. one sec... [13:45:03] T393966 has a couple of Puppet PRs that might be useful (not 100% sure on that, though). There's also https://thanos.wikimedia.org/rule/alerts which shows the alerts we were trying to change [13:45:04] T393966: Update WDQS SLO lag queries to reflect graph split changes - https://phabricator.wikimedia.org/T393966 [13:46:03] not sure if we means I or obs team, but this was for a separate ticket [13:47:05] Sorry, "we" as in DPE SRE. Our SLO alerts are defined in Thanos as well [13:47:12] ah, I get it now [13:47:20] sorry, had other things in minde [13:48:08] I sent an email to obs team, hopefully they can design an global approach for non SRE-alerts [13:48:19] but I need to run [13:48:30] have a nice day [13:48:34] .o/