[11:42:00] I've noticed that the top left graph on https://logstash.wikimedia.org/app/dashboards#/view/mediawiki-errors?_g=h@42b0d52&_a=h@a12d892 has stopped displaying (unless you switch to 24 hours) [11:42:58] (no idea if that's -sre, -releng or -operations, but..) [11:50:33] jynus: quick check with you, are you ok if I close T251416? do you have any remaining concern? [11:50:34] T251416: PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 [12:04:30] volans: checking [12:05:23] "So I think that the original concern has been almost completely removed." I agree with that statement [12:05:34] aka I would like to see more work on that, but the ticket can be closed [12:06:26] great, thanks! [12:06:29] volans: I commented with that on task [12:07:05] thx [12:07:06] thanks by the way by the step by step improvement of those and the acccomodating to our needs [12:09:03] specially the main issue I think was BIOS resets defaulting to reimage [12:09:11] and that is now fixed [12:12:14] (ref my message earlier about phab, found T332273) [12:12:14] T332273: mediawiki-errors logstash dashboard's "errors over time" panel broken - https://phabricator.wikimedia.org/T332273 [12:32:39] <_joe_> oncall people, heads up: I am adding a couple poolcounter alerts [12:32:50] <_joe_> they should not page, if they do I messed up [12:32:55] <_joe_> just ack the alert and ping me [12:41:00] <_joe_> godog: what is the right way to alert on a value coming from statsd/graphite nowadays? [12:46:06] _joe_: acl [12:46:08] ack [12:47:08] _joe_: icinga / check_graphite [12:47:30] not the answer I like, but the answer I got [12:54:23] sorry to trouble during SRE week; could someone please let me know if a patch to operations/deployment-charts changeprop service needs to be deployed manually, or does that happen automatically after +2? https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/896091/4#message-d18c69a92d440f028dcee9da801c36687336920f the jobs that this patch supports will start running later today. [12:55:12] kostajh: needs to be done manually [12:55:44] taavi: ack. could you please point me to documentation saying how to do that? [12:56:13] https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments [12:57:27] ah, right. I got confused b/c I'm used to bumping a new docker image version for linkrecommendation deployments, but here we don't have that [13:01:34] <_joe_> kostajh: from the point of view of k8s, in both cases you're changing a yaml file :) [13:01:54] <_joe_> godog: nor the answer I hoped for :D but thanks :) [14:04:12] can anyone tell me why Jenkins bot is not running even after I commented "recheck"? Patch 900700 [14:06:57] JameelKaisar: which patch? [14:09:57] patch #900700 [14:10:38] operations/puppet [14:10:47] https://gerrit.wikimedia.org/r/c/operations/puppet/+/900700/ [14:11:05] your email is not on https://www.mediawiki.org/wiki/Continuous_integration/Allow_list [14:14:07] Thanks for the information [14:14:21] I'll talk to my team members regarding this [14:14:23] JameelKaisar: since you used your gmail address and not the WMF one. if you want to add the Gmail address, please create a patch and someone from SRE can +1 [14:15:44] Thanks @sukhe ... I'll do that [14:16:41] or use the wmf one for the patches ;) [14:17:05] yep, that's probably the best [14:38:11] !ack 3480 [14:38:11] 3480 (ACKED) es2029 (paged)/mysqld processes (paged) [14:40:50] !resolve 3480 marostegui [14:40:50] Attempt to ack incident 3480 failed. [14:40:54] :( [15:28:10] Thanks everyone [16:12:34] <_joe_> welcome JameelKaisar :)