[08:20:13] hmmm it seems that the deployment-charts repo is dirty on the deployment host, is it ok to git reset? [08:20:18] https://www.irccloud.com/pastebin/GhLS8EHH/ [08:22:48] ottomata: I'm guessing those changes were by you? ^-- [08:57:44] <_joe_> dpogorzelski: given the changes are just positional, I'd assume so [08:59:01] <_joe_> but also: yes, unless the repo has been made dirty in the last few hours, assume it is in general :) [09:32:41] This might be related to work done on T417407 by brouberol. I think he's afk for a little while. [09:32:41] T417407: Evacuate all kafka-mirrormaker instances to Kubernetes - https://phabricator.wikimedia.org/T417407 [09:35:05] sorry. that was me, and shouldn't have stayed dirty except my daughter barged into my office [09:35:17] temporary mayhem ensured and I forgot [09:39:46] <_joe_> brouberol: good root cause analysis [09:40:18] <_joe_> have you set up the proper failsafes for the future, like electrifying your offices' door? [09:40:41] I was thinking about hiring a Real Chaos Gorilla as a bodyguard [09:40:52] <_joe_> that would also work I guess [09:41:03] or printing "YOU SHALL NOT PASS 🧙" on my door [09:42:09] <_joe_> brouberol: you probably just need to master the "annoyed joe stare", though. It worked pretty well with my daughter when she was a kid [09:42:40] <_joe_> it just takes some practice not to let the fact you're delighted to leak out [09:42:56] gorillas? train a pool of blåhaj's and place lasers on their heads? [09:43:15] or is it a pod when it comes to sharks? [09:50:13] https://clip.cafe/beneath-the-planet-of-the-apes-1970/if-are-caught-by-the-gorillas-s2/ [10:18:35] I thought that after https://gitlab.wikimedia.org/repos/sre/vopsbot/-/merge_requests/20 was merged, saying !ack would ack all unacked incidents. But when we had 3 p.ages fire in quick succession this morning and I said !ack, sirenbot said 'Could not ack the alert. Please check the parameters.' ... ? [13:44:12] Emperor: the pages of this morning are acknowledged but not resolved, should they be resolved? [14:02:47] matthieulec: oh, good catch, yes, I'll do that [14:02:49] !incidents [14:02:49] 7784 (ACKED) db1253 (paged)/MariaDB Replica IO: s7 (paged) [14:02:50] 7785 (ACKED) db1253 (paged)/MariaDB Replica Lag: s7 (paged) [14:02:50] 7786 (ACKED) db1253 (paged)/MariaDB Replica SQL: s7 (paged) [14:02:52] !resolve [14:02:53] Could not resolve the alert. Please check the parameters. [14:02:58] I'm sure that's meant to work now :( [14:03:06] !resolve 7784-7786 [14:03:06] Incident id must be an integer [14:03:10] !resolve 7784 [14:03:10] 7784 (RESOLVED) db1253 (paged)/MariaDB Replica IO: s7 (paged) [14:03:12] !resolve 7785 [14:03:12] 7785 (RESOLVED) db1253 (paged)/MariaDB Replica Lag: s7 (paged) [14:03:13] !resolve 7786 [14:03:14] 7786 (RESOLVED) db1253 (paged)/MariaDB Replica SQL: s7 (paged) [14:03:45] 7784-7786 is an integer, it's -2 :-P [14:11:32] Emperor: pretty sure it worked for a while, no idea what broke it [14:37:04] nice to know it's not just me going mad :) [14:40:05] looking at the logs that apparently fails with 'error="invalid character 's' looking for beginning of value"' so seems like a sirenbot bug? [15:02:33] yeah I was looking at that on Friday, I got as far as tracking it down to "incidentIds is [], which should never happen" but not as far as why it does anyway :) [16:48:54] sukhe: we can re-enable doh in magru and esams when you have a minute [16:51:02] topranks: ok thank you [16:51:08] I see the updated comment on the task [16:51:15] I can re-enable it and we can observe for a few hours I guess? [16:52:46] yep, worth taking a look at the conntack numbers in a few hours [16:53:16] I tested thoroughly so I'm pretty sure everything will be ok [16:53:27] yep, enabling it now [16:58:38] topranks: all done [16:58:52] lgtm [16:59:08] https://www.irccloud.com/pastebin/NrwA4HX2/ [16:59:21] yep [16:59:24] https://www.irccloud.com/pastebin/bS1RQC5d/ [16:59:28] https://grafana.wikimedia.org/goto/ffgwib37fs16oc?orgId=1 [16:59:42] can't test magru from here but I need to still undo the outbound announce policy there [17:00:25] ; NSID: 64 6f 68 37 30 30 33 ("doh7003") [17:00:35] bast7003 fwiw :P [17:00:49] cool [17:01:02] one thing to watch out though - internet is not routing it to magru right now [17:01:08] bast7003 will hide that [17:01:18] useful nonetheless of course [17:01:54] ja [17:02:02] ha true [17:47:58] Grafana will go down for a short reboot at 1800Z. [18:17:28] opened https://phabricator.wikimedia.org/T420982 for that vopsbot bug [19:31:14] inflatador: is your puppet patch good to merge? [19:32:04] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1251117 specifically [19:34:35] swfrench-wmf yes, please do [19:34:42] sorry for the lag on that [19:34:52] inflatador: ack, will do :) [19:35:48] Unrelated, but it looks like one of our discovery certs will expire fairly soon: issuer `CN=Wikimedia_Internal_Root_CA,OU=Cloud Services,O=Wikimedia Foundation\, Inc,L=San Francisco,ST=California,C=US', serial 0x715331115b69e7112b0e3c7f8c89ce15c51a4639, EC/ECDSA key 528 bits, signed using ECDSA-SHA512, activated `2021-05-04 13:54:00 UTC', expires `2026-05-03 13:54:00 UTC', pin-sha256="PbgfDlEHVB4Zw0a42zNqqnEQbcYF9jYp/dbT4eSdOb8="` [20:58:03] oops, was today 'no more apt mirrors' day? [20:59:00] * inflatador noticed that too [20:59:41] I knew it was coming and yet find myself unprepared [21:11:27] Oh, were they supposed to go away? I just noticed I couldn't install a package on one of the cirrussearch hosts [21:16:31] I don't know what's happening. https://phabricator.wikimedia.org/T416707 predicts future demise but implies that it's still in the future [21:16:53] but meanwhile... no connectivity to mirrors.wikimedia.org [21:17:08] so maybe something is broken by accident today and I mistook it for an early symptom of the future removal [21:31:04] maybe so...regardless, I guess we would have to one-off the linked change onto our servers if we wanted to get a package [22:23:51] ^ looking at this [22:25:24] we haven't made any of the prerequisite changes yet (like https://gerrit.wikimedia.org/r/1256371, or updating modules/package_builder/manifests/pbuilder_base.pp or modules/docker/templates/images/sourceslist.base.erb) so this is an unrelated breakage [22:42:41] great news, I didn't do anything and it came back on its own [22:42:51] like and follow for more SRE life hacks [23:56:10] rzl: I !bash'ed you at https://bash.toolforge.org/quip/gA8gHZ0Bvg159pQrJzGF