[07:33:00] I thought marostegui was the other on-call person this week? [07:34:08] [and https://wikitech.wikimedia.org/w/index.php?title=SRE/Oncall/Schedule thinks jelto is next week] [07:34:38] Victorops says it is me [07:34:53] wtf? [07:35:00] it was me yesterday [07:37:36] there's no override scheduled for you today [07:38:03] ...but the EMEA shift does now have j.elto down all week. WTF? [07:38:08] it was definitely saying you yesterday [07:39:25] this is so werid [07:39:28] weird [07:39:29] lmata: ^ [07:40:55] and the rotation schedule looks to have changed entirely - there's a change on Monday and then next week is wrong too [07:41:19] I fear someone has tried to edit the schedule rather than putting in an override and that's confused the entire rotation. [07:42:16] marostegui: lmata is OOO until Thursday, so I propose we put in an override for jelto through to Friday and thus put you back on-call and then email lmata so he can have a look on Thursday? [07:44:15] jynus: can I just check: you are expecting to be oncall w/c 7 August and _not_ w/c 31 July? [07:45:38] Emperor: yeah makes sense [07:47:34] OK, done so, hopefully will get picked up at 08:00 UTC [07:48:23] I'll email lmata [08:01:58] \o/ [08:01:58] thanks Emperor [08:06:11] YW :) [08:17:38] marostegui: may the cumin be with you in this day of on-call [08:18:27] elukey: I'll have spicerack handy! [08:18:46] :D [08:26:58] do you think `piuparts` could be useful on the build hosts? In case I can do a quick CR to install it... [08:27:31] <_joe_> I have no idea what that is :) [08:28:15] <_joe_> ah I see, I do that using docker more or less [08:28:37] is usually part of (my) workflow [08:29:11] it's usually able to find some bugs [08:29:19] thanks for creating the override. I wasn't expecting on-call this week. I was scheduled for next week. That's also why I missed the first hours of on-call [08:36:31] marostegui, Emperor - I am merging and applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/941315 to kafka-main1001, it shouldn't cause any trouble but in case ping me [08:37:45] ack [09:06:16] We'll just do a quick restart of CAS / IDP to ensure that configurations are picked up. There will be a brief interuption, about a minute. [09:09:34] And done [10:11:12] was there a recent change to the ssh-known-hosts file that includes machine names without dots (i.e. not just stat1004.eqiad.wmnet, but also plain stat1004)? Or did my local stuff change. I notice that I have to "tab harder" as of recently, and I wonder if I broke something myself :) [10:18:38] <_joe_> klausman: it's not you [10:18:46] <_joe_> and yes it's annoying [10:19:00] Ah, ok. What was the rationale for adding bare names? [10:19:41] <_joe_> no idea [10:20:40] The script hasn't changed in 5mo, so it must come from somewhere else. I presume the DNS server side updates [10:21:49] <_joe_> I'm not sure how that would affect the output of this [10:22:02] <_joe_> I'd rather imagine netbox changes could cause it [10:22:08] well, the script curls the DNS data [10:22:22] cf. https://gerrit.wikimedia.org/r/c/operations/debs/wmf-sre-laptop/+/893708 [10:22:22] <_joe_> from where? [10:22:59] <_joe_> that is just the cnames, and that's the dns repo which hasn't changed much [10:23:18] <_joe_> I would rather imagine it's a puppet7 related change [10:23:24] <_joe_> jbond: ^^ [10:23:48] <_joe_> is it possible we switched to puppet7 or puppetdb7 somewhere and this is a ripple effect? [10:41:40] likely a side effect of https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/9c50677a45dba0a3cc12b21ca640ca489f91713f%5E! [10:41:59] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/9c50677a45dba0a3cc12b21ca640ca489f91713f [11:09:11] <_joe_> yeah [13:13:40] <_joe_> jhathaway: given john is out, can I bother you about https://gerrit.wikimedia.org/r/c/operations/puppet/+/936692 ? [13:14:06] <_joe_> it had some unintended consequences, specifically the autocomplete from our known hosts file isn't working as well as it was before [13:14:21] <_joe_> which was the whole point of the code removed with that patch [13:14:39] <_joe_> I won't revert without asking about the consequences though [13:22:49] yup I can look _joe_ [13:24:05] I think the primary fallout of a revert would be, https://phabricator.wikimedia.org/T340947 [13:25:13] I'll look at fixing, or reverting [13:27:21] <_joe_> I guess fixing is probably less intrusive [13:28:07] okay, i'll take a look after my meeting, and if I cry uncle, I'll revert to save everyone from developing RSI [13:36:00] <_joe_> ahah no need :) [13:55:29] andrewbogott: hi. heads-up, rolling out the pdns-upgrade [13:57:01] rolling out to dns recs first, so will wait for your confirmation before hitting cloud [14:07:20] https://debmonitor.wikimedia.org/packages/pdns-recursor cloudservices left, rest is done [14:21:48] sukhe: I was afk but back now [14:21:57] if everything is going ok, go ahead and upgrade cloudservices too [14:22:16] We're about to build out a new one of those hosts, I assume we'll just get the latest version from apt? [14:23:04] yep, it's in reprepro so you will get whatever is there [14:23:46] great [14:23:48] andrewbogott: so confirming, will upgrade cloudservices100[45] and 200[45]-dev [14:27:59] sukhe: yep! [14:28:15] thanks [14:28:18] is this recursor + auth or just the recursor? [14:28:21] just the recursor [14:28:27] auth we will do once these are settled in a bit [14:28:30] so as to not mix the two things [14:28:37] if you mean gdnsd auth, not pdns-auth :) [14:28:44] since I know you run pdns-auth unless I am mistaken [14:29:21] yeah, we're using pdns auth [14:29:26] because it integrates with Designate [14:29:42] yep [14:29:49] this is just for the recursor [14:31:20] great [14:33:48] andrewbogott: all done, no issues. thanks! [14:36:35] lgtm, thanks for waiting until I was back [15:27:12] per the earlier comment from _joe_ on our ssh known hosts, does anyone know why we add the hostname to our aliases? https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/ssh/manifests/server.pp#74 [15:27:20] rather than only the fqdn