[07:01:00] Gerrit needs a short restart in around one hour [08:03:19] volans: I'm on linux :}, I'm testing it again this morning, and it seems to be working 🤞, will do more tests before removing the old keys [08:07:50] Gerrit reboot happening now [08:16:27] Gerrit should be back [08:19:05] <_joe_> not really [08:19:11] <_joe_> fatal: unable to access 'https://gerrit.wikimedia.org/r/p/operations/puppet.git/': Recv failure: Connection reset by peer [08:19:58] <_joe_> it's possible the http clone url has changed? [08:20:11] <_joe_> works now [08:22:13] oh that's surprising. Cloning the puppet repo here works fine again too [08:31:07] <_joe_> the url *has* changed btw but there's an automatic redirect at least [08:34:00] but that's most likely unrelated to the maintenance reboot. That was just a kernel update [09:11:28] I just seen Info: Applying configuration version '(462263005c) Klausman - chore: add dpogorzelski to ops-limited' [09:11:40] please don't forget to follow the commit message conventions of the puppet repo [09:11:48] ack [09:12:12] (I only merged it, didn't write the commit msg, but will relay) [09:12:37] shoudln't that change have had a corresponding sre-access-requests ticket? [09:12:45] that too :( [09:14:59] Dangit, I'll investigate [09:15:43] actually.... I'm gonna revert that one [09:15:47] I can't find a task [09:17:41] yeah, the checklist we used conveniently skips over needing a ticket [09:17:45] +1 [09:17:56] klausman: the approval for that group mentions question_mark [09:18:01] where is his approval? [09:18:35] https://office.wikimedia.org/wiki/Technology/Onboarding/Checklists/Dawid_Pogorzelski#Site_Reliability_Engineering [09:18:49] sorry if i missed some steps and the confusion :) [09:18:51] In this checklist, there is no mention of a ticket/approcal needed [09:19:58] klausman: approvals are defined on data.yaml itself [09:20:20] regarding "please don't forget to follow the commit message conventions of the puppet repo" where can i find information about that? [09:20:36] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1197978 is the group change, and it depends on the user change [09:20:42] dpogorzelski: https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines [09:23:03] 👍 [09:23:50] if there is a template for the onboarding guide i could update it with the extra info [09:24:00] I've reverted the ops-limited CR, please seek approval and I'll be happy to approve/merge that [09:24:12] sorry for all the fuzz dpogorzelski :) [09:24:32] It's on me, really. I should have remembered the access ticket requirement [09:24:54] tbh we should make the CI enforce a referenced ticket when changing access [09:25:53] 👍 [09:26:14] We'll follow https://wikitech.wikimedia.org/wiki/SRE/Production_access#Access_Request_Process now [09:26:24] thx <3 [09:36:23] phabricator down? [09:36:48] not for me no [09:37:28] also phab metrics look quite normal to me [09:37:54] working here [09:38:20] aol [09:38:23] Seems to be back yep [09:38:25] THank you all [11:05:01] GitLab needs a short maintenance break in one hour (12:00 UTC) [12:13:10] GitLab maintenance done [14:04:40] I'm running into partman issues with a config B 4-drive host and could really use a second server to see if I can reproduce there. Does anyone have an insetup Dell server with 4 drives I can use as a lab rat? [15:20:26] should the access request be assigned to someone in specific or it will be picked up by someone from the project? [15:23:28] the sre clinic duty person will deal with that, as explained on the #sre-access-requests project description [15:54:02] <_joe_> dpogorzelski: hey welcome aboard! [16:01:11] by any chance, does anyone have any vm with a spare/unused host with nftables ? [16:01:21] I would like to do some production testing [16:04:29] alternativelly, I could ask for a temporary vm just for testing and remove it afterwards [16:04:40] jynus: just some commands that dont change stuff permanently? you can use zuul2002.codfw.wmnet [16:04:54] trixie with nft [16:05:13] monitoring is downtimed. hf [16:05:14] I will be just moving fake empty files around, but I would prefer it wasn't in use [16:05:28] not in use [16:05:32] ah [16:05:35] thanks then [16:05:51] yep.np [16:06:17] I will give a ping here when I start using it (I intend to just write some files to $home and then delete them) [16:06:24] and when I stop using it too [16:06:38] sounds good! [16:07:21] thank you a lot [16:51:28] During reimage, I would like the debian installer to pause right before the reboot so I can see what it looks like before the (destined to fail) reboot. Is there any good way to do that? [17:14:24] _joe_: thanks :) [17:16:22] mutante: sadly zuul2002 won't work for me, as it is still upsing iptables [17:22:03] andrewbogott: if I recall correctly, you can disable puppet on apt1002, comment out the two reboot_in_progress config options, then run your install [17:22:14] jynus: ah, right! that's because of docker. I only checked if nft is installed. but dont worry.. I am changing that right now [17:25:18] jhathaway: huh, presumably that will mess with anyone else who is also reimaging... [17:26:10] But if I wait a few hours that will be noone [17:43:51] andrewbogott: true, but with a little communication, it is usually okay [18:07:56] jynus: I made some changes. zuul2002 uses nft now and has insetup role [18:08:36] no worries, I think I will use another host, but thanks for offering [18:09:06] is there a reason left to use another host now? [18:09:13] I just did that so you can use it [18:09:30] not really, just I didn't want to bother you with this [18:09:56] I don't want to put the burden on you [18:10:10] it's already done [18:10:17] in any case, I am finishing my day [18:10:21] so no more testing for now [18:10:23] not using it means only reverting [18:10:35] then let me use it tomorrow [18:10:46] I just didn't want to make you work more [18:11:02] 0:-) [18:11:11] alright. use it tomorrow. enjoy the rest of the day. laters [19:30:33] jhathaway: now that I'm following up... I don't understand what you mean by 'the two reboot_in_progress config options'. Where would those be? And is it really on the apt server and not on install100x ? [19:31:14] andrewbogott: grep reboot /srv/autoinstall/common.cfg [19:31:20] those two ^ [19:31:29] thx [19:31:30] on apt1002 [19:31:43] if that doesn't work, I can try to find my notes and reproduce [19:31:51] ok! trying... [19:48:03] debug an issue, search on google for debian bug report, find bug report, read reply that Morit.z asked about this issue on IRC in **2017**, 🤦, at least I didn't fine my own bug report, which has happened to me before. [19:48:44] the install pause is working! [19:49:11] andrewbogott: great!