[09:25:11] Hi! I committed a change in the private puppet repo, with a simple git add + git commit -m "" and saw the following error message [09:25:11] error: Your local changes to the following files would be overwritten by merge: [09:25:11] requestctl/request-ipblocks/cloud/alibaba.yaml [09:25:11] requestctl/request-ipblocks/cloud/aws.yaml [09:25:11] Please commit your changes or stash them before you merge. [09:25:12] Aborting [09:25:12] ⚠️ Something went wrong! Maybe you attempted to rewrite history? [09:25:28] these requestctl files have nothing to do with my change [09:25:51] I just wanted to check whether there's something I should do, or if we somehow have local changes we need to cleanup? [09:25:53] hey brouberol ! [09:26:04] I think it is my fault, lemme check [09:26:13] thnks! [09:27:32] brouberol: do you have the entire log message to paste by any chance? [09:28:17] https://phabricator.wikimedia.org/P66943 [09:32:38] yep I think it is my fault, still not sure why [09:33:20] so the dump_cloud_ip_ranges runs on puppetserver1001 atm, not on puppetmaster, and it committed two things in the past 24 hours [09:33:27] I'm not seeing any diff on puppetmaster1001, but there might have been on some other puppet replicas? [09:33:28] afaics the commits propagated correctly [09:36:12] I checked all puppet replicas and they don't have anything in their git status, but the above yaml files mentioned are the ones that dump_cloud_ip did [09:37:16] should I attempt to re-run .git/hooks/post-commit ? [09:37:44] nono please [09:37:48] ack [09:37:52] I think I found the problem [09:38:21] for some reason, /var/lib/git/operations/private on puppetmaster1001 shows the diff [09:38:28] basically the post-commit does the following: [09:39:20] 1) git pull on puppetmaster1001's /var/lib/git/operations/private, that is the canonical copy "read-only" for puppet masters to use (so staging a change in /srv/private doesn't trigger changes before an explicit commit) [09:39:35] 2) pushes the update to all puppetmaster and puppetservers [09:39:48] that will in turn do the same with their local "read-only" canonical versions [09:43:44] now the files are dated some days ago [09:43:45] -rw-r--r-- 1 gitpuppet gitpuppet 2468 Jul 23 15:28 requestctl/request-ipblocks/cloud/alibaba.yaml [09:47:40] ok so I fixed the /var/lib/git/operations/private [09:47:57] with git reset --hard HEAD^ + git pull (as done by the post-commit hook) [09:48:27] now puppetmaster1001 is ahead of the rest for one commit, the one from brouberol [09:48:38] nice, thanks! Do I still need to re-run the post-commit hook to propagate it? [09:49:09] yeah this is the thing - to make things clean, I'd need to reset --hard your commit, is it ok if you have to re-stage and do it? [09:49:13] just to make sure that all works [09:49:27] oh sure [09:49:34] all right lemme do it [09:49:58] it was a simple enough diff anyway [09:54:12] brouberol: all good, you can retry [09:54:20] at the moment I have no idea what happened [09:55:31] thanks, this time, the hook worked! [09:56:06] nice! [09:56:34] I moved the dump_cloud_ip_ranges timer yesterday (via https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056508) so it seems a weird coincidence [10:02:02] the idea, IIUC, is that all puppetmasters/puppetservers run the dump_cloud_ip_ranges timer to populate the volatile dir [10:02:21] but only one, in this case puppetserver1001 (was puppetmaster1001) writes to /srv/private [10:02:24] and to etcd [10:02:45] the recent two commits from the timer were propagated everywhere [10:02:56] but puppetmaster1001's /var/lib/git/operations/private [10:03:03] was the only one in a weird state [10:07:32] if anybody has a clue/suggestion/etc.. I am all ears [10:09:22] the other reason could be that I had to "fix" a couple of times puppetmaster's git repos for failed commits propagated from puppetserver1001 [10:14:23] GitLab needs a short maintenance at 11:00 UTC (in 45 minutes) [10:18:27] cc: jhathaway (if you have time later on to review my ramblings above lemme know what you think :) [10:23:19] <_joe_> elukey: I think it's very risky to allow people to still use /srv/private on puppetmaster1001 [10:23:30] <_joe_> if you have moved stuff to write to puppetserver [10:28:21] <_joe_> in theory it shouldn't, but I kinda have a memory of something similar happening the last time we did this move [10:37:34] _joe_ exactly I thought it was basically the same scenario as before, but I can definitely revert the code change and merge it only when we move [10:37:56] I don't explain why there was unstaged files in /var/lib/git/operations/private though [10:38:02] as if something added them [10:56:42] need to run errand for lunch, but when I am back I'll revert the timer change and inspect carefully all git repo dir [10:56:56] then I'll roll it forward again only when we completely switch [10:57:08] still not sure what happened, but better to be safer before the weekend [10:57:11] thanks folks! [11:06:06] GitLab upgrade done [11:08:06] ❤️ [12:27:26] elukey _joe_ just to clarify: should anything having to do with the puppet private repo be done on puppetserver1001 instead of puppetmaster1001? [12:35:00] brouberol: we are prepping a migration to puppetserver1001, since the puppet 5 infra (puppetmaster*) will be deprecated (hopefully) during the next months.. in theory committing from any puppetserver should be fine, but there may be some corner cases [12:36:58] ack, thanks [12:37:03] in this case the two commits done by the systemd timer were issued and propagated correctly, but for some reason only on one of the puppetmaster1001 repos there was that error [12:37:07] still unclear why [13:35:16] Added a write-up in https://phabricator.wikimedia.org/T368023#10017817 [13:42:45] elukey: I'll take a look this morning [13:44:31] <3 [14:17:20] very weird, I was able to repro the issue again [14:17:38] when I rolled out the revert for the timer on puppetserver1001 [14:17:50] the systemd timer changed, but afaics it wasn't executed [14:18:28] ah snap I may know what could be [14:18:58] so git diff doesn't show me any difference for the staged files [14:20:14] no ok I thought https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056201 but it was only for puppetmasters [14:28:27] please don't commit any puppet private changes for the moment [14:34:19] current status https://phabricator.wikimedia.org/T368023#10018036 [14:38:37] ok it seems as if the last dump_cloud_ip_ranges commit is staged for a revert, when the systemd unit changes after the puppet run (to remove a flag) [14:39:19] I am going to unstage the change, but this is really weird [14:40:36] done, I am very puzzled [14:42:40] going to rollout the revert also to puppetmaster* nodes [14:45:52] elukey: what's really weird is systemd doesn't show the unit running since midnight [14:46:01] exactly yes [14:51:45] on puppetmasters all good [14:52:04] I'm baffled [14:52:23] it seems triggered when "-c" is removed from the unit [14:52:43] could it be some horror related to ExecStart's parsing in the systemd unit? [14:59:45] (need to run a quick errand, but all is clear now) [15:01:03] so we know for sure it ran the `"systemd daemon-reload for ${unit_name} (${title})"` [15:01:14] I was wondering about this: [15:01:16] if $restart { [15:01:18] # Refresh the service if restarts are required [15:01:20] Exec[$exec_label] ~> Service[$unit] [15:01:22] } else { [15:01:24] Exec[$exec_label] -> Service[$unit] [15:01:26] } [15:01:36] but I can't find anything that would set $restart=true in this case [15:01:47] and I think we'd still see something more in the logs from either puppet or systemd [15:04:41] it took me an embarassing amount of seconds to see the difference between these two if branches [15:04:55] I may need to change my font x) [15:06:30] claime: I think Hack is pretty nice [15:07:51] yeah I've also confirmed that .parameters.restart is false for all the resources with titles matching dump_cloud_ip_ranges.* in puppetdb [15:08:01] so I dunno [15:23:59] anyway elukey I'm quite baffled and am giving up for now [15:27:59] cdanis: thanks a lot for checking! [15:29:41] might be controversial but I'm a big fan of the ligatures in Fira Code https://usercontent.irccloud-cdn.com/file/35nFvhGC/image.png [15:33:40] that capital E though 😬 [15:34:34] * kamila_ doesn't think that this kind of syntax is a font problem [15:36:58] cdanis: I tried to zoom way in on it to send you a huge picture but unfortunately it went away when I did [15:37:17] rzl: out of personal curiosity, do you use an odd-numbered font points size [15:37:29] I don't know, which is probably the real answer to your question [15:37:34] haha [15:38:24] NotLikeThis [15:38:27] when I moved to this 4k monitor and set up 1.25 scaling, there's a bunch of stuff I tweaked because it looked bad, and a bunch of other stuff I didn't tweak because I didn't really care enough [15:39:53] I really don't like the lowercase i in Hack >_> [15:40:01] no never mind it's 1x scaling after all, and that's 14pt, so who's to say [16:15:22] 17:08:11 Are we supposed not to guess/tell in public who the colocation provider is? https://diff.wikimedia.org/2024/07/26/the-journey-to-open-our-first-data-center-in-south-america/ [16:15:39] XioNoX: copying from -tech as you're not there and you wrote the blog post ^ [16:16:12] my guess would be that there wasn't any reason to make a big deal of it in the post itself, might be construed as an endorsement [16:17:16] cdanis: the wikitech DC page also doesn't mention it [16:17:29] It does say it doesn't follow the normal naming convention [16:17:40] I didn't write it, but I agree with cdanis [16:17:50] Links to a private google doc [16:17:51] well, the 'normal' naming convention is deprecated [16:17:58] perhaps that's the part that needs updating [16:18:01] So as far as I can see, it's never been publicly mentioned [16:18:09] Who the colo provider is [16:18:21] like we can't change the name of ulsfo, but the facility that's in hasn't been owned by United Layer for years, or something like that [16:18:27] anyone interested knows where to find that info (peeringdb), but most people probably don't care [16:18:30] Which is also unusual, it's public knowledge for all the others who it is [16:18:32] I think that might be an oversight but I don't think that is intentional that we are avoiding the name [16:19:04] yeah [16:19:11] probably a side effects also of having all that info internally in netbox [16:20:09] anyone feel free to edit the wikitech page ofc :) [16:21:15] XioNoX: peering db would tell me https://ascenty.com/data-centers/sao-paulo-capital/sao-paulo-3/ [16:22:13] that's right [16:25:51] I went ahead and did https://wikitech.wikimedia.org/w/index.php?title=Magru_data_center&diff=prev&oldid=2209884 [16:37:45] https://www.mediawiki.org/wiki/Wikimedia_services_policy mentions "implementation guidelines" several times but doesn't link to them. Do they exist somewhere?