[09:25:11] <brouberol>	 Hi! I committed a change in the private puppet repo, with a simple git add + git commit -m "<msg>" and saw the following error message
[09:25:11] <brouberol>	 error: Your local changes to the following files would be overwritten by merge:
[09:25:11] <brouberol>	 	requestctl/request-ipblocks/cloud/alibaba.yaml
[09:25:11] <brouberol>	 	requestctl/request-ipblocks/cloud/aws.yaml
[09:25:11] <brouberol>	 Please commit your changes or stash them before you merge.
[09:25:12] <brouberol>	 Aborting
[09:25:12] <brouberol>	 ⚠️  Something went wrong!  Maybe you attempted to rewrite history? 
[09:25:28] <brouberol>	 these requestctl files have nothing to do with my change 
[09:25:51] <brouberol>	 I just wanted to check whether there's something I should do, or if we somehow have local changes we need to cleanup? 
[09:25:53] <elukey>	 hey brouberol !
[09:26:04] <elukey>	 I think it is my fault, lemme check
[09:26:13] <brouberol>	 thnks!
[09:27:32] <elukey>	 brouberol: do you have the entire log message to paste by any chance?
[09:28:17] <brouberol>	 https://phabricator.wikimedia.org/P66943
[09:32:38] <elukey>	 yep I think it is my fault, still not sure why
[09:33:20] <elukey>	 so the dump_cloud_ip_ranges runs on puppetserver1001 atm, not on puppetmaster, and it committed two things in the past 24 hours
[09:33:27] <brouberol>	 I'm not seeing any diff on puppetmaster1001, but there might have been on some other puppet replicas?
[09:33:28] <elukey>	 afaics the commits propagated correctly 
[09:36:12] <elukey>	 I checked all puppet replicas and they don't have anything in their git status, but the above yaml files mentioned are the ones that dump_cloud_ip did
[09:37:16] <brouberol>	 should I attempt to re-run .git/hooks/post-commit ?
[09:37:44] <elukey>	 nono please
[09:37:48] <brouberol>	 ack
[09:37:52] <elukey>	 I think I found the problem
[09:38:21] <elukey>	 for some reason, /var/lib/git/operations/private on puppetmaster1001 shows the diff
[09:38:28] <elukey>	 basically the post-commit does the following:
[09:39:20] <elukey>	 1) git pull on puppetmaster1001's /var/lib/git/operations/private, that is the canonical copy "read-only" for puppet masters to use (so staging a change in /srv/private doesn't trigger changes before an explicit commit)
[09:39:35] <elukey>	 2) pushes the update to all puppetmaster and puppetservers
[09:39:48] <elukey>	 that will in turn do the same with their local "read-only" canonical versions
[09:43:44] <elukey>	 now the files are dated some days ago
[09:43:45] <elukey>	 -rw-r--r-- 1 gitpuppet gitpuppet 2468 Jul 23 15:28 requestctl/request-ipblocks/cloud/alibaba.yaml
[09:47:40] <elukey>	 ok so I fixed the /var/lib/git/operations/private
[09:47:57] <elukey>	 with git reset --hard HEAD^ + git pull (as done by the post-commit hook)
[09:48:27] <elukey>	 now puppetmaster1001 is ahead of the rest for one commit, the one from brouberol 
[09:48:38] <brouberol>	 nice, thanks! Do I still need to re-run the post-commit hook to propagate it?
[09:49:09] <elukey>	 yeah this is the thing - to make things clean, I'd need to reset --hard your commit, is it ok if you have to re-stage and do it?
[09:49:13] <elukey>	 just to make sure that all works
[09:49:27] <brouberol>	 oh sure
[09:49:34] <elukey>	 all right lemme do it
[09:49:58] <brouberol>	 it was a simple enough diff anyway
[09:54:12] <elukey>	 brouberol: all good, you can retry
[09:54:20] <elukey>	 at the moment I have no idea what happened
[09:55:31] <brouberol>	 thanks, this time, the hook worked!
[09:56:06] <elukey>	 nice!
[09:56:34] <elukey>	 I moved the dump_cloud_ip_ranges timer yesterday (via https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056508) so it seems a weird coincidence
[10:02:02] <elukey>	 the idea, IIUC, is that all puppetmasters/puppetservers run the dump_cloud_ip_ranges timer to populate the volatile dir
[10:02:21] <elukey>	 but only one, in this case puppetserver1001 (was puppetmaster1001) writes to /srv/private
[10:02:24] <elukey>	 and to etcd
[10:02:45] <elukey>	 the recent two commits from the timer were propagated everywhere
[10:02:56] <elukey>	 but puppetmaster1001's /var/lib/git/operations/private
[10:03:03] <elukey>	 was the only one in a weird state
[10:07:32] <elukey>	 if anybody has a clue/suggestion/etc.. I am all ears
[10:09:22] <elukey>	 the other reason could be that I had to "fix" a couple of times puppetmaster's git repos for failed commits propagated from puppetserver1001
[10:14:23] <jelto>	 GitLab needs a short maintenance at 11:00 UTC (in 45 minutes)
[10:18:27] <elukey>	 cc: jhathaway (if you have time later on to review my ramblings above lemme know what you think :)
[10:23:19] <_joe_>	 elukey: I think it's very risky to allow people to still use /srv/private on puppetmaster1001
[10:23:30] <_joe_>	 if you have moved stuff to write to puppetserver
[10:28:21] <_joe_>	 in theory it shouldn't, but I kinda have a memory of something similar happening the last time we did this move
[10:37:34] <elukey>	 _joe_ exactly I thought it was basically the same scenario as before, but I can definitely revert the code change and merge it only when we move
[10:37:56] <elukey>	 I don't explain why there was unstaged files in /var/lib/git/operations/private though
[10:38:02] <elukey>	 as if something added them
[10:56:42] <elukey>	 need to run errand for lunch, but when I am back I'll revert the timer change and inspect carefully all git repo dir
[10:56:56] <elukey>	 then I'll roll it forward again only when we completely switch
[10:57:08] <elukey>	 still not sure what happened, but better to be safer before the weekend
[10:57:11] <elukey>	 thanks folks!
[11:06:06] <jelto>	 GitLab upgrade done
[11:08:06] <akosiaris>	 ❤️
[12:27:26] <brouberol>	 elukey _joe_ just to clarify: should anything having to do with the puppet private repo be done on puppetserver1001 instead of puppetmaster1001?
[12:35:00] <elukey>	 brouberol: we are prepping a migration to puppetserver1001, since the puppet 5 infra (puppetmaster*) will be deprecated (hopefully) during the next months.. in theory committing from any puppetserver should be fine, but there may be some corner cases
[12:36:58] <brouberol>	 ack, thanks
[12:37:03] <elukey>	 in this case the two commits done by the systemd timer were issued and propagated correctly, but for some reason only on one of the puppetmaster1001 repos there was that error
[12:37:07] <elukey>	 still unclear why
[13:35:16] <elukey>	 Added a write-up in https://phabricator.wikimedia.org/T368023#10017817
[13:42:45] <jhathaway>	 elukey: I'll take a look this morning
[13:44:31] <elukey>	 <3
[14:17:20] <elukey>	 very weird, I was able to repro the issue again
[14:17:38] <elukey>	 when I rolled out the revert for the timer on puppetserver1001
[14:17:50] <elukey>	 the systemd timer changed, but afaics it wasn't executed
[14:18:28] <elukey>	 ah snap I may know what could be
[14:18:58] <elukey>	 so git diff doesn't show me any difference for the staged files
[14:20:14] <elukey>	 no ok I thought https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056201 but it was only for puppetmasters
[14:28:27] <elukey>	 please don't commit any puppet private changes for the moment
[14:34:19] <elukey>	 current status https://phabricator.wikimedia.org/T368023#10018036
[14:38:37] <elukey>	 ok it seems as if the last dump_cloud_ip_ranges commit is staged for a revert, when the systemd unit changes after the puppet run (to remove a flag)
[14:39:19] <elukey>	 I am going to unstage the change, but this is really weird
[14:40:36] <elukey>	 done, I am very puzzled
[14:42:40] <elukey>	 going to rollout the revert also to puppetmaster* nodes
[14:45:52] <cdanis>	 elukey: what's really weird is systemd doesn't show the unit running since midnight
[14:46:01] <elukey>	 exactly yes
[14:51:45] <elukey>	 on puppetmasters all good
[14:52:04] <cdanis>	 I'm baffled
[14:52:23] <elukey>	 it seems triggered when "-c" is removed from the unit
[14:52:43] <elukey>	 could it be some horror related to ExecStart's parsing in the systemd unit?
[14:59:45] <elukey>	 (need to run a quick errand, but all is clear now)
[15:01:03] <cdanis>	 so we know for sure it ran the `"systemd daemon-reload for ${unit_name} (${title})"`
[15:01:14] <cdanis>	 I was wondering about this:
[15:01:16] <cdanis>	             if $restart {
[15:01:18] <cdanis>	                 # Refresh the service if restarts are required
[15:01:20] <cdanis>	                 Exec[$exec_label] ~> Service[$unit]
[15:01:22] <cdanis>	             } else {
[15:01:24] <cdanis>	                 Exec[$exec_label] -> Service[$unit]
[15:01:26] <cdanis>	             }
[15:01:36] <cdanis>	 but I can't find anything that would set $restart=true in this case
[15:01:47] <cdanis>	 and I think we'd still see something more in the logs from either puppet or systemd
[15:04:41] <claime>	 it took me an embarassing amount of seconds to see the difference between these two if branches
[15:04:55] <claime>	 I may need to change my font x)
[15:06:30] <cdanis>	 claime: I think Hack is pretty nice
[15:07:51] <cdanis>	 yeah I've also confirmed that .parameters.restart is false for all the resources with titles matching dump_cloud_ip_ranges.* in puppetdb
[15:08:01] <cdanis>	 so I dunno
[15:23:59] <cdanis>	 anyway elukey I'm quite baffled and am giving up for now
[15:27:59] <elukey>	 cdanis: thanks a lot for checking!
[15:29:41] <rzl>	 might be controversial but I'm a big fan of the ligatures in Fira Code https://usercontent.irccloud-cdn.com/file/35nFvhGC/image.png
[15:33:40] <cdanis>	 that capital E though 😬
[15:34:34] * kamila_ doesn't think that this kind of syntax is a font problem
[15:36:58] <rzl>	 cdanis: I tried to zoom way in on it to send you a huge picture but unfortunately it went away when I did
[15:37:17] <cdanis>	 rzl: out of personal curiosity, do you use an odd-numbered font points size
[15:37:29] <rzl>	 I don't know, which is probably the real answer to your question
[15:37:34] <sukhe>	 haha
[15:38:24] <cdanis>	 NotLikeThis
[15:38:27] <rzl>	 when I moved to this 4k monitor and set up 1.25 scaling, there's a bunch of stuff I tweaked because it looked bad, and a bunch of other stuff I didn't tweak because I didn't really care enough
[15:39:53] <claime>	 I really don't like the lowercase i in Hack >_>
[15:40:01] <rzl>	 no never mind it's 1x scaling after all, and that's 14pt, so who's to say
[16:15:22] <RhinosF1>	 17:08:11 <Nemo_bis> Are we supposed not to guess/tell in public who the colocation provider is? https://diff.wikimedia.org/2024/07/26/the-journey-to-open-our-first-data-center-in-south-america/
[16:15:39] <RhinosF1>	 XioNoX: copying from -tech as you're not there and you wrote the blog post ^
[16:16:12] <cdanis>	 my guess would be that there wasn't any reason to make a big deal of it in the post itself, might be construed as an endorsement
[16:17:16] <RhinosF1>	 cdanis: the wikitech DC page also doesn't mention it
[16:17:29] <RhinosF1>	 It does say it doesn't follow the normal naming convention
[16:17:40] <XioNoX>	 I didn't write it, but I agree with cdanis 
[16:17:50] <RhinosF1>	 Links to a private google doc
[16:17:51] <cdanis>	 well, the 'normal' naming convention is deprecated
[16:17:58] <cdanis>	 perhaps that's the part that needs updating
[16:18:01] <RhinosF1>	 So as far as I can see, it's never been publicly mentioned
[16:18:09] <RhinosF1>	 Who the colo provider is
[16:18:21] <cdanis>	 like we can't change the name of ulsfo, but the facility that's in hasn't been owned by United Layer for years, or something like that
[16:18:27] <XioNoX>	 anyone interested knows where to find that info (peeringdb), but most people probably don't care
[16:18:30] <RhinosF1>	 Which is also unusual, it's public knowledge for all the others who it is
[16:18:32] <sukhe>	 I think that might be an oversight but I don't think that is intentional that we are avoiding the name
[16:19:04] <cdanis>	 yeah
[16:19:11] <XioNoX>	 probably a side effects also of having all that info internally in netbox
[16:20:09] <cdanis>	 anyone feel free to edit the wikitech page ofc :)
[16:21:15] <RhinosF1>	 XioNoX: peering db would tell me https://ascenty.com/data-centers/sao-paulo-capital/sao-paulo-3/
[16:22:13] <cdanis>	 that's right
[16:25:51] <RhinosF1>	 I went ahead and did https://wikitech.wikimedia.org/w/index.php?title=Magru_data_center&diff=prev&oldid=2209884
[16:37:45] <cdanis>	 https://www.mediawiki.org/wiki/Wikimedia_services_policy mentions "implementation guidelines" several times but doesn't link to them.  Do they exist somewhere?