[08:32:45] TIL you can do e.g. TZ=UTC date -d 'TZ="America/New_York" 19:00' [08:44:02] neat [09:32:23] hello oncallers [09:32:39] I am going to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054894 to create the new ops-limited group [09:32:58] it is going to be deployed everywhere, so if you see failed puppet runs it might be me [10:26:27] Scary! fingers crossed [10:26:54] it is already half deployed, looks good [10:27:02] fleet-wide sudoers edits are just.... *shudder* [10:27:22] otoh, everybody knows they are high-risk so review is intense :) [10:27:38] ? [10:28:05] Well, doing a fleet-wide edit of sudoers makes everyone especially paranoid, so you get more/better review. [10:28:36] whereas it's the simple "doesn't even touch anything critical" edits that then are the final domino in place that get ya [10:29:14] we have solit puppet code for admin, plus the allowed commands are basically a little more than read-only [10:29:32] (and the number of new users limited) [10:29:53] the group will be used by dcops and new SREs [10:29:57] ack. [10:30:03] (tappo*f is already on it for example) [10:30:17] I'll send an email later on [10:30:25] (sre-admins is to be considered deprecated) [10:31:16] Fortunately, puppet works without sudo, that was not the case in a place I used to work at. You can imagine how wonderfully that broke everything one day. [10:34:30] IIRC we had a case once in which something broke puppet client in a way that wasn't fixable via puppet but luckily a sed run via cumin solved the issue :D [10:38:41] always nice to have _several_ OOB accesses :) [11:07:43] Who knows about Blubber? There's a question in talk_to_sre on slack about layers (and removing build artifacts from images) [12:10:47] thanks s.lyngs :) [13:18:21] I want to export nginx access logs into a prometheus metric. Before I start building a new thing, are we doing that in our infra already someplace? [13:18:48] andrewbogott: not specifically nginx, but usually mtail or benthos is used for that kind of task [13:19:32] ok, I will look for that in the puppet repo. It looks like there's a project specifically designed for nginx logs -> prometheus but it might be overkill for what I need [13:42:44] folks please avoid any change to puppet private for the next 10/15 mins if possible, or ping me if you have to do it urgently [13:46:31] nice, /var/lib/git/operations/private on puppetmaster1001 has unstaged content [13:48:27] there are some files owned by root and not gitpuppet [13:50:41] elukey: those are autogenerated [13:50:43] by the timer [13:51:10] that updates them every day [13:53:41] volans: the context is in https://phabricator.wikimedia.org/T368023, something weird happens [13:53:52] the timer that you refer to commits to /srv/private [13:54:14] I keep finding staged (not committed) content under /var/lib/git/operations/private [13:54:18] that shouldn't happen [13:58:24] at this point I am wondering if this has been ongoing for a bit of time or not [13:59:04] Balthazar failed to private-commit the last time, so I suspect that anything like that should cause issues [14:50:47] akosiaris: you around? I have some questions about deploy1002/1003 and I've been told you're the person to contact :) [14:53:16] tl;dr; (I think) deploy1002:/srv/deployment/netbox/deploy/.git/objects/info/packs is owned by `mwdeploy wikidev` but the same file on deploy1003 is owned by `debmonitor wikidev`, which causes the deploy cookbook to fail as it tries to access the files as `mwdeploy` [14:54:42] it smells like UID mismatch and rsync that attempted to preserve UIDs [14:54:47] that's one example file, but there are others in the same case [14:55:08] akosiaris: there is a parsoid cleanup required in the DNS repo as well I think [14:55:12] Jul 30 14:32:42 dns7001 gdnsd[1431224]: Name 'parsoid-php.discovery.wmnet.': resolver plugin 'metafo' rejected resource name 'disc-pa> [14:55:24] and seems like the same happened in all the directories [14:56:04] I am patching it [15:09:34] You shouldn't be using deploy1002 anymore [15:09:42] it's going to be decommissioned [15:09:58] ah, the uid is wrong on deploy1003 [15:10:00] mb [15:10:26] yeo [15:10:28] yep [15:11:10] yeah, uid mismatch [15:11:15] cgoubert@deploy1002:~$ id mwdeploy [15:11:16] uid=499(mwdeploy) gid=499(mwdeploy) groups=499(mwdeploy) [15:11:26] cgoubert@deploy1003:~$ id debmonitor [15:11:28] uid=499(debmonitor) gid=499(debmonitor) groups=499(debmonitor) [15:11:45] o/ [15:12:19] ok 1st things first, let me fix the DNS part [15:13:02] the other ones, I think I know what they are about, systemd::sysusers, the way we use it doesn't setup specific uids [15:13:05] but rather it's a race [15:13:18] which is also why the imagecatalog was failing after every switchover [15:13:26] I have fixed that one, I didn't dare touch the rest thouch [15:13:46] but apparently I have to sync them between the 2 [15:14:15] deploy1002 shouldn't be used btw, I haven't decommissioned it today due to some scap issues where we needed a guinea pig [15:14:20] but otherwise stay clear ot if [15:14:25] of it* [15:15:03] sukhe: do you have a parsoid-php DNS change already? or should I craft one? [15:15:19] akosiaris: https://gerrit.wikimedia.org/r/c/operations/dns/+/1058189 [15:15:23] thanks [15:15:29] XioNoX: is the uid diff causing you issues? [15:16:46] akosiaris: I fixed it manually for my repo, but it might impact others [15:16:56] but yeah that was the issue [15:18:50] akosiaris: do you want to run authdns-update or should I do it? happy either way [15:19:00] sukhe: already done [15:19:08] thanks <3 [15:19:30] looks good now [15:24:41] every re-image is going to be a damn race isn't it.. [15:28:10] XioNoX: ok, so on deploy1002, why is /srv/deployment/netbox owned by mwdeploy but /srv/deployment/netbox-dev by trebuchet? [15:28:51] I won't comment the fact that in 2024 I am still seeing trebuchet even though we got rid of it 5+ years ago [15:29:41] akosiaris: probaly leftover from a ancient time, the issue was the .git directory that didn't have the proper owner [15:30:53] akosiaris: so I fixed the owner of /srv/deployment/netbox/deploy/.git/ on deploy1003, and now now deploy1002 it shows up as `keyholder` [15:31:21] while it was `mwdeploy` on 1002 before I fixed it on 1003 :) [15:31:36] so yeah, probably a uid sync going wrong :) [15:31:42] keyholder is the wrong uid though [15:32:02] but if you fix it on 1002 it's going to be wrong on 1003 the second after :) [15:32:07] akosiaris: keyholder on deploy1002 has the same uid as mwdeploy on 1003 [15:32:11] no, it's the other way around [15:32:23] 1003 is being synced to 1002 [15:32:50] if 1002 is going away, then maybe it's just fine to wait it out, I fixed my immediate issue [15:41:08] apparently dockerhub is giving me the oppertunity to go from 99 known vuln to "only" 51 vulnerabilities [15:41:19] php:apache-bullseye https://hub.docker.com/layers/library/php/apache-bullseye/images/sha256-ee9228058b1efb077a6f175556c6ed7609234132d368a63b25ac3edec842094b?context=explore [15:41:25] bookworm: https://hub.docker.com/layers/library/php/apache-bookworm/images/sha256-9f9dfcdafdc40a38b0bbce87bea746250798c085e91f7fd3dda9c16a5610c41b?context=explore [15:41:29] used by Codesearch frontend. [18:15:56] urbanecm: doesn't IRCCloud support slack actually https://blog.irccloud.com/slack-integration/ [18:16:21] So maybe that could work for cscott [18:16:33] I moved from -operations cause busy doing actual stuff [18:17:08] Ye it's still there on desktop [19:28:29] RhinosF1: the button is there. never used it though. i just have slack installed [20:10:56] hnowlan, could you please respond on T361381? Even if it's just to say that you don't know/don't care? [20:10:56] T361381: Replace deployment-maps-master01 with a Bullseye or Bookworm instance - https://phabricator.wikimedia.org/T361381 [20:13:58] similarly, btullis, can you respond to https://phabricator.wikimedia.org/T370465? [20:14:19] * andrewbogott loves this skiptracer side gig [20:16:35] nasahdnasbhdahdhajsahsasa sahshjs [20:17:08] ksjqjhdahdhksdjsjsahsjsa maybe you want it all [20:17:20] pqsjjsshqskjdhssjsqasaksh [20:17:34] I am out on leave this week, back next week. Please could you direct it to someone else in Data Platform Engineering SRE if it's urgent. Thanks. [20:17:55] nd the on-duty person will redirect it as appropriate.Cross-team tags: [20:18:06] amjajhdsshdhjsjosahdhs [20:18:31] sasasj [20:18:54] m, add a tag from the list below. If you're not sure which one to [20:19:16] mdaxsghdadajsxchsjsgk [20:19:30] ahhdahdaaysqas ashsgusdy 5866 [20:20:18] ajdajdkajdasiwsadsdj