[06:59:43] I tried today to use homer to finish adding parse1001 to the cluster for T359387, but met a huge diff in the firewall and bailed out. In the end, I just added the bg neighbors manually, but someone with context for what all those firewall changes are probably wants to take a look [06:59:48] T359387: Cleanup parsoid-php service - https://phabricator.wikimedia.org/T359387 [07:02:44] akosiaris: will do, thx [07:43:18] hi, I have sent a couple patches to change the `git::clone` for homer public and private repos which change the permissions of the repos on disk: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056981 https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056985/ [07:43:18] Each commit has the rationale and have PCC output to ease the review/deploy ;) [07:43:36] I have done similar changes on other repo, the ultimate goal is to remove the `umask` parameter from `git::clone` [08:09:02] hashar: hi! We'll review it asap :) [08:13:13] :] [08:13:39] I am pretty sure they are both fine but I don't know the cumin/homer contexts :/ [08:33:02] hashar: I am wondering why https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056981 stops you from removing the umask parameter ... I think that we wanted 440 to avoid any accidental changes to those files, it is public but we use it on cumin nodes for delicate things [08:33:11] so if possible I'd not change its mode [08:33:26] the other one for homer private looks fine, I can definitely merge it [08:36:23] in the WIP change to remove umask, I am computing the umask based on the requested mode [08:36:55] so my change removes the umask parameter from all definitions [08:37:22] for homer, the requested mode is 0440 which results in my change to compute a umask of 337 [08:37:53] and since currently the mode is set but hte umask is not set, git::clone defaults it to 022 [08:38:11] and on my wip change that shows up as a diff for the cumin host https://puppet-compiler.wmflabs.org/output/927986/3425/cumin1002.eqiad.wmnet/index.html ;) [08:38:19] - umask => 022 [08:38:20] + umask => 337 [08:38:31] so the change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056981 is to eventually get a noop in the follow up change [08:38:46] I have made a puppet change per diff to ultimately end up with a noop change ;) [08:39:32] then for the change of mode, I have the clone of graphana-grizzly writable as well with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1054892 [08:39:47] cause given `git::clone` uses `ensure => latest` it would erase local modification on each puppet run [08:40:09] it also drop the "ops" group so that the repo will be owned by root [08:40:57] which I guess would prevent one from doing unexpected modifications (unless you are acting as root of course) [08:41:44] eventually long term I'd like to remove `mode` from the `git::clone` in favor of having a parameter that defines what ones want (public, read-only, group-private, user-private) something like that [08:44:27] hashar: sure what I am saying is that we can do a no-op simply setting umask for the homer public repo too, without changing its mode [08:44:30] same thing as private [08:44:44] no-op for the final change I mean [08:45:08] it seems more consistent to what's already there [08:45:13] to rephrase, the 0444 mode is not needed after changing the group from `ops` to `root` ;) [08:46:59] but surely I could have set the umask to 337, but remove mode/group would still make the repo read-only for non root [08:47:24] and that has the advantage of making the git::clone slightly simpler [08:47:29] yeah but ops folks can read it without sudo, while with your change they will have to use it [08:47:38] I think this was the original aim [08:47:56] so let's use what already there please :) [08:48:35] ops folks will be able to read it since the mode changes to 0755 [08:48:38] (readable by others) [08:49:10] sure, but we prefer 0440 :) [08:50:02] XioNoX, topranks - do you have an idea where a string like "Generated successfully, see the output tab for result." could be generated between Homer/Netbox? [08:50:25] context is https://github.com/netbox-community/pynetbox/pull/632 - upstream asked me what is that use case [08:50:35] elukey: https://github.com/netbox-community/pynetbox/pull/632#issuecomment-2262383509 :) [08:50:40] I don't see why you need to not make it readable by others given the repository IS public anyway ( https://gerrit.wikimedia.org/g/operations/homer/public/+/refs/heads/master ) [08:51:12] XioNoX: aahhh thanks! [08:52:31] hashar: readable by others, like 444 would be fine, but 755 is different, and I am pretty sure who set those perms didn't want write capabilities for those files. The repo is public, but its use on cumin nodes is different from the fact that we want to set some file perms. [08:53:23] and I am just suggesting to keep things as they are, since it will not block your work [08:53:49] fixing the umask is fine, let's do it and move on :) [08:56:46] 10netbox, 06Infrastructure-Foundations: Change icinga link to alerts.w.o in netbox device page - https://phabricator.wikimedia.org/T371079#10034638 (10ayounsi) +1 to keep both for now. @fgiunchedi can you double check the link ? https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org&q=... [09:00:18] elukey: it is not just about fixing the umask, ultimately I want to remove the mode parameter as well :D For the homer public repo, after my change it will still not be writable since it would then be owned by root:root [09:00:45] hence why we can drop the 0444 mode, that is covered by changing the group ownership from ops to root :) [09:00:56] (and that is one less mode I have to migrate later) [09:01:46] hashar: and what mode parameters you'd like to provide? This is a bigger change than the one you anticipated :) [09:02:08] well in the case of the public repo, the default of 0755 and root:root would work [09:02:15] aka the default [09:02:32] 10netbox, 06Infrastructure-Foundations: Markdown bug in Netbox-next - https://phabricator.wikimedia.org/T340444#10034645 (10ayounsi) 05Open→03Resolved a:03ayounsi Going to close that one as I can't reproduce it on Netbox 4. Please reopen if needed. [09:04:38] or I guess I can do another change that simply fix the umask to the correct 0337 and rephrase my change that is changing the mode/group to the default [09:04:41] splitting the concerns [09:06:06] I'd be greateful, and then we can think about the modes to offer by default for git::clone - I get what you are doing, but having repos used by tools with the writable flag is something that I have always been scared about, they are supposed to be read-only and it should be reflected in the file perms [09:07:21] isn't the tool running as homer? [09:07:51] but yeah I will split my change [09:08:04] homer is run as our own users [09:08:12] no sudo, just type `homer xxx` [09:09:37] so having the public repo owned by root:root and 755 will keep it read-only [09:09:41] but I will split that to another change ;) [09:11:36] thanks :) [09:12:05] then I am not sure what the extra steps bring in :/ [09:32:39] elukey: so I have set the umask for both the private and public homer repos in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056985 :] [09:33:04] so it is now one simple change for both public repos, which is indeed is easier to reason about! [09:33:15] sorry I got lost in my thoughts earlier [09:35:02] ooooooookk and please feel free to discuss anything in here when you want :) Thanks for the chat and the change in the code review <3 [09:35:13] yeah it is a mess sorry [09:35:24] I have spent too much time trying to untangle git::clone :D [09:35:42] so it took me a while to understand that we could just set the umask [09:35:52] that is what I actually did for the Puppet masters [09:36:19] and removing `mode` is another concern entirely indeed but I got lost [09:42:04] thanks for the work :) [09:42:47] * hashar lunches [09:50:14] 10netops, 06Infrastructure-Foundations, 06SRE: Model GRE tunnels in Netbox - https://phabricator.wikimedia.org/T369351#10034735 (10ayounsi) [09:50:16] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#10034736 (10ayounsi) [10:34:05] 10netbox, 06Infrastructure-Foundations: Netbox: capirca.getHosts script runs into timeout - https://phabricator.wikimedia.org/T358339#10034793 (10ayounsi) `python Traceback (most recent call last): File \"/srv/deployment/netbox/venv/lib/python3.11/site-packages/django/db/models/fields/related_descriptors.py\... [10:39:27] 10netbox, 06Infrastructure-Foundations: Change icinga link to alerts.w.o in netbox device page - https://phabricator.wikimedia.org/T371079#10034813 (10fgiunchedi) Ah yes my bad @ayounsi the search link shouldn't include quotes, i.e. https://alerts.wikimedia.org/?q=%40state%3Dactive&q=%40cluster%3Dwikimedia.org... [10:42:23] 10netbox, 06Infrastructure-Foundations: Change icinga link to alerts.w.o in netbox device page - https://phabricator.wikimedia.org/T371079#10034816 (10ayounsi) 05Open→03Resolved a:03ayounsi No pb! Done. See for example https://netbox.wikimedia.org/dcim/devices/1969/ [12:04:50] back from lunch [12:05:01] elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056985 should be good to go ;) [12:45:55] ack! [12:48:13] 10SRE-tools, 10conftool, 06DBA, 06Infrastructure-Foundations, and 2 others: Spicerack support for dbctl - https://phabricator.wikimedia.org/T362893#10035208 (10Volans) Status update: The conftool improvements ([[ https://gitlab.wikimedia.org/repos/sre/conftool/-/merge_requests/9 | here ]] and [[ https://gi... [13:08:16] 10netops, 06Infrastructure-Foundations, 06SRE: Model GRE tunnels in Netbox - https://phabricator.wikimedia.org/T369351#10035314 (10cmooney) I've been playing with this a little on Netbox-Next, you can see the data here covering our existing GRE tunnels: https://netbox-next.wikimedia.org/vpn/tunnels/ Initia... [13:25:37] XioNoX: o/ I am a bit ignorant about what "databases" do in a Redis config [13:25:56] trying to get it from the docs but it is not clear [13:26:20] elukey: 16 namespaces [13:26:24] hardcoded IIRC [13:26:26] 0 to 15 [13:26:35] it's a tenancy mechanism [13:27:31] it's rarely used as a mechanism, if you can avoid it, at least for production workloads, it's best that you do [13:28:48] akosiaris: okok thanks! I think it is something used for netbox-dev with local redis.. I didn't know that one could have multiple databases in the same instance [13:29:24] there's some cool tricks you can do btw with those, e.g. https://redis.io/docs/latest/commands/swapdb/ [13:29:43] yeah, netbox uses 2 DBs, one for queue, one for cache [13:30:15] we shifted from using 0 and 1 (netbox 3) to 2 and 3 (netbox 4), to not collide [13:31:34] btw, it's not really a multitenancy mechanism. e.g. you can easily run flushall and kill all data in all databases [13:32:09] nor are there grants, users, privileges or anything like that. Just thinly siloed and separated data [13:32:13] okok! [14:20:37] 10Mail, 10Bitu, 06Infrastructure-Foundations: Don't get password reset emails for my alt through IDM - https://phabricator.wikimedia.org/T371612#10035614 (10Aklapper) [14:21:41] folks if you are ok I'd proceed with the debmonitor-server 0.5.0 upgrade [14:21:52] first in codfw (not serving traffic) [14:21:55] then in eqiad [14:21:59] ok for you? [14:22:20] and the upgrade would be a apt-get install debmonitor-server python3-debmonitor [14:22:52] slyngs: o/ pinging you since you did the last upgrade, lemme know if I am missing anything big [14:24:40] no objections here, I guess for testing the client updates we need to wait the eqiad update right? [14:26:21] in theory yes [14:26:54] checking the config it goes to "DB_HOST": "m2-master.eqiad.wmnet", anyway, so we could test forcing it to the codfw host too I thibk [14:27:17] seems good, upgrading codfw now [14:28:31] <3 [14:53:32] XioNoX: FYI with dark mode the external links icinga, etc... are not visible in netbox [14:53:49] kinda related, I see you added AlertManager, we could add it to debmonitor too (config change in puppet) [14:54:46] volans: maybe the 4.1 redesign will fix it :) [14:55:27] volans: not sure what you mean with debmonitor, I don't think it supports network equipment yet :) [14:55:53] it supports external links like netbox and has the Icinga one [14:56:30] ah ok [14:57:19] sorry I forgot the keyword "link" above :D [15:18:30] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10035781 (10Papaul) @cmooney links removed. You can resolve the task if nothing else needs to be done. [15:22:04] 10SRE-tools, 06Infrastructure-Foundations: Allow debmonitor to store the Debian version-id in the OS field - https://phabricator.wikimedia.org/T368744#10035803 (10elukey) Tried to test the new debmonitor-server on debmonitor2003: * changed an-worker1080 (random host) /etc/hosts to point debmonitor.discovery.wm... [15:23:33] added some info to --^ , rolled back debmonitor2003 since I wasn't able to test it properly [16:03:58] 10netops, 06Infrastructure-Foundations, 06SRE: Model GRE tunnels in Netbox - https://phabricator.wikimedia.org/T369351#10036015 (10cmooney) After discussing with @ayounsi on irc I've adjusted the approach: https://netbox-next.wikimedia.org/vpn/tunnels/ Principal decisions were: # We will use a group calle... [16:23:49] * elukey afk! o/ [17:40:06] o/ [20:21:10] 10Mail, 06Infrastructure-Foundations, 13Patch-For-Review: postfix: set smtpd_forbid_bare_newline to normalize - https://phabricator.wikimedia.org/T370011#10036691 (10jhathaway) 05Open→03Resolved a:03jhathaway [23:19:07] 10Mail, 06Infrastructure-Foundations: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) - https://phabricator.wikimedia.org/T369253#10037154 (10bcampbell) I don't see a record of this email in the Google logs. I see an email...