[00:19:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:24:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:09:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:14:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:25:39] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:29:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [01:59:21] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:04:21] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:05:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:15:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:19:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:49:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [02:50:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:04:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:09:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:39:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:40:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:55:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:59:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:29:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:30:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:45:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:49:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:15:56] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate codfw row C & D database hosts to new Leaf switches - https://phabricator.wikimedia.org/T370852#10026182 (10Marostegui) >>! In T370852#10010096, @Ladsgroup wrote: > This should have the map: https://fault-tolerance.toolforge.org... [05:19:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:20:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:34:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:39:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:54:56] elukey: indeed! Too bad they didn't open a PR or at least an issue in 2023, it could have been fixed by now... They have a suggested fix, but it's more to import what's in __init__.py if I understand correctly : https://github.com/Omripresent/netbox/commit/f23d7e6facac899710505ba547a8fd6a360c5dd5 not other files on the side. [05:58:30] elukey: this looks interesting too: https://github.com/kkthxbye-code/netbox-script-manager but I don't think we should rely on a plugin that could too easily go unmaintained for such critical feature [06:18:43] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox automation to move selected hosts from ASW to LSW - https://phabricator.wikimedia.org/T370846#10026230 (10ayounsi) We can potentially re-use the `move_server.MoveServer` script but make the server selection a `MultiObjectVar` as input and make the rack U... [06:24:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:29:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:40:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:44:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:59:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:00:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:14:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:19:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:30:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:34:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:48:43] XioNoX: yes yes similar but not the same :( IIUC upstream doesn't really have any willingness to fix this issue, I am going to check a little netbox's code and then I'll try to give my +1 to one of your solutions [07:52:26] thx [08:04:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:09:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:20:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:21:18] XioNoX: did you already silenced/disabled notifications for netbox1003? has resumed its spom :D [08:22:09] 700 just in july in this channel ;) [08:24:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:54:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:59:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:10:39] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:14:21] FIRING: [2x] SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:26:58] volans: it keeps expiring, downtimed for 2 more days [09:27:08] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#10026676 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2cf5df78-f1ca-4bf8-800b-9a731e1182f6) set by ayounsi@cumin1002 for 2 days, 0:00:00 on 2 host(s) and their... [09:40:57] the ops-limited posix group is being rolled out now [09:41:02] finally :) [09:42:11] XioNoX: so the problem is your optimism :D [09:43:07] things have been taking way too long [10:39:12] the thing that I don't undestand is why something like from ._commons import etc.. doesn't work [10:39:20] they are in the same dir too [10:42:10] elukey: it's from the PoV of the Netbox app as I understand it, so only netbox root or other system paths. Probably something to do with the fact that it imports it in DB during sync [10:45:20] elukey: they dynamically load the scripts now and don't add the scripts root to the pythonpath [10:45:25] XioNoX: another option could be to have all *py files in customscripts prefixed with wmf_ or similar, and then we could simply add SCRIPTS_ROOT to the PATH [10:45:33] should be simple and avoid any kind of collision [10:45:59] no need to prefix each file, just have wmf_scripts and add that to the pythonpath [10:46:14] pythonpath hack and scripts root in netbox config could differ [10:46:56] volans: wmf_scripts as in a new dir called like that? With all the files inside [10:47:01] I'll implement anything you think is best at this point [10:48:00] elukey: either rename the existing one in extras or have a new one inside that with just _common.py [10:48:09] so the import would be from wmf_script._common import Foo [10:48:17] *wmf_scripts [10:48:24] or anything similar [10:48:42] another option that was not evaluated was to put the common stuff in its own package and add it as dependency [10:48:48] but requires publishing it to pypi :D [10:50:29] volans: IIUC "customscripts" is some sort of default for Netbox 4, but I could be wrong [10:50:52] XioNoX: can we rename "customscripts" to something like "wmfscripts" ? [10:51:08] if so we are good to go with option 2 [10:54:08] I've a quick question [10:54:16] elukey: I don't think that rename would change anything, or I don't understand [10:54:21] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:54:40] was the option to inject _common.py as wmf_scripts_common.py into the venv from the deploy repo being evaluated? [10:55:01] we may have multiple files in the future [10:55:08] multiple common? [10:55:27] if _common gets horrendously big we should probably break it down [10:55:46] that it can be just its own package :D [10:55:48] *than [10:55:59] you are fixated with packages :D [10:57:54] XioNoX: the main idea is use option 2 and avoid naming collisions.. so if we have a new dir containing what could generate a collision (like _common.py) then we should be good. I agree that renaming customscripts alone is not enough, we'd need to have its parent dir as SCRIPTS_ROOT probably.. so we could simply add a subdir called $something with _commons.py in it [10:58:18] then we fix all the imports in custom scripts, and add the new subdir path to the Python path in settings.py [10:58:40] if you want to add the subdir you need 2 nested levels [10:59:06] customscripts/wmf_utils/_commons.py right? [10:59:20] not enough, you'd have to add customscripts/ to the path [10:59:28] with risk of collisions of the files inside [10:59:45] I am not getting why customscripts though [10:59:58] to be able to fo from wmf_utils._common import [11:00:13] if you add /path/foo/ to pythonpath, the imports look at what's inside foo/ [11:00:16] not at foo itself [11:00:37] yeah you're right [11:00:45] just print(sys.path) :D [11:05:33] volans: then isn't it less invasive to just rename all the customscripts files with "wmf_" in front and add the dir to pythonpath? [11:05:46] or simpler I'd say, not invasive [11:06:18] so we don't have to think about it too much [11:06:34] and _commons becomes wmf_commons.py [11:07:01] (lunch, will read later!) [11:07:48] I don't recall if the names that appears in netbox come from the filename too, it will be a bit annoying to see all the scripts called wmf_... [11:08:19] I think it does [11:08:58] I think we will need to rename the reports and their timers too [11:09:23] even though the timers can be replaced by netbox 4 built in function now [11:10:41] that's why I wanted to decouple _common from the rest of the scripts [11:11:17] isn't its name unique enough? [11:11:34] which name? [11:12:21] _common [11:12:44] please, re-read what you wrote... in teh same line 'common' and 'unique'... :-P [11:13:23] but the underscore :) [11:13:32] I would probably do something like this: [11:13:47] - leave customscripts as is (ideally rename it to scripts as I always hated that name) [11:14:04] - create in the root of netbox-extras: scripts_imports/wmf_scripts_imports/common.py [11:14:18] - add scripts_imports to the pythonpath [11:14:27] - potentially in the future split common into multiple things [11:15:57] then edit the scripts to say from common import Importer ? [11:16:19] no, from wmf_scripts_imports.common import Importer [11:16:25] from wmf_script...common import [11:16:27] yeah [11:16:42] wmf_scripts_imports it's the unique part that ensures to avoid conflicts [11:16:53] how do you add it to the python path? [11:16:54] can be wmf_netbox_scripts or aanything similar [11:16:56] or where? [11:17:15] that's a good question, the obvious one is making it a pypi package :D [11:17:24] the other is to inject it somewhere in netbox [11:17:24] how? [11:17:37] so back to option 2? :) [11:17:39] the third one is to see if netbox adds anything to pythonpathh that we could leverage [11:18:31] >>> print(sys.path) [11:18:31] ['/srv/deployment/netbox/current/src/netbox', '/usr/lib/python39.zip', '/usr/lib/python3.9', '/usr/lib/python3.9/lib-dynload', '/srv/deployment/netbox/venv-1716810615/lib/python3.9/site-packages'] [11:18:41] so just the source code and the venv [11:18:51] not sure it it adds anything at runtime though [11:19:10] would be better to check with a script, adding a logging line to log the sys.path [11:19:16] within the script context [11:19:21] to se if there is anything we could use [11:20:38] what's the downside of my option 4 btw? [11:21:44] looks like it's the same usecase with extra step [11:25:26] it's the same, but we need the abov estructure anyway [11:25:35] and the deploy repo could create the symlink in the venv [11:25:41] (preferred to the symlink in the source code) [11:26:01] less issues, will be cleanup by the venv rotation of the deploy code [11:29:55] I didn't know we could link stuff directly in the venv [11:30:12] does it need to have a specific structure? [11:33:36] be inside /srv/deployment/netbox/venv-1716810615/lib/python3.9/site-packages/ [11:34:00] although it will still be a hack, to be clear :D [11:35:07] as long as it's the least complex and easiest to maintain hack [11:35:33] I'll sent patches asap [11:42:06] I'd like to have a look at netbox code to understand what they are doing [11:42:13] not sure if I have the time or are allowed to do so :D [11:42:23] s/allowed/authorized/ ;) [11:42:59] volans: should I wait for you ? [11:48:43] give me 5 [11:49:21] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:00:07] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate codfw row C & D database hosts to new Leaf switches - https://phabricator.wikimedia.org/T370852#10027115 (10Ladsgroup) Yeah. If we can build a public API from zarcillo, it'd would make the whole easier. [12:00:20] I'd like to see how netbox imports the scripts and if we could use the same to import common [12:00:56] that said, anothe roption , slightly more annoying, would be to add in each script the dynamic import of the common file based on the SCRIPTS_ROOT config with importlib [12:01:28] netbox might also have a helper function for that, to be investigated [12:01:30] a bit like option 1 ? [12:01:53] volans: dunno if this can help btw https://github.com/Omripresent/netbox/commit/f23d7e6facac899710505ba547a8fd6a360c5dd5 [12:01:55] much lss invasiva [12:02:09] we discussed it a bit earlier [12:02:21] (this compared to option 1) [12:09:28] so something like common = ScriptModule.objects.get(file_path="_common.py").get_module() [12:09:42] should work, currently failing because of cannot import name 'ColorChoices' from 'utilities.choices' [12:09:48] that got moved [12:10:03] anyway, lunch time [12:10:21] volans: https://github.com/wikimedia/operations-software-netbox-extras/commit/7bac70975fd1b8ba9b7c54d41303fcbad1587231 [12:10:40] where would you add that line though? in the script file? [12:11:29] in each script yes [12:12:00] how stable the solution is... to be seen [12:12:20] also ScriptModule does a bit too many things imho but I'm not sure we can use one of its parent classes [12:12:24] too deep into netbox code [12:12:26] to know for sure [12:12:44] yeah, no great option until an upstream fix [12:25:30] I sent https://gerrit.wikimedia.org/r/c/operations/software/netbox-deploy/+/1058146 and https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1058147 [12:27:52] er, it's incomplete, how do I figure out the python version in that part of the path? `lib/python3.9/site-packages` for the symlink? [12:34:59] I also updated https://gerrit.wikimedia.org/g/operations/software/netbox to 4.0.8 [12:41:27] 10SRE-tools, 06Infrastructure-Foundations, 06SRE: Pairing tool for new SREs using sudo under supervision - https://phabricator.wikimedia.org/T299989#10027374 (10elukey) The new ops-limited group is live, just sent an email to all SREs about it. [12:51:47] I guess the alternative is to create the symlink into netbox's directory instead of the venv [12:53:55] is that script even used? [12:54:13] volans: which one? [12:54:46] scap/checks/linkconf.sh [12:56:26] volans: ah yeah, leftovers from the scap era.. [12:56:39] Makefile.deploy is where it needs to go and there is `PYTHON_VERSION` [12:57:27] exactly [12:57:37] you have the venv path easily there [12:58:06] 10netops, 06Infrastructure-Foundations, 06SRE: Do we need prometheus-ethtool-exporter? - https://phabricator.wikimedia.org/T371375 (10cmooney) 03NEW p:05Triage→03Low [12:59:45] volans: not sure where `PYTHON_VERSION` is used, or if it's leftovers too [12:59:50] but probably safe to re-use it [13:00:59] possibly? [13:05:56] I lost track of what it was decided :D [13:15:28] I was exploring options, Arzhel jump on all of them at once and produced some CRs :D [13:15:34] *jumped [13:17:03] volans, elukey, CR updated [13:30:08] elukey: nice work & email :) [13:30:27] indeed, nice email [13:30:53] <3 [14:10:07] XioNoX: I like https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1058147 but I'd add a README explaining the rationale. We can ask others to chime in and vote, the preferred one will win [14:11:21] elukey: they're complementary [14:11:36] elukey: I replied on your comment too, but I prefer to have the doc in 1 location [14:12:09] ah ok you added a symlink, I didn't see the other change since I thought it was the other solution [14:12:27] I wish it was that simple :) [14:12:52] re: docs - fine for wikitech, I alwasy find nice to have a reference in the repo that I checkout [14:13:03] otherwise it is difficult to know where the canonical docs are [14:13:29] if it's only in one location, it's easier, no? [14:13:49] if you know that it is on wikitech yes [14:15:36] lgtm anyway, feel free to proceed [14:15:38] I hope people read the doc before messing with that repo :) [14:16:10] my main point is that "the doc" should be close to a repo, but it is just my preference [14:17:06] elukey: https://github.com/wikimedia/operations-software-netbox-extras/blob/master/README.md last touched, 5 years ago :) I'll add to my todo to clean it up [14:18:09] I'll deploy netbox 4.0.8 on netbox-dev to test those changes too [14:24:49] 10Mail, 06Infrastructure-Foundations, 10MediaWiki-Email, 06SRE: Old "Email this user" email is repeatedly resent - https://phabricator.wikimedia.org/T361860#10027930 (10jhathaway) @Xover apologies for the radio silence on this issue, have you seen any new occurrences? [14:36:02] XioNoX: do you have a moment to show me how to use the network provision script? [14:36:24] on netbox-next, I would like to run it with some data but I am a little lost :) [14:37:24] ah sorry you are deploying, after that np [14:37:26] sure, as soon as I'm done with the "sre.deploy.python-code" error :) [14:47:49] `sudo runuser -u mwdeploy -- /usr/bin/git -C "/srv/deployment/netbox/deploy" update-server-info` fails with `error: unable to update .git/info/refs: Permission denied`, the files are owned by `debmonitor wikidev`, the debmonitor seems a bit weird, no? And the same deploy command worked last week [14:48:15] the deploy host changed though [14:48:33] ping alex :D [14:48:40] deploy2002 still says "don not use this" [14:48:49] and deploy1003 doesn't say that [14:48:59] XioNoX: new hosts, not switch datacenter [14:49:14] 1002/1003 [14:49:32] yeah, 1002 says "do not use" [14:50:17] and indeed on 1002 the files are owned by `mwdeploy wikidev` [14:50:21] so something is funky [15:09:16] 10netbox, 06Infrastructure-Foundations, 13Patch-For-Review: Upgrade Netbox to 4.x - https://phabricator.wikimedia.org/T336275#10028310 (10ops-monitoring-bot) Deployed netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275 [15:11:31] volans, elukey, netbox 4.0.8 deployed to next with that workaround and first test show that it works :) [15:17:42] 10netops, 06DBA, 06DC-Ops, 06Infrastructure-Foundations, and 2 others: Migrate codfw row C & D database hosts to new Leaf switches - https://phabricator.wikimedia.org/T370852#10028353 (10Marostegui) @ABran-WMF please coordinate with @cmooney for this. [15:18:13] elukey: I can help now for the network provisioning script [15:35:55] back! [15:36:00] I was in a meeting [15:37:17] elukey: any specific question or you want a walk through? [15:38:00] XioNoX: super simple - an example of inputs that I should give to the script to test it [15:38:18] I tried with netbox-next but for some reason I wasn't able to pick up the right data [15:40:15] like in https://netbox-next.wikimedia.org/extras/scripts/24/ - a triple device/switch-port/speed that I can use over and over to test [15:40:34] in the dropdown I see only a few of the nodes, like cp4036 [15:40:36] elukey: yeah, easiest is to find a planned device : https://netbox-next.wikimedia.org/dcim/devices/?status=planned [15:41:18] of course there are no good pending ones [15:41:19] and the switch port is the one related to the primary interface [15:41:30] like in https://netbox-next.wikimedia.org/dcim/devices/5250/interfaces/ [15:42:11] of course I'm finding more netbox 4 bugs [15:42:23] ahahhaah [15:42:28] this wasn't the aim [15:42:35] cp4033, switch port 10, speed 10G, cable id foo [15:44:39] perfect thanks :) [15:45:31] I'm sure I fixed that bug in the past too... did I screw up the rebases?! [15:47:53] slyngs: ok to merge https://gerrit.wikimedia.org/r/c/operations/software/debmonitor/+/1054879 when I have time? Or do you prefer to review it? [15:49:21] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:56:29] elukey: fixed manually on -dev, I'll send a patch but this worked https://netbox-next.wikimedia.org/extras/scripts/results/44712/ [16:02:03] the fix https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/1058208 [16:18:27] +1ed thanks, I'll cherry pick my change tomorrow and check! [16:18:34] (The mgmt mac address) [17:18:09] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10028892 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=22b0edee-c7a6-4b0f-9fea-2095ec62... [17:18:56] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10028893 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6ff7dee3-4248-4c63-812a-befb7aa3... [17:55:32] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10028999 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=dd309020-6739-44e3-aae7-1db7e069... [17:55:45] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10029001 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=e014b03e-5922-4caa-80c4-c950cc41... [18:13:26] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10029064 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=a53d3f9e-80ae-429e-b814-01f035f8... [18:29:35] elukey: Sorry, +1'ed. [18:30:30] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10029125 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=428f84f9-4ca7-4d64-ba2f-941c3927... [18:30:48] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10029126 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=fea7df87-a776-4ad1-b5ea-1c4c47a6... [18:30:59] If someone is brave enough to review and +1 https://gerrit.wikimedia.org/r/c/operations/dns/+/1057827 then I'd like to be able to switch the CAS clusters tomorrow [18:37:05] +1ed [18:38:00] jhathaway: Thank you, hopefully any major issues caused by the upgrade will have been resolved when your back at the desk tomorrow :-) [18:38:10] ha! [18:38:20] It should "Just work"(tm) :-) [18:59:58] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: lvs2012: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370862#10029209 (10cmooney) Work on this one is completed, all that remains is to remove the old cross-rack links which... [19:28:56] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2011: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370891#10029484 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=fdb9ae19-db19-42c1-a837-d30eff23... [19:29:28] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: lvs2011: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370891#10029498 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=0cfea209-8c6a-4d44-8fbf-96f5cd79... [19:49:21] FIRING: SystemdUnitFailed: generate_os_reports.service on puppetdb2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:09:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: lvs2011: Move existing row C & D vlans to primary uplink and add new ones - https://phabricator.wikimedia.org/T370891#10029715 (10cmooney) Work completed on this one on the network & LVS side. @papaul we can now remove the cross-... [20:27:20] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434 (10RobH) 03NEW [20:27:23] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-eqiad: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435 (10RobH) 03NEW [20:27:32] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-codfw: Q1:codfw:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371434#10029796 (10RobH) [20:27:45] 10netops, 06DC-Ops, 10fundraising-tech-ops, 06Infrastructure-Foundations, 10ops-eqiad: Q1:eqiad:frack network upgrade tracking task - https://phabricator.wikimedia.org/T371435#10029801 (10RobH) [23:04:21] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:59:21] FIRING: [2x] SystemdUnitFailed: generate_vrts_aliases.service on mx-in2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed