[10:25:01] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614#10707237 (10cmooney) [10:25:05] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 13Patch-For-Review: Migrate network icinga alerts to gNMI/prometheus - https://phabricator.wikimedia.org/T388641#10707238 (10cmooney) [10:35:03] 10netops, 06Infrastructure-Foundations, 06SRE: Create alerting for saturation on sub-rated interfaces - https://phabricator.wikimedia.org/T374614#10707267 (10cmooney) >>! In T374614#10147994, @ayounsi wrote: > Short term I think if you add `[4Gbps]` to the interface description, LibreNMS will [[ https://docs... [10:45:50] 10netops, 06Infrastructure-Foundations, 10Observability-Alerting, 06SRE: Improve port-utilisation alerting to take QoS into account - https://phabricator.wikimedia.org/T384052#10707299 (10cmooney) [12:45:31] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade core routers to Junos 23.4R2 - https://phabricator.wikimedia.org/T364092#10707705 (10cmooney) [13:49:12] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 06serviceops, 06SRE: Create a cookbook to automate gerrit's switchover - https://phabricator.wikimedia.org/T260666#10708019 (10ABran-WMF) a:03ABran-WMF [14:21:27] getting an error when running test-cookbook against a specific patchset, any suggestions? https://phabricator.wikimedia.org/P74592 [14:22:05] nm...I just needed to use `--ps` instead of `-ps` [14:22:20] I was about to ask that :D [14:23:22] I'll push a patch to fix the help msg at some point, but yeah...pebkac ;P [15:22:53] 10netops, 06Infrastructure-Foundations, 10Data-Engineering (Q3 2025 January 1st - March 31th): Update `netflow` retention strategy in Druid (too much data) - https://phabricator.wikimedia.org/T387839#10708693 (10Ahoelzl) 05Open→03Resolved [20:03:58] Hey IF, I'm getting the `Nagios_host resource with title cirrussearch2056 not found yet` error when running a reimage cookbook again. This is a net-new role, more details here: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1134043 . 1)Does that nagios msg actually indicate a lack of Puppet 7? and 2) If so, where do we need to set the hiera? [20:11:12] o/, looking [20:12:36] {◕ ◡ ◕} [20:14:04] the nagios configs rely on exported resources, which are populated in puppetdb, when a puppet run completes [20:14:08] where do you see the error? [20:14:34] it's coming from the reimage cookbook in progress...it's on a tmux on cumin2002 under my user if you wanna take a peek [20:15:24] session is called 'rolling-operation' [20:17:43] given the message about puppetdb, I would assume this check is used as proxy to determine if the puppet run has completed successfully [20:18:13] but I would need to look at the re-image cookbook to confirm [20:18:48] yeah, if I scroll up on my tmux I see it signed the CSR and then it says `100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'puppet agent -t --noop &> /dev/null'. [20:18:48] 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. [20:18:48] Run Puppet in NOOP mode to populate exported resources in PuppetDB` [20:18:58] ah [20:19:06] that's when it hangs with the Nagios error [20:19:29] thanks, so then I would assume the noop is failing, i.e. puppet is not able to compile the catalog [20:19:44] or not able to apply the catalog in noop mode [20:22:52] ah, puppetserver has the answer [20:22:52] or at least a relevant error...missing profile::opensearch::rack [20:22:52] I thought I fixed that yesterday, but I guess not ;( . I'll get a patch going, sorry to bug ya [20:22:52] no worries at all, please but me again if that doesn't rectify the issue [20:22:52] *bug [20:41:59] cool, will do