[12:52:18] folks if anyone was able to assist me with T408378 I would appreciate it [12:52:18] T408378: Nokia OSPF alerts not working - https://phabricator.wikimedia.org/T408378 [12:52:40] for some reason the alerts don't fire when I want them and I can't really see why [12:52:59] also the Lint alters at sites where we don't have these metrics, I'm sure there is an easy fix for that [12:53:41] not super urgent [13:26:14] I'll take a look topranks [13:33:45] topranks: could you tell me a time window when at least one alert should’ve fired? [14:06:38] topranks: found a couple of alerts that fired successfully https://w.wiki/FqSi [14:42:27] tappof: hey sorry I was on a call there for ages [14:42:50] topranks: no problem :) I left you a comment on the task [14:42:59] ok cool [14:43:13] I can basically trigger the scenario in which an alert would fire any time [14:43:33] right now two devices are in a state which should alert - but note i have downtime set using the cookbook for these devices [14:43:40] will check the task now thanks! [14:50:39] > right now two devices are in a state which should alert - but note i have downtime set using the cookbook for these devices [14:50:52] I see one neighbor [14:51:20] topranks: [14:55:45] sry, I am on cellular internet in Italy here and it is acting up :( [14:56:11] yes that makes sense only one alert should be present right now [14:56:57] or perhaps two, I disabled the inteface not the ospf so it should be two [14:56:58] https://grafana.wikimedia.org/goto/smrMp8RvR [14:57:28] this even: https://grafana.wikimedia.org/goto/smrMp8RvR?orgId=1 [14:57:57] huh why is the share link not working? [14:58:00] https://grafana-rw.wikimedia.org/d/b77db156-d852-4601-acc5-4065b888e5fe/ospf-status-nokia [14:58:06] If you disable the interface, the oper status reported by gNMI should be 1 (down), not 4 (point-to-point). Am I right? [14:58:49] > huh why is the share link not working? [14:59:07] I think it does not work from grafana-rw [14:59:55] I wasn't 100% on that but just checked in the thanos ui [15:00:08] correct, the oper-state is '1' because I have the interface shut down [15:00:58] I have to jump on another call, but I will need to think about that, we might want to also alert if "oper_state==1" [15:01:04] maybe a separate alert [15:01:22] but yes right now the alert won't fire it doesn't match the status [15:01:50] ack [23:45:34] FIRING: DiskSpace: Disk space centrallog2002:9100:/srv 3.981% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=centrallog2002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace