[01:33:35] (SystemdUnitFailed) firing: (4) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:34:28] (SystemdUnitFailed) firing: (4) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:22:39] 10netops, 10Infrastructure-Foundations, 10Observability-Metrics, 10SRE, 10observability: Prometheus: ingest SONiC metrics - https://phabricator.wikimedia.org/T335027 (10ayounsi) More details: LibreNMS (via SNMP) already collects data. There seems to be some minor bugs which will hopefully be fixed with t... [08:35:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322 (10fgiunchedi) >>! In T326322#9087109, @ayounsi wrote: > Next steps here: > * Decide which hosts will run gnmic, I can think of 4 option... [09:38:35] (SystemdUnitFailed) firing: (4) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:28:35] (SystemdUnitFailed) firing: (4) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:39:28] (SystemdUnitFailed) firing: (5) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:44:29] (SystemdUnitFailed) firing: (5) debian-weekly-rebuild.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:29:28] (SystemdUnitFailed) firing: httpbb_hourly_appserver.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:04:46] I did that also: https://usercontent.irccloud-cdn.com/file/6YuSFeFx/1000010481.jpg [14:09:51] What even am I looking at :D [14:09:57] good stuff!! [14:10:55] if we have trouble in Amaterdam I guess you can just fly up in that thing and save the day [14:12:44] FYi jhathaway this is the task re puppetdb-api https://phabricator.wikimedia.org/T342458 [14:13:02] i had a very rough stab at things but this dosen't work https://gerrit.wikimedia.org/r/c/operations/puppet/+/940403/ [14:19:23] thanks [14:29:28] (SystemdUnitFailed) resolved: httpbb_hourly_appserver.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:29:29] (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:29:29] (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-int_hourly.service Failed on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:34:37] hi folks [20:34:51] anyone around that has some pointers on why the cookbook on a new host is stalling at [20:34:55] [24/50, retrying in 72.00s] Attempt to run 'cookbooks.sre.hosts.reimage.ReimageRunner._populate_puppetdb..poll_puppetdb' raised: Nagios_host resource with title cp3081 not found yet [20:36:14] I see an existing ticket that says that not reimaging to insetup first might be causing some issue here? [20:40:10] I have an insetup role that matches this, so that's not the issue [20:50:07] aah I see it https://puppetboard.wikimedia.org/report/cp3081.esams.wmnet/c08ace01a07c4925bbf28da8e05da23c170015cb [22:27:26] FYI this was fixed by updating $site in realm.pp for the esams/knams public and private subnets