[00:05:13] (DiskSpace) resolved: Disk space puppetmaster1001:9100:/ 4.838% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [02:08:38] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:26:05] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10Papaul) @cmooney @Jhancock.wm checked the server, no IP address set on it and she did reset it but it didn't resolve the issue. I asked... [06:08:38] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [07:28:36] jbond: hello, quick question about the netbox/hiera sync script [07:29:19] does it sanitize the description field? Or is it needed? https://www.irccloud.com/pastebin/g0JDCmDx/ [07:31:29] 10netops, 10Infrastructure-Foundations, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10Marostegui) [07:31:36] 10netops, 10Infrastructure-Foundations, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10Marostegui) p:05Triage→03High [07:33:20] 10netops, 10Infrastructure-Foundations, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10jcrespo) @andrea.denisse This is something I mentioned to you some time ago and promised to raise it with your team. [08:03:09] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10ayounsi) Haha yeah indeed! In theory we should only keep 90 days of logs : https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modu... [08:06:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10Marostegui) This is the oldest row: ` root@db1164.eqiad.wmnet[librenms]> select timestamp from syslog order by timestamp asc limit 1; +---------------------+ | timestamp... [08:12:45] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: Change cloud-instance-transport vlan subnets from /30 to /29 - https://phabricator.wikimedia.org/T348140 (10dcaro) 05Open→03In progress [08:17:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10jcrespo) Would it be possible to have it on filesystem/kibana only? I don't mind backing it up for persistence, but on db there is extra cost that wouldn't be on filesys... [08:48:59] (PuppetZeroResources) firing: Puppet has failed generate resources on puppetdb1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [09:25:27] XioNoX: we don't do anything with the description field just copy it, afaik its not used by anything [09:28:59] (PuppetZeroResources) resolved: Puppet has failed generate resources on puppetdb1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [09:34:56] jbond: for my change to pcc which targets `master` should I abandon it and do it on `2.x` then assumes you will later merge `2.x` into master? [09:35:04] or do you want two independent changes? [09:35:29] I am not sure which branching model you wanna follow :] [09:48:03] hashar: if you just abandon and re target 2.x i will take care of merging to master [09:48:26] excellent :) [09:48:36] I will do that and add some fixes for mypy [09:53:26] thanks [09:54:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10observability: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10ayounsi) We already have it in Kibana, but the LibreNMS UI is quite convenient and we send more verbose logs for alerting there. The solution is probably to reduce the r... [10:08:43] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:03:38] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:03:39] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:07:31] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10dcaro) This unblocked this issue and made tox pass: https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/967244... [14:37:25] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10fnegri) Unfortunately, I think this specific bug still exists, because there's no Python 3.11 wheel in PyPI: https:... [14:57:33] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack, 10Patch-For-Review: spicerack: tox fails to install PyYAML using python 3.11 on bookworm - https://phabricator.wikimedia.org/T345337 (10dcaro) yep, I tried a few combinations of elasticsearch-curator/pyyaml and such... it turns out that elasticseach-c... [15:09:17] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Automate L3 Switch to Core Router BGP peerings (and remove OSPF on drmrs switches) - https://phabricator.wikimedia.org/T349125 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=b09e42f6-6ad2-4453-abab-27f0a3934508) set by... [15:26:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Automate L3 Switch to Core Router BGP peerings (and remove OSPF on drmrs switches) - https://phabricator.wikimedia.org/T349125 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=6731cf5b-8a4f-4391-98fa-2900d5500bf5) set by... [16:54:15] 10netops, 10Data-Engineering, 10Infrastructure-Foundations, 10SRE, and 2 others: [Maintenance] Netflow/pmacct: use forwardingStatus - https://phabricator.wikimedia.org/T331707 (10Ahoelzl) [18:04:27] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:04:28] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:03:13] (DiskSpace) firing: Disk space puppetmaster1001:9100:/ 5.94% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace