[00:03:13] (DiskSpace) resolved: Disk space puppetmaster1001:9100:/ 4.762% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [02:23:43] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [05:48:42] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:48:42] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:09:29] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: Move asw2-c8-eqiad to spares - https://phabricator.wikimedia.org/T349798 (10ayounsi) 05Open→03Stalled [08:09:48] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: Move asw2-c8-eqiad to spares - https://phabricator.wikimedia.org/T349798 (10ayounsi) [08:13:58] do we have any insight on why cumin1001 crashed? [08:16:39] volans: no [08:17:46] it has also passed our usual EOL AFAICS [08:18:32] I don't recall if we have a replacement planned [08:24:10] no mention of cumin in the FY23-24 Budget: CapEx spreadsheet [08:24:32] weird... it should have been part of the list of 5y old ones no? [08:24:48] ah no... [08:25:04] I think we decided to go VM in the end [08:25:10] as we have the other one physical [08:25:12] and try [08:25:15] this way [08:25:25] cool! [08:25:42] my bad I had totally forgot, let me try to find the papertrail [08:26:44] volans: https://phabricator.wikimedia.org/T334091#8819631 [08:29:54] yep that one [08:29:55] thx [08:49:31] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:19:44] 10netops, 10DBA, 10Infrastructure-Foundations, 10SRE, and 2 others: librenms.syslog table size - https://phabricator.wikimedia.org/T349362 (10Marostegui) 05Open→03Resolved a:03Marostegui Table truncated: `root@db1164:/srv/sqldata/librenms# ls -lh syslog.ibd -rw-rw---- 1 mysql mysql 9.0M Oct 26 09:18... [09:47:56] 10netops, 10Data-Platform-SRE, 10Infrastructure-Foundations, 10Product-Analytics, and 2 others: Maybe restrict domains accessible by webproxy - https://phabricator.wikimedia.org/T300977 (10ayounsi) [09:49:31] (SystemdUnitFailed) firing: (2) docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:45:04] (NodeTextfileStale) firing: Stale textfile for cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [10:47:00] (NodeTextfileStale) firing: Stale textfile for puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [11:07:59] (PuppetFailure) firing: Puppet has failed on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:32:59] (PuppetFailure) resolved: Puppet has failed on puppetserver1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:47:00] (NodeTextfileStale) resolved: Stale textfile for puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [13:24:59] (PuppetZeroResources) firing: Puppet has failed generate resources on netmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:34:59] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on netmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:39:59] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on netmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:44:59] (PuppetZeroResources) resolved: (2) Puppet has failed generate resources on netmon1003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [13:53:42] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:34:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: Connect two hosts in codfw row A/B for switch migration testing - https://phabricator.wikimedia.org/T345803 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest2003.codfw.wmnet with OS bullseye [14:45:17] (NodeTextfileStale) firing: Stale textfile for cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [17:53:42] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:45:17] (NodeTextfileStale) firing: Stale textfile for cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale [21:53:42] (SystemdUnitFailed) firing: docker-reporter-k8s-images.service Failed on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:24:13] (DiskSpace) firing: Disk space puppetmaster1001:9100:/ 5.944% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=puppetmaster1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [22:45:17] (NodeTextfileStale) firing: Stale textfile for cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale