[02:09:13] (DiskSpace) firing: Disk space idp1002:9100:/ 3.171% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [06:09:13] (DiskSpace) firing: Disk space idp1002:9100:/ 0% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [06:42:07] looking at idp1002 [06:49:13] (DiskSpace) resolved: Disk space idp1002:9100:/ 5.138% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [07:10:40] 10SRE-tools, 10Infrastructure-Foundations, 10Spicerack: Merge reimaging cookbooks - https://phabricator.wikimedia.org/T336491 (10SLyngshede-WMF) 05Open→03Resolved [07:58:26] I compressed some older CAS logs, but the TLDR is that with Gitlab/OIDC using SSO we end up with a lot more logs in our current log level and (lack of) rotation (cas.log for a single day is currently in the 300ish M space) [07:59:05] I'll open a more structured task later to sort out how we best rotate/handle/archive/compress these going forward [07:59:42] log handling is a little different since log4j is generally designed for folks which don't use a Linux distro with logrotate [08:04:42] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:34:42] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:36:34] 10netbox, 10Infrastructure-Foundations: Netbox: get rid of WMF Production Patches - https://phabricator.wikimedia.org/T310717 (10SLyngshede-WMF) [08:36:36] 10CAS-SSO, 10netbox, 10Infrastructure-Foundations: Move Netbox authentication to python-social-auth - https://phabricator.wikimedia.org/T308002 (10SLyngshede-WMF) 05Open→03In progress Installed the now release social-core on netbox-dev2002 ` $ source /srv/deployment/netbox/venv/bin/activate $ pip insta... [08:40:23] I am just going to quickly restart netbox-next to do a test [08:49:40] 10CAS-SSO, 10netbox, 10Infrastructure-Foundations: Move Netbox authentication to python-social-auth - https://phabricator.wikimedia.org/T308002 (10SLyngshede-WMF) CAS is presented correctly, but needs correct credentials: {F37085759} ` REMOTE_AUTH_BACKEND = 'social_core.backends.cas.CASOpenIdConnectAuth' S... [12:00:43] 10CAS-SSO, 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Move Netbox authentication to python-social-auth - https://phabricator.wikimedia.org/T308002 (10SLyngshede-WMF) Configuration is going to become something like: ` REMOTE_AUTH_BACKEND = 'social_core.backends.cas.CASOpenIdConnectAuth' R... [12:34:42] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:04:42] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:29:42] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:04:42] moritzm, jbond: I made the new spicerack release, if needed I can deploy it to the cumin hosts right now (but I can't test all the changes righ now) or later [15:04:47] however you prefer/need [15:11:45] volans: i can wait till tomorrow to merge my changes [15:12:13] ack [15:16:01] the ganeti changes aren't time critical at all, whatever works best for you [15:29:42] (SystemdUnitFailed) firing: (2) update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:29:42] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [20:29:42] (SystemdUnitFailed) resolved: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:02:13] (DiskSpace) firing: Disk space krb1001:9100:/ 5.973% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=krb1001 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace