[00:06:13] (DiskSpace) resolved: Disk space idp1002:9100:/ 5.865% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [00:08:48] (SystemdUnitFailed) firing: update-tails-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [00:20:14] (PuppetFailure) firing: Puppet has failed on cumin1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:08:00] (PuppetFailure) firing: (2) Puppet has failed on netflow1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [01:08:48] (SystemdUnitFailed) resolved: update-tails-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:20:14] (PuppetFailure) firing: Puppet has failed on cumin1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [05:08:15] (PuppetFailure) firing: (2) Puppet has failed on netflow1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [08:20:14] (PuppetFailure) firing: Puppet has failed on cumin1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:00:00] (PuppetFailure) resolved: Puppet has failed on cumin1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:08:15] (PuppetFailure) firing: (2) Puppet has failed on netflow1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [09:15:02] 10Puppet, 10SRE: Ensure filenames invalid in windows are not commited to operations/puppet - https://phabricator.wikimedia.org/T353487 (10Peachey88) [09:47:17] (SystemdUnitFailed) firing: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:17:17] (SystemdUnitFailed) resolved: netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:27:17] (SystemdUnitFailed) firing: (2) netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:28:48] (SystemdUnitFailed) resolved: (2) netbox_report_accounting_run.service Failed on netbox1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:50:22] 10SRE-tools, 10Infrastructure-Foundations: Decommission cookbook: lock per switch - https://phabricator.wikimedia.org/T353513 (10ayounsi) [11:03:00] (PuppetFailure) firing: (2) Puppet has failed on netflow1002:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [11:06:00] that one should recover soon -ish ^ [11:06:56] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [11:07:15] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [11:37:00] (PuppetFailure) resolved: Puppet has failed on netflow2003:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetFailure [13:06:14] 10netbox, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10cmooney) @ayounsi, @Volans I have uploaded the above patch to add the functionality as descr... [13:17:16] 10netbox, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team, 10Patch-For-Review: Netbox: Add support for our complex host network setups in provision script - https://phabricator.wikimedia.org/T346428 (10cmooney) p:05Triage→03Medium [14:51:00] 10SRE-tools, 10Infrastructure-Foundations: Decommission cookbook: lock per switch - https://phabricator.wikimedia.org/T353513 (10ABran-WMF) To add a bit of contextual informations: this was triggered during T353449 and T353448 which were run seconds apart [15:23:47] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 3 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10jijiki) Prep work for memcached hosts is in place; those hosts are using each host's puppet certs for TLS, and migrating to puppet7 needs a minor tweak due... [15:32:36] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10jijiki) [15:34:32] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 4 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10jijiki) [17:39:13] (DiskSpace) firing: Disk space idp1002:9100:/ 5.948% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [18:59:14] (DiskSpace) resolved: Disk space idp1002:9100:/ 5.829% free - https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&viewPanel=12&var-server=idp1002 - https://alerts.wikimedia.org/?q=alertname%3DDiskSpace [19:33:26] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:37:51] jhathaway: oh jeez lol [19:37:53] thanks for that patch [19:38:24] so funny indeed, we knew naming it aux was going to cause us grief at some point! [19:38:36] yeah I just could NOT have guessed it'd be that way [19:38:50] me neither [19:38:51] I think I remembered that fact about CON but not about AUX [19:39:49] the number of restrictions has a wonderful windows quality to it, http://msdn.microsoft.com/en-us/library/aa365247.aspx [19:50:46] honestly the best part of that is the offered example "NUL.tar.gz" [19:51:31] :) [21:32:08] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10SRE, 10Puppet (Puppet 7.0): Re-images sometimes fail as the cert request goes to the wrong puppet master - https://phabricator.wikimedia.org/T353558 (10jhathaway) [21:32:22] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10SRE, 10Puppet (Puppet 7.0): Re-images sometimes fail as the cert request goes to the wrong puppet master - https://phabricator.wikimedia.org/T353558 (10jhathaway) p:05Triage→03Medium [21:32:45] 10SRE-tools, 10Infrastructure-Foundations, 10Puppet-Infrastructure, 10SRE, 10Puppet (Puppet 7.0): Re-images sometimes fail as the cert request goes to the wrong puppet master - https://phabricator.wikimedia.org/T353558 (10jhathaway) a:05Volans→03None [23:33:26] (SystemdUnitFailed) firing: update-ubuntu-mirror.service Failed on mirror1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed