[02:12:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:12:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:46:40] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9721680 (10MoritzMuehlenhoff) [09:08:53] 10Mail, 06Infrastructure-Foundations, 10MediaWiki-Email, 06SRE: Old "Email this user" email is repeatedly resent - https://phabricator.wikimedia.org/T361860#9721761 (10Xover) And now I just got a resend of a different email to a different user, originally sent on April 11. That’s something like two out of... [09:32:34] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#9721802 (10cmooney) >>! In T362522#9717511, @cmooney wrote: > FWIW I changed the key-exchange algo configured on mr1-eqsin to see if it would make any difference CPU is roughly the same pattern sinc... [10:04:37] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9721866 (10MoritzMuehlenhoff) [10:12:25] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:52:27] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9722192 (10MoritzMuehlenhoff) [11:56:01] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772 (10cmooney) 03NEW p:05Triage→03Medium [13:44:16] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9722479 (10MoritzMuehlenhoff) [14:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:38:48] (PuppetZeroResources) firing: Puppet has failed generate resources on install1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:43:48] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on install1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [14:44:20] 10SRE-tools, 06collaboration-services, 06Infrastructure-Foundations, 10Puppet-Core, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619#9722701 (10MoritzMuehlenhoff) [14:58:48] (PuppetZeroResources) firing: (2) Puppet has failed generate resources on install1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:23:48] (PuppetZeroResources) resolved: Puppet has failed generate resources on install1004:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources [15:24:00] ^ this was https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/9760b837cf977fafa7208d95baa6ce9035c1fdbd [15:27:09] 10Mail, 06Infrastructure-Foundations, 10MediaWiki-Email, 06SRE: Old "Email this user" email is repeatedly resent - https://phabricator.wikimedia.org/T361860#9722988 (10jhathaway) Given that this has reoccurred and from the emails you provided looks to be duplication on the application layer I think we need... [18:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:58:39] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772#9724034 (10cmooney) I believe the two patches above, once merged, will add the required redundancy. Following option 1 above, creatin... [19:02:46] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic, 13Patch-For-Review: ASW single-point of failure for LVS VIPs at POPs - https://phabricator.wikimedia.org/T362772#9724049 (10cmooney) Perhaps one option would be to ignore the puppet patch to change drmrs and esams for now - but merge the Homer one... [22:13:50] (SystemdUnitFailed) firing: debian-weekly-rebuild.service on build2001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed