[00:28:44] (SystemdUnitFailed) firing: (9) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:28:44] (SystemdUnitFailed) firing: (9) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:11:57] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper RA receive bug CVE-2023-28981 - https://phabricator.wikimedia.org/T334916 (10ayounsi) 05Open→03Resolved a:03ayounsi Deployed [08:18:44] (SystemdUnitFailed) firing: (13) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:18:49] (PuppetDisabled) firing: (2) Puppet disabled on puppetdb1002:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=puppet&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled [09:01:28] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper RA receive bug CVE-2023-28981 - https://phabricator.wikimedia.org/T334916 (10ayounsi) This might need to be rolled back the day we start doing BGP unnumbered between spine and leaf as it seems to rely on it: https://www.theasciiconstruct.com/post/junos-b... [09:43:27] 10SRE-tools, 10netbox, 10Infrastructure-Foundations, 10Puppet-Infrastructure, and 3 others: update systems to use new puppetdb instance - https://phabricator.wikimedia.org/T342214 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=73525cca-1535-4d44-89d8-fcd584ea67a9) set by jmm@cumin2002... [09:43:53] 10SRE-tools, 10netbox, 10Infrastructure-Foundations, 10Puppet-Infrastructure, and 3 others: update systems to use new puppetdb instance - https://phabricator.wikimedia.org/T342214 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=69921077-8a56-48de-9905-0d3d1b91d292) set by jmm@cumin2002... [12:18:44] (SystemdUnitFailed) firing: (9) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:58:46] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper RA receive bug CVE-2023-28981 - https://phabricator.wikimedia.org/T334916 (10cmooney) Hmm yeah good point. We can probably upgrade devices to a release with the fix in it before then. [13:28:44] (SystemdUnitFailed) firing: (10) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:33:44] (SystemdUnitFailed) firing: (10) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:38:44] (SystemdUnitFailed) firing: (11) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:53:26] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783 (10Jhancock.wm) @cmooney I haven't received it yet. I checked with the dock to make sure it hasn't arrived and we weren't notified but no luck. Is there a tracking number for the package? [14:03:47] (SystemdUnitFailed) firing: (10) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:38:46] (SystemdUnitFailed) firing: (10) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:30:26] 10netops, 10Infrastructure-Foundations: Move cr1-esams<->cr2-esams link to QSFP port - https://phabricator.wikimedia.org/T347323 (10ayounsi) [16:34:16] 10netops, 10Infrastructure-Foundations: Move cr1-esams<->cr2-esams link to QSFP port - https://phabricator.wikimedia.org/T347323 (10cmooney) There's no free QSFP port on cr1-esams, which was the reason we had to use the 3x10G. We probably need to channelize et-0/0/2 on cr2-esams and use breakout cables if we... [19:38:46] (SystemdUnitFailed) firing: (10) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:38:47] (SystemdUnitFailed) firing: (10) httpbb_hourly_appserver.service Failed on cumin1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed