[00:50:45] (VarnishChildRestarted) firing: varnish-upload restarted on cp4052 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp4052&datasource=ulsfo%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [04:50:45] (VarnishChildRestarted) firing: varnish-upload restarted on cp4052 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp4052&datasource=ulsfo%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [07:43:04] 10Traffic, 10SRE: oom killed varnish on cp4052 - https://phabricator.wikimedia.org/T325797 (10Vgutierrez) [07:45:45] (VarnishChildRestarted) resolved: varnish-upload restarted on cp4052 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp4052&datasource=ulsfo%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [10:35:48] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper QFC5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801 (10cmooney) p:05Triage→03Low [10:54:14] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) p:05Triage→03High [10:55:23] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper QFX5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801 (10ayounsi) [11:11:26] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10cmooney) For the record I had a quick look at the codfw / ulsfo / eqsin / esams virtual-chassis port stats and none of them are showing historical CRC errors. [13:08:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) I had a closer look as there is support for this kind of graphing and alerting in LibreNMS since a while https://github.com/librenms/librenms/blame/258505ed4429050344... [15:37:30] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10Jclark-ctr) @ayounsi I do have spare optics for connection. 1/5/23 is a good day to perform this maintenance [16:02:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10cmooney) >>! In T325803#8486630, @ayounsi wrote: > For example `jnxVirtualChassisPortInCRCAlignErrors.5."vcp-255/1/3" = 42` while it has been cleared and should now be at 0.... [16:14:30] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) > jnxVirtualChassisPortInCRCAlignErrors is a COUNTER64, so I'm not sure that clearing the device counters should reset what SNMP reports. If it did then LibreNMS woul... [16:59:43] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper QFX5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801 (10cmooney) Juniper have come back to say the message is harmless and can be ignored. > It’s a Harmless error message. > > I gone thr...