[06:51:26] I'm back1 [06:51:27] ! [08:15:03] 10SRE-tools, 10Infrastructure-Foundations, 10Patch-For-Review, 10cloud-services-team (Kanban): WMCS Cookbook Automation Q2 tracking task - https://phabricator.wikimedia.org/T319401 (10dcaro) [08:25:14] 10SRE-tools, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL - https://phabricator.wikimedia.org/T325756 (10dcaro) [10:35:46] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper QFC5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801 (10cmooney) p:05Triage→03Low [10:54:13] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) p:05Triage→03High [10:55:21] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper QFX5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801 (10ayounsi) [11:11:24] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10cmooney) For the record I had a quick look at the codfw / ulsfo / eqsin / esams virtual-chassis port stats and none of them are showing historical CRC errors. [13:08:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) I had a closer look as there is support for this kind of graphing and alerting in LibreNMS since a while https://github.com/librenms/librenms/blame/258505ed4429050344... [15:37:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10Jclark-ctr) @ayounsi I do have spare optics for connection. 1/5/23 is a good day to perform this maintenance [16:02:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10cmooney) >>! In T325803#8486630, @ayounsi wrote: > For example `jnxVirtualChassisPortInCRCAlignErrors.5."vcp-255/1/3" = 42` while it has been cleared and should now be at 0.... [16:14:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: [eqiad] faulty VC optics - https://phabricator.wikimedia.org/T325803 (10ayounsi) > jnxVirtualChassisPortInCRCAlignErrors is a COUNTER64, so I'm not sure that clearing the device counters should reset what SNMP reports. If it did then LibreNMS woul... [16:25:43] 10SRE-tools, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, and 3 others: Write a cookbook to set a k8s cluster in maintenance mode - https://phabricator.wikimedia.org/T277677 (10elukey) Current status: * `sre.discovery.service-route` (used by `sre.k8s.pool-depool-cluster`) has been moved to the... [16:59:41] 10netops, 10Infrastructure-Foundations, 10SRE: Juniper QFX5120 error logs on lsw1-e1 and lsw1-f1: Failed to get ifl for ifl index - https://phabricator.wikimedia.org/T325801 (10cmooney) Juniper have come back to say the message is harmless and can be ignored. > It’s a Harmless error message. > > I gone thr...