[02:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 5d 11h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [04:57:22] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqsin, 06SRE: EQSIN:New switch setup/configuration - https://phabricator.wikimedia.org/T418439#11864039 (10Papaul) [06:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 5d 7h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [06:07:24] 10netops, 06Infrastructure-Foundations: POPs - free up 2x100G ports - https://phabricator.wikimedia.org/T424611 (10ayounsi) 03NEW p:05Triage→03High [06:11:47] 10netops, 06Infrastructure-Foundations: POPs - free up 2x100G ports - https://phabricator.wikimedia.org/T424611#11864154 (10ayounsi) [06:36:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [06:41:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:08:15] 10netops, 06Infrastructure-Foundations, 10Observability-Metrics: gNMIc: investigate new "collector" command - https://phabricator.wikimedia.org/T416360#11864411 (10ayounsi) All gnmic instances have been upgraded to 0.45.0 [10:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 5d 3h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [10:13:10] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: expand support to Nokia switches - https://phabricator.wikimedia.org/T424639 (10cmooney) 03NEW p:05Triage→03Medium [10:34:27] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640 (10cmooney) 03NEW p:05Triage→03Medium [10:34:33] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640#11865094 (10cmooney) [10:34:35] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: expand support to Nokia switches - https://phabricator.wikimedia.org/T424639#11865095 (10cmooney) [10:45:42] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640#11865146 (10cmooney) [10:48:13] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640#11865157 (10cmooney) [10:48:59] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: use the 'CS1' DSCP code point for low-priority instead of AF41 - https://phabricator.wikimedia.org/T424640#11865159 (10cmooney) [10:53:54] 10netops, 06Infrastructure-Foundations, 10Observability-Metrics, 13Patch-For-Review: gNMIc: investigate new "collector" command - https://phabricator.wikimedia.org/T416360#11865201 (10ayounsi) Manually tested on netflow4003 and works well. Two differences: The metrics/graph `sum by (source) (rate(gnmic_su... [13:43:09] XioNoX: topranks: hi kind netops-ians. at some point, jaime and I will reach about the offline backups planning and questions related to setting up a dedicated interconnect. just a heads-up! [13:43:35] forwarding a document related to that so it's not a surprise. [13:52:12] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:frack:rack/install/configuration new switches in rack F5 - https://phabricator.wikimedia.org/T405618#11866061 (10Papaul) [14:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 4d 23h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [14:50:04] sukhe: thx [15:25:06] 10netops, 06Infrastructure-Foundations: POPs - free up 2x100G ports - https://phabricator.wikimedia.org/T424611#11866503 (10cmooney) My basic thoughts on this are: * We create a new vlan on each top-of-rack switch at the POPs for the "core router transport" ** suggest cr-ibgp- for it * We allocate... [15:26:29] 10netops, 06Infrastructure-Foundations: POPs - free up 2xQSFP ports - https://phabricator.wikimedia.org/T424611#11866517 (10cmooney) [15:50:43] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: expand support to Nokia switches - https://phabricator.wikimedia.org/T424639#11866670 (10cmooney) [15:55:18] 10netops, 06Infrastructure-Foundations, 06SRE: Network QoS: expand support to Nokia switches - https://phabricator.wikimedia.org/T424639#11866721 (10cmooney) [16:29:02] 10Mail, 06collaboration-services, 06Infrastructure-Foundations, 10Phabricator, 10VPS-project-Phabricator: @wikimedia.org email addresses don't seem to be receiving emails sent by the test Phabricator instance - https://phabricator.wikimedia.org/T422559#11866948 (10A_smart_kitten) From [[https://we.phorge... [16:37:32] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO:Switch refresh diagram - https://phabricator.wikimedia.org/T408511#11867023 (10Papaul) [16:55:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-ulsfo, 06SRE: ULSFO: New switch configuration - https://phabricator.wikimedia.org/T408892#11867079 (10ssingh) >>! In T408892#11749076, @ayounsi wrote: > As a side note we will need to manually change the IPs of the routed ganeti nodes in rack 23 to... [17:09:39] 10netops, 06Infrastructure-Foundations, 06SRE: Network telemetry - collect device sub-interface statistics with gnmic - https://phabricator.wikimedia.org/T424683 (10cmooney) 03NEW p:05Triage→03Medium [17:12:02] 10netops, 06Infrastructure-Foundations, 06SRE: Network telemetry - collect device sub-interface statistics with gnmic - https://phabricator.wikimedia.org/T424683#11867159 (10cmooney) [18:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 4d 19h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [18:19:43] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: Lumen 10G transport 442550293 disconnection - https://phabricator.wikimedia.org/T424758 (10RobH) 03NEW [18:26:47] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 3 others: Lumen 10G transport 442550293 disconnection - https://phabricator.wikimedia.org/T424758#11868220 (10RobH) @Papaul: Please advise what the exact patch panel port https://netbox.wikimedia.org/circuits/circuits/103/ lands on before I... [22:05:20] FIRING: [3x] PKICertificateExpiry: Intermediate certificate in the trust chain for discovery expires in 4d 15h 49m 25s - https://wikitech.wikimedia.org/wiki/PKI/CA_Operations - TODO - https://alerts.wikimedia.org/?q=alertname%3DPKICertificateExpiry [22:49:10] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Network telemetry - collect device sub-interface statistics with gnmic - https://phabricator.wikimedia.org/T424683#11869006 (10cmooney) I had a stab at this in the above patch. Some notes on the event processors added: |Name|Event Process...