[01:51:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10Papaul) I asked @Jclark-ctr to run the 40G fiber for row C and row D and he said he will get it done sometimes next week. Once the fiber in place I will update... [08:18:38] (LVSHighCPU) firing: (8) The host lvs5002:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5002 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [08:23:38] (LVSHighCPU) resolved: (8) The host lvs5002:9100 has at least its CPU 0 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs5002 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [08:25:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Netbox: use FHRP Groups feature - https://phabricator.wikimedia.org/T311218 (10ayounsi) One off script for that, tested on netbox-next: https://netbox-next.wikimedia.org/ipam/fhrp-groups/ `lang=python,name=Move VRRP IPs to FHRP... [10:04:51] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10cmooney) Added above patch to delegate this range to the WMCS name servers. I hadn't checked the naming convention previously, I do actually... [10:35:59] 10Traffic, 10Phabricator: Phabricator was logging out users repeteadly - https://phabricator.wikimedia.org/T316337 (10jcrespo) 05Invalid→03Open [10:44:49] 10Traffic: strip non session cookies before cache lookup in ATS - https://phabricator.wikimedia.org/T316338 (10Vgutierrez) [10:46:24] 10Traffic, 10Phabricator, 10SRE, 10Wikimedia-Incident: Phabricator was logging out users repeteadly - https://phabricator.wikimedia.org/T316337 (10jcrespo) [10:47:17] 10Traffic, 10SRE, 10Patch-For-Review: strip non session cookies before cache lookup in ATS - https://phabricator.wikimedia.org/T316338 (10Vgutierrez) [10:47:23] 10Traffic, 10Phabricator, 10SRE, 10Wikimedia-Incident: Phabricator was logging out users repeteadly - https://phabricator.wikimedia.org/T316337 (10Vgutierrez) [10:47:56] 10Traffic, 10SRE, 10Patch-For-Review: strip non session cookies before cache lookup in ATS - https://phabricator.wikimedia.org/T316338 (10Vgutierrez) An initial test of https://gerrit.wikimedia.org/r/c/operations/puppet/+/826785/6/modules/profile/files/trafficserver/default.lua (PS6) in cp6016 triggered T316337 [10:51:12] 10Traffic, 10SRE, 10Patch-For-Review: strip non session cookies before cache lookup in ATS - https://phabricator.wikimedia.org/T316338 (10Vgutierrez) 05Open→03In progress p:05Triage→03Medium [10:59:44] 10Traffic, 10Phabricator, 10SRE, 10Wikimedia-Incident: Phabricator was logging out users repeteadly - https://phabricator.wikimedia.org/T316337 (10jcrespo) Preliminary working doc: https://docs.google.com/document/d/1Ka9MQB8OwdzAzJVfZuaIGo5VfnyRNRr_WxLPZ6YFMkE [11:16:36] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10cmooney) Also just a note on the setup of the WMCS DNS in general. It seems BIND won't resolve any of these names because the CNAMEs on the... [11:55:56] (HAProxyEdgeTrafficDrop) firing: 59% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [12:00:56] (HAProxyEdgeTrafficDrop) resolved: 58% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [12:19:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team (Kanban): Join ARIN waiting list to request additional IPv4 resources. - https://phabricator.wikimedia.org/T288342 (10cmooney) a:03cmooney [12:20:42] 10netops, 10Infrastructure-Foundations, 10SRE: Return AS43821 to RIPE - https://phabricator.wikimedia.org/T314471 (10cmooney) 05In progress→03Resolved This has been completed and records cleared up. [12:22:03] 10netops, 10Infrastructure-Foundations, 10SRE: Complete testing of SONiC NOS / Dell network gear and write up - https://phabricator.wikimedia.org/T310901 (10cmooney) 05Open→03Resolved I'm going to close this task for now. If, as seems likely, we wish to deploy Dell as an alternate vendor in production w... [12:31:24] 10Traffic, 10Phabricator, 10SRE, 10Wikimedia-Incident: Phabricator was logging out users repeatedly (2022-08-26) - https://phabricator.wikimedia.org/T316337 (10Aklapper) [13:14:48] 10netops, 10Infrastructure-Foundations, 10SRE: Standardize VRRP group IDs - https://phabricator.wikimedia.org/T260363 (10ayounsi) [13:14:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox, 10Patch-For-Review: Netbox: use FHRP Groups feature - https://phabricator.wikimedia.org/T311218 (10ayounsi) [13:16:05] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10Andrew) >>! In T315955#8188444, @cmooney wrote: > Also just a note on the setup of the WMCS DNS in general. > > It seems BIND won't resolve... [13:20:30] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade core routers to Junos 20+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) [14:10:35] 10netops, 10Infrastructure-Foundations, 10SRE: Upgrade core routers to Junos 20+ - https://phabricator.wikimedia.org/T295690 (10Volans) [15:03:17] 10netops, 10Infrastructure-Foundations, 10SRE: Create Quality of Service design for WMF internal networks - https://phabricator.wikimedia.org/T316358 (10cmooney) p:05Triage→03Medium