[02:08:42] 10netbox, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:cr* router power not balance on all 4 PEM's - https://phabricator.wikimedia.org/T401937#11160972 (10Papaul) @ayounsi @cmooney can you do the test Juniper asked us to do tomorrow Sept. 9th after the meeting link around 11:15am CT?... [02:10:26] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11160973 (10Papaul) Tested all the cross cage links (7) only 2 links are not coming up. I will do more testing tomorrow. [03:06:40] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [03:11:05] FIRING: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [04:11:05] RESOLVED: NetboxAccounting: Netbox - Accounting job failed - https://wikitech.wikimedia.org/wiki/Netbox#Report_Alert - https://netbox.wikimedia.org/core/jobs/ - https://alerts.wikimedia.org/?q=alertname%3DNetboxAccounting [05:06:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [08:48:33] 10netops, 10SRE-tools, 06Infrastructure-Foundations: Evaluate automatic MAC-based DHCP for production servers - https://phabricator.wikimedia.org/T396712#11161490 (10ayounsi) 05Open→03Resolved a:03ayounsi Evaluation is done and @jhathaway has rolled out UUID + MAC fallback DHCP (with the `--no82` c... [08:56:25] RESOLVED: [2x] MirrorHighLag: Mirrors - /srv/mirrors/debian synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [08:56:35] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161546 (10ayounsi) [09:50:41] 10netops, 06Infrastructure-Foundations, 06SRE: Ganeti network config results in additional auto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#11161728 (10cmooney) 05Open→03Declined Gonna close this one. I suspect we may be hitting an occasional issue due to this, which is being paper... [09:56:31] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161751 (10ayounsi) a:03Papaul @Papaul would you be ok to take care of that ? [10:00:14] 10netops, 06Infrastructure-Foundations, 06SRE: gNMIc connection not working for cloudsw2-d5-eqiad - https://phabricator.wikimedia.org/T387018#11161778 (10ayounsi) 05Open→03Resolved a:03ayounsi cloudsw2-d5-eqiad is now gone. [10:01:41] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161786 (10ayounsi) [10:05:06] 10netops, 06Infrastructure-Foundations: Replace Rancid with Oxidized - https://phabricator.wikimedia.org/T361252#11161794 (10ayounsi) 05Open→03Declined Well, we managed to get Rancid to work with Nokia so that's not really needed. [10:09:06] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161806 (10ayounsi) @Vgutierrez @ssingh could that be a good opportunity to see how drmrs handles the loss of a switch/rack ? With the site depooled, and while one ToR switch is upgrading, maybe w... [10:12:29] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#11161823 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing that never-ending tracking task to focus on more specific sub-tasks now that all the ground work is done. [10:12:59] 10netops, 06Infrastructure-Foundations, 06SRE: Homer: redefine IBGP definitions to support both Unicast & EVPN clusters - https://phabricator.wikimedia.org/T394530#11161826 (10cmooney) 05Open→03Resolved Closing this one, current status is both the Juniper & Nokia device definitions are the same, and... [10:14:08] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161830 (10cmooney) Thanks @ayounsi. @papaul specifically the request relates to drmrs. Cloud services may need more planning with the WMCS team on scheduling so you can leave that to me for now,... [10:17:39] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#11161837 (10ayounsi) 05Open→03Resolved a:03ayounsi All good! [10:24:41] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161877 (10Vgutierrez) >>! In T390813#11161805, @ayounsi wrote: > @Vgutierrez @ssingh could that be a good opportunity to see how drmrs handles the loss of a switch/rack ? > > With the site depool... [10:31:08] 10netops, 06Infrastructure-Foundations, 06SRE: Allow read-only users to view logs on Juniper devices - https://phabricator.wikimedia.org/T401378#11161907 (10cmooney) 05Open→03Resolved [10:32:31] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 06SRE: ssw1-f1-eqiad: Fan Spinning Upgraded - https://phabricator.wikimedia.org/T400783#11161919 (10cmooney) >>! In T400783#11107455, @Jclark-ctr wrote: > @cmooney @ayounsi It looks like there’s nothing I or Juniper can do unless the OS is updated. A reboo... [10:38:44] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#11161969 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing that parent task to focus on the r... [10:47:38] 10netops, 06Infrastructure-Foundations, 06SRE, 06Traffic: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#11162004 (10ayounsi) 05Open→03Resolved a:03ayounsi All the tooling, metrics and examples are there for the service owners to setup their alert... [10:50:31] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BFD on 'core' EBGP peerings from L3 switches to CRs - https://phabricator.wikimedia.org/T374452#11162024 (10cmooney) 05Open→03Declined Not gonna implement this one for now, we can revisit if needed. [14:55:25] FIRING: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [16:23:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech exp[ansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163829 (10cmooney) [16:24:19] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech exp[ansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163842 (10cmooney) [16:24:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech exp[ansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163851 (10cmooney) a:05Jclark-ctr→03None [16:25:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech expansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163855 (10cmooney) [16:28:16] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech expansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163873 (10cmooney) [16:40:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech expansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163919 (10cmooney) [16:40:59] 10netops, 06Infrastructure-Foundations, 06Traffic: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11163921 (10ssingh) Hi Netops folks. Thanks for suggesting the idea of testing `drmrs`. Since this requires some changes on our end as well (adjusting the depool policy //somehow//) an... [16:54:03] 10Packaging, 10bacula, 10Data-Persistence-Backup, 10Infrastructure Security, 06Infrastructure-Foundations: Trixie bacula-fd package incompatible with our bacula installation - https://phabricator.wikimedia.org/T404114 (10jcrespo) 03NEW [16:55:00] 10Packaging, 10bacula, 10Data-Persistence-Backup, 10Infrastructure Security, 06Infrastructure-Foundations: Trixie bacula-fd package incompatible with our bacula installation - https://phabricator.wikimedia.org/T404114#11164009 (10jcrespo) p:05Triage→03High High because it blocks many Debian upgrades. [16:55:26] 10netops, 06Infrastructure-Foundations, 06SRE: codfw expansion: configure new Nokia switches in rows E/F - https://phabricator.wikimedia.org/T402590#11164013 (10cmooney) [17:34:26] 10netbox, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw:cr* router power not balance on all 4 PEM's - https://phabricator.wikimedia.org/T401937#11164212 (10cmooney) Papaul did some testing today shuffling things around. **Test 1: Remove PEM 0 from router** We did this, after a few... [17:45:25] RESOLVED: MirrorHighLag: Mirrors - /srv/mirrors/ubuntu synchronization lag - https://wikitech.wikimedia.org/wiki/Mirrors - https://grafana.wikimedia.org/d/dbd8a904-eab2-48d1-a3b9-fa1851ef3ed2/mirrors?orgId=1 - https://alerts.wikimedia.org/?q=alertname%3DMirrorHighLag [20:25:33] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: Server provision script updates for Nokia switch support - https://phabricator.wikimedia.org/T404146 (10cmooney) 03NEW p:05Triage→03Medium [20:25:36] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: Server provision script updates for Nokia switch support - https://phabricator.wikimedia.org/T404146#11165106 (10cmooney) [20:27:51] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: Updates for Nokia switch support - https://phabricator.wikimedia.org/T404146#11165109 (10cmooney)