[00:38:59] 06Traffic, 10DNS, 06SRE: Set mediawiki.gr, wikipedia.pt, and wiktionary.org.uk NS records to WMF - https://phabricator.wikimedia.org/T401438#11160824 (10Alchimista) I'm sorry @BCornwall, but as Waldir mentioned, when responding to your email, it didn't include your email, sorry for that. In 2022 we were aske... [02:10:26] 10netops, 10fundraising-tech-ops, 06Infrastructure-Foundations: Move pfw1b-codfw to rack F5 - https://phabricator.wikimedia.org/T401297#11160973 (10Papaul) Tested all the cross cage links (7) only 2 links are not coming up. I will do more testing tomorrow. [07:38:51] FIRING: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [07:43:51] RESOLVED: FermMSS: Unexpected MSS value on 10.2.1.27:80 @ ms-fe2015 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=4&var-site=codfw&var-cluster=swift - https://alerts.wikimedia.org/?q=alertname%3DFermMSS [08:48:33] 10netops, 06Infrastructure-Foundations, 10SRE-tools: Evaluate automatic MAC-based DHCP for production servers - https://phabricator.wikimedia.org/T396712#11161490 (10ayounsi) 05Open→03Resolved a:03ayounsi Evaluation is done and @jhathaway has rolled out UUID + MAC fallback DHCP (with the `--no82` c... [08:56:35] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161546 (10ayounsi) [09:50:41] 10netops, 06Infrastructure-Foundations, 06SRE: Ganeti network config results in additional auto-conf IPv6 address - https://phabricator.wikimedia.org/T378335#11161728 (10cmooney) 05Open→03Declined Gonna close this one. I suspect we may be hitting an occasional issue due to this, which is being paper... [09:56:31] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161751 (10ayounsi) a:03Papaul @Papaul would you be ok to take care of that ? [10:00:14] 10netops, 06Infrastructure-Foundations, 06SRE: gNMIc connection not working for cloudsw2-d5-eqiad - https://phabricator.wikimedia.org/T387018#11161778 (10ayounsi) 05Open→03Resolved a:03ayounsi cloudsw2-d5-eqiad is now gone. [10:01:41] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161786 (10ayounsi) [10:05:06] 10netops, 06Infrastructure-Foundations: Replace Rancid with Oxidized - https://phabricator.wikimedia.org/T361252#11161794 (10ayounsi) 05Open→03Declined Well, we managed to get Rancid to work with Nokia so that's not really needed. [10:09:06] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161806 (10ayounsi) @Vgutierrez @ssingh could that be a good opportunity to see how drmrs handles the loss of a switch/rack ? With the site depooled, and while one ToR switch is upgrading, maybe w... [10:12:29] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#11161823 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing that never-ending tracking task to focus on more specific sub-tasks now that all the ground work is done. [10:12:59] 10netops, 06Infrastructure-Foundations, 06SRE: Homer: redefine IBGP definitions to support both Unicast & EVPN clusters - https://phabricator.wikimedia.org/T394530#11161826 (10cmooney) 05Open→03Resolved Closing this one, current status is both the Juniper & Nokia device definitions are the same, and... [10:14:08] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161830 (10cmooney) Thanks @ayounsi. @papaul specifically the request relates to drmrs. Cloud services may need more planning with the WMCS team on scheduling so you can leave that to me for now,... [10:17:39] 10netops, 06Infrastructure-Foundations: mr1-eqsin performance issue - https://phabricator.wikimedia.org/T362522#11161837 (10ayounsi) 05Open→03Resolved a:03ayounsi All good! [10:24:41] 10netops, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11161877 (10Vgutierrez) >>! In T390813#11161805, @ayounsi wrote: > @Vgutierrez @ssingh could that be a good opportunity to see how drmrs handles the loss of a switch/rack ? > > With the site depool... [10:31:08] 10netops, 06Infrastructure-Foundations, 06SRE: Allow read-only users to view logs on Juniper devices - https://phabricator.wikimedia.org/T401378#11161907 (10cmooney) 05Open→03Resolved [10:32:31] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 06SRE: ssw1-f1-eqiad: Fan Spinning Upgraded - https://phabricator.wikimedia.org/T400783#11161919 (10cmooney) >>! In T400783#11107455, @Jclark-ctr wrote: > @cmooney @ayounsi It looks like there’s nothing I or Juniper can do unless the OS is updated. A reboo... [10:38:44] 10netops, 06Infrastructure-Foundations, 10observability, 10Prod-Kubernetes, and 3 others: Prevent BGP alerts triggering when K8s host maintenance is being done - https://phabricator.wikimedia.org/T384731#11161969 (10ayounsi) 05Open→03Resolved a:03ayounsi Closing that parent task to focus on the r... [10:43:16] 06Traffic: Prepare/deploy new IPs for codfw cp nodes - https://phabricator.wikimedia.org/T377534#11161986 (10ayounsi) [10:47:38] 10netops, 06Traffic, 06Infrastructure-Foundations, 06SRE: Alert when anycast-healthchecker withdraws BGP route - https://phabricator.wikimedia.org/T374619#11162004 (10ayounsi) 05Open→03Resolved a:03ayounsi All the tooling, metrics and examples are there for the service owners to setup their alert... [10:50:31] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BFD on 'core' EBGP peerings from L3 switches to CRs - https://phabricator.wikimedia.org/T374452#11162024 (10cmooney) 05Open→03Declined Not gonna implement this one for now, we can revisit if needed. [12:59:38] 06Traffic, 06SRE: Setting up Wikimedia Trust and Safety Help Center with Zendesk product: Seeking Guidance on host mapping - https://phabricator.wikimedia.org/T400952#11162569 (10JAbrams) @ssingh, thanks for your reply and for explaining. I understand now about the canonical domain and why we can’t redirect w... [13:15:08] 06Traffic, 10DNS, 06SRE: Migrate PDNS recursor config to use /etc/powerdns/recursor.d ? - https://phabricator.wikimedia.org/T389333#11162671 (10ssingh) >>! In T389333#11159569, @CDobbins wrote: > @ssingh: that's right. I thought about this a bit over the weekend, and I think the easiest approach is going to... [15:33:36] 06Traffic, 06SRE: apt-staging: add headers to prevent CDN caching - https://phabricator.wikimedia.org/T402284#11163424 (10fnegri) 05Open→03Resolved a:03Dzahn Nice, thank you! I will optimistically mark as Resolved. [15:56:23] 06Traffic, 06Data-Engineering: Reduce noise from duplicate sequence-gap alerts on HaProxy-webrequests - https://phabricator.wikimedia.org/T401383#11163549 (10Fabfur) I'm also taking care of this with some experiments to check when actually HAProxy (or HaproxyKafka) skips these messages [16:23:41] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech exp[ansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163829 (10cmooney) [16:24:19] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech exp[ansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163842 (10cmooney) [16:24:35] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech exp[ansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163851 (10cmooney) a:05Jclark-ctr→03None [16:25:01] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech expansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163855 (10cmooney) [16:28:17] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech expansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163873 (10cmooney) [16:40:09] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-eqiad, 06SRE: Eqiad: new structured cabling required for fr-tech expansion and row a/b switch refresh - https://phabricator.wikimedia.org/T402432#11163919 (10cmooney) [16:40:59] 10netops, 06Traffic, 06Infrastructure-Foundations: Upgrade End Of Support Junos - https://phabricator.wikimedia.org/T390813#11163921 (10ssingh) Hi Netops folks. Thanks for suggesting the idea of testing `drmrs`. Since this requires some changes on our end as well (adjusting the depool policy //somehow//) an... [16:55:26] 10netops, 06Infrastructure-Foundations, 06SRE: codfw expansion: configure new Nokia switches in rows E/F - https://phabricator.wikimedia.org/T402590#11164013 (10cmooney) [18:27:42] \o got some patches up for LVS teardown of `wdqs` (the pybal pool still exists but no traffic is being routed to it): https://gerrit.wikimedia.org/r/c/operations/dns/+/1182976 && https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182978 [18:28:18] should I tag someone specific on the patches? maybe brett? [19:06:36] ryankemper: thanks for checking. brett is out till Thursday [19:06:40] when are you planning on doing this? [19:06:52] if it is ASAP, please add [19:07:00] er, enter too soon [19:07:15] please add me and vgutierre[z] to it and once of us can take it tomorrow [19:07:31] if it can wait, then yes, Brett can take it on [19:07:50] Cool with that context I think targetting monday or tuesday of next week sounds ideal [19:08:00] cool, please add brett then :) [19:08:03] ty! [20:25:33] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: Server provision script updates for Nokia switch support - https://phabricator.wikimedia.org/T404146 (10cmooney) 03NEW p:05Triage→03Medium [20:25:36] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: Server provision script updates for Nokia switch support - https://phabricator.wikimedia.org/T404146#11165106 (10cmooney) [20:27:51] 10netops, 06Infrastructure-Foundations, 06SRE: Netbox: Updates for Nokia switch support - https://phabricator.wikimedia.org/T404146#11165109 (10cmooney)