[08:04:04] 06Traffic, 06SRE: Anycast ns1.wikimedia.org - https://phabricator.wikimedia.org/T366193#9849495 (10ayounsi) That's quite interesting seeing the variation of tradeoffs, and can be quite (an important) rabbithole. Is the goal to figure it out before anycasting ns1, or first anycast ns1 from anywhere then figure... [10:46:55] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [10:47:42] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [10:49:33] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9849768 (10cmooney) So I tested pushing with Homer to the devices in row D and it was pretty much successful :) NOTE: As the devices need some additio... [10:51:55] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [10:53:17] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:01:55] FIRING: [3x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:13:43] 10netops, 06Infrastructure-Foundations, 06SRE: Include vlans with defined IRB int in device vlans even if no port present - https://phabricator.wikimedia.org/T366348 (10cmooney) 03NEW p:05Triage→03Low [11:17:59] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Include vlans with defined IRB int in device vlans even if no port present - https://phabricator.wikimedia.org/T366348#9849823 (10cmooney) Diff with this patch applied on one of the new codfw switches: ` cmooney@wikilap:~$ homer lsw1-d2-cod... [11:31:55] FIRING: [3x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [11:33:06] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Include vlans with defined IRB int in device vlans even if no port present - https://phabricator.wikimedia.org/T366348#9849839 (10cmooney) [11:36:55] FIRING: [3x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:37:10] FIRING: [3x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:38:17] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:41:55] RESOLVED: [3x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [12:42:42] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [13:35:14] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, and 2 others: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9850037 (10cmooney) @papaul @Jhancock.wm I noticed that the leaf in rack d8 is reporting one of it's power supplys down: ` cmooney@lsw1-d8-codfw> show... [14:14:39] 06Traffic, 06SRE: Anycast ns1.wikimedia.org - https://phabricator.wikimedia.org/T366193#9850138 (10ssingh) >>! In T366193#9849495, @ayounsi wrote: > That's quite interesting seeing the variation of tradeoffs, and can be quite (an important) rabbithole. Is the goal to figure it out before anycasting ns1, or fir... [14:19:46] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9850142 (10cmooney) @ABran-WMF thanks for creating all the tasks! Really appreciated, I did not expect to come back and see that :) >>! In T348977#9837047, @MatthewVe... [14:24:43] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f5-eqiad - https://phabricator.wikimedia.org/T365982#9850163 (10cmooney) [14:25:56] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f6-eqiad - https://phabricator.wikimedia.org/T365983#9850164 (10cmooney) [14:27:39] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f5-eqiad - https://phabricator.wikimedia.org/T365982#9850169 (10cmooney) p:05Triage→03Medium a:05MatthewVernon→03cmooney [14:27:53] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9850174 (10cmooney) @ABran-WMF thanks for creating all the tasks! Really appreciated, I did not expect to come back and see that :) >>! In T348977#9837047, @MatthewVe... [14:28:06] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f6-eqiad - https://phabricator.wikimedia.org/T365983#9850165 (10cmooney) p:05Triage→03Medium a:05MatthewVernon→03cmooney [15:30:13] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360 (10ssingh) 03NEW [15:30:16] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9850347 (10ssingh) p:05Triage→03Medium [15:31:23] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade ssw1-e1-eqiad to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361 (10cmooney) 03NEW p:05Triage→03Medium [15:36:30] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9850415 (10ssingh) To clarify, there is //no change// to the configuration of the DNS hosts themselves and the peer list there. This is only for the consumers of `P:systemd::tim... [15:49:40] FIRING: [4x] VarnishHighThreadCount: Varnish's thread count on cp1102:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:54:39] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9850519 (10ssingh) [15:54:40] FIRING: [25x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:54:50] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9850520 (10cmooney) I suspect Brandon may be more versed in the ways of NTP than myself, and could advise if there are any pitfalls on the protocol side. But from my own unders... [15:56:35] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade ssw1-e1-eqiad to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9850549 (10cmooney) [15:57:19] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9850550 (10BBlack) [15:58:51] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade ssw1-e1-eqiad to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9850564 (10cmooney) [15:58:55] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9850565 (10cmooney) [15:59:40] FIRING: [30x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:59:50] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9850567 (10cmooney) [16:01:35] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9850574 (10BBlack) Yeah, I've looked at this from the deep-ntp-details POV and it's all pretty sane. We're in alignment with the recommendations in https://www.rfc-editor.org/r... [16:04:40] FIRING: [44x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:09:40] FIRING: [44x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:14:40] FIRING: [42x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:19:40] FIRING: [34x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:24:40] RESOLVED: [21x] VarnishHighThreadCount: Varnish's thread count on cp1100:0 is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:25:36] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Eqiad row E-F Spines to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9850699 (10cmooney) [16:25:58] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Eqiad row E-F Spines to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9850702 (10cmooney) [16:27:34] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Eqiad row E-F Spines to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9850704 (10cmooney) [16:30:02] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade Eqiad row E-F Spines to JunOS 22.2R3 - https://phabricator.wikimedia.org/T366361#9850726 (10cmooney) [16:31:10] 10netops, 06Infrastructure-Foundations, 06SRE: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - https://phabricator.wikimedia.org/T348977#9850733 (10cmooney) [18:03:07] 06Traffic, 06SRE: Anycast ns1.wikimedia.org - https://phabricator.wikimedia.org/T366193#9851085 (10cmooney) >>! In T366193#9849495, @ayounsi wrote: > That's quite interesting seeing the variation of tradeoffs, and can be quite (an important) rabbithole. Is the goal to figure it out before anycasting ns1, or fi... [19:16:56] 06Traffic, 06SRE: Anycast ns1.wikimedia.org - https://phabricator.wikimedia.org/T366193#9851431 (10BBlack) Yes, from a resiliency POV, in some senses keeping unicasts in the mix is an answer (and it's the answer we currently rely on). In a world with only very smart and capable resolvers, the simplest answer... [21:22:48] FIRING: PuppetZeroResources: Puppet has failed generate resources on ncmonitor1001:9100 - https://puppetboard.wikimedia.org/nodes?status=failed - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet - https://alerts.wikimedia.org/?q=alertname%3DPuppetZeroResources