[00:51:03] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [00:52:18] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [01:11:28] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919775 (10BCornwall) [01:12:18] RESOLVED: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [01:36:05] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919804 (10BCornwall) [01:54:47] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919821 (10BCornwall) [01:55:01] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9919822 (10BCornwall) [03:49:53] 07HTTPS, 10MediaWiki-Action-API, 10MediaWiki-REST-API, 10RESTBase-API, 06Wikimedia Enterprise: Proposal: fail explicitly and revoke relevant API keys over plain-text HTTP connection for all Wikimedia APIs - https://phabricator.wikimedia.org/T368344 (10Diskdance) 03NEW [03:52:42] 07HTTPS, 06Traffic, 10MediaWiki-Action-API, 10MediaWiki-REST-API, and 2 others: Proposal: fail explicitly and revoke relevant API keys over plain-text HTTP connection for all Wikimedia APIs - https://phabricator.wikimedia.org/T368344#9919994 (10Pppery) [09:02:28] hey folks, I am going to rollout the glibc changes to all cp nodes [09:09:50] elukey: no daemon restarts? [09:18:02] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9920631 (10ABran-WMF) [09:18:23] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9920630 (10Marostegui) >>! In T365995#9883497, @jcrespo wrote: > backup1009 is the main backup node for bacula on eq... [09:19:07] vgutierrez: yep it can be picked up anytime, a lot of cp text nodes already run it and cp4052 was restarted yesterday, seems fine to avoid a complete roll restart (also there are some reboots planned afaics) [09:19:31] the upgrade is mostly an excercise, no real security flaws to fix [09:19:38] elukey: ack [09:19:42] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9920643 (10Marostegui) [09:20:26] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-f1-eqiad - https://phabricator.wikimedia.org/T365996#9920645 (10ABran-WMF) [09:21:19] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f2-eqiad - https://phabricator.wikimedia.org/T365997#9920659 (10ABran-WMF) [09:23:11] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 -lsw1-f3-eqiad - https://phabricator.wikimedia.org/T365998#9920670 (10ABran-WMF) [09:50:30] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Turn down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#9920844 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7ca43ab0-579a-4f82-97aa-11720f300bd7) set by cgoubert@cumin1002 for 21 days, 0:00... [09:54:13] 06Traffic, 10MW-on-K8s, 06serviceops, 06SRE, and 2 others: Turn down api_appserver and appserver clusters - https://phabricator.wikimedia.org/T367949#9920870 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=046a1781-9fad-454c-b26b-ad2c96d2d8b2) set by cgoubert@cumin1002 for 21 days, 0:00... [09:55:25] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Add per-output queue monitoring for Juniper network devices - https://phabricator.wikimedia.org/T326322#9920871 (10cmooney) >>! In T326322#9650260, @cmooney wrote: >>>! In T326322#9130092, @ayounsi wrote: >> @cmooney I came across https://w... [10:50:39] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9921174 (10jcrespo) > Is there a procedure for that so we know how to do so? Sadly, there is not. The code changes... [10:56:13] 10netops, 06Data-Persistence, 06DBA, 06Infrastructure-Foundations, and 2 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e3-eqiad - https://phabricator.wikimedia.org/T365995#9921190 (10Marostegui) I will try - but just in case @ABran-WMF please take some notes! [12:32:41] claime: ack, ping me when needed! [12:33:10] fabfur: great, thank you [12:56:44] vgutierrez: hi I'd a small question about Liberica ? [12:57:01] topranks: hi, go ahead please [12:57:05] I see in the Katran source there is a parameter COPY_INNER_PACKET_TOS [12:57:24] it defaults to on (or at least my assumption that's what the 1 means) [12:57:25] https://github.com/facebookincubator/katran/blob/13a651916ce5a182a047e64737f8415188c8e97b/katran/lib/bpf/balancer_consts.h#L304 [12:57:40] I assume we don't modify this setting? [12:58:09] not during my initial tests, I haven't implemented Katran as a forwarding plane yet [12:58:28] ok [12:58:41] well that default I think makes sense to leave alone [12:58:42] what's the desired value for you? :) [12:58:54] in terms of our packet-prioritisation / qos on the network [12:59:18] it's best if Katran copies the DSCP/TOS value from the original packet to the new IP header in the IPIP tunnelled one [12:59:22] so the default is best for us [13:00:08] we'll have already set that to "default" priority for external traffic from internet, but internal services SREs may have marked it to be considered "high" or "low" priority [13:00:24] keeping the default setting preserves all that for the traffic forwarded by the LB [13:00:56] which leads me to another question ;) [13:01:04] I know for PyBal the nodes don't run iptables/nftables etc., for performance reasons I think ? [13:01:39] is that the same for Liberica? I'd have thought with eBPF pulling the traffic to be load-balanced out of the kernel pipeline we could allow nftables to filter traffic to/from the host itself ? [13:04:50] that would be feasible yes [13:06:07] ok, I guess we can discuss again when closer to the time [13:06:20] at least I'm happy to test nftables on katran based nodes [13:08:11] but overall I think it would be good if we could do it, better security to protect the kernel / system IP itself [13:08:14] it also would enable us to mark the DSCP/TOS bits in packets the system generates (and with above setting Katran will do that for the traffic it forwards to match source) [13:09:01] with the LVS currently that's a small gap in our end-to-end QoS. It's a minor thing won't cause many problems, but be nice if Liberica could use nft [13:15:53] +1 [14:55:41] 10netops, 06Data-Persistence, 06Data-Platform-SRE, 06DBA, and 3 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad - https://phabricator.wikimedia.org/T365986#9922024 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=7a21c2a6-e267-4150-8111-b348788c4a9b) set by cmoo... [14:58:37] 10netops, 06Data-Persistence, 06Data-Platform-SRE, 06DBA, and 3 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad - https://phabricator.wikimedia.org/T365986#9922051 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=01b84d43-d6d0-4f45-bc2e-375ff79e21f8) set by cmoo... [14:59:05] 10netops, 06Data-Persistence, 06Data-Platform-SRE, 06DBA, and 3 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad - https://phabricator.wikimedia.org/T365986#9922053 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=65c438b1-9725-4de3-9a45-8318edea15f1) set by cmoo... [16:26:11] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9922688 (10RobH) [16:27:10] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9922684 (10RobH) a:05RobH→03None [16:27:49] 06Traffic, 10Observability-Tracing: traceparent response headers are being emitted externally - https://phabricator.wikimedia.org/T368428 (10CDanis) 03NEW [16:32:08] 06Traffic, 06DC-Ops, 10ops-codfw, 06serviceops, 06SRE: lvs2011 Memory failure on slot B1 - https://phabricator.wikimedia.org/T368165#9922720 (10Jhancock.wm) swapped DIMM_B1 for DIMM_B2 to test. error has cleared. [16:32:37] vgutierrez: when you have a moment can I get a +1 on https://gerrit.wikimedia.org/r/1049603 ? [16:35:37] 06Traffic, 10Observability-Tracing, 13Patch-For-Review: traceparent response headers are being emitted externally - https://phabricator.wikimedia.org/T368428#9922746 (10CDanis) [16:35:45] thanks <3 [16:35:57] as for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049604 do ATS Lua changes still require restarts? [16:36:09] I'm not in a rush about that one ofc [16:36:38] 10netops, 06Data-Persistence, 06Data-Platform-SRE, 06DBA, and 3 others: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e5-eqiad - https://phabricator.wikimedia.org/T365986#9922750 (10cmooney) 05Open→03Resolved [16:50:56] topranks: we are seeing some issues with lvs2011 IPv6 traffic [16:50:58] topranks: Heya! Mind joining us in this corner? [16:51:19] we noticed because we need to use bast2003 to log via ssh into lvs2011, other bastions won't work [16:51:33] https://www.irccloud.com/pastebin/WrZrqESz/ [16:51:47] MTR shows that bast6003 can't reach lvs2011 on port 22 TCP [16:55:11] the host is sending RST back [16:55:16] 6:54:34.804413 IP6 2620:0:861:4:208:80:155:110.43496 > 2620:0:860:113:10:192:23:9.22: Flags [S], seq 4090563902, win 43200, options [mss 1440,sackOK,TS val 1070070870 ecr 0,nop,wscale 9], length 0 [16:55:16] 16:54:34.804465 IP6 2620:0:860:113:10:192:23:9.22 > 2620:0:861:4:208:80:155:110.43496: Flags [R.], seq 0, ack 4090563903, win 0, length 0 [16:57:03] although it's funny cos the SSH connection (in this case from bast1003) doesn't immediately fail, which you'd expect if that RST went back [16:57:32] I'm going to let JennH know that these issues don't require her presence in the DC any more [16:59:30] Thanks for your work :) [16:59:35] vgutierrez: something fishy [16:59:54] 06Traffic, 06DC-Ops, 10ops-codfw, 06serviceops, 06SRE: lvs2011 Memory failure on slot B1 - https://phabricator.wikimedia.org/T368165#9922861 (10BCornwall) 05Open→03Resolved Linux is happy, too. Thank you, @Jhancock.wm! [17:00:04] right now the default IPv6 route is using vlan2018 [17:00:32] which is fairly normal (should be fixed - see T358260) [17:00:33] T358260: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260 [17:00:51] now the odd thing is the IPv6 default only gets added when it gets an RA on the interface [17:01:10] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9922864 (10BCornwall) a:03BCornwall [17:04:10] topranks: somehow that RST doesn't arrive to the client [17:04:21] topranks: or mtr wouldn't show lost packets? [17:04:29] yep you're correct [17:04:45] the RST is odd, but it's the IPv6 default route not working for some reason [17:04:59] we only noticed this after rebooting the host BTW [17:06:01] to add to taht, topranks, also as a refresher, that we worked on this in https://phabricator.wikimedia.org/T352920 most recently but had not rebooted it since then [17:06:03] I think I see what's wrong... but I'm scratching my head as to what could have changed [17:06:06] (vlan missing on switch) [17:06:20] ah I think I know [17:07:28] so this host is in row A.... we only put the IP gateways for vlan2018, private1-b-codfw, on the switches in row B [17:07:34] as that is the only place they are needed [17:07:59] *but* the LVS can use any random vlan for IPv6 traffic because..... well T358260 [17:08:00] T358260: Disable acceptance of IPv6 router-advertisement on non-default LVS interface - https://phabricator.wikimedia.org/T358260 [17:08:43] being specific it in theory should be able to hit one of the anycast gateway interfaces on one of the row B switches [17:08:55] they are what are sending the RAs it is using for its default route [17:12:54] yeah this is some quirk, the same thing happens in v4 [17:13:08] lvs2011 can't ping the anycast GW IP on the row B leaf switches on vlan 2018 [17:13:36] it can reach hosts on the vlan just fine though - just not the switch GW IP [17:13:39] https://www.irccloud.com/pastebin/81RXiH7d/ [17:14:34] same with v6 [17:14:39] https://www.irccloud.com/pastebin/AiyFS5wv/ [17:14:42]  [17:16:00] So for now I fixed with by deleting the default route, letting lvs choose another one of the many RAs its accepted instead [17:16:14] root@lvs2011:~# ip route del default via fe80::2018:0:1 [17:16:14] root@lvs2011:~# ip -6 route get fibmatch :: [17:16:14] default via fe80::1 dev vlan2019 proto ra metric 1024 expires 577sec hoplimit 64 pref medium [17:16:24] yeah it works now [17:16:40] so I guess we should revisit https://phabricator.wikimedia.org/T358260 [17:16:45] We could do some things we'd rather not to fix this on the network side [17:16:45] I do see your patch there, I know I know [17:17:06] specifically extending the vlan to all switches in row A, C and D and adding a GW interface [17:17:35] but tbh I don't think this is really an issue network-side, and it's a bad idea to add work-arounds on the network rather than not fix the root cause [17:19:52] thx topranks :) [17:20:14] it makes sense it happened after a reboot I think, the behaviour is it uses one of the RAs at random, but then sticks with that as its default [17:20:37] so after a reboot here it picked vlan2018, which is in row B, and is now an anycast gw on in a vxlan [17:21:02] there *is* some quirk there but I think the easier way to solve is to set the sysctl's on the lvs [17:21:55] we won't have any other hosts trying to use a switch in a remote row as its gateway [17:22:25] so - despite the fact in a regular ethernet that should work - the deficiency on the network is not gonna impact otherwise [17:24:26] topranks: thanks <3 [17:24:39] brett: I think once you have verified everything else is fine, repool it IMO [17:25:39] yeah - from experience it should be stable and keep using the current IPv6 default [17:27:11] sukhe: Other than SSH I didn't really see anything wrong until traffic flowed. So I guess if everything seems right might as well open the faucet [17:27:22] brett: yeah [17:27:23] topranks: do you have any idea how the linux routing engine makes that decision? is the the same kind of thing like on junipers where the longest-lived BGP session breaks a tie? [17:28:38] cdanis: I actually don't, RAs are sent periodically so I expect it just picks the first that it gets after coming online [17:28:45] yeah that makes sense [17:29:53] it's sort of the same as what you mention on the Juniper [17:30:15] as-in if it has a route in the table, and then learns another - with exactly equal attributes - it keeps what it has [17:30:23] yep sure [17:30:39] Seeing the elevated tcp/socket errors again in the host overview [17:31:20] ssh is working from eqiad still, the v6 route on lvs2011 hasn't changed [17:32:19] topranks: Is that to say that this should not be pooled? [17:32:58] no it still looks ok to me is what I was saying [17:33:08] or at least the elevated socket errors are not due to the same thing [17:33:26] but probably we should work out what they are before pooling [17:33:37] sukhe: I'm gonna depool again [17:34:10] brett: wait [17:34:20] zoom out a bit and see if the errors stand in any way [17:34:44] They do stand out. Since the reboot it goes above a pretty flat plateua [17:34:47] plateau [17:35:15] tcp/inerrs [17:35:20] compare it against the background rate of the other LVS while pooled [17:35:40] wait tcp/inerrs? that's like, bad checksums, or the packet is being rejected for other reasons [17:35:58] https://grafana-rw.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=lvs2011&var-datasource=thanos&var-cluster=lvs&from=now-24h&to=now&viewPanel=20 [17:36:15] brett: ok, go ahead [17:36:22] tcp/attemptfails as well [17:37:02] weird [17:38:37] almost a similar rate (and worse on lvs2014 fwiw) [17:39:06] oh yeah, so this might be an issue beyond lvs? [17:40:26] I think you will see some tcp/inerrs on basically any host you pick. I don't think -- and I may be wrong -- that this is an alarming rate [17:40:38] unless you find some other symptom that is [17:41:15] compare the rates of lvs2011 or example with any other lvs, including 2014 for example since that is now the primary ht-1 [17:41:49] Yeah. Makes sense, I was alarmed by the difference before and after. But it seems to have happened independently [17:42:00] soooooo you cool with re-re-pooling? [17:42:14] unless you see anything else that is wrong, I am. [17:43:12] pybal is going to generate some tcp/attemptfails as it healthchecks things that are not online right now [17:43:31] (those are connection timeouts and i think perhaps also conn refused) [17:44:37] yeah, as a baseline [17:45:00] brett: keep an eye out on the LVS graphs/ipvsadm output to see it is picking up traffic fine and 2014 is draining [17:45:27] don't worry if it pages :] [17:45:46] FWIW on the original issue the problem is for some reason the VXLAN-based switches are struggling to properly deal with packets that are sent to them over a L2VNI with a destination MAC of it's own local interface [17:46:24] Juniper do have a design they call "centrally routed bridged overlay" which involves exactly that happening, but I suspect there are some config knobs we don't have that are required to make it work (ARP/ND snooping perhaps) [17:47:34] But as I said I think we can hopefully resolve by adjusting the LVS to use its primary interface instead [18:03:54] topranks: will discuss with Traffic more formally. [18:03:56] thank you :) [18:04:02] for the resolution for today as well [18:04:05] brett: looks good I would say [18:04:09] no probs [18:04:22] yeah [18:29:51] bblack: any chance you're around? [18:31:19] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9923318 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5017.eqsin.wmnet with OS b... [18:37:42] I'm having a "how did this ever work" moment about very old VCL [18:42:17] cdanis: yeah [18:42:42] (some of it probably didn't) [18:50:18] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9923367 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5017.eqsin.wmnet with OS bulls... [18:50:26] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9923368 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5017.eqsin.wmnet with OS b... [18:56:00] 06Traffic, 10Observability-Tracing, 13Patch-For-Review: traceparent response headers are being emitted externally - https://phabricator.wikimedia.org/T368428#9923414 (10CDanis) 05Open→03Resolved [19:52:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [19:57:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [19:59:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [20:03:24] Hi Traffic, DPE / the Search Team are continuing work on the WDQS graph split transition. Requests will be federated between the two different subgraphs. As a result of that our previous method of performing throttling (basically a token-bucket algorithm) won't work since a large number of requests will be coming internally rather than externally, making it challenging to track/attribute the external origin of federation requests [20:03:29] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9923652 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5017.eqsin.wmnet with OS bulls... [20:04:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5017 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [20:05:34] Is someone available to discuss tomorrow with dcausse and me? We have a weekly search meeting beginning at 15:00 UTC tomorrow (weds) that would be a good venue to have the discussion, if someone from traffic is free to attend [20:06:16] Meeting is 15:00-15:30 on the calendar but the true length of the meeting is more like 15:00-17:00 FYI [20:12:28] ryankemper: I am not a good fit for that so will defer to others. [20:12:55] also a bit late for most people so please email sre-traffic@ [20:14:10] ack. Will send out that e-mail in a bit, and also include a bit more context since I forgot to link a phab ticket [20:14:29] thanks