[08:19:45] eh, the icinga dual setup has bitten us multiple times, the external monitoring was born also for that (checking that icinga works there) [08:20:06] I'm pretty sure there is a puppet way to solve this problem, but nobody has wanted to look at it in ages AFAIK [08:20:44] but I don't know if o11y has looked ad it in recent times [08:21:32] that said, one small thing that we could do is to expand a bit the external monitoring to also check the status of the passive host on itself [08:21:52] but might become a bit spammy in terms of emails.... thoughts? cdanis ^^^ [08:23:23] as for the thanos timeout, that's probably worth a task for o11y [09:53:58] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 (10cmooney) Change now made on relforge1003 also. During change I ran "sudo ip monitor" and netlink... [10:05:32] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 (10Volans) >>! In T290984#7354885, @cmooney wrote: > - Decide on a way to have this done at boot-time... [10:12:04] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 (10MoritzMuehlenhoff) >>! In T290984#7354885, @cmooney wrote: > - Decide on a way to have this done a... [10:13:20] thanks moritzm ^^^ we were about to ping you on how to proceed [10:13:43] I think we could do it for all "files" in /sys/kernel/debug/i40e/ as is the same driver and should be safe [10:15:12] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 (10MoritzMuehlenhoff) > That page mentions that at least firmware version NVM 6.01 (for the NIC) and... [10:15:57] yeah, that sounds good [10:16:21] *dirs are it's /sys/kernel/debug/i40e/XXX/command [10:16:33] or check if a backported ethtool makes this configurable via ethtool (but then ethtool might be closely related to a specific kernel version) [10:17:53] in general offloading LLDP to onboard agent on the NIC instead of the Linux kernel sounds like a recipe for disaster :-) [10:19:55] The smart NIC of the future is coming for your whole system watch out! [10:20:29] ehehe [10:21:12] I'm wondering if it might be a driver issue instead of ethtool version? Definitely worth running a test to see if a backported ethtool would expose it though. [10:22:58] moritzm: as to the suggestion to manage it like a file, the written string doesn't appear if you look at the file contents afterwards, so I'm not sure if that'd cause a problem? Would puppet just constantly try to write it every time it sees it's not in the file? [10:30:47] agreed, it's probably more likely the case that ethtool simply exposes whatever the driver presents it [10:30:53] we have another option to test this: [10:31:31] there's a 4.19 kernel for Stretch (also puppetised since the k8s workers use) and we could reboot one of the nodes to 4.19 to see if that makes a difference [10:32:17] apt-get install -y linux-image-4.19-amd64 [10:32:24] as for the managing as a file [10:33:12] that's a good point, I didn't think about that: if the setting won't be visible Puppet will try to write to it in every Puppet run [10:33:34] it's probably a NOP for the driver once it disable the LLDP mode, but still [10:33:38] one other option: [10:34:00] if we can detect the "LLDP is operated by the kernel" mode reliably [10:34:15] we can use "onlyif" along with a Puppet exec [10:34:53] the lldp_parent fact is emoty in that case [10:35:18] there's an example in the base::firewall class starting in line 41 which is pretty similar [10:35:19] as the lldp_neibhours and the lldp one has just primary: null [10:35:24] But it would also be empty if something was broken on the switch stopping it sending LLDP frames. [10:37:17] there is also the horrible way, write a a file in /tmp that puppet checks so at reboot it will not be there :-P [10:37:20] we could also simply combine the echo to sysfs along with writing a file to tmpfs [10:37:30] eheeh same :) [10:37:47] haha :-) [10:37:48] haha good stuff lads. never enough duct-tape :) [10:38:18] I guess if ethtool could be made report on the status that'd be the cleanest way. [10:38:39] indeed, but your hunch that this will only work with an updated kernel seems likely [10:39:41] I'd lean that way, but I'm also confused as to how the "echo" command works if the driver doesn't support it, but admittedly that's at the edge of my knowledge of these things. Perpahs we've an intermediate driver version that doesn't report the attribute but does allow it to be toggled that way. [10:40:11] or it might simply ignore it, dunno [10:40:45] in any case we should guard any puppet code which modifies this with checks for the presense of the NIC model, then the impact is limited anyway [10:41:03] and possibly the NIC firmware version also plays a role here [10:41:32] typically these don't get upgraded after the version that was shipped by the vendor [10:41:52] except a few cases where we needed to upgrade to fix some bugs we ran into [10:42:08] yeah that makes sense [10:43:51] Updating the kernel seems a relatively straightforward thing we could do? Worth doing to see if there is any difference in ethtool output? [10:48:14] In the meanwhile I'll merge the patch to fix the facter side of it [10:48:46] I think so, rebooting one of the swift backends to 4.19 and then back to 4.9 should be fine, best to quickly sync up with Matthew as a headsup, Filippo is on vacation currently [10:49:15] the swift backends (ms-be*) can simply be rebooted one a time, the Swift frontends automatically deal with unreachable backends [10:49:16] we already asked for the echo and he kinda 302 us to c.danis [10:49:33] as he's the backup person for swift apparently until they get full control :D [10:50:00] ah, ok [10:50:35] so yes, double sync up for now :D [10:52:17] also I thin there is some rebalance in progress [10:52:24] so we might want to check tht before rebooting hosts [10:52:48] ack, the other option to test (relforge) is a little more involved since we need to drain the node in the elasticsearch level for every reboot and AFAICT this only affects swift backends and relforge, so let's wait for Chris to be around [10:53:17] ack [10:59:10] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 (10Volans) The puppet patch has been merged, so the error showing up in facter is now gone. [16:37:31] Guys talking to effi.e there about some problems she's seeing on MW servers talking to memcache servers. [16:38:07] I believe the issue is probably being caused by outbound discards in eqiad from switch layer to CRs. However this is a tricky hypothesis to validate. [16:39:10] One way to try to verify is to change the gateway for a few MW servers from CR1 (VRRP master) to CR2, causing the traffic to flow via the idle links to CR2 from the row, and thus never need to be buffered / suffer from drops due to full buffers. [16:40:07] Obviously it's trivial to change this on the server command line, but I'm wondering how I might achieve it in our environment? Or at least prevent puppet from reverting to the correct GW IP if it's changed manually. [16:40:27] puppet is easy, just disable it if it's a quick test [16:40:53] does it need to happen on a pooled server or also a depooled or codfw one might show the issue? [16:41:43] volans: I was thinking about depooling, running puppet to change the gw, and the leave it for a few days [16:41:49] and see if anything pops up in the graphs [16:42:03] I forgot to pool back the server [16:42:05] ah so change it from puppet, as a more permanaent solution, got it [16:42:32] I think running it for 3-4 days would make sense, we can always leave puppet disabled for 3-4 days [16:42:35] I have done worse :p [16:42:44] topranks: what files you need to modify? [16:43:08] volans: hey thanks for weighing in. [16:43:13] just /e/n/i ? [16:43:28] do we even need to change it instead of applying the change live? [16:43:36] Em the change isn't needed from puppet itself, if we can disable puppet for a few hosts that's probably the easiest. [16:43:45] Actual change is to change the default gateway IP. [16:43:59] I'd probably just do "ip route add" and "ip route del" to do that. [16:44:09] But alternately you could change /etc/network/interfaces and then ifreload. [16:44:13] I don't thinkt hat puppet touches that, but not 100% sure [16:44:21] /etc/network/interfaces is not managed by puppet IIRC [16:44:29] ok that is good news [16:44:36] cumin 'R:File = /etc/network/interfaces' returns empty [16:44:50] ok, then live it is then [16:45:13] I want to remember it is not on puppet, because when we have to move a server, manuel edits it, shutsdown the server and on start nothing has to be done [16:45:26] *edits that conf directly [16:45:34] topranks: I can depool a server, change its default, run puppet, and if nothing happens [16:45:42] as for the whole plan I didn't read the whole backlog and I'm not the expert on that side, so I'll leave it to you to judge other possible impacts [16:45:43] we are good to go [16:45:48] hmm yeah ok. [16:46:15] volans: we are trying to verify the root cause of some retransmissions we see on mw* and mc* servers [16:46:21] its lldp fact will probably change, but that should change just the grouping of hostgroups in icinga [16:46:22] This was the way I verified the cause of the drops for the backup servers. But in that case I could run iperf immediately after, so I was unconcerned if puppet reverted my manual change. [16:47:05] jynus: thanks. I think that means it must not be configured by puppet? I did spend some time trying to find how we template that file before and not getting very far, which suddenly now makes sense. [16:47:20] topranks, I think it is handled on install only [16:47:26] topranks: if a server is depooled, you can have your way with it too [16:47:33] I can spare a few [16:47:38] topranks: we set it in stone at d-i time [16:48:04] see modules/install_server/files/autoinstall/scripts/late_command.sh [16:48:11] printf '\tpre-up /sbin/ip token set ::%s dev %s\n' "${IP}" "${IFACE}" >> /target/etc/network/interfaces [16:48:27] plust additional commands after wards [16:49:56] so, the verdict is [16:50:33] depool do whatever you need to do and if all good repool I guess [16:50:37] we depool, manually change gw, run iperf or repool [16:50:47] +1 [16:51:20] topranks: we can coord for it for tomorrow or when it works for you [16:51:31] volans: thanks for that, although I can only see some "additional" commands in there, not the config in the network file for the actual IP address, or gateway. [16:53:27] effie: yep sounds like a plan, let's look at it in the morning, it'll be quick enough to do. [16:53:29] mmmh right, let me check better [16:53:42] do you know what subset of servers might be suitable to do it on? [16:53:50] volans: don't waste more of your time [16:53:59] we can depool a server and see what happens [16:54:21] it is safe enough [16:54:39] yes, I can do 2 api and 2 app servers in row A [16:54:44] or 1 and 1 [16:57:08] either or, and we can do it now if you got time or instead wait till the morning (aware it's getting late over there) [16:57:38] topranks: I *think* d-i does the file generation by default with the IP and gateway of the installer environment [16:57:45] and there we just append the pre-up stuff [16:58:20] any other customization might go into interfaces.d/ [16:59:03] ok yeah. I've mostly found that dir empty when poking my nose about, seems mostly to just be in "interfaces" file. [16:59:10] "d-i" is what, debian installer? [16:59:58] yes yes