[10:49:21] Emperor: re https://gerrit.wikimedia.org/r/c/operations/puppet/+/1114015 maybe discussing it here would be faster? :) [11:02:43] Yeah, sorry, I'm obviously confused, and some of the ms fe stuff is a bit ... legacy [11:03:44] Emperor: the lvs service only cares about IP and port, so the DNS records are totally unrelated [11:03:46] AFAICS (and ignoring the dnsdisc for a moment) confctl knows about nginx and swift-fe services for the frontends at the moment [11:04:05] there is no `nginx` at the moment on ms-fe servers [11:04:15] it's just the service name on conftool [11:04:39] Mmm, I wonder if we should have renamed that when the TLS termination moved from nginx to envoy [11:04:57] or something software agnostic like `tls` [11:04:59] <_joe_> Emperor: we didn't on the appservers/etc pools either :) [11:05:27] <_joe_> that patch should be very low risk [11:05:38] this is not confusing at all :) [11:05:43] <_joe_> worst that can happen, the scripts it will install to safely restart services won't work at first [11:06:04] <_joe_> volans: I mean if you want to go through the hassle of renaming stuff in conftool, be my guest [11:06:32] Emperor: so right now cluster=swift,service=swift-fe on conftool tells the LVS which backend servers should get the traffic for the swift VIP on port 80 [11:06:50] and cluster=swift,service=nginx does the same but for port 443 [11:07:15] Right, I think that's what I thought they did :) [11:07:34] cluster=swift,service=nginx is tagged as `swift-https` and cluster=swift,service=swift-fe is tagged as `swift` [11:08:02] so the realserver::pools mapping tells that if you restart swift-proxy.service or envoy.service, `swift-https` needs to be depooled on conftool [11:08:21] wait, tagged where? Confctl thinks the tags are "tags": "dc=codfw,cluster=swift,service=nginx" or similar [11:08:35] maybe tagged is the wrong verb here :) [11:08:45] hieradata/common/service.yaml [11:08:59] Sorry, I'm not trying to be difficult, I just am trying to unconfuse myself [11:09:11] <_joe_> Emperor: we create a file called /etc/conftool/local_services.yaml that contains the mapping between service name and conftool tags [11:09:18] so on service::catalog you got two keys `swift` and `swift-https` [11:09:23] (and understand what, if anything, needs to be changed in how swift is managed when we merge your patch) [11:09:32] Emperor: nope, nothing [11:09:47] you'll have new scripts that you could ignore or benefit from them [11:10:07] but the main goal of my patch is dropping the lvs::realsever include on site.pp and start using profile::lvs::realserver [11:11:20] OK, so if we deploy this change, it'll still be swift-fe and nginx in confctl, the roll-restart cookbook won't need updating, but there will be new scripts on the frontends that do similar to my existing 'sudo depool && sleep 5 && sudo systemctl restart swift-proxy && sleep 5 && sudo pool' ? [11:13:21] <_joe_> yes [11:13:36] Grand, thank you all for answering my very stupid questions :) [11:13:39] <_joe_> called restart-envoy and restart-swift-proxy IIRC [11:15:03] I get a bit confused between what confctl thinks is a service, what systemd thinks is a service and what service::catalog thinks is a service [11:15:34] (I thought I had achieved understanding when setting apus up, but evidently I hadn't, or I forgot it again) [11:16:06] I think I now understand the distinction again, but will probably be confused the next time the various meanings of service come up /o\ [11:16:47] I'm a service, you're a service, everybody is a service :D [11:21:02] * Emperor goes out to buy the disservice animal t-shirt [11:21:46] ( https://www.threadless.com/shop/%40effinbirds/design/disservice-animal but beware of many of the designs in that shop being NWS ) [11:26:40] presumably trying to rename the confctl services would Just Be Pain? [11:32:53] <_joe_> you need to create the new entries, populate them correctly, make the switch in service catalog [11:41:54] Hm, Not Today :) [13:43:48] What should we do with the alerts Not accepting/receiving prefixes from anycast BGP peer global noc ? [13:43:59] They've been going on for a few days, should we silence them and create a task? [13:45:02] marostegui: topranks already filed T384258 [13:45:03] T384258: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258 [13:45:08] but yes, we should probably silence them for now. [13:45:13] topranks: ^ agreed? [13:45:52] marostegui: yes, I disabled the alert in LibreNMS which I thought would have that effect alreay [13:45:55] leave it with me [13:46:02] topranks: <3 [13:46:02] topranks: thank you :) [13:51:02] sorry it seems during some of my troubleshooting I accidentally re-enabled the stupid thing [13:51:06] disabled again now sorry for the noise [13:55:44] topranks: thank you [23:00:32] no alerts today - nothing to report from on-call