[13:07:49] How does one move a ticket to be a "this might be a MW bug" ticket? T348586 is a failure of MW to write an object to both swift clusters (it only PUT to codfw). Monday's rclone job will tidy this up, but it might warrant a look further up the stack to see why only one write attempt was seen by swift... [13:07:50] T348586: File not found: /v1/AUTH_mw/wikipedia-commons-local-public.7e/7/7e/EC02-0162-69_l_%2824374651802%29.jpg - https://phabricator.wikimedia.org/T348586 [13:24:39] on-call folks: we are rolling out a potentially big change, moving ns1 announcements to BGP from the current static routes [13:25:01] we will be extra careful but if something breaks, please let us know and I promise to fix it :) [13:26:21] XioNoX: ^ around? I am going ahead unless you stop me! [13:26:30] sukhe: go for it [13:26:35] plan is what we discussed yesterday: roll out change, check everything is OK, remove statics [13:26:55] worse case, we put the statics back: set routing-options static route 208.80.153.231/32 next-hop [ 208.80.153.48 208.80.153.74 208.80.153.107 ]; [13:27:20] yeah [13:47:17] sukhe: any luck? (keeping an eye on bgp) [13:47:59] XioNoX: so far so good [13:48:14] 2023-10-11 13:48:08,269 anycast-healthchecker[1628249] INFO hc-vip-ns1.wikimedia.org status UP [13:48:26] which host? [13:48:30] 2004 [13:48:40] checking a few other things before rolling out [13:48:48] already rolled out to non-codfw DNS hosts, NOOP there [13:48:58] (not all, a few, to test) [13:49:10] XioNoX: please review as well [13:49:15] "Hidden reason: Rejected by import policy" [13:49:18] doesn't look good [13:49:25] where is this? I see the IP being advertised [13:49:41] > Hidden reason: Rejected by import policy [13:49:44] OH [13:49:46] I know... [13:49:48] we are missing something on the CRs? [13:49:49] not a big deal [13:49:52] yeah [13:50:06] we accept `from prefix-list-filter anycast4 longer` [13:50:16] so only things that are longer than a /32 [13:50:19] so nothing... [13:50:50] but what about the existing adverts for the WDNS /32s and such? [13:50:54] as we set "prefix-list anycast4 208.80.153.231/32" [13:51:04] ah in the wikidough case that must be a /24 [13:51:05] ok [13:51:16] the other are "longer" than the prefix-list "anycast4 10.3.0.0/24" [13:51:26] just need one small change [13:51:41] thanks, I will wait for you to fix it then and meanwhile check other stuff [13:54:45] that's a good reminder for the v6 /128 as well, given we set the same there [13:55:04] https://gerrit.wikimedia.org/r/c/operations/homer/public/+/965169 [13:55:21] topranks: if you're around for an easy review [13:55:30] orlonger is one word? [13:55:33] yeah [13:55:38] yeah, seems like it is per the docs [13:57:10] ah cool [13:57:19] "longer" vs "orlonger" heh :) [13:57:49] what names [13:58:28] orlonger-but-actually-shorter [14:00:47] alright, now we're talking [14:00:53] https://www.irccloud.com/pastebin/Qfp5LSYy/ [14:01:06] nice! [14:01:18] https://www.irccloud.com/pastebin/XrRiI6Ss/ [14:01:33] so the static is prefered [14:02:10] ospf should go away with the static [14:02:16] yeah I guess one of the nice fallbacks for this is, in case something was broken during the rollout [14:03:39] sukhe: let me know when I can pull the static and see what's up [14:03:52] XioNoX: yeah, rolling out to all others shortly [14:03:55] just double checking [14:09:43] 2005 looks good, moving on to 2006 [14:09:58] confirmed [14:12:31] 208.80.153.231/32 208.80.153.107 64605 I [14:12:45] 2006 also looks OK [14:13:47] yep [14:14:17] I am going to roll it out to all P:bird::anycast [14:14:19] then we can remove the statics [14:14:21] and see how that goes [14:14:29] ok! [14:14:44] 🐦 is nice on irccloud [14:14:54] ha yeah, even on phab [14:25:43] XioNoX: agent enabled, rolled out to all hosts [14:25:49] sukhe: cool [14:25:54] time to rm the statics and see :) [14:26:27] on it [14:28:14] done [14:28:16] bgp took over [14:28:19] nice! [14:28:21] no ping lost from home at least [14:28:35] https://grafana.wikimedia.org/d/Jj8MztfZz/authoritative-dns?orgId=1&refresh=30s [14:28:44] hitting the same server [14:28:57] dunno if it's pure luck or if the lb uses the same info [14:30:28] now eqiad? [14:30:38] :) [14:30:41] haha [14:30:44] not today though [14:30:51] I plan to do it tomorrow, just to let this bake a while [14:30:53] sounds good? [14:32:03] of course yeah [14:32:25] might be worth testing for failures too [14:32:38] shutting down recdns, shutting down authdns [14:32:40] etc [14:32:48] one 1 server I mean [14:33:15] yeah, that should be a simple bird stop though for both things? there is NTP too with that [14:34:31] I mean simulate gdnsd crash, to make sure the healthcheck works as expected and pulls the prefix [14:34:59] ah in that sense ok [14:35:01] ok will test [14:36:07] I'm stepping away for an hour or so but I don't think you need me for that [14:36:27] XioNoX: all good, thanks for the help! [14:36:52] no pb! thx for the work! it's great to see it live [14:37:20] yep, indeed, long time coming! [15:26:29] * arturo leaving channel [15:31:42] take care!