[00:04:04] 10Traffic, 10SRE, 10vm-requests, 10Patch-For-Review: Please create two Ganeti VMs for Wikidough in eqsin - https://phabricator.wikimedia.org/T284246 (10Dzahn) @ssingh doh5001.wikimedia.org is ready for you now. doh5002 on hold for lack of IP in that subnet. [02:16:40] 10Traffic, 10SRE, 10vm-requests, 10Patch-For-Review: Please create two Ganeti VMs for Wikidough in eqsin - https://phabricator.wikimedia.org/T284246 (10ssingh) >>! In T284246#7133295, @Dzahn wrote: > @ssingh doh5001.wikimedia.org is ready for you now. doh5002 on hold for lack of IP in that subnet. Thanks... [07:19:13] mutante, sukhe, bblack, can one of the bast hosts be decom? [07:39:39] which bast host, in which DC? [07:45:40] moritzm: bast5001 [07:46:05] see https://netbox.wikimedia.org/ipam/prefixes/28/ip-addresses/ [07:46:53] 2nd option is to use one of the reserved IPs, but if bast5001 is redundant then it's cleaner to reclaim its public IP [07:50:50] bast5001 (and also bast3004/4003) might get repurposed RSN, see https://phabricator.wikimedia.org/T243057#7130650, let's wait a few more days until some decision is made there [07:51:37] if not, I agree that decommissioning them temporarily (until the eventual repurpose) is sensible [09:19:40] 10Traffic, 10DNS, 10SRE, 10serviceops, and 2 others: DNS for GitLab - https://phabricator.wikimedia.org/T276170 (10jbond) [09:19:54] 10Traffic, 10SRE, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10jbond) a:05Dzahn→03jbond The SSH port has now been opened as well [09:20:21] 10Traffic, 10SRE, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10jbond) 05Resolved→03Open a:05jbond→03Dzahn [09:21:19] 10Traffic, 10SRE, 10ops-eqiad: cp1087 down with hardware issues - https://phabricator.wikimedia.org/T278729 (10ema) >>! In T278729#7132555, @Dzahn wrote: > https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=cp1087 Thanks Daniel, after rebooting the host all the alerts are now gone. [09:27:24] moritzm: sounds good, thanks. for now, we will just go with one doh host and wait to see as the discussion progresses [09:27:31] (thanks XioNoX!) [09:50:45] 10Traffic, 10SRE, 10ops-eqiad: cp1087 down with hardware issues - https://phabricator.wikimedia.org/T278729 (10ema) 05Open→03Resolved Tentatively closing. [10:59:16] 10Traffic: ATS: origins server response data accounting issues - https://phabricator.wikimedia.org/T284290 (10ema) [11:18:17] 10Traffic: Take response size into account in CDN HTTP requests throttling - https://phabricator.wikimedia.org/T284292 (10ema) [11:30:53] 10Traffic, 10netops, 10SRE: Please configure the routers for Wikidough's anycasted IP - https://phabricator.wikimedia.org/T283503 (10ssingh) doh5001 is also up; from Mumbai, we are reaching eqsin as desired: ` $ kdig @wikimedia-dns.org +nsid +tls-ca wikipedia.org ;; TLS session (TLS1.3)-(ECDHE-SECP256R1)-(E... [12:30:48] 10Traffic, 10netops, 10SRE: BGP Policy on aggregate routes prevents them being created in some circumstances. - https://phabricator.wikimedia.org/T283163 (10cmooney) After discussion with @ayounsi on IRC he suggested looking at the use of the following command to address this: ` set protocols bgp group 10Traffic, 10netops, 10SRE: BGP Policy on aggregate routes prevents them being created in some circumstances. - https://phabricator.wikimedia.org/T283163 (10ayounsi) That sounds great! Let's test it out next week. Thanks. [14:21:57] 10Traffic: Create dashboard showing aggregate data transfer rates per DC/cluster - https://phabricator.wikimedia.org/T284304 (10ema) [18:22:48] 10Traffic, 10SRE, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10Dzahn) Thanks! I think the SSH part was T276148 (shouldnt that be closed now? We can't... [18:24:24] 10Traffic, 10DNS, 10SRE, 10serviceops, and 2 others: DNS for GitLab - https://phabricator.wikimedia.org/T276170 (10Dzahn) [18:25:04] 10Traffic, 10SRE, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: open firewall ports on gitlab1001.wikimedia.org (was: Port map of how Gitlab is accessed) - https://phabricator.wikimedia.org/T276144 (10Dzahn) 05Open→03Resolved ` ACCEPT tcp -- anywhere gitlab.wikimedi... [18:32:02] ACK, i wasnt sure why 2 bastions in eqsin, saw backlog now, just waiting for now [18:41:09] 10Traffic, 10SRE, 10ops-eqiad: cp1087 down with hardware issues - https://phabricator.wikimedia.org/T278729 (10Dzahn) 18:38 < icinga-wm> PROBLEM - Check systemd state on cp1087 is CRITICAL: CRITICAL - degraded: The following units failed: rsyslog.service,syslog.socket https://wikitech.wik... [19:17:30] mutante: :) so we just did doh5001 for now... which is OK [19:18:05] sukhe: ok! [19:18:27] but we _can_ also decom that second bastion it sounds [19:18:47] yeah [19:20:50] 10Traffic, 10SRE, 10ops-eqiad: cp1087 down with hardware issues - https://phabricator.wikimedia.org/T278729 (10BBlack) rsyslogd was down for repeatedly segfaulting on startup. I was able to strace the failure and see that it kept segfaulting while reading one of its own files in `/var/spool/rsyslog/` on sta... [19:23:04] yeah we should really just re-purpose those, but I think we need a better plan before we go doing it (the bastions that were migrated into ganeti) [19:23:53] the earlier plan was to make them ganeti nodes, but I think that depends on the outcome of the prometheus diskspace convo [19:24:16] either way, the machine is likely to remain alive and need an IP, although possibly not on the public subnet [19:24:40] 10Traffic, 10SRE, 10vm-requests: Please create two Ganeti VMs for Wikidough in eqiad - https://phabricator.wikimedia.org/T284348 (10ssingh) [19:24:53] 10Traffic, 10SRE, 10vm-requests: Please create two Ganeti VMs for Wikidough in ulsfo - https://phabricator.wikimedia.org/T284349 (10ssingh) [19:25:40] sukhe: ^ ulsfo is going to run into a similar problem as eqsin [19:25:51] (there's only 1 actually-free public1 IP) [19:25:57] oh! should have checked [19:26:24] ok thanks. we can just do one for now. or should we completely skip it? either way is fine, we have eqiad and codfw anyway [19:26:35] I'd say do one for now [19:26:43] ok, updating ticket. thanks [19:26:50] there's a couple conversations to have about fixing the IPs and/or re-purposing the bastions, which are now inter-related [19:27:06] but the same convo applies to both ulsfo+eqsin, they're in the same basic state on all related things [19:27:51] yeah [19:28:20] I think for the purpose of testing -- which is what we are doing -- one per PoP is fine as long as we can test the traffic is going to where it should be [19:28:28] right [19:29:10] the only thing that doesn't give us, in theory, is proof that the traffic hashing for ECMP is working out ok at each DC. In theory the setting could be wrong somewhere. [19:29:38] 10Traffic, 10SRE, 10vm-requests: Please create a Ganeti VM for Wikidough in ulsfo - https://phabricator.wikimedia.org/T284349 (10ssingh) [19:29:39] but in practice, IIRC the setting is global for the whole router (not per route/IP/service/port/whatever), and was already set correctly for all cases to make anycast recdns work ok [19:30:36] but is there any reason the hashing would differ per DC? in the sense that it seems to work fine for codfw, where we see IPs hitting both hosts [19:30:50] ah [19:30:53] what I mean is that's technically up to the router config at each site to get it right [19:31:27] (if it's "wrong", it could just randomly break TCP connections by ignoring the TCP details and sending alternating packets to random doh hosts) [19:31:41] 10Traffic, 10SRE: ATS: origins server response data accounting issues - https://phabricator.wikimedia.org/T284290 (10colewhite) p:05Triage→03Medium [19:32:04] 10Traffic, 10SRE: Take response size into account in CDN HTTP requests throttling - https://phabricator.wikimedia.org/T284292 (10colewhite) p:05Triage→03Medium [19:32:17] this was already looked at for anycast recdns though. [19:32:37] I see! this is then the additional magic sauce I assumed would work :) [19:32:51] 10Traffic, 10SRE: Create dashboard showing aggregate data transfer rates per DC/cluster - https://phabricator.wikimedia.org/T284304 (10colewhite) p:05Triage→03Medium [19:33:27] technically even with the correct setting it's less-than-ideal [19:33:49] because the routers aren't smart enough to correctly hash ICMP Packet Too Big messages to do path MTU discovery either [19:34:28] that might be a detail we need to poke at before testing too broadly w/ the community [19:34:43] (probably by clamping mss) [19:35:49] maybe if we're lucky that can be done in dnsdist instead of the routers, or on the doh hosts [19:36:40] noting it down; dnsdist, hmm... [19:37:15] dnsdist doesn't seem to have the setting [19:37:26] it can be fixed on the host default route though, or we can clamp it at the router [19:37:49] something for later, unless some of the current initial test users complain of breakage when hitting a 2+ doh-node datacenter [19:39:34] I have MTU 1500 at home, so I'd probably never notice such an issue, but many cable/dsl/etc solutions in the eyeball networks probably still have <1500 effective MTU [19:40:28] (we also clamp it for CF tunneling as well, and it's likely our future L4LB solution will require caring about this stuff too) [19:43:38] is there a preference of clamping the MSS at the routers, versus doing it on the hosts? [19:44:42] I actually don't know which one is better and in what way. or is the answer simple enough that if you do it at the router if you need it at that layer? [19:47:25] all good questions :) [19:48:06] I tend to have a personal preference to prefer adding to host config complexity over router config complexity, because the latter doesn't have as much mature tooling and practice around it. [19:49:04] but there are arguments that it's just simpler and more-universal to do it at the routers I'm sure, vs trying to get it puppetized exactly where it's needed across all the N public service hosts eventually [19:49:36] (class of them I mean, for everything that uses an L4LB or routes through tunnels for CF, etc, etc) [19:52:59] 10Traffic, 10SRE, 10ops-eqiad: cp1087 down with hardware issues - https://phabricator.wikimedia.org/T278729 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by bblack on cumin1001.eqiad.wmnet for hosts: ` cp1087.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202106041952_bblack_1... [19:53:01] I see. I am surprised at the lack of tooling and practice as you mention, around the router configuration. I thought this is a fairly common issue and should be easy to undertake on the routers [19:53:20] but I guess tooling may also mean a difference between a one-time manual configuration vs a more automated process? [19:53:39] well we have stuff now that manages router config in a templated way from a git repo [19:53:59] but it's just not at the level of maturity of e.g. puppet on the host side [19:54:47] so if I had a virtual slider I could move, I would not choose to slide a bunch of complexity out of puppet and into the router config tooling [19:54:57] (given no other factors in the choice!) [19:57:35] that's fair! [20:02:30] sukhe: for reference re: our router cfg mgmt: https://wikitech.wikimedia.org/wiki/Homer [20:03:23] Juniper can only do clamping on routers links so it's a "all site or nothing" kind of thing [20:04:37] are they are only 2 hosts in the same vlan, pmtud might be a good option for now, but might not in the near future [20:06:23] easiest might be to lower the host's v6 MTU to something like 1400 (cf. https://blog.cloudflare.com/increasing-ipv6-mtu/) [20:08:29] it's v4 we're worried about more, we don't even have v6 yet :) [20:08:56] [in this case, at this time, for doh testing] [20:09:47] correct, for Wikidough at least [20:10:18] in the wikidough case, we don't have any tunnels involved (well, other than the separate issue of CF at some sites) [20:10:29] bblack: yes, I made a homer change today! pretty trivial in the sense I just added doh5001's IP but I was hesitant to run the tool so cathal did that :) [20:10:43] but we have the current bird-based anycast + ECMP + junipers not hashing ICMP PTB for MTU discovery to deal with [20:11:42] pmtud uses multicast IIRC, so it's up to your pain thresholds at the netops level on whether that's a viable option :) [20:12:03] broadcast it seems [20:12:19] oh, even better! :) [20:12:21] and CF probably have all the servers of a given site in the same vlan [20:12:40] in our case all the servers here are ganeti instances, so throw that complexity into the mix [20:13:26] actually we could have an "internal to ganeti" vlan to exchange pmtud boradcast [20:13:32] doesn't look clean though :) [20:15:26] probably, eventually, we'll be on Katran or something like it and have to deal with this for a completely-different reason anyways [20:16:16] MTUs are a pain in several possible different ways [20:16:44] lowering it a bit might be a good first step [20:17:16] yeah our best bet for now is probably to clamp it on the host, puppetizing the hack we did before for the cp nodes for CF, maybe [20:18:25] https://phabricator.wikimedia.org/T232602 for reference on the details [20:18:53] but we might want a different value here than 1436, not sure [20:19:32] do the ganetis already have MTU issues due to some virtual networking detail? [20:19:55] bblack: I'm wondering if we can graph PTB messages with prometheus [20:20:37] and tune the MSS to have the optimal amount of PTB on both (or all doh) hosts combined [20:21:22] well if PTB routing is borked (as it is in the current bird+juniper anycast solution stack), the optimal amount of PTBs is zero [20:21:26] maybe the amount is so low that it wouldn't be worth lowering it much [20:21:46] bblack: the risk is that it arrives on the wrong doh server [20:21:51] of a given site [20:21:52] but there's probably a pragmatic value that covers most real-world cases, yes [20:22:02] not that it's lost [20:22:12] yes, if we broadcast the rest [20:22:37] no, I mean just to know how much the VIP gets PTB messages [20:23:02] so even if the PTB arrive on the wrong host, we count them all, so we know if it's even a problem for us [20:23:03] yes, the info is useful [20:23:32] and if it's a problem, maybe there is a good compromise MTU [20:23:45] (or MSS) [20:24:31] well we know it's going to be a problem in the current state of things [20:25:15] yeah, but not being blind would be useful [20:25:20] yes [20:26:10] but without a pmtud-like solution, we have to tune to get that rate as near to zero as is reasonable (and accept that some unreasonable cases are just-broken, if anyone has e.g. an MTU of 600 bytes on a real-world user link somewhere) [20:26:54] whereas with a pmtud-like solution, the broken-ness is a solved problem, and we're just tuning any MSS clamp such that we avoid excessive broadcast from pmtud [20:27:41] yep, both are needed [20:27:46] data + solution [20:28:34] I'd assume there's probably a trivial prom stat for PTB counts from sysfs somewhere (probably already logged, prom seems to grab everything by default!) [20:30:12] if not, maybe setup a iptables exporter with a iptables counter [20:35:43] https://github.com/cloudflare/pmtud/pull/3 would be useful too [20:38:15] bblack: https://github.com/exaring/pmtud :) [20:40:58] yeah I don't think that one's actually tied to tunnels even though the example config mentions them, but I could be wrong [20:41:14] I think we could just list all the other cluster members and have it forward them directly [20:41:38] re: stats, apparently the standard linux snmp stats counters only give the ICMP Type, but not the subcode [20:42:03] IPv4 "PTB" are really type 3 "dest unreach" with a specific subcode for "fragmentation needed" [20:42:14] so it doesn't separate them in stats from other kinds of dest unreach [20:46:23] https://github.com/retailnext/iptables_exporter but might be overkill [20:56:23] 10Traffic, 10SRE, 10ops-eqiad: cp1087 down with hardware issues - https://phabricator.wikimedia.org/T278729 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp1087.eqiad.wmnet'] ` and were **ALL** successful. [21:49:34] 10Traffic, 10SRE, 10vm-requests: Please create a Ganeti VM for Wikidough in ulsfo - https://phabricator.wikimedia.org/T284349 (10colewhite) p:05Triage→03Medium a:03colewhite [23:01:52] 10Traffic, 10SRE, 10vm-requests, 10Patch-For-Review: Please create a Ganeti VM for Wikidough in ulsfo - https://phabricator.wikimedia.org/T284349 (10colewhite) Cookbook ran successfully. Currently unprovisioned.