[06:47:07] Updated network map with Sao Paulo : https://upload.wikimedia.org/wikipedia/labs/5/5f/Wikimedia_network_overview.png [06:59:42] Very nice, but can we get a direct connection between MAGRU and EQSIN, just for fun :-) [07:04:56] slyngs: as long as you don't ask for a eqsin-drmrs, or eqord-magru that would mess up my diagram :) [07:06:12] Just place an antenna icon on both and say it's wireless [09:22:42] I'm seeking reviewers https://gerrit.wikimedia.org/r/c/operations/puppet/+/1032401 to spare graphite from a whole lot of metrics [14:12:11] repeating what I just sent to the ops list but just a heads-up - we're about to start migration of commons traffic to k8s (only at 5% for now) [14:12:32] ok thanks, gl [14:15:46] hnowlan: awesome [15:18:03] bblack: topranks: since there has been some discussion of this in different places [15:18:20] we should as just discussed turn on Wikidough's /24 at least in magru [15:18:40] and see how that works out with everything and then do ns2 when we want to [15:18:52] any objections? [15:19:13] that should also cover durum which is within the same /24 [15:19:44] yep [15:20:28] I'm in favour of it. I think extremely unlikely there will be any issue, but I still think it's good to announce this range first as a sort of test before we pull the trigger for the range ns2 sits in [15:20:46] thanks! [15:23:23] when trying to reimage a host: "The file needed for preconfiguration could not be retrieved from http://apt.wikimedia.org/autoinstall/preseed.cfg" [15:27:15] mutante: 'contint[12]00[12]': [15:27:17] entry looks fine [15:27:25] contint2002? [15:28:13] correct [15:29:32] when we had the case where there just no match for the host name it would usually fail differently. like it would say it couldn't format the disks, but not that it cant download the file [15:29:53] let me see if I can repeat it though [15:30:19] yeah maybe a one-off [15:30:57] it's nice that these k8s traffic moves don't involve traffic-level config anymore, like they did back when we did e.g. php->hhvm [16:02:06] sukhe: next attempt it's sitting at "manually give me a DNS server address" ..sigh [16:03:25] mutante: what's the exact output out of curiosity [16:06:26] sukhe: https://phab.wmfusercontent.org/file/data/mmj6fvuscnzxezzdd2la/PHID-FILE-rmdkldofvtqmppe2l25q/Screenshot_from_2024-05-16_09-04-21.png [16:06:39] ouch [16:06:45] that's when I look at mgmt console of course [16:06:58] is it still there? [16:06:59] the cookbook is just at "couldn't detect reboot", which makes sense of course [16:07:38] yeah it's there, checking something [16:07:53] I was still trying to get pwstore file open:) ok [16:07:58] thanks! [16:14:25] mutante: ok so DHCP failing inside d-i. I also noticed that this is an R440 and first time upgrade to bullseye? [16:14:32] which is why you probably didn't hit this before [16:14:55] IIRC and I think check with papaul but worth upgrading the NIC firmware as we have seen weird issues with DHCP failing because of that [16:15:21] contint2002 has 21.81.3. IMO, 21.85 works best and I *think* there might be other tasks related to this [16:16:44] I just said "this might not be the same hardware platform and every once in a while it's the combo of distro version and installer and hardware" he [16:20:16] ha! [16:20:56] check with dc-ops about recommended NIC but we saw this in the cp hosts bullseye upgrades too fwiw https://phabricator.wikimedia.org/T321309 [16:21:58] thanks, will contact dcops [16:58:37] mutante: took a quick look, that host is at 1G with the embeded NIC, usually we don't see issues there [16:58:51] it is on the public vlan in row b, connected to one of the newer switches [16:59:28] topranks: see -dcops, we are also getting DHCPACKs sent from installserver [16:59:31] topranks: the DHCP server sends an ACK but the installer thinks it fails :p [16:59:34] before we migrated I tested reimage on that vlan and all was good, but this may be one of the first attempts since then to perform a reimage in this particular circumstances [16:59:40] now trying different Debian version [16:59:47] ah ok, let me move to DC-ops [16:59:51] dcops said I found a new error :) [21:38:40] Snapshot1008 is throwing timeouts during scap (it is ready for decom per https://phabricator.wikimedia.org/T364455 I think) - can someone depool it?