[07:47:54] 10netbox, 10Infrastructure-Foundations, 10IPv6, 10User-jbond: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK) - https://phabricator.wikimedia.org/T253173 (10fgiunchedi) [09:36:14] hey teammates, given that jo.bo is out today and I'm not sure who's around I was wondering if we're doing the team meeting later or just skip it... not much happened since we last met in person :D [09:38:41] agreed, let's cancel :-) [09:51:47] Morning. I've got an odd case here. I thought I might share it in case it rings any bells. I've just added a new disk to matomo1002 and rebooted at the ganeti level with the cookbook. [09:52:32] Upon startup the `ferm` service is failing to start, so I can only get in via the ganeti console. The ferm logs show the following. [09:52:36] https://www.irccloud.com/pastebin/WEJBUlsm/ [09:54:04] The firewall failure is affecting piwik.wikimedia.org at the moment, which is causing some missing stats, particularly for the current sound logo campaign. [09:54:10] btullis: is that a transient failure or persistent? [09:54:16] if you restart ferm what it does? [09:54:29] also, weird that you can't ssh if ferm doesn't start, which rules got applied? [09:54:32] Persistent. I've restarted the service once since boot with the same error logged. [09:54:36] fyi there is not dns entry for backup1001.eqiad.wmnet [09:54:55] https://www.irccloud.com/pastebin/WRgtA6Q8/ [09:55:13] jbond: ? it resolves to 10.64.48.36 [09:55:20] it's https://netbox.wikimedia.org/ipam/ip-addresses/3398/ [09:55:24] This server probably hasn't been rebooted in a while. [09:55:47] btullis: so, if it's all accept, it's not ferm the issue [09:55:52] probably the VM doesn't have network [09:56:03] that would explain also the resolution failure [09:56:07] ignore me still need coffee, was testing ffrom local machine (which should still work but different issue) [09:56:12] can you ping the rest of the infra from inside the VM? [09:56:51] randome guess the interface has come up with a different name so didn;t get the right config from /etc/network/interface [09:57:34] No, can't resolve. I can check IP addresses, but I think you're probably right jbond. [09:58:05] * volans meant pinging IPs directly, but yes, most likely it's without network currently [09:58:49] this sounds like https://phabricator.wikimedia.org/T273026? (or is it only DNS that is failing?) [09:59:45] Yes, I meant I was about to check IP addresses after checking resolution, but jbond nailed it. ens5 had been renamed to ens14 - I have updated /etc/network/interfaces and rebooted. [10:01:00] That's fixed it. Running puppet now to make sure it's clean. Thanks all. [10:02:31] btullis: if you're still running the reboot cookbook it should check that the puppet run @boot does complete successfully [10:02:34] just FYI [10:03:24] Oh yeah, thanks. Cookbook still running. 👍 [10:12:50] the root cause of 273026 isn't understood so far, for some machine changes qemu reorders the PCI "slots", but there's no clear pattern [10:13:13] mid term I'm planning to switch away from the emulated pc-i440fx machine type [10:13:37] it's the default since that chipset is so old that you could e.g. even run Windows XP on it [10:13:58] and hopefully a more modern QEMU machine type will solve this [11:26:04] so, in the we'll skip the meeting? I just saw mo.ritz's reply in agreement :) [11:29:20] yeah probably makes sense to skip it [11:35:11] I think only Joanna can really drop the event, but I instead simply nacked the invite [11:43:48] yeah me too, and I'll send an email to the team for the others [11:44:29] jhathaway, cdanis: FYI no meeting today (see backlog) [11:45:13] {done} [11:45:29] and I've also put a note in the etherpad, I guess we should be able to reach everyone this way :) [11:47:16] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10taavi) [11:59:16] 10Mail, 10Analytics, 10Infrastructure-Foundations: kerberos manage_principals.py emails go to spam - https://phabricator.wikimedia.org/T318155 (10MoritzMuehlenhoff) Not sure what best to do here since we have no real insight why Gmail flagged it as such. We could maybe send these with a dedicated @wikimedia... [12:11:35] 10Mail, 10Analytics, 10Infrastructure-Foundations: kerberos manage_principals.py emails go to spam - https://phabricator.wikimedia.org/T318155 (10BTullis) I have seen other reports of this from end-users. e.g. T317545#8240269 so I think it would be a nice one to address. I don't see any refererence to DKIM... [13:41:40] volans: thanks [14:23:22] FYI I've resumed work on https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/817739/ and I'm testing a new iteration of hiera_export.py on netbox-dev2002 [14:23:49] nice! cc jbond ^^^ [14:25:30] on this, I was wondering if there's a "shell" for netbox's context like a python repl I can interactively play with ? [14:26:09] godog: https://wikitech.wikimedia.org/wiki/Netbox#nbshell [14:26:30] nice, thank you jbond !