[10:10:24] Is there anyone who has homer experience wrt to new k8s host that can take a look at https://gerrit.wikimedia.org/r/c/operations/homer/public/+/945547 ? [10:41:03] [out of scope] we should really get that data from netbox and no hardcode it IMHO [10:41:35] claime: mmmh I'm not sure if the new layout switched need the data there, so far I've seen the k8s_neighbors being defined in other places of the repo [10:41:43] I'd check with netops [10:42:04] volans: I think these are the first kube nodes in row f [10:42:18] and nothing in row E either I guess [10:42:20] (wikikube) [10:42:24] Nope [10:42:28] Don't think so [10:42:39] ok, hence I'm not 100% sure that's the right place [10:43:18] I'll wait for netops review then [10:43:21] but according to: [10:43:24] templates/asw/bgp_overlay.conf:{% if device_bgp.k8s_neighbors | d({}) %} [10:43:24] templates/asw/bgp_overlay.conf: {% set k8s_neighbors = device_bgp.k8s_neighbors -%} [10:43:49] it seems to me that it should work [12:40:32] volans: https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/903174 waiting for ServiceOps review [12:45:17] claime: lgtm but I'm in a train so I don't want to deploy it myself, I can assist if there is any issue [12:46:27] XioNoX: ack, I will deploy it after lunch (so in an hour or so). Is there a specific risk I should be aware of deploying it? [12:46:55] claime: nah, worse case you have an error but no risk of breaking things [12:47:07] awesome, thank you [15:50:35] headsup: cumin1001 will be rebooted tomorrow, if anyone has any long running tmuxes/screens for which that would be an issue, let me know [16:06:37] What could cause profile::netbox::host::location to be empty for a host being reimaged after rename ? [16:06:42] It has a location in netbox [16:07:24] did the hiera cookbook run? it's triggered by the dns one [16:07:38] volans: It should have yesterday [16:07:40] I'll check logs [16:08:05] hostname? [16:08:09] START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001" [16:08:11] No Changes to apply [16:08:13] kubernetes1025 [16:08:24] (and most probably 1026 too) [16:08:34] what's their status in netbox? [16:08:42] Planned [16:08:53] probably that, didn't he reimage flip that to staged? [16:09:29] Apparently not, since it's the reimage cookbook that's stuck waiting for the exported resource in puppetdb [16:09:46] And a puppet noop run on the host brings me to the profile::netbox::host::location empty error [16:10:28] mmmh is there a chicken and egg problem? you can change the status in netbox and then re-run the netbox hiera cookbook [16:10:42] but we mightt need a more permanent fix if this si a normal/valid workflow/use-case [16:10:45] I'll do that yeah [16:11:10] Invalid role/status (servers must not use STAGED) [16:11:14] lol [16:11:34] ah right... my bad [16:11:47] active [16:11:53] ack [16:11:59] reimage flips planned|failed to active [16:12:09] but only once done? [16:12:25] what's currently running? [16:12:43] reimage, at the cookbooks.sre.hosts.reimage.ReimageRunner._populate_puppetdb..poll_puppetdb step [16:13:49] option 1) if you have enough retry left yu can change netbox, run the cookbook, ssh to the host via install_console and run the same noop agent again, then let the cookbook resume once it finds the data [16:14:18] option 2) ctrl+c (just once, let the cookbook do its rollback), fix netbox/run the cookbook and then reimage again [16:14:19] yeah I should have enough retries [16:14:38] I have 3, but with the backoff it should be good [16:14:47] command is: puppet agent -t --noop &> /dev/null [16:14:59] yep, ty [16:15:06] running ok [16:15:16] we'll see if the cookbook picks up (i think it will) [16:15:36] So when renaming we should set them to active before running the dns cookbook? [16:17:18] I would check the puppetization instead, having the first puppet run depend on profile::netbox::host::location seems the wrong approach for how things are done right now [16:18:27] Fair enough [16:18:45] Found Nagios_host resource for this host in PuppetDB [16:18:47] \o/ [16:18:55] but ofc we could re-evaluate [16:19:23] was profile::kubernetes::node the one failing? [16:19:45] lol for the typo: # Get typology info from netbox data [16:19:49] ugh, my backlog isn't big enough [16:20:08] Checking my bash history, yeah [16:20:16] modules/profile/manifests/kubernetes/node.pp L87 [16:20:30] yep that one [18:57:21] Is there an easy way to use cumin to get all nodes belonging to a team? I'm trying `cumin 'P:contacts%role_contacts="Search Platform"' 'cat /etc/debian_version'` but I either have something wrong with quotes or with how we query attributes [18:57:33] https://wikitech.wikimedia.org/wiki/Cumin isn't super clear on this [19:06:26] gehel: tryl: cumin 'P:contacts%role_contacts~"Search Platform"' maybe [19:06:38] yeah that seems to work [19:07:21] sukhe: cool! that works! [20:16:54] gehel, sukhe: we have alias for each team [20:16:58] use A:owner-search-platform [20:17:18] volans: ha, til, thanks [20:17:25] and if you want the debian version: 'A:owner-search-platform and A:bookworm' [20:17:37] gives you already the list without running any command [20:17:44] sudo cumin 'A:owner-search-platform and A:bookworm' [20:17:46] to be precise [20:17:53] I never had a use for the team thing before but this is helpful [20:18:06] made me wonder if we have the right key set for traffic ones [20:19:20] volans: even better! Thanks! [20:21:05] yw :) [20:52:01] the team associations should be pretty complete at this point, there's some remaining unclear ownerships (irc.wikimedia.org or mailman) and sometimes the annotations are not immediatelely applies when a new role is created, but it's working quite well at this point