[00:05:25] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:05:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:05:40] FIRING: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:25:25] RESOLVED: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver2004:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:46:08] XioNoX, topranks o/ if any of you have a some time to help me I'd need to move ml-serve1014 and ml-serve1015 out of the analytics vlan :( [08:46:15] same issue as 1012 and 1013 [08:46:22] but I don't recall exactly the procedure [08:46:30] (without decomming and reprovisioning I mean) [08:46:35] :( [08:47:18] maybe a chance for Arzhel’s new cookbook option? [08:47:54] topranks: what's that? [08:48:24] did you not add a flag to change the IP / move vlan without a reimage? [08:48:34] maybe I imagined it :D [08:49:00] topranks: only to move to the new per rack vlan, not arbitrary vlans [08:49:43] would that not work here? I guess it’ll select the analytics per-rack vlan [08:51:24] yeah, and if it's already in the per-rack vlan it won't offer to move to the private vlan [08:51:58] elukey: changing it manually on the host is possible, and we can update Netbox/DNS [08:52:18] the bit I’m less sure about is keeping puppet happy [08:53:02] a reimage is going to be cleaner I think [08:53:19] just need to run puppet after and hope nothing relies on the host's IP definition in puppet [08:53:36] elukey: why we can't decom/recom? and is there a tracking task? [08:57:15] they were originally created https://phabricator.wikimedia.org/T400626, and now ML is adding them to k8s [08:57:24] I didn't check the analytics vlan bit, forgot about it [08:57:36] dcops may have assumed that for the all batch [08:57:49] I can decom and recom, it seemed a bit cumbersome [08:57:58] the last time it took no time with the ip flip [08:58:27] anyway, what you suggest is to run decom first, and then do the network provision again in netbox? [08:58:28] actually it might be better to do these the manual way [08:59:10] provisioning of them is also a hassle as they are not in a rack with a switch, it requires some Netbox cheating to make it work [08:59:25] :( [08:59:59] elukey: if you did the others then hopefully it’ll be smooth, we are ok to downtime and reboot them? [09:01:16] topranks: yep yep I am working with Tobias atm on them, they are not used at all [09:01:24] ml-serve1014 and ml-serve1015 [09:01:51] can I help and/or do it if you are busy? [09:01:55] didn't mean to dump that on you [09:02:17] ok well I can help out; it’s as easy to do it manual than decom/reprovision here I think [09:03:07] elukey: let me take a look a quick look and get back to you [09:05:05] thanks al ot [09:05:08] *a lot [09:14:50] elukey: I'm going to make the change on ml-serve1014 and reboot that ok? [09:15:58] yes! [09:16:23] ok! [09:30:35] elukey: ok it is back up with the new IP, puppet run went ok [09:31:32] \o/ [09:31:34] Icinga isn't happy - it can't ssh on [09:32:02] I suspect that is to do with it trying the old IP, or the old IP being in the ssh "known hosts" for it not sure [09:32:18] "socket timed out" - so I suspect trying the old IP [09:32:23] that should be fixed by the next puppet run on it in theory [09:32:37] yeah I think so [09:33:07] ssh alert just cleared \o/ [09:33:09] and calico is now happy, bgp session established [09:33:17] perfect [09:33:23] do you hate me if I ask the same for 1015? [09:33:26] ok I'll do ml-serve1015 the same way then I guess? [09:33:28] nah it's fine [09:34:26] yep yep anytime [09:35:54] ok doing it now! [09:40:33] elukey: did someone just reboot this server? [09:41:19] I was in the middle of editing /etc/network/interfaces, didn't save though so I think should be ok :) [09:42:30] I think it may have been Tobias, I told him to stop working on it :D [09:42:38] yeah it's ok he pinged me afterwards :P [10:08:43] elukey: ok ml-serve1015 is now done, calico is happy [10:09:16] topranks: you rock thanks a lot! [16:30:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:35:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:13:25] FIRING: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:18:25] RESOLVED: SystemdUnitFailed: check_netbox_uncommitted_dns_changes.service on netbox1003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed