[07:09:46] there are quite a few CRITICAL Icinga alerts, could people have a look when they start their day and clear the ones relevant to them? https://icinga.wikimedia.org/alerts [10:35:28] apologies elukey Emperor if you have received some noise regarding VO, I assigned some overrides that were missing to devnull during your vacations to avoid we being notified [10:37:07] apparently you cannot set overrides for just business hours, you have to also set batphone to either yourself (if interchanged bussiness hours with someone else) or to dev null if on vacation [11:06:14] <_joe_> jbond: what's the best way to create a system user from a puppet class? [11:07:16] <_joe_> systemd::sysuser I guess? [11:07:17] _joe_: I can actually respond that as moritz has been converting some of that [11:07:20] yes [11:07:26] _joe_: yes systemd::sysuser [11:08:49] _joe_: also see https://wikitech.wikimedia.org/wiki/UID [11:10:01] specificaly if its a user that you want to exist everywhere then you can add it to data.yaml but if just for a few hosts/roles use systemd::sysuser [11:10:24] <_joe_> yeah the latter [11:10:38] ack [11:30:21] XioNoX: topranks Hi! any ideas why we can't reach virt.cloudgw.eqiad1.wikimediacloud.org from prometheus1005 but we can from alert1001? (we want to monitor that vip) [11:30:46] none! [11:30:50] but let me have a look :) [11:31:06] may be an acl restriction [11:32:27] thanks! gtg, something came up, be back in a bit [11:35:52] dcaro: So yeah alert1001 has a public / internet IP, and thus traffic to it is allowed in from the cloud realm (like any other internet host) [11:36:30] Prometheus1005 is on private WMF space, and the ACLs in place block traffic from public cloud IP ranges from getting to it. [11:42:02] Cloudgw VIP 185.15.56.244 (Vlan1120, handoff to cloud switches), is pingable from Prometheus1005 [11:42:40] What kind of traffic is required for the desired monitoring? Is it possible to use the Vlan1120 VIP as opposed to the inside one? [12:11:59] hmm, I see, we want to monitor both to make sure both sides of the 'nat' are up and running, but if it's not possible to ping the internal we'll have to find a different way of monitoring that one [12:59:47] dcaro: no we can make it work - is all that you need ICMP ping? [13:45:25] topranks: yep, that'd be awesome :) [13:54:14] dcaro: cool leave it with me should be easy enough to add that range to the existing acl. [13:54:54] topranks: thanks! you can ping me or add a comment here T314775 if you need anything :) [13:54:55] T314775: Unable to reach virt.cloudgw.eqiad1.wikimediacloud.org from prometheus1005 (through cr1) - https://phabricator.wikimedia.org/T314775 [16:52:13] I've had a couple of hosts (elastic1063 and 1065) fail reimage during the IPMI step. I verified that I can reach both hosts' mgmt interfaces with SSH, any idea what could cause this? [16:54:20] inflatador: I'd recommend checking with DCops [16:59:22] XioNoX ACK, will ask there [18:32:43] As I'm on clinic duty this week, I'm going to need to get my phabricator permissions updated: It looks like I don't have access to edit L3 objects [18:33:04] Would someone be willing to update my permissions or point me to the right person to ask? [18:48:17] brett: can you try now please [18:58:21] sukhe: All good! Thanks [19:07:03] sukhe: can you change https://phabricator.wikimedia.org/project/manage/29/ so it doesn't default to the work order [19:07:08] Workboard* [19:08:42] Should be on https://phabricator.wikimedia.org/project/29/item/configure/