[06:54:48] going to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1146021 (no impact expected) [06:55:18] ack [08:10:19] oncaller: I just merged https://gerrit.wikimedia.org/r/c/operations/alerts/+/1136383 that introduces new alerts for haproxykafka similar to the ones for varnishkafka (low message rate from instances). Any (unwanted) alert is on me [08:11:22] roger [08:11:24] thanks [08:16:18] ack [08:20:59] marostegui: I'd like to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/1145902 would you be available to test it with a paging db host ? [08:22:24] godog: Sure [08:22:32] I can reboot a host without downtiming if that's what you mean? [08:22:46] marostegui: yes exactly, ok I'll merge and let you know shortly [08:22:51] cool [08:23:40] volans: ^ expect pages [08:24:17] k [08:25:03] _and_ recoveries [08:29:38] marostegui: ready when you are [08:29:44] going for it with db1187 [08:30:33] ack [08:30:42] mariadb stopped, rebooting [08:31:05] godog: should we ack or not? [08:31:31] volans: either way, incident should be resolved anyways [08:32:05] May 15 08:31:55 alert1002 icinga[1502926]: HOST ALERT: db1187;DOWN;HARD;2;PING CRITICAL - Packet loss = 100% [08:32:06] first one here [08:32:07] acked [08:32:59] ok standing by for the host to come back [08:33:04] yep [08:33:46] host back [08:34:08] recovery in IRC [08:34:16] and in victorops [08:34:30] looks good to me [08:34:54] [10:34:49] !incidents [08:34:54] [10:34:50] 6124 (RESOLVED) Host db1187 (paged) [08:35:11] So it looks fixed! [08:35:19] yeah \o/ [08:35:19] nice [08:35:25] Thanks godog [08:35:40] sure np marostegui [08:35:57] looking forward to no more pages sent via email and parsing subjects [08:53:12] hi, anyone around that can help with a train issue? MW image build process keeps getting stuck and I can't roll back the train [08:55:46] maybe someone from serviceops? [09:00:43] will try there, thx [09:06:46] image build process is working again [10:21:43] I'm surely Doing It Rong, but a test cookbook just afiled with: 'spicerack.hosts.HostError: Unable to find host ms-fe2009.codfw.wmnet in Netbox' [10:21:50] failed, even [10:22:36] Emperor: just to exclude the most obvious thing, netbox has hostnames, not fqdns [10:23:01] oh, I keep getting tripped up by which bits of spicerack need hostname and which need fqdn :( It's probably that. [10:24:43] the spicerack.* accessors api IIRC the only one that wants ip/fqdn is the legacy IPMI module (just by memory) [10:25:30] all the remote stuff works with fqdns, but that's implicit as a result of the queries [15:53:51] oncaller: I'm about to repool cp3073 and cp3081, the two hosts with the experimental geoip script, I'll keep an eye on latencies and eventual errors (see mail sent to sre-at-large) [16:18:06] ^^ haproxy configuration on the two hosts has been reverted and the two hosts repooled