[13:30:36] cwhite: Possibly. I'm not familiar with the machines that run the statsv daemon. Would they not try to resolve the hostname themselves? [15:14:38] hey [15:15:29] phuedx: are you familiar with the code path in mediawiki that emits those page previews metrics? [15:26:59] I see nothing in statsv logs [15:31:44] <_joe_> are the other statsd metrics from mediawiki still being received? [15:33:34] the prometheus-statsd exporter on webperf1003 is trying to reach graphite1005:8125, but there's nothing on that port [15:34:31] <_joe_> ok that sounds like an issue, is that related? [15:34:36] the prometheus-statsd exporter on webperf1003 is trying to reach graphite1005:8125, but there's nothing on that port [15:34:38] sorryy [15:34:47] statsd-proxy is running, but no logs [15:35:28] Ah no, my bad, it's udp so it IS listening [15:35:33] udp 0 0 0.0.0.0:8125 0.0.0.0:* 1010/statsd-proxy [15:35:46] but not on v6 [15:35:48] But it seems to only be listening on udp4 [15:35:50] yep [15:36:29] We can assume all prometheus-statsd-exporter services have the same issue [15:36:36] T271138 [15:37:17] (taking a look too) [15:37:53] the hosts entry was ipv4 yeah? and dns is now returning ipv6 [15:37:55] <_joe_> claime: prometheus-statsd-exporter should not be connecting to statsd though [15:38:10] <_joe_> so I would rather imagine it's some firewall/connectivity issue [15:38:14] It's trying to connect to statsd-proxy [15:38:28] It's the right IP, the right port [15:38:44] --statsd.relay-address=statsd.eqiad.wmnet:8125 [15:39:00] <_joe_> ah I see ok, so it's another issue but I don't think is the problem phuedx was showing us [15:39:06] <_joe_> as his data comes from graphite [15:39:08] <_joe_> not prometheus [15:39:41] <_joe_> claime: I'd say rollback anyways your change [15:39:45] ack [15:39:50] <_joe_> we can keep investigating later [15:40:04] SGTM [15:40:20] <_joe_> but yeah we already found at least one thing that's broken [15:40:41] <_joe_> I guess a similar problem could be there for statsd-proxy and graphite? [15:40:54] That's what I'm thinking too [15:45:03] Reverted [15:45:15] Should I do puppet run on some particular hosts? [15:46:29] cdanis: Sorry. I was in a meeting. Yes. I'm familiar with the code path that emits those metrics :) [15:46:31] mmhh maybe webperf to start with [15:47:20] Done on webperf1003, and restarterd the prometheus-statsd-exporer [15:47:30] but yeah I think we're back [15:47:35] for follow up we could update statsd.eqiad.wmnet to an v4 only A record (mirror the host entry), instead of the current CNAME [15:47:54] I don't see anything about v6 support in statsd-proxy, not sure offhand there [15:48:05] agree, I'm seeing metrics again in graphite [15:48:21] Yeah we're getting data again [15:49:35] phuedx: sorry for breaking it [15:54:53] sorry for the false sense of security re: this change claime :( [15:55:05] no worries, it happens [15:55:26] I should have done a bit more due diligence on what metrics could be affected, I'd have caught it way earlier [15:55:35] on the "upside" it is udp so nothing was immediately... ONFIRE [15:55:38] ᕕ( ᐛ )ᕗ [15:55:46] Heh [15:56:06] but yeah mw was fine since it uses the v4 anyways [15:56:39] herron: yeah doesn't look like there's v6 support, as much as it pains me I think forcing the v4 address is the lesser evil [16:06:11] claime: Not at all. Thank you for the quick fix! [16:06:40] This is another reason to migrate dashboards to Prometheus :) [16:11:49] indeed