[00:05:36] 10netops, 10Continuous-Integration-Infrastructure, 10DC-Ops, 10SRE, 10ops-codfw: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))) - https://phabricator.wikimedia.org/T283582 (10Papaul) @hashar since Monday is a Holiday, let is do this on the 18th a... [08:31:57] (EdgeTrafficDrop) firing: 67% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [08:36:57] (EdgeTrafficDrop) resolved: 57% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [08:46:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp6002:9331 is unreachable - https://alerts.wikimedia.org [08:50:12] ^ expected... imaging the cp servers in drmrs [08:51:56] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp6002:9331 is unreachable - https://alerts.wikimedia.org [09:16:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp6010:9331 is unreachable - https://alerts.wikimedia.org [09:21:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp6010:9331 is unreachable - https://alerts.wikimedia.org [09:30:56] (EdgeTrafficDrop) firing: 57% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [09:35:56] (EdgeTrafficDrop) resolved: 57% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [09:46:56] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp6003:9331 is unreachable - https://alerts.wikimedia.org [09:56:56] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp6003:9331 is unreachable - https://alerts.wikimedia.org [09:57:28] mmandere: cool :) [10:15:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp6011:9331 is unreachable - https://alerts.wikimedia.org [10:20:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp6011:9331 is unreachable - https://alerts.wikimedia.org [10:24:56] (EdgeTrafficDrop) firing: 68% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [10:29:56] (EdgeTrafficDrop) resolved: 68% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [11:16:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp6012:9331 is unreachable - https://alerts.wikimedia.org [11:21:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp6012:9331 is unreachable - https://alerts.wikimedia.org [11:45:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp6005:9331 is unreachable - https://alerts.wikimedia.org [12:00:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp6005:9331 is unreachable - https://alerts.wikimedia.org [12:25:57] (EdgeTrafficDrop) firing: 68% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [12:30:57] (EdgeTrafficDrop) resolved: 69% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [12:34:57] (EdgeTrafficDrop) firing: 62% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [12:39:57] (EdgeTrafficDrop) resolved: 62% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [13:14:57] (EdgeTrafficDrop) firing: 66% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [13:19:57] (EdgeTrafficDrop) resolved: 63% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [13:24:56] (EdgeTrafficDrop) firing: 60% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [13:34:56] (EdgeTrafficDrop) resolved: 69% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org [13:37:31] can we silence this alert for drmrs for the time being? :) [13:47:18] yeah it's interesting [13:47:52] I was under the impression from the docs that jinxer runs from alertmanager, but I can't find the alert in the alertmanager UI, and the site= metadata only seems to have eqiad+codfw values there :P [13:51:59] maybe it's non-native is the issue [13:55:15] yeah the current state of affairs is definitely confusing! [13:55:55] there are icinga-based versions of these traffic drop alerts, which are defined only for the 5 prod sites. It's not yet defined there for drmrs. [13:56:07] so this has to be native in alertmanager [13:56:37] which doesn't have any matches in the UI for site=drmrs (or ulsfo, or esams. It seems to think only the core sites exist in the query completions in the UI) [13:56:52] checking the alertmanager repo... [13:59:08] that also doesn't seem to contain any global config limiting to just core sites. The traffic drop AM is generically-templated to operate on all available $site values, so that's how it picked up drmrs I guess. [13:59:16] but what's limiting the UI to the core sites? [15:02:24] it's probably silenced now, assuming I understand what I think I understand (which is that if you're trying to build a silencing rule while alert isn't firing, you can't really see that what you're doing matches the alert you think it will later match!) [16:07:31] 10netops, 10Infrastructure-Foundations, 10SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929 (10Tks4Fish) @faidon are there any updates on this? We've been discussing tooling to help with steward workflow, but they are highly dependent on IPv6. If that is already a problem locally, globally... [17:17:08] 10netops, 10Infrastructure-Foundations, 10SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929 (10cmooney) @Tks4Fish I don't think there is any reason to worry in terms of availability of IPv6 address space. Is there a specific proposal on the table requiring additional IPv6 address space for... [18:22:42] 10netops, 10Infrastructure-Foundations, 10SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929 (10Tks4Fish) @cmooney Sorry, I think I ended up asking in the wrong place. My question comes from T37947, and after looking at the comments there, I got to this task, saw it as stalled and concluded... [20:01:15] 10netops, 10Infrastructure-Foundations, 10SRE: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929 (10cmooney) @Tks4Fish no problem at all! And certainly no need to apologize. This task more relates to allocating blocks of IPv6 for Toolforge/Cloud. As per the above discussion there are some sma... [22:50:50] 10Traffic, 10MediaWiki-Uploading, 10SRE: ATS 502 on uploading non-small files - https://phabricator.wikimedia.org/T299160 (10Josve05a)