[05:54:56] (HAProxyEdgeTrafficDrop) firing: 68% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:59:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:26:40] 10Traffic, 10DNS, 10Infrastructure-Foundations, 10SRE, and 2 others: sre.dns.netbox cookbook dosn't support period terminated domains - https://phabricator.wikimedia.org/T306809 (10ayounsi) [07:31:56] (HAProxyEdgeTrafficDrop) firing: (2) 46% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:36:56] (HAProxyEdgeTrafficDrop) resolved: (2) 47% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:45:59] 10Traffic, 10DNS, 10Infrastructure-Foundations, 10SRE, and 2 others: sre.dns.netbox cookbook dosn't support period terminated domains - https://phabricator.wikimedia.org/T306809 (10ayounsi) 05Stalled→03Open [08:01:56] (HAProxyEdgeTrafficDrop) firing: 66% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:06:57] (HAProxyEdgeTrafficDrop) resolved: 67% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:11:28] vgutierrez: brett: i notice that DNS CI is not not passing, https://integration.wikimedia.org/ci/job/operations-dns-lint-docker/3948/console [09:11:45] looks like it could be related to the work bret was doing yesterday re: https://netbox.wikimedia.org/search/?q=lvs4007.ulsfo.wmnet&obj_type= [09:17:36] i created the following task https://phabricator.wikimedia.org/T311290 [09:18:57] the message seems wrong, I assume it should expect 2 [09:18:57] 10Traffic, 10MediaWiki-Debug-Logger, 10SRE, 10noc.wikimedia.org, and 2 others: noc.wikimedia.org with X-Wikimedia-Debug routes to mwdebug but host is not served there - https://phabricator.wikimedia.org/T245552 (10Nintendofan885) [09:19:20] https://github.com/wikimedia/operations-dns/blob/master/utils/zone_validator.py#L782 doesn't seem to account for more than 2 ips where it's dual stack [09:19:51] RhinosF1: indeed i think this is iomprovments that are being made. its possibe it just needs a patch to the CI check [09:20:33] 10Traffic, 10SRE, 10noc.wikimedia.org: noc.wikimedia.org consistently 503s in eqsin and sometimes 503s in esams - https://phabricator.wikimedia.org/T255368 (10Nintendofan885) [09:21:05] i dont think the fact lvs4007 has addtional ip addresses is an issue. i marked the task as high as failing CI prevents or adds barries to makeing other DNS changes [09:21:19] (but wait for traffice for an authorative answer on that) [09:22:53] jbond: it has 3 IPs listed not the 2 I assume it should (one v4 and one v6) but the script claims it expects 1. I don't think the script handles a dual stack with more than 1 IPv4 or IPv6 (check seems only to account for 2 IPv4 (and maybe 2 IPv6) [09:22:55] jbond: yeah we can set the public v6 fqdn to match the public v4 [09:23:52] it's also a public IP so it shouldn't be in the .wmnet relam [09:25:50] jbond: running the dns cookbook [09:26:01] XioNoX: cheers [09:29:58] jbond: should that have been caught sooner? [09:30:18] jbond: alright the cookbook failed [09:31:05] jbond: it's the same for lvs4005.ulsfo.wmnet. and lvs4006.ulsfo.wmnet. fixing [09:31:58] XioNoX: i think that it should have been caught when people run the cookbook :) and failing to run the cookbook shuold triger the uncommited dns changes alert (however i dont see that triggered, looking) [09:35:10] https://netbox.wikimedia.org/ipam/ip-addresses/?q=vl1201 [09:36:20] it dosn;t look right that they have the SLAAC address in netbox [09:36:55] jbond: I don't like it neither, but it's the current way... [09:37:01] oh ok [09:37:23] it's what's configured on the host, and imported by puppetDB [09:37:30] ahh ack [09:38:26] dose that mean that interface::add_ip6_mapped didn;t work properly [09:40:24] jbond: I think the LVS are a special case :) [09:41:01] jbond: should be good now [09:41:10] ack indeed the vl1202 interface dosn;t have an address that makes senses mapping [09:41:17] thanks [09:44:45] 10Traffic, 10DNS, 10SRE: DNS CI is broken - https://phabricator.wikimedia.org/T311290 (10ayounsi) 05Open→03Resolved a:03ayounsi Error message can be a bit criptic but if deployed to DNS this would have meant that: `lvs4005.ulsfo.wmnet` (and similar) pointed to both AAAA 2620:0:863:1:f6e9:d4ff:feba:f46... [10:06:07] 10Traffic, 10Ganeti, 10SRE, 10Patch-For-Review: Remove SLAAC IPs from Ganeti hosts - https://phabricator.wikimedia.org/T265904 (10MoritzMuehlenhoff) [12:37:59] 10Traffic, 10Data-Engineering, 10SRE, 10Patch-For-Review: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10zeljkofilipin) [15:01:11] 10Traffic, 10SRE, 10ops-eqsin: SSH on cp5012.mgmt is flapping (CRITICAL) - https://phabricator.wikimedia.org/T311264 (10ssingh) >>! In T311264#8024288, @RobH wrote: > So if the idrac is accessible, the firmware update isn't OS impacting. However, I cannot login to this idrac interface via HTTPS or SSH, so i... [19:40:39] 10Traffic, 10DNS, 10SRE, 10WMF-Legal, and 2 others: Setup redirect of policy.wikimedia.org to Advocacy portal on Foundation website - https://phabricator.wikimedia.org/T310738 (10Varnent) My meeting will happen before they shut things off - so there will likely be a slight delay into July - and I should ha...