[03:26:56] (HAProxyEdgeTrafficDrop) firing: (2) 33% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [03:31:56] (HAProxyEdgeTrafficDrop) resolved: (5) 66% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:56:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:01:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:01:47] Hello. I'm looking to add a trafficserver backend mapping rule. Would anyone be able to check it for me please? https://gerrit.wikimedia.org/r/c/operations/puppet/+/779840 [09:02:55] sure [09:04:07] Many thanks. [09:05:58] hmm are you sure that's the right port? [09:06:10] https://www.irccloud.com/pastebin/4szfCc4p/ [09:06:37] TLS isn't working [09:10:00] hmm it's a SNI only service :) [09:14:07] vgutierrez: Thanks. I'm not sure that the dicovery records are all linked up at the moment. I'm uncertain of whether I need to make changes to the service catalog or to DNS. Here's my cry for help. https://phabricator.wikimedia.org/T303049#7851915 [09:14:28] nah, it's working as expected [09:14:33] https://www.irccloud.com/pastebin/SVktfPGL/ [09:14:52] as soon as I provided the -servername datahub.wikimedia.org it worked [09:14:52] Ah, cool. Thanks. [09:15:02] I've +1ed the CR [09:15:14] the only thing apparently missing is the public DNS record [09:15:14] Great, thanks. <3 [09:15:30] Host datahub.wikimedia.org not found: 3(NXDOMAIN) [09:15:31] That's here. https://gerrit.wikimedia.org/r/c/operations/dns/+/779839 [09:15:35] but that should be easy to fix :) [09:16:31] btullis: great :) [09:16:36] +1ed as well [09:17:03] Excellent. Many thanks. Pressing the buttons now. :-) [09:17:24] feel free to ping me directly for traffic related CRs [09:18:53] Thanks. Will do. [09:29:39] I've triggered a manual puppet run on cp6009 and tested the new rule [09:29:59] https://www.irccloud.com/pastebin/jQ3Ry8rf/ [09:36:50] Nice. Thanks. [09:47:56] (HAProxyEdgeTrafficDrop) firing: 54% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:52:56] (HAProxyEdgeTrafficDrop) resolved: (3) 59% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:58:20] btullis: BTW.. who is the person that I should reach on your team to ask some stuff about intake-analytics / eventgate-external? [11:00:33] Well ottomat.a will very likely know the most, but you can feel free to ask me and I'll do my best to answer. [11:07:32] ack, I'll suscribe you to a phab task this afternoon then [11:07:47] Nice. 👍 [11:08:38] 10Traffic, 10Prod-Kubernetes, 10SRE, 10serviceops, and 2 others: service::catalog entries and dnsdisc for Kubernetes services under Ingress - https://phabricator.wikimedia.org/T305358 (10akosiaris) > * The monitoring: stanza can't be added as having that without lvs: breaks icinga. Can potentially be ignor... [11:28:31] 10Traffic, 10Prod-Kubernetes, 10SRE, 10serviceops, and 2 others: service::catalog entries and dnsdisc for Kubernetes services under Ingress - https://phabricator.wikimedia.org/T305358 (10BTullis) >> The monitoring: stanza can't be added as having that without lvs: breaks icinga. Can potentially be ignored... [12:16:44] 10Traffic, 10Analytics: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10Vgutierrez) [12:28:38] 10Traffic, 10Prod-Kubernetes, 10SRE, 10serviceops, and 2 others: service::catalog entries and dnsdisc for Kubernetes services under Ingress - https://phabricator.wikimedia.org/T305358 (10JMeybohm) >>! In T305358#7854870, @akosiaris wrote: >> * The monitoring: stanza can't be added as having that without lv... [13:46:36] 10Traffic, 10Analytics, 10SRE: intake-analytics is responsible for up to a 85% of varnish backend fetch errors - https://phabricator.wikimedia.org/T306181 (10Vgutierrez) p:05Triage→03Medium [15:04:59] 10netops, 10Infrastructure-Foundations, 10SRE: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10ayounsi) [20:29:50] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: 2M 25G DAC testing - https://phabricator.wikimedia.org/T306220 (10RobH) p:05Triage→03Medium [20:30:03] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: 2M 25G DAC testing - https://phabricator.wikimedia.org/T306220 (10RobH) [20:31:03] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: 2M 25G DAC testing - https://phabricator.wikimedia.org/T306220 (10RobH) [20:32:04] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: 2M 25G DAC testing - https://phabricator.wikimedia.org/T306220 (10RobH) This was detailed on the procurement task, and I've migrated the testing to this onsite related task. [20:51:10] 10netops, 10Infrastructure-Foundations, 10SRE: Cannot verify NTP status asw1-b12-drmrs - https://phabricator.wikimedia.org/T305840 (10cmooney) I've opened a case with Juniper, let's see what they say. [21:00:52] 10Traffic, 10SRE: per-backend-service concurrency limits in ATS-BE - https://phabricator.wikimedia.org/T306223 (10CDanis) [21:08:22] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Agree how to handle port-block speeds for QFX5120-48Y - https://phabricator.wikimedia.org/T303529 (10cmooney) So I've been able to check the options here on the QFX5120 platform. It is **not** possible to mix 10G and 25G SFP modules in the... [21:10:22] 10netops, 10Infrastructure-Foundations, 10ops-eqiad: 2M 25G DAC testing - https://phabricator.wikimedia.org/T306220 (10cmooney) 05Open→03Resolved Thanks for the help on this one @Jclark-ctr. All done with the testing you can remove those cables and leave them with our others. thanks! [21:20:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Test port-block constraints on QFX5120 devices - https://phabricator.wikimedia.org/T304934 (10wiki_willy) a:03Jclark-ctr [21:27:17] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10netbox: Avoid ghost hosts on the network - https://phabricator.wikimedia.org/T306007 (10wiki_willy) Hi @ayounsi - can you provide a few recent examples of when this has triggered alerts? We're trying to align and find some patterns, to tweak things proce... [21:29:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Agree how to handle port-block speeds for QFX5120-48Y - https://phabricator.wikimedia.org/T303529 (10cmooney) Actually I should clarify, it *may* be possible to use the channel-speed syntax to configure the switch in blocks of 2, it allows... [21:38:17] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Test port-block constraints on QFX5120 devices - https://phabricator.wikimedia.org/T304934 (10cmooney) 05Open→03Resolved Closing ticket, duplicate. Results detailed in https://phabricator.wikimedia.org/T303529#7856797 [21:38:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Agree how to handle port-block speeds for QFX5120-48Y - https://phabricator.wikimedia.org/T303529 (10cmooney) [21:39:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: 2M 25G DAC testing - https://phabricator.wikimedia.org/T306220 (10cmooney) 05Resolved→03Open Actually I spoke too soon, there is one other combination I want to check. @Jclark-ctr could you move the 10G cable in port xe-0/0/1 to port xe-0/0/2... [23:07:56] (HAProxyEdgeTrafficDrop) firing: (3) 18% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:12:56] (HAProxyEdgeTrafficDrop) firing: (6) 35% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [23:17:56] (HAProxyEdgeTrafficDrop) resolved: (6) 64% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop