[04:37:35] 10netops, 10Infrastructure-Foundations, 10serviceops: TCP retransmissions in eqiad and codfw - https://phabricator.wikimedia.org/T291385 (10Marostegui) p:05Triage→03Medium [06:51:42] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10ayounsi) a:05wiki_willy→03ayounsi AMS-IX NOC emailed to schedule the change, with vlan 380 for IX and 381 for NaWas. [07:08:54] 10netops, 10Infrastructure-Foundations, 10serviceops: TCP retransmissions in eqiad and codfw - https://phabricator.wikimedia.org/T291385 (10jijiki) >>! In T291385#7365671, @cmooney wrote: We have been living with this for quite a long time, we can wait a little longer :) > Should we de-pool those two boxes... [09:01:57] 10Traffic, 10Infrastructure-Foundations, 10SRE: OpenSSL < 1.1.0 compatibility issues with new LE issuance chain - https://phabricator.wikimedia.org/T283165 (10Joe) [09:58:06] Traffic pad is up for the meeting later today [09:58:52] I hope to make it if I continue to feel ok, should that change please do meet and discuss Q2 without me :) [10:25:10] question_mark: ack! [13:32:14] 10netops, 10Infrastructure-Foundations, 10serviceops: TCP retransmissions in eqiad and codfw - https://phabricator.wikimedia.org/T291385 (10joanna_borun) 05Open→03In progress [13:33:04] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10joanna_borun) 05Open→03In progress [13:56:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10ayounsi) Updated Netbox, resulting diff: `lang=diff [edit interfaces ae2] + flexible-vlan-tagging; + encapsulation flexible-ethern... [14:36:17] 10netops, 10Infrastructure-Foundations, 10SRE, 10procurement: Move AMS-IX port to 802.1q tagged and get "private vlan" added - https://phabricator.wikimedia.org/T291407 (10ayounsi) 05In progress→03Resolved All done. [15:19:50] 10Traffic, 10SRE, 10SRE Observability (FY2021/2022-Q2): VarnishTrafficDrop alert false positives due to DCs depooled - https://phabricator.wikimedia.org/T291148 (10lmata) [15:22:17] 10Traffic, 10DC-Ops, 10SRE, 10decommission-hardware, 10ops-eqiad: reclaim cescout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T275696 (10Cmjohnson) [15:22:30] 10Traffic, 10DC-Ops, 10SRE, 10decommission-hardware, 10ops-eqiad: reclaim cescout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T275696 (10Cmjohnson) 05Open→03Resolved [16:00:16] 10netops, 10Infrastructure-Foundations, 10SRE: Create an alert for output discards on network devices - https://phabricator.wikimedia.org/T284593 (10joanna_borun) 05Open→03In progress [16:00:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-tools, and 3 others: Investigate Capirca - https://phabricator.wikimedia.org/T273865 (10joanna_borun) 05Open→03In progress [16:01:57] 10netops, 10Infrastructure-Foundations, 10SRE: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10joanna_borun) 05Open→03In progress [16:06:39] Anyone willing to review patches for a new LVS service? No clue what i'm doing and the docs say that means i'll break everything :) https://gerrit.wikimedia.org/r/c/operations/puppet/+/713959 [17:11:47] 10Traffic, 10SRE, 10Patch-For-Review: Deploy durum: check service for Wikidough - https://phabricator.wikimedia.org/T289536 (10cmooney) Thanks for all the background info here. Regarding the use cases for the manual entries, yes there are probably some (like wikimedia-dns.org) that we could adjust the scrip... [18:09:46] quick sanity check on the steps for adding a new wcqs.svc.[codfw,eqiad].wmnet re https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only) [18:09:48] docs say: [18:09:49] > llocate an IP address per colo to serve your content on Netbox [18:09:51] allocate* [18:10:05] in this case colo means a dc, right? i.e. in this case, one IP for eqiad and one for codfw [18:18:16] ebernhardson: looking [18:20:27] ryankemper: yes, that's correct [18:21:03] if you look at https://netbox.wikimedia.org/ipam/prefixes/93/ip-addresses/ (eqiad) and https://netbox.wikimedia.org/ipam/prefixes/92/ip-addresses/ (codfw) you'll see the two lists are identical [18:26:55] ack. I created https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063/, would appreciate a spot check [18:27:10] I changed the netmask to `/32`, set the role as `VIP` and left tenant blank [18:28:15] ryankemper: lgtm, make sure you run the sre.dns.netbox cookbook too [19:12:40] So is it correct to say that `sudo authdns-update` generates the new zonefiles and the `sre.dns.netbox` cookbook deploys those files? Or is there no ordering dependency between them (i.e. you don't have to run the authdns-update before the netbox cookbook) [19:15:37] ryankemper: other way around mostly, the netbox cookbook takes the data from netbox and generates the zonefiles (see https://phabricator.wikimedia.org/source/netbox-exported-dns/history/master/). and it also runs `authdns-update` for you as well. [19:16:21] legoktm: ah okay. I figured that the fact that the nameserver is queryable after the authdns-update meant something was off with my understanding [19:16:37] legoktm: think it'd be worth me adding a note to https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile that those steps don't need to be done manually if running the cookbook? [19:17:26] similarly, https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only) links to https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile and says to follow those steps and then afterwards says to run the cookbook (I imagine those sections might've been written before the netbox cookbook handled that?) [19:53:02] ryankemper: for svc currently you need both authdns updatea adn the cookook, they do two different things, the cookbook is a noop and the one "real" is the manual change in the dns repo, that's because of T270071 that is linked there too [19:53:02] T270071: SVC DNS zonefiles and source of truth - https://phabricator.wikimedia.org/T270071 [19:53:42] the cookbook change is just because adding the ips to netbox will raise an alert if those changes will not get pushed to the generated files [19:55:04] we should resume the work on the task and made a decision on what to do to the SVC zonefiles, and go in one direction or the other, we're currently a bit in a limbo. [20:00:27] and to add some clarity, the manual authdns-update will take care of deploying any manual changes made in the operations/dns repo, while the cookbook takes care only of the auto-generated files from Netbox data. They are totally orthogonal one to the other. [20:10:50] 10Traffic, 10Analytics, 10Analytics-Kanban: Review use of realloc in varnishkafka - https://phabricator.wikimedia.org/T287561 (10odimitrijevic) @elukey Thanks for reviewing the patch. Based on your question in the pr it is unclear that there is a specific issue that this PR addresses. > Trying to add some... [21:10:57] (VarnishTrafficDrop) firing: 58% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [21:13:59] volans: thanks that was very helpful! [21:15:57] (VarnishTrafficDrop) resolved: 64% GET drop in text@eqsin during the past 30 minutes - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [21:54:48] 10HTTPS, 10MediaWiki-General: Protocol-relative URLs are poorly supported or unsupported by a number of HTTP clients - https://phabricator.wikimedia.org/T54253 (10Krinkle)