[01:17:51] 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: cp3079 bios settings - https://phabricator.wikimedia.org/T349314#9728534 (10ssingh) 05Openβ†’03Resolved a:03ssingh We fixed it but forgot to close this task so resolving. Thanks @Dzahn! [06:57:38] (LVSRealserverMSS) firing: Unexpected MSS value on [2a02:ec80:300:ed1a::3]:80 @ ncredir3004 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=esams&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [07:00:19] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9728741 (10gmodena) 05Openβ†’03Resolved >>! In T351117#9726516, @Fabfur wrote: >> I'm afraid mixing varnishkafka and be... [07:02:38] (LVSRealserverMSS) resolved: Unexpected MSS value on [2a02:ec80:300:ed1a::3]:80 @ ncredir3004 - TODO - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=esams&var-cluster=ncredir - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [08:03:01] (PurgedHighEventLag) firing: High event process lag with purged on cp5027:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5027 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [08:08:01] (PurgedHighEventLag) firing: (8) High event process lag with purged on cp5024:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [08:13:01] (PurgedHighEventLag) resolved: (10) High event process lag with purged on cp5024:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [08:44:01] (PurgedHighEventLag) firing: High event process lag with purged on cp5029:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5029 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [08:49:01] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp5029:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [09:16:01] (PurgedHighEventLag) firing: High event process lag with purged on cp5020:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5020 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [09:21:01] (PurgedHighEventLag) resolved: High event process lag with purged on cp5020:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5020 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [10:53:52] 06Traffic, 06Data Products, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117#9729252 (10Fabfur) 05Resolvedβ†’03In progress The `haproxy_id` field has been added to messages. (PS. I'll keep this open... [11:31:47] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9729282 (10cmooney) @ssingh I've reserved the following addresses in Netbox for the LVS now, let me know if you need any more info or if I can hel... [11:53:22] sukhe: (or anyone else from traffic!) [11:53:37] Is there any source that allows doing an zone transfer from our dns servers? [11:53:53] i.e. this: https://phabricator.wikimedia.org/P61016 [11:54:04] not super important more ways to check but figure I'd ask [12:07:52] topranks: we reject AXFR requests (I think you should be getting a NOTIMP?) and since gdnsd does not have any notion of ACLs, there is no trusted source, AFAIK [12:08:22] ok cool [12:08:33] indeed correct to reject them in general [12:08:38] but it’s only me guys :) [12:09:43] topranks: grouchomarx.png [13:04:26] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9729450 (10cmooney) >>! In T362421#9710346, @ayounsi wrote: > Prefixes assigned in Netbox: https://netbox.wikimedia.org/ipam/prefixes/?site_id=11 Thanks! > Next step is to c... [14:23:40] 06Traffic, 13Patch-For-Review: replace mtail with benthos on ncredir instances - https://phabricator.wikimedia.org/T362776#9729650 (10Vgutierrez) Testing benthos on ncredir2001 shows some concerning results (TL;DR it looks like benthos drops some messages and metrics aren't as accurate as expected). nginx is... [14:31:28] 06Traffic, 13Patch-For-Review: replace mtail with benthos on ncredir instances - https://phabricator.wikimedia.org/T362776#9729665 (10Vgutierrez) running another test this time with 3x10k requests it looks like the culprit is the socket_server UDP input that drops packets: ` processor_latency_ns_count{label="s... [16:11:52] 06Traffic, 10DNS, 06SRE: Authenticating wikimedia.org domain with MailChimp - https://phabricator.wikimedia.org/T362921#9729980 (10ssingh) Update is that we will need to add a DKIM record for MailChimp so a patch will follow. Rest everything seems to be in order. [17:55:20] 10netops, 06Traffic, 06DC-Ops, 06Infrastructure-Foundations, 10ops-magru: Q4:rack/setup/install magru misc servers - https://phabricator.wikimedia.org/T362730#9730356 (10ssingh) Thanks @cmooney, looks good! One small update to the above since we will most likely transpose these to `hieradata/common/lvs/i... [18:08:12] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: magru network setup - https://phabricator.wikimedia.org/T362421#9730419 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=2c797c95-485f-45b4-85c7-e8514173ae11) set by cmooney@cumin1002 for 0:20:00 on 4 host(s) and their se...