[00:49:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:54:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [01:56:22] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:06:14] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:11:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:12:03] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:16:14] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:21:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:26:17] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:26:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:27:03] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:31:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:36:17] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:36:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:37:49] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:41:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:47:51] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [02:57:49] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:02:51] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:03:32] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:06:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:08:32] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:10:15] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:19:59] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:20:15] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:21:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:22:09] FIRING: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [03:22:21] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:27:09] RESOLVED: LVSHighRX: Excessive RX traffic on lvs2013:9100 (eno12399np0) - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs2013 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [03:29:59] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [03:51:48] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:01:48] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:06:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:07:21] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:11:22] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:16:22] RESOLVED: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:23:14] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:28:14] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:33:14] FIRING: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [04:38:14] RESOLVED: [2x] SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [10:38:31] 10netops, 06Infrastructure-Foundations, 06SRE: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10483284 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=fe2806ef-4f5c-4485-981c-52b89f9e3154) set by cmooney@cumin1002 for 2:00:00 on 1 host(s) and th... [11:08:24] hello hello - I'd like to add a gateway rule for routing citoid traffic via the rest-gateway (as opposed to restbase) for testwiki only. Would that be okay? I'd disable puppet and do a single host rollout to test as usual https://gerrit.wikimedia.org/r/c/operations/puppet/+/1113178 [11:53:12] hnowlan: looking good from my PoV [12:08:40] vgutierrez: thanks! mind if I do the rollout now-ish? disable puppet on A:cp, enable on one host (I usually do cp2037), etc [12:09:35] fabfur: are you going to be around? [12:09:56] I'll be away from my computer ~45 minutes [12:10:10] hnowlan: but I think you can proceed [12:12:05] 06Traffic, 13Patch-For-Review: issue unified cert using pki.goog - https://phabricator.wikimedia.org/T384195#10483697 (10Vgutierrez) 05Open→03Stalled This is currently blocked on pki.goog side: ` Problem for *.wikipedia.org: urn:ietf:params:acme:error:rejectedIdentifier :: The server will not issue certif... [12:12:59] thanks - it's fairly low-risk hopefully. [12:13:19] yeah I'm here [12:34:03] oh dear, service owners most likely need to make some changes to have suitable headers [12:34:37] I'll revert [12:40:56] done. I'll come back to this one at some point, thanks! :) [12:44:41] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 2 others: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10483801 (10hnowlan) A test rollout of the routing for testwiki was successful when testing internally from a routing perspective, bu... [12:56:07] ack [13:06:06] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10483855 (10cmooney) The above patch adds BGP stats collection to our current setup. Tested in Magru and working well, albeit with a few quirks disc... [13:16:09] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10483868 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=ba072b6c-6957-428b-a932-dfcf0b3f8103) set by cmooney@cumin1002 for 2:00:... [13:46:27] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 2 others: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10483956 (10Mvolz) >>! In T361576#10483801, @hnowlan wrote: > A test rollout of the routing for testwiki was successful when testing... [14:07:34] Hey Traffic! If I wanted to add a new domain name to an existing VIP, do I need to touch LVS? Or would that just be ATS and DNS (or something else)? [14:08:46] load balancers aren't aware of DNS, [14:09:37] assuming it's a public endpoint you need to create the DNS records and adjust the ATS backend rule to catch the new domain name as well [14:15:26] inflatador: feel free to add us to reviews if that helps make it easier [14:17:11] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BGP multipath at internet edge - https://phabricator.wikimedia.org/T384473 (10cmooney) 03NEW p:05Triage→03Low [14:17:15] 10netops, 06Infrastructure-Foundations, 10Sustainability (Incident Followup): Optimise WMF WAN Network Configuration - https://phabricator.wikimedia.org/T297355#10484100 (10cmooney) [14:17:17] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BGP multipath at internet edge - https://phabricator.wikimedia.org/T384473#10484099 (10cmooney) [14:17:24] vgutierrez sukhe Thanks, will do! [14:17:35] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BGP multipath at internet edge - https://phabricator.wikimedia.org/T384473#10484103 (10cmooney) [14:18:45] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BGP multipath at internet edge - https://phabricator.wikimedia.org/T384473#10484104 (10cmooney) [14:19:16] 10netops, 06Infrastructure-Foundations, 06SRE: Enable BGP multipath at internet edge - https://phabricator.wikimedia.org/T384473#10484107 (10cmooney) [14:48:07] 06Traffic: Replace pybal with liberica on the PoPs - https://phabricator.wikimedia.org/T384477 (10Vgutierrez) 03NEW [15:24:34] 06Traffic: clean up testlb services - https://phabricator.wikimedia.org/T384486#10484507 (10Vgutierrez) p:05Triage→03Medium [15:29:14] 06Traffic, 10Citoid, 06Editing-team, 10RESTBase Sunsetting, and 2 others: Switchover plan from restbase to api gateway for Citoid - https://phabricator.wikimedia.org/T361576#10484518 (10hnowlan) >>! In T361576#10483956, @Mvolz wrote: > >>>! In T361576#10483801, @hnowlan wrote: >> A test rollout of the rou... [15:52:38] FIRING: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.225:443 @ cp7005 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [15:57:38] RESOLVED: [4x] LVSRealserverMSS: Unexpected MSS value on 195.200.68.225:443 @ cp7005 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=magru&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:02:08] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 185.15.58.225:443 @ cp6011 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:07:08] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 185.15.58.225:443 @ cp6011 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:12:53] FIRING: [12x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.225:443 @ cp5022 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:17:53] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.225:443 @ cp5022 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [16:31:47] side effect of removing the test IPs :) [17:30:30] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10485502 (10andrea.denisse) Looking at the changelog I wonder if this issue could be related to this [[ https://github... [18:04:12] 10netops, 06Infrastructure-Foundations, 10observability, 06SRE: LibreNMS reporting no routes learnt from doh/durum Anycast peers at various POPs - https://phabricator.wikimedia.org/T384258#10485755 (10cmooney) >>! In T384258#10485502, @andrea.denisse wrote: > Looking at the changelog I wonder if this issue... [18:48:13] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10485972 (10CDanis) > All of this does suggest we should probably look at running distributed collectors as we move to productionize this, potentiall... [21:34:02] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10486590 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=fe40d399-fce9-41c4-b12a-4bcb36770f4b) set by cmooney@cumin1002 for 1:00:... [21:47:18] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10486643 (10cmooney) >>! In T369384#10485972, @CDanis wrote: > The aux clusters are waiting for us :D and we do have one in codfw as well now. Yep i... [23:46:53] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Productionize gnmic network telemetry pipeline - https://phabricator.wikimedia.org/T369384#10487032 (10cmooney)