[00:05:38] (LVSHighRX) firing: Excessive RX traffic on lvs1017:9100 (eno1np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1017 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [00:10:38] (LVSHighRX) resolved: Excessive RX traffic on lvs1017:9100 (eno1np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs1017 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [01:34:56] (HAProxyEdgeTrafficDrop) firing: 64% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [01:44:56] (HAProxyEdgeTrafficDrop) resolved: 66% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [07:47:55] hello traffic, I see there were a couple of paging alerts here tonight, but AFAIK they alerted only here and not in -ops and did not actually page. Is that expected? [07:56:20] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10cmooney) Ok cool well we can close this in that case I think. Cheers. [08:01:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=1e573369-5fdd-4621-8ae7-786b5a67de04) set by cmooney@cumin1001 for 2:00:00 on 1 host(s) and th... [08:05:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 64.61254561443596% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [08:05:56] (HAProxyEdgeTrafficDrop) firing: 59% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:10:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 24.225880470407866% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [08:40:00] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=57f0ae1d-0fa1-4b98-9454-bea638ac3971) set by cmooney@cumin1001 for 2:00:00 on 3 host(s) and th... [08:40:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in esams has dropped 2.42266494483531% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [08:56:57] (PyBalBGPUnstable) firing: (3) PyBal BGP sessions on instance lvs3005 are failing - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [08:58:49] XioNoX: ^^ some activity in esams from your side? [08:59:31] volans: that seems like an issue triggered from moving alerts from icinga to prometheus [09:02:40] vgutierrez: yep, maintenance /cc topranks [09:03:08] XioNoX: thanks yes I'm upgrading cr3-esams. Site is depooled. [09:18:26] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:46:07] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=39465e0b-b93d-45ba-b1d8-0c49dacc39fb) set by cmooney@cumin1001 for 2:00:00 on 3 host(s) and th... [10:51:57] (PyBalBGPUnstable) resolved: (3) PyBal BGP sessions on instance lvs3005 are failing - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [11:11:16] (VarnishTrafficDrop) firing: Varnish traffic in drmrs has dropped 65.17392837332896% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [11:11:56] (HAProxyEdgeTrafficDrop) firing: 60% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [11:16:16] (VarnishTrafficDrop) firing: Varnish traffic in drmrs has dropped 36.924823384192365% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [11:46:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in drmrs has dropped 56.370850249337266% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [11:46:56] (HAProxyEdgeTrafficDrop) resolved: 63% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=drmrs&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [13:50:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) Upgrade of cr3-esams went well earlier. Firmware upgrade works as per docs. I will put up more info on that later for our own reference. [13:51:10] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [13:51:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10cmooney) [14:18:47] 10netops, 10Infrastructure-Foundations: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) [15:04:18] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) [17:27:03] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) [18:06:56] (HAProxyEdgeTrafficDrop) firing: 58% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [18:11:57] (HAProxyEdgeTrafficDrop) resolved: 63% request drop in text@eqsin during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqsin&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [18:13:30] 10Traffic, 10SRE, 10Patch-For-Review: Varnish SLI is impacted by external components performance|behavior - https://phabricator.wikimedia.org/T317051 (10Vgutierrez) [18:26:26] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10cmooney) @Jgreen I believe I've done what's required now (not all that familiar with this workflow however). Both ports that are labelled for frdata100... [18:42:57] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) @cmooney Both interfaces show no-carrier, can you confirm that the switch ports are enabled? [18:44:23] 10Wikimedia-Apache-configuration, 10Security-Team, 10SecTeam-Processed, 10Security: Microsites respond with pseudo directory listing if sent an invalid Accept header - https://phabricator.wikimedia.org/T306516 (10sbassett) [20:28:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10cmooney) @Jgreen my bad yeah they were both still part of the disabled group. Both up/up now, hopefully looks better your side too. ` cmooney@fasw-c-eq... [21:04:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) >>! In T317539#8230385, @cmooney wrote: > @Jgreen my bad yeah they were both still part of the disabled group. > > Both up/up now, hopefully lo... [21:04:38] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Set frdata1001 switch ports to fundraising vlan - https://phabricator.wikimedia.org/T317539 (10Jgreen) 05Open→03Resolved a:03Jgreen