[00:08:56] (HAProxyEdgeTrafficDrop) firing: (4) 32% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:13:56] (HAProxyEdgeTrafficDrop) resolved: (6) 38% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [01:49:57] (HAProxyEdgeTrafficDrop) firing: (2) 54% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [01:54:56] (HAProxyEdgeTrafficDrop) resolved: (2) 63% request drop in text@drmrs during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:49:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [05:54:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:00:56] (HAProxyEdgeTrafficDrop) firing: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:05:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:11:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) This opened {T314998} automatically. Please sync up with Netops before doing the work as live traffic is using the port. [07:15:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Upgrade core routers to Junos 21+ - https://phabricator.wikimedia.org/T295690 (10ayounsi) I went through the useful https://apps.juniper.net/feature-explorer/select-software.html?typ=1&swName=Junos%20OS&rel=21.2R3&sid=1211&platform=MX204&pi... [07:48:46] 10netops, 10Infrastructure-Foundations: Enable LLDP on SRX facing interfaces - https://phabricator.wikimedia.org/T320229 (10ayounsi) p:05Triage→03Low [08:02:56] (HAProxyEdgeTrafficDrop) firing: 68% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [08:07:51] 10netops, 10Infrastructure-Foundations: Use Junos BGP graceful-shutdown and shutdown features - https://phabricator.wikimedia.org/T320230 (10ayounsi) p:05Triage→03Low [08:07:56] (HAProxyEdgeTrafficDrop) resolved: 68% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:02:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10aborrero) [09:11:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) p:05Lowest→03Medium [09:12:16] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) I change the priority to medium,. The lack of a proper solution for network management causes period problems eno... [09:23:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: IPv6 BFD Sessions Failing from Bird (Anycast VMs) to Juniper QFX in drmrs - https://phabricator.wikimedia.org/T304501 (10cmooney) 05Open→03Resolved Change applied across all routers now, so hopefully the last we see this kind of issue. [09:25:20] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Audit eqiad & codfw LVS network links - https://phabricator.wikimedia.org/T286881 (10Vgutierrez) @Jclark-ctr please let me or @BCornwall know when it would be a good time for you to perform the change [10:00:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10ayounsi) [10:41:56] (HAProxyEdgeTrafficDrop) firing: (3) 42% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [10:46:56] (HAProxyEdgeTrafficDrop) resolved: (5) 60% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [12:48:20] 10netops, 10Infrastructure-Foundations: Junos: send syslog through mgmt_junos - https://phabricator.wikimedia.org/T320244 (10ayounsi) p:05Triage→03Low [12:53:16] 10netops, 10Infrastructure-Foundations: Junos: use mgmt_junos for syslog and ntp - https://phabricator.wikimedia.org/T320244 (10ayounsi) [12:54:50] 10netops, 10Infrastructure-Foundations, 10SRE: Junos: resolve DNS through mgmt_junos - https://phabricator.wikimedia.org/T317175 (10ayounsi) [13:09:51] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Audit eqiad & codfw LVS network links - https://phabricator.wikimedia.org/T286881 (10Jclark-ctr) @Vgutierrez will schedule for next week i will not be on site today unless @Cmjohnson is available today i will have to get wi... [13:24:01] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Audit eqiad & codfw LVS network links - https://phabricator.wikimedia.org/T286881 (10Papaul) @Jclark-ctr if you want, you can also ping me for the port configuration. [13:44:19] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10MoritzMuehlenhoff) I created a new netinst environment based on the latest buster plus the 5.10.136 Linux kernel under /var/lib/puppet/volatile/tftpboot/bu... [13:54:23] 10Traffic, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: add HBA355i support to installer - https://phabricator.wikimedia.org/T319067 (10ssingh) Thanks for the update and for working on this! >>! In T319067#8300254, @MoritzMuehlenhoff wrote: > I created a new netinst environment based on th... [14:42:20] 10netops, 10Infrastructure-Foundations: Junos: investigate BGP rib sharding - https://phabricator.wikimedia.org/T320264 (10ayounsi) p:05Triage→03Low [14:42:51] XioNoX: that sounds scary :P [14:43:01] sukhe: what does? [14:43:07] rib sharding [14:43:30] sukhe: oh (I mute various bots) [14:43:36] aha [14:44:00] sukhe: yeah it's not an easy thing to implement, so it depends on how much we trust Juniper :) [14:44:27] and where is the risk/benefits tradeoff [14:44:37] I am begining to get a sense of the trust involved with Juniper :) [14:45:34] hahaha [16:25:30] 10netops, 10Infrastructure-Foundations, 10SRE: Default allowed SSH parameters on upgraded Juniper mgmt routers prevent some connections - https://phabricator.wikimedia.org/T320272 (10cmooney) p:05Triage→03Low [17:03:56] win 14 [17:42:16] (VarnishTrafficDrop) firing: Varnish traffic in eqsin has dropped 67.79043493216098% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [17:47:16] (VarnishTrafficDrop) resolved: Varnish traffic in eqsin has dropped 67.7993061189241% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [19:49:59] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: ulsfo refresh scheduling - https://phabricator.wikimedia.org/T317249 (10ssingh) @RobH: ganeti4004 has been decommissioned and is ready for you. Thanks!