[06:51:56] (HAProxyEdgeTrafficDrop) firing: 58% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:52:04] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10Sustainability (Incident Followup): Rate limiting for hotlinked images - https://phabricator.wikimedia.org/T317799 (10ayounsi) [clinic duty] tagging the teams I think are relevant to this task, please change the tags as needed [06:56:56] (HAProxyEdgeTrafficDrop) resolved: 69% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [09:57:24] 10netops, 10Infrastructure-Foundations, 10SRE: Default allowed SSH parameters on upgraded Juniper mgmt routers prevent some connections - https://phabricator.wikimedia.org/T320272 (10jbond) > , as we can drop to a regular shell and specify the MAC code manually: FYi you can also use the .ssh/config file whic... [11:12:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) [11:46:39] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10cmooney) Thanks for tracking all this John. As you know most of our hosts just have a single interface with single unica... [11:56:48] 10netops, 10Infrastructure-Foundations, 10SRE: Default allowed SSH parameters on upgraded Juniper mgmt routers prevent some connections - https://phabricator.wikimedia.org/T320272 (10cmooney) > AFAIK this configures the ssh daemon to accept connections using this protocol (possibly also configures outbound c... [12:02:55] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) >>! In T234207#8307389, @cmooney wrote: > I'm not sure if this task is the best place to discuss this but I'm of t... [12:10:54] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10MoritzMuehlenhoff) >>! In T234207#8307389, @cmooney wrote: > Thanks for tracking all this John. > > So for instance we c... [12:13:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10ayounsi) [12:22:48] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10cmooney) >>! In T234207#8307423, @jbond wrote: > Perhaps from the netbox PoV but from any new (networkd) module should su... [12:30:25] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) >>! In T234207#8307431, @MoritzMuehlenhoff wrote: >>>! In T234207#8307389, @cmooney wrote: >> Thanks for tracking... [13:00:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10cmooney) > One thing i forgot to highlight is thet tere is currently a bit of a chicken/egg issue of using interface_auto... [13:19:41] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10Jclark-ctr) Verified Netbox Thanks [13:19:49] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad row C switch fabric recabling - https://phabricator.wikimedia.org/T313384 (10Jclark-ctr) 05Open→03Resolved [13:19:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10SRE-OnFire, and 2 others: asw2-c5-eqiad crash - https://phabricator.wikimedia.org/T313382 (10Jclark-ctr) [13:55:29] 10netops, 10Infrastructure-Foundations: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ayounsi) p:05Triage→03Low [13:59:29] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo, 10Patch-For-Review: decommission dns4002 - https://phabricator.wikimedia.org/T320440 (10ssingh) [14:23:20] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10ayounsi) For physical servers we indeed need to keep the whole lifecycle/provisioning process in mind (racking/provisioni... [14:25:09] 10netops, 10Infrastructure-Foundations: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10cmooney) Agreed we should add it to the CRs, no reason I can think of not to. Also I'll think about it in terms of the l3_switch template consolidation. They should get the same... [14:30:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10cmooney) > Which means being able to map the real world interface to the logical one, from previous conversations it's o... [15:05:11] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS buster [15:07:47] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission dns4002 - https://phabricator.wikimedia.org/T320440 (10ssingh) @RobH: I think we can mark this as resolved as all the Puppet configuration has been removed and you already ran the decom cookbook. Deferring this to you in case something el... [15:11:23] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission dns4002 - https://phabricator.wikimedia.org/T320440 (10RobH) 05Open→03Resolved [15:11:27] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10RobH) [15:13:14] ^ thanks rob! [15:27:52] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10MoritzMuehlenhoff) I have setup ganeti4008 as a node in the ulsfo Ganeti cluster and moved a VM to it to confirm it works as expected. [15:49:42] 10Traffic, 10InternetArchiveBot, 10SRE: IABot is encountering 429 on Wikimedia Production - https://phabricator.wikimedia.org/T318065 (10Cyberpower678) 05Open→03Resolved a:03Cyberpower678 [15:50:42] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS buster completed: - dns4004 (... [15:50:47] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host dns4004.wikimedia.org with OS buster executed with errors:... [16:31:16] (VarnishTrafficDrop) firing: Varnish traffic in eqsin has dropped 65.21452830543775% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [16:36:16] (VarnishTrafficDrop) firing: (2) Varnish traffic in eqsin has dropped 59.01458126730489% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [16:41:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in eqsin has dropped 59.01458126730489% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [17:41:41] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install ulsfo misc class hosts - https://phabricator.wikimedia.org/T317247 (10ssingh) dns4004 has been commissioned. [18:09:35] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: ulsfo refresh scheduling - https://phabricator.wikimedia.org/T317249 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by sukhe@cumin2002 for hosts: `ganeti4001.ulsfo.wmnet` - ganeti4001.ulsfo.wmnet (**PASS**) - Downtimed host... [18:11:49] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: ulsfo refresh scheduling - https://phabricator.wikimedia.org/T317249 (10ssingh) @RobH: ganeti4001 has been decommissioned. Thanks! [19:11:06] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10cmooney) [19:11:28] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10cmooney) [19:11:36] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10cmooney) 05Open→03In progress p:05Triage→03High [19:11:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10cmooney) [19:15:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10cmooney) [19:16:34] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10ayounsi) a:05Jclark-ctr→03None [19:29:12] 10netops, 10Infrastructure-Foundations, 10SRE, 10Sustainability (Incident Followup): Cr1-eqiad comms problem when moving to 40G row D handoff - https://phabricator.wikimedia.org/T320566 (10cmooney)