[07:48:26] 10Traffic, 10netops, 10SRE, 10User-jbond: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10jbond) [08:37:56] (VarnishTrafficDrop) firing: (2) 65% GET drop in text@ulsfo during the past 30 minutes - - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [08:42:26] ema: as a side effect of moving the alerts to jinxer it looks like we moved them as well from -operations to -traffic [08:44:31] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10ayounsi) [08:47:57] (VarnishTrafficDrop) resolved: (2) 67% GET drop in text@ulsfo during the past 30 minutes - - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org [08:50:40] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10ayounsi) [08:54:17] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10ayounsi) [08:56:15] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10ayounsi) [09:08:19] 10Traffic, 10netops, 10SRE, 10User-jbond: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10fgiunchedi) In case it is useful: a lighter weight (but one that we have to maintain ourselves) solution for prometheus would be to use node-exporter's textfile collector... [09:20:07] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10ayounsi) [09:20:47] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10ayounsi) a:03ayounsi [09:29:45] 10Traffic, 10netops, 10SRE, 10User-jbond: Anycast: consistent ICMP packet too big routing - https://phabricator.wikimedia.org/T253732 (10cmooney) Thanks @ayounsi, That exaring project looks to be a fairly sensible approach alright, albeit fairly new. Might be worth testing out. I note our NS seem to use... [09:32:39] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10cmooney) [09:43:16] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10cmooney) [09:55:37] 10netops, 10SRE: Increase Google IX sessions prefix-limit - https://phabricator.wikimedia.org/T284447 (10cmooney) 05Open→03Resolved [10:04:52] vgutierrez: yeah for VarnishTrafficDrop that is by choice [10:05:16] the netops alerts instead should still be routed to #-operations [10:06:02] see https://gerrit.wikimedia.org/r/c/operations/alerts/+/697710/ [12:31:44] 10Traffic, 10netops, 10VPS-project-Codesearch, 10serviceops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10ema) [12:32:04] 10Traffic, 10netops, 10VPS-project-Codesearch, 10serviceops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10ema) p:05Triage→03Low [13:36:48] 10Traffic, 10netops, 10SRE, 10VPS-project-Codesearch, 10serviceops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10BBlack) When we looked into this for the Bird-based anycast stuff, we found that the combination you w... [13:40:09] 10Traffic, 10SRE, 10VPS-project-Codesearch, 10serviceops: Consider using BindsTo instead of Requires to declare dependencies between systemd unit - https://phabricator.wikimedia.org/T284555 (10ayounsi) [14:00:00] 10HTTPS, 10Traffic, 10SRE, 10Performance-Team (Radar): Enable HTTP/3 (QUIC) support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10jbond) RFC assigned [[ https://datatracker.ietf.org/doc/html/rfc9000 | rfc9000 ]] [14:09:34] 10HTTPS, 10Traffic, 10SRE, 10Performance-Team (Radar): Enable HTTP/3 (QUIC) support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10hashar) [15:27:16] 10Traffic: Implement SLI measurement for Varnish Frontend - https://phabricator.wikimedia.org/T284576 (10ema) [15:28:09] 10Traffic, 10observability: Implement SLI measurement for Varnish Frontend - https://phabricator.wikimedia.org/T284576 (10ema) [16:45:10] 10netops, 10Data-Persistence-Backup, 10SRE, 10bacula: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10cmooney) I've been able to find the source of the dropped traffic between eqiad and codfw. T... [16:53:32] 10netops, 10Data-Persistence-Backup, 10SRE, 10bacula: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10cmooney) I also researched / played with TCP tunings. I don't believe the current CUBIC algo... [18:40:15] 10netops: Adjust egress buffer allocations on ToR switches - https://phabricator.wikimedia.org/T284592 (10cmooney) [18:56:48] 10netops: Create an alert for output discards on network devices - https://phabricator.wikimedia.org/T284593 (10cmooney) [19:02:49] 10netops: Create an alert for output discards on network devices - https://phabricator.wikimedia.org/T284593 (10cmooney) [21:14:20] 10Traffic, 10SRE, 10vm-requests: Please create a Ganeti VM for Wikidough in ulsfo - https://phabricator.wikimedia.org/T284349 (10ssingh) >>! In T284349#7135796, @colewhite wrote: > Cookbook ran successfully. Currently unprovisioned. Thanks for creating the VM!