[05:02:42] XioNoX I guess you're already aware but just in case: https://buff.ly/455ltvV [05:04:33] PoC for a fully automation for bgp peering powered by meta, Google and cloudflare [06:06:26] thx, it's on my list of things to watch [09:15:24] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10JMeybohm) [09:54:50] 10netops, 10Infrastructure-Foundations, 10SRE: cr2-esams:FPC0 Parity error - https://phabricator.wikimedia.org/T318783 (10cmooney) @Jhancock.wm not 100%, I will try to chase on that. [10:35:20] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [10:36:39] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate all eventgate installations to mw-api-int - https://phabricator.wikimedia.org/T346448 (10Joe) 05Open→03Resolved a:03Joe [10:36:50] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [12:45:28] 10Traffic, 10Abstract Wikipedia team, 10MW-on-K8s, 10SRE, and 4 others: Migrate functions-orchestrator service to mw-api-int - https://phabricator.wikimedia.org/T347397 (10Jdforrester-WMF) [13:14:26] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate wikifeeds to mw-api-int - https://phabricator.wikimedia.org/T346447 (10Joe) 05Open→03Resolved a:03Joe [13:14:44] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [13:15:10] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 2 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Joe) [13:18:58] Hello traffic folks :) [13:19:04] I'd need a review for https://gerrit.wikimedia.org/r/c/operations/puppet/+/961106/ if you have time [13:20:44] <3 [13:23:58] (PurgedHighEventLag) firing: (2) High event process lag with purged on cp4038:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:28:16] ^^ are you playing with cp4038 and purged fabfur? [13:28:24] nope [13:28:58] (PurgedHighEventLag) resolved: (3) High event process lag with purged on cp4038:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:29:06] We have an alert from VarnishKafka (too few messages) coming from cp1083 [13:29:15] btw all tests I run w/ pcc are towards cp4037, and shouldn't impact at all [13:29:20] Anything happening with it at the moment? [13:30:44] Oh, that alert went away. Must have just been a spike of some kind. [13:32:15] btullis: hmmm eqiad is still depooled [13:32:51] so if you got an spike there could be to some client with eqiad IP hardcoded [13:32:56] *due to [13:34:07] vgutierrez: Good point, thanks. We still get about 1 msg/s from varnishkafka, even when the data centre is depooled. https://grafana.wikimedia.org/d/000000253/varnishkafka?orgId=1&var-datasource=eqiad%20prometheus/ops&var-cp_cluster=cache_text&var-instance=cp1083%3A9132&viewPanel=14 [13:34:19] btullis: healthchecks? :) [13:36:04] Yeah, something like that I think. We used to get a lot of false positives from this alert when cp servers or DCs were depooled, but now it's better at triggering only when there is a significant difference between requests in and messages out. Anyway, I think we can leave it now, I just wanted to check what might have caused it to alert, even if it was a false positive. [13:37:26] 10netops, 10Infrastructure-Foundations, 10SRE: Move cr1-esams<->cr2-esams link to QSFP port - https://phabricator.wikimedia.org/T347323 (10ayounsi) Thanks, I remembered there was a reason but forgot what it was! I guess it doesn't make much sens to buy a `MIC3-3D-2X40GE-QSFPP` seeing the [[ https://www.juni... [13:38:21] 10netops, 10Infrastructure-Foundations, 10SRE: Add 4x10G breakout cable to cr2-esams - https://phabricator.wikimedia.org/T347323 (10ayounsi) [13:44:36] 10netops, 10Infrastructure-Foundations, 10SRE: Add 4x10G breakout cable to cr2-esams - https://phabricator.wikimedia.org/T347323 (10cmooney) >>! In T347323#9199448, @ayounsi wrote: > Thanks, I remembered there was a reason but forgot what it was! Yeah it's a shame. I made the same mistake while planning th... [14:17:21] 10netops, 10Infrastructure-Foundations, 10SRE: Add 4x10G breakout cable to cr2-esams - https://phabricator.wikimedia.org/T347323 (10ayounsi) That's a great idea! Opened {T347403} [14:48:04] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE: Juniper network device audit - all sites - https://phabricator.wikimedia.org/T213843 (10ayounsi) 05Open→03Resolved a:03RobH I think we can close that one. @RobH did the audit afaik. [15:06:45] (VarnishHighThreadCount) firing: (3) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:09:18] sukhe: i just saw your ping in ops [15:09:34] jbond: sorry I am in a meeting so couldn't look at it fully [15:09:48] ack ill take a look [15:09:52] thank you [15:09:57] (PurgedHighEventLag) firing: (3) High event process lag with purged on cp4044:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:09:59] you can look the failure on dns2005 [15:11:45] (VarnishHighThreadCount) firing: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:14:57] (PurgedHighEventLag) firing: (16) High event process lag with purged on cp4037:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:26:45] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:31:45] (VarnishHighThreadCount) firing: (11) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:32:42] it'd be nice if the threadcount alert text above named the server :P [15:36:45] (VarnishHighThreadCount) firing: (13) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [15:44:58] (PurgedHighEventLag) resolved: (32) High event process lag with purged on cp4037:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:46:45] (VarnishHighThreadCount) firing: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:01:45] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:01:57] (PurgedHighEventLag) firing: (5) High event process lag with purged on cp4039:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:06:45] (VarnishHighThreadCount) firing: (10) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:06:57] (PurgedHighEventLag) firing: (16) High event process lag with purged on cp4037:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:11:45] (VarnishHighThreadCount) firing: (14) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:12:12] bblack: https://gerrit.wikimedia.org/r/c/operations/alerts/+/961148 [16:21:45] (VarnishHighThreadCount) firing: (10) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:21:58] (PurgedHighEventLag) firing: (31) High event process lag with purged on cp4037:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:26:45] (VarnishHighThreadCount) firing: (7) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:26:57] (PurgedHighEventLag) resolved: (6) High event process lag with purged on cp4037:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:31:45] (VarnishHighThreadCount) firing: (8) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:36:45] (VarnishHighThreadCount) firing: (12) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:41:45] (VarnishHighThreadCount) firing: (12) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:51:45] (VarnishHighThreadCount) firing: (10) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [16:56:45] (VarnishHighThreadCount) resolved: (6) Varnish's thread count is high - https://wikitech.wikimedia.org/wiki/Varnish - https://alerts.wikimedia.org/?q=alertname%3DVarnishHighThreadCount [17:29:50] 10netops, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 2 others: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond)