[00:33:57] (HAProxyEdgeTrafficDrop) firing: 58% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [00:35:16] (VarnishTrafficDrop) firing: Varnish traffic in esams has dropped 40.07227566549601% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:40:16] (VarnishTrafficDrop) resolved: (2) Varnish traffic in esams has dropped 39.78298011092882% - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000180/varnish-http-requests?viewPanel=6 - https://alerts.wikimedia.org/?q=alertname%3DVarnishTrafficDrop [00:43:56] (HAProxyEdgeTrafficDrop) resolved: 65% request drop in text@esams during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=esams&var-cache_type=text - https://alerts.wikimedia.org/?q=alertname%3DHAProxyEdgeTrafficDrop [06:24:46] 10netops, 10Infrastructure-Foundations, 10SRE: Detect IP address collisions - https://phabricator.wikimedia.org/T189522 (10ayounsi) p:05High→03Triage [06:50:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi) Nice! let me know when we're ready to do the move. [12:51:35] 10netops, 10Cloud-Services, 10Infrastructure-Foundations: Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114 (10ayounsi) [12:52:41] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations: Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114 (10taavi) [12:55:04] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114 (10taavi) [12:57:44] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10cloud-services-team (Kanban): Use vlan trunking instead of multiple physical interfaces - https://phabricator.wikimedia.org/T316114 (10taavi) > This has been discussed between Netops and WMCS but I couldn't find any existing tasks. Feel free to mark it... [13:56:54] Hello. I have a new LVS low-traffic service that I'd like to enable soon. Would anyone be available to review it please and advise if today is a good time? https://gerrit.wikimedia.org/r/c/operations/puppet/+/826296 [14:13:11] * vgutierrez looking [14:17:02] btullis: +1ed [14:17:13] affected LVS are lvs1019 and lvs1020 FYI [14:33:18] Thank you. [14:54:40] vgutierrez: Are you happy for me to do a pybal restart now, as per: https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service [14:55:10] btullis: yes.. after running puppet on lvs1020 of course :) [14:55:18] lvs1020 first and lvs1019 afterwards [14:55:25] Thanks. Will do. [14:57:33] !log restarting pybal on lvs1020 [14:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:31] Looks OK to me, proceeding to update lvs1019 [15:01:24] !log restarting pybal on lvs1019 [15:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:50] btullis: great :) [16:35:09] 10netops, 10Cloud-Services, 10Infrastructure-Foundations, 10SRE: Undocumented IP on WMCS network - https://phabricator.wikimedia.org/T315955 (10Andrew) a:03Andrew [18:29:26] Are all of our ATS deployments on 9.x now? [18:30:23] no, just 10 [18:30:29] hieradata/hosts/cp6016.yaml:profile::trafficserver::backend::is_ats9: true [18:30:32] hieradata/hosts/cp4026.yaml:profile::trafficserver::backend::is_ats9: true [18:30:35] hieradata/hosts/cp5014.yaml:profile::trafficserver::backend::is_ats9: true [18:30:38] hieradata/hosts/cp4032.yaml:profile::trafficserver::backend::is_ats9: true [18:30:42] hieradata/hosts/cp1090.yaml:profile::trafficserver::backend::is_ats9: true [18:30:45] hieradata/hosts/cp1089.yaml:profile::trafficserver::backend::is_ats9: true [18:30:47] hieradata/hosts/cp3065.yaml:profile::trafficserver::backend::is_ats9: true [18:30:51] hieradata/hosts/cp6008.yaml:profile::trafficserver::backend::is_ats9: true [18:30:54] hieradata/hosts/cp5016.yaml:profile::trafficserver::backend::is_ats9: true [18:30:57] hieradata/hosts/cp3064.yaml:profile::trafficserver::backend::is_ats9: true [18:31:00] these ones [18:33:41] 10Traffic, 10SRE: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10BCornwall) [18:35:34] sukhe: Thanks for answering that. I can't find the email mentioning the upgrade; Is there an ETA for all ATS servers to be on 9.x? [18:36:05] https://phabricator.wikimedia.org/T292815 is made more difficult since one of the metrics we need to export was renamed between 8.x and 9.x [18:36:16] we just have T309651 that tracks this task if that helps, but no formal email, no [18:36:17] T309651: Package and deploy ATS 9.1.3 - https://phabricator.wikimedia.org/T309651 [18:36:40] the rollout ETA depends on how we resolve some of the ATS9 regressions that we noticed [18:37:06] yeah, T292815 made it to our list of changes [18:37:07] T292815: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 [18:39:35] oh, I thought someone sent a nice email, but I must have mixed it up [18:39:54] brett: that was probably the email to the Performance team we sent from Traffic :) [18:41:13] for a conditional between 8.x and 9.x, you can use the boolean is_ats9 if that helps [18:41:42] the specificity of which depends of course on where you are trying to make that comparison [18:41:50] hth with that if I can [18:42:17] Thanks, that's helpful. I might just make the patch for only ats 9 and put it on the backburner. There's no rush to get it out [18:42:35] yep, fair [19:16:31] brett: you can also use the `or` keyword in prometheus [19:16:45] PromQL itself can combine metrics like that [19:19:03] The issue is that I'm creating a metric for node exporter, not for querying, sadly. [19:19:57] I guess I could export *both* but that just feels kinda messy [19:20:16] and more work to reverse once the upgrade is complete :^) [19:20:50] 10Traffic, 10SRE, 10Patch-For-Review: ATS should alert if the number of total or active connections reached maximum - https://phabricator.wikimedia.org/T292815 (10BCornwall) 05Open→03Stalled Change needs some testing but will be stalled until https://phabricator.wikimedia.org/T309651 is fixed [19:23:08] 10Traffic, 10SRE: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has X seconds left - https://phabricator.wikimedia.org/T243948 (10BCornwall) [19:25:18] 10Traffic, 10SRE: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has X seconds left - https://phabricator.wikimedia.org/T243948 (10BCornwall) 05Resolved→03Open [19:25:28] ^whoops, my bad, wrong ticket... [19:26:15] 10Traffic, 10SRE: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has X seconds left - https://phabricator.wikimedia.org/T243948 (10BCornwall) 05Open→03Resolved [19:30:46] 10Acme-chief, 10Cloud-VPS, 10SRE, 10Traffic-Icebox, and 2 others: acme-chief shouldn't try to perform OCSP stapling of expired certs - https://phabricator.wikimedia.org/T262251 (10BCornwall) 05Open→03In progress p:05Medium→03High [19:53:36] ah okay fair enough :) [20:55:32] 10Traffic, 10netops, 10Infrastructure-Foundations: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160 (10CDanis) [20:56:09] 10Traffic, 10netops, 10Infrastructure-Foundations: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160 (10CDanis) p:05Triage→03Low [20:59:36] 10Traffic, 10netops, 10Infrastructure-Foundations: improve GeoDNS-to-edge mapping - https://phabricator.wikimedia.org/T316160 (10CDanis)