[02:55:45] (HAProxyRestarted) firing: (2) HAProxy server restarted on cp1077:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [06:55:45] (HAProxyRestarted) firing: (2) HAProxy server restarted on cp1077:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [10:55:45] (HAProxyRestarted) firing: (2) HAProxy server restarted on cp1077:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [12:22:43] 10netops, 10Infrastructure-Foundations, 10SRE: Plan codfw row A/B top-of-rack switch refresh - https://phabricator.wikimedia.org/T327938 (10RobH) [14:27:02] 10Traffic, 10netops, 10DBA, 10Data-Engineering, and 9 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10ssingh) [14:46:21] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10Jclark-ctr) @BBlack dns1003 name is already in use. Should this be changed to dns100{4..6} [14:48:23] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10ssingh) @Jclark-ctr: yes please, dns100[1-3] are currently in use, so we should do dns100[4-6]. [14:54:27] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10ssingh) Hi @Papaul: We are ready to start working on this, sorry for the delay! The above plan sounds fine so let's coordinate when you plan to go in so that I ca... [14:55:45] (HAProxyRestarted) firing: (2) HAProxy server restarted on cp1077:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [15:00:45] (HAProxyRestarted) firing: (2) HAProxy server restarted on cp1077:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [15:13:17] 10Traffic, 10DC-Ops, 10SRE, 10ops-codfw: Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014 - https://phabricator.wikimedia.org/T326767 (10Papaul) @ssingh no need to be sorry and welcome back. You can decom the server you want first and once it's done just let me know which one. Thanks [15:22:07] 10Traffic, 10Discovery-Search, 10SRE, 10API Platform (API Platform Roadmap): Generic strategy to deal with high volume / expensive traffic from cloud providers - https://phabricator.wikimedia.org/T326782 (10Gehel) [15:22:25] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10Jclark-ctr) [15:28:22] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consider confirming the hostname by user input when running the reimaging cookbook - https://phabricator.wikimedia.org/T332202 (10BCornwall) > As a different approach, what if the reimaging cookbook printed out the role information from th... [15:30:46] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consider confirming the hostname by user input when running the reimaging cookbook - https://phabricator.wikimedia.org/T332202 (10BCornwall) Amir provided this on the ops mailing list: On 2023-04-29 14:12, Amir Sarabadani wrote: >Did we h... [16:08:31] 10Traffic, 10PyBal: pybal's "can-depool" logic only takes downServers into account - https://phabricator.wikimedia.org/T184715 (10BCornwall) 05In progress→03Stalled [16:11:34] 10Traffic, 10Infrastructure-Foundations, 10Performance-Team (Radar): Set cookie in Varnish to start a probe - https://phabricator.wikimedia.org/T335637 (10BCornwall) [16:21:31] 10Traffic, 10Infrastructure-Foundations: Set cookie in Varnish to start a probe - https://phabricator.wikimedia.org/T335637 (10Krinkle) [16:32:36] 10Traffic, 10Observability-Metrics, 10Patch-For-Review: Add prometheus-https load balancer - https://phabricator.wikimedia.org/T326657 (10BCornwall) 05In progress→03Stalled [16:42:37] 10Traffic, 10Infrastructure-Foundations, 10netbox, 10Patch-For-Review: Issues converting services from active/passive to active/active - https://phabricator.wikimedia.org/T330084 (10BCornwall) This ticket could do with a little more clarity: I'm going to Boldly assume this ticket is for identifying/fixing... [16:45:48] 10Traffic, 10Infrastructure-Foundations, 10netbox, 10Patch-For-Review: gdnsd failures when converting services from active/passive to active/active - https://phabricator.wikimedia.org/T330084 (10BCornwall) [16:46:25] 10Traffic: Write a cookbook to handle upgrades of ATS - https://phabricator.wikimedia.org/T335531 (10BCornwall) [16:46:29] 10Traffic: Write a cookbook to handle restarts of Wikimedia DNS - https://phabricator.wikimedia.org/T335533 (10BCornwall) [16:47:04] 10Traffic: Let HAProxy handle port 80 - https://phabricator.wikimedia.org/T323557 (10BCornwall) [16:47:52] 10Traffic, 10Observability-Alerting, 10Patch-For-Review: Use DNS name instead of IP in PyBal alerts - https://phabricator.wikimedia.org/T322377 (10BCornwall) [16:50:20] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10Jclark-ctr) dns1004. A6. U.8 PORT. 11 CABLEID 1038 dns1005. B6 U.5 PORT. 0 CABLEID 1969 dns1006. C6 U27. PORT.27 CABLEID 3249 [16:54:22] 10Traffic, 10Infrastructure-Foundations, 10SRE-tools: Cookbook to depool a site in AuthDNS - https://phabricator.wikimedia.org/T334048 (10BCornwall) I'm hesitant to the idea of creating an abstraction over an abstraction - I may be an outlier but my experience with depooling has been with confctl rather than... [16:59:19] 10Traffic, 10Patch-For-Review: Update certspotter - https://phabricator.wikimedia.org/T204993 (10BCornwall) [16:59:48] 10Traffic, 10SRE, 10Patch-For-Review: increase of network errors on alert1001 after certspotter has been enabled - https://phabricator.wikimedia.org/T303593 (10BCornwall) 05Open→03Resolved Since the larger network issues have been fixed, I'm going to close this as resolved. Further improvements suggested... [17:03:08] 10Traffic: anycast-healthchecker fails to start after a reboot and before a puppet run - https://phabricator.wikimedia.org/T314457 (10BCornwall) [17:03:24] 10Traffic, 10DNS, 10Fundraising-Backlog, 10Infrastructure-Foundations, and 2 others: Add support for Brand Indicators for Message Identification (BIMI) for wiki mail - https://phabricator.wikimedia.org/T311685 (10BCornwall) [17:03:38] 10Traffic: Deploy Wikidough: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) public resolver - https://phabricator.wikimedia.org/T252132 (10BCornwall) [17:14:52] 10Traffic: Write a cookbook to handle restarts of Wikimedia DNS - https://phabricator.wikimedia.org/T335533 (10ssingh) Thanks for filing this task! The steps involved are essentially (in order): - disable Puppet - stop bird.service - restart pdns-recursor.service - restart dnsdist.service - start bird.service... [17:31:29] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q1:(Need By: TBD) rack/setup/install cloudswift100[12] - https://phabricator.wikimedia.org/T289882 (10Jclark-ctr) @Papaul Cables where connected to correct ports. i did swap cables while verifying Replaced Cable new cableid23030450029... [19:00:45] (HAProxyRestarted) firing: HAProxy server restarted on cp1085:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cp1085&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [19:52:13] 10Traffic, 10DC-Ops, 10SRE, 10ops-eqiad: Q4:rack/setup/install dns100[345] - https://phabricator.wikimedia.org/T326685 (10Jclark-ctr) [20:39:39] 10Traffic, 10RESTBase-API, 10Documentation: I am hitting a rate limit on REST API endpoint - https://phabricator.wikimedia.org/T307610 (10BCornwall) Hi, @Mitar! This ticket is pretty old at this point.... sorry! Hopefully you can understand that there are many tickets for the few of us to handle at once and... [20:42:06] 10Traffic, 10SRE, 10Patch-For-Review: Incorrect X-Cache-Status reported by deployment-prep caches - https://phabricator.wikimedia.org/T269825 (10BCornwall) 05Open→03Stalled @bblack, @Vgutierrez is this patch by ema still something we'd like incorporated? [20:55:27] 10Traffic, 10RESTBase-API, 10Documentation: I am hitting a rate limit on REST API endpoint - https://phabricator.wikimedia.org/T307610 (10Mitar) To my knowledge it is. https://www.mediawiki.org/wiki/Wikimedia_REST_API#Terms_and_conditions still says that 200 requests/second per REST API endpoint is fine (unl... [20:55:53] 10Traffic: Frequent server errors (503 and 502), happened several times in the last 2 days - https://phabricator.wikimedia.org/T297544 (10BCornwall) 05Open→03Stalled Hello! This is quite an old ticket. We're sorry that this fell through the cracks; the amount of tickets we receive can easily overwhelm our s... [21:01:38] 10Traffic, 10SRE: Clean up Traffic Grafana dashboards to reflect HA-Proxy metrics - https://phabricator.wikimedia.org/T304153 (10BCornwall) 05In progress→03Invalid Marking as invalid as this is too vague to be actionable. Considering that we've been running haproxy for some time now and appear to have usef... [21:01:46] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10BCornwall) [21:10:07] 10Traffic, 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10ops-knams: Q4/Q1:knams racking elevations & planning - https://phabricator.wikimedia.org/T331886 (10BCornwall) [21:16:51] 10Traffic, 10SRE: Create CI for latency-measurement - https://phabricator.wikimedia.org/T318288 (10BCornwall) 05Open→03Invalid Closing as invalid since this utility isn't used very often. Perhaps, at a later date, we can re-explore this. [21:18:52] 10Traffic, 10netops, 10DBA, 10Data-Engineering, and 10 others: codfw row C switches upgrade - https://phabricator.wikimedia.org/T334049 (10bking) [21:19:04] 10Traffic, 10SRE: Varnish SLI is impacted by external components performance|behavior - https://phabricator.wikimedia.org/T317051 (10BCornwall) Hi, @Vgutierrez, does this ticket still need any work done or can it be closed? Thanks! [21:25:59] 10Traffic, 10SRE: Add DP cookie for pageview filtering - https://phabricator.wikimedia.org/T315676 (10BCornwall) @Vgutierrez Would you consider this completed and ready to be closed? [21:27:23] 10HTTPS, 10Traffic, 10SRE, 10serviceops, and 2 others: Get new edge & internal HTTPS certificates expanded to add wikifunctions.org and *.wikifunctions.org - https://phabricator.wikimedia.org/T313227 (10BCornwall) @Vgutierrez Would you consider this completed and ready to close? [21:29:44] 10Traffic, 10SRE, 10Patch-For-Review: per-backend-service concurrency limits in ATS-BE - https://phabricator.wikimedia.org/T306223 (10BCornwall) Hi, @CDanis! Would you be so kind as to provide a description that helps describe the work to be done in this ticket? Thanks! [21:33:14] 10Traffic, 10envoy, 10serviceops, 10Patch-For-Review: Refactor envoy access_log_path to access loggers - https://phabricator.wikimedia.org/T303231 (10BCornwall) [23:00:45] (HAProxyRestarted) firing: HAProxy server restarted on cp1085:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=eqiad%20prometheus/ops&var-instance=cp1085&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted