[01:59:45] (HAProxyRestarted) firing: (8) HAProxy server restarted on cp3066:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [02:42:00] (PyBalBGPUnstable) firing: (6) PyBal BGP sessions on instance lvs3008 are failing - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [05:59:45] (HAProxyRestarted) firing: (8) HAProxy server restarted on cp3066:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [06:42:00] (PyBalBGPUnstable) firing: (6) PyBal BGP sessions on instance lvs3008 are failing - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [07:07:04] 10Traffic, 10Observability-Metrics, 10Patch-For-Review: Add prometheus-https load balancer - https://phabricator.wikimedia.org/T326657 (10fgiunchedi) >>! In T326657#9093733, @BCornwall wrote: > @fgiunchedi Now that this is merged, would you say that this is complete? Thanks for the feedback. Thank you for f... [09:11:15] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Add Dell switches support to Homer/Cookbooks - https://phabricator.wikimedia.org/T320638 (10ayounsi) [09:42:01] (PyBalBGPUnstable) firing: (6) PyBal BGP sessions on instance lvs3008 are failing - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [09:42:38] ^^ that should be "normal" [09:43:44] 10Traffic, 10Math, 10RESTbase Sunsetting: Determin the cause of a sudden 80% drop in requests to math endpoints - https://phabricator.wikimedia.org/T344329 (10daniel) [09:45:30] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [09:52:01] (PyBalBGPUnstable) firing: (6) PyBal BGP sessions on instance lvs3008 are failing - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [09:59:45] (HAProxyRestarted) firing: (8) HAProxy server restarted on cp3066:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [10:21:04] 10Traffic, 10SRE: acme-chief should support debian bookworm - https://phabricator.wikimedia.org/T344330 (10Vgutierrez) [10:29:58] 10Traffic, 10SRE, 10Patch-For-Review: acme-chief should support debian bookworm - https://phabricator.wikimedia.org/T344330 (10Vgutierrez) [10:35:10] 10Traffic, 10SRE, 10Patch-For-Review: acme-chief should support debian bookworm - https://phabricator.wikimedia.org/T344330 (10Vgutierrez) p:05Triage→03Medium [10:41:19] 10Traffic, 10Thumbor: Cannot download large (3GB) PDF files from commons - https://phabricator.wikimedia.org/T341755 (10Clement_Goubert) Hi, I can't reproduce the download issue going through drmrs ` $ curl -v -o o.pdf https://upload.wikimedia.org/wikipedia/commons/9/9f/ZHSY000097_%E5%AE%8B%E6%9B%B8%E4%B8%80... [10:49:13] 10Traffic, 10SRE, 10Patch-For-Review: acme-chief should support debian bookworm - https://phabricator.wikimedia.org/T344330 (10Vgutierrez) @hashar could you clarify if T342346 would trigger having python 3.11 on CI with some kind of backport for bullseye or do you have another task tracking python 3.11 suppo... [10:57:15] (HAProxyRestarted) resolved: HAProxy server restarted on cp3081:9100 - https://wikitech.wikimedia.org/wiki/HAProxy#HAProxy_for_edge_caching - https://grafana.wikimedia.org/d/gQblbjtnk/haproxy-drilldown?orgId=1&var-site=esams%20prometheus/ops&var-instance=cp3081&viewPanel=10 - https://alerts.wikimedia.org/?q=alertname%3DHAProxyRestarted [12:48:57] 10Traffic, 10SRE, 10observability, 10Upstream: flapping icinga Letsencrypt TLS cert alerts around renewal time - https://phabricator.wikimedia.org/T293826 (10RhinosF1) This happened again for lists1001. Requested (and it has been) restart in #wikimedia-sre [13:52:01] (PyBalBGPUnstable) firing: (6) PyBal BGP sessions on instance lvs3008 are failing - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [15:01:55] 10Traffic, 10Math, 10RESTbase Sunsetting, 10SRE: Determin the cause of a sudden 80% drop in requests to math endpoints - https://phabricator.wikimedia.org/T344329 (10daniel) [15:03:08] 10Traffic, 10Math, 10RESTbase Sunsetting, 10SRE: Determin the cause of x8 increase in requests to math endpoints between july 6 and August 3 - https://phabricator.wikimedia.org/T344329 (10daniel) [15:28:21] 10Traffic, 10SRE: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10ssingh) [15:52:01] (PyBalBGPUnstable) resolved: (6) PyBal BGP sessions on instance lvs3008 are failing - https://wikitech.wikimedia.org/wiki/PyBal#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPyBalBGPUnstable [15:52:09] cool ^ [15:52:14] back in business with the new hosts [16:39:27] 10Traffic, 10PyBal, 10SRE, 10Scap, and 3 others: High rate of errors and increased latency on uncached MediaWiki requests due to infrastructure outage - https://phabricator.wikimedia.org/T337497 (10thcipriani) [16:39:53] ^ old ticket [16:46:21] 10Traffic, 10SRE, 10Patch-For-Review: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10Fabfur) [17:43:20] 10Traffic, 10SRE, 10Patch-For-Review: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by brett@cumin2002 for hosts: `cp[3058-3061].esams.wmnet` - cp3058.esams.wmnet (**PASS**) - D... [17:44:32] 10Traffic, 10Patch-For-Review: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10BCornwall) 05Open→03In progress p:05Triage→03Medium [17:59:01] 10Traffic, 10Patch-For-Review: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by brett@cumin2002 for hosts: `cp[3062-3065].esams.wmnet` - cp3062.esams.wmnet (**PASS**) - Downtimed... [18:03:58] 10Traffic, 10Patch-For-Review: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10BCornwall) [18:13:09] 10Traffic: Q1:unified decommission task for old esams hosts (knams migration) - https://phabricator.wikimedia.org/T344363 (10ssingh)