[08:00:12] (LVSHighCPU) firing: (8) The host lvs6001:9100 has at least its CPU 1 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs6001 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [08:05:12] (LVSHighCPU) resolved: (8) The host lvs6001:9100 has at least its CPU 1 saturated - https://bit.ly/wmf-lvscpu - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs6001 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighCPU [08:34:45] 10Traffic, 10SRE, 10envoy, 10serviceops: Set a limit to the number of allowed active connections via runtime key overload.global_downstream_max_connections - https://phabricator.wikimedia.org/T340955 (10JMeybohm) a:03JMeybohm [09:54:42] 10Traffic, 10SRE, 10envoy, 10serviceops: Set a limit to the number of allowed active connections via runtime key overload.global_downstream_max_connections - https://phabricator.wikimedia.org/T340955 (10JMeybohm) `max(sum by (instance) (envoy_http_downstream_cx_active))` over the last 30 days tops out at ~... [10:13:05] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Upgrade Envoy to supported version - https://phabricator.wikimedia.org/T300324 (10BTullis) >>! In T300324#8988266, @JMeybohm wrote: > ... as datahub (cc @BTullis ) which I did not deploy because it has a huge diff I'm not able to reason about.... [11:11:44] 10Traffic, 10MW-on-K8s, 10SRE, 10serviceops, and 3 others: Direct 0.5% of all traffic to mw-on-k8s - https://phabricator.wikimedia.org/T341078 (10Clement_Goubert) Everything looks good. mw-api-ext: {F37129502} {F37129504} {F37129506} mw-web: {F37129508} {F37129510} {F37129512} [13:32:45] 10Traffic, 10ops-codfw: lvs2013 ManagementSSHDown - https://phabricator.wikimedia.org/T340960 (10ssingh) [13:39:24] 10Traffic, 10ops-codfw: lvs2013 ManagementSSHDown - https://phabricator.wikimedia.org/T340960 (10Jhancock.wm) I found the idrac light blinking rapidly in amber. Quick Sync is not responding. I tried rebooting just the idrac but it hasn't helped. The next troubleshooting step is to reboot the server. @ssingh... [13:51:24] 10Traffic, 10ops-codfw: lvs2013 ManagementSSHDown - https://phabricator.wikimedia.org/T340960 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=f6099155-97b3-49c3-9c11-36962a3c834b) set by vgutierrez@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: mgmt interface issu... [14:19:01] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Set a limit to the number of allowed active connections via runtime key overload.global_downstream_max_connections - https://phabricator.wikimedia.org/T340955 (10akosiaris) >>! In T340955#8989979, @JMeybohm wrote: > `max(sum by (instance) (envo... [14:35:38] 10Traffic, 10ops-codfw: lvs2013 ManagementSSHDown - https://phabricator.wikimedia.org/T340960 (10ssingh) 05Open→03Resolved a:03ssingh Thanks to @Jhancock.wm for the quick resolution of this issue! [14:41:52] 10Traffic, 10SRE, 10envoy, 10serviceops, 10Patch-For-Review: Set a limit to the number of allowed active connections via runtime key overload.global_downstream_max_connections - https://phabricator.wikimedia.org/T340955 (10JMeybohm) I've reduced the limit to 50k (which is what https://www.envoyproxy.io/d... [15:43:13] 10Domains, 10SRE: Mark Monitor administration panel (redirects for wikimedia.pl) - https://phabricator.wikimedia.org/T333827 (10Dzahn) Pretty sure that SRE is needed to add this domain to DNS and create redirects. Whether control of a single domain in MarkMonitor can be handed over to another tenant, I am dou... [18:42:46] 10Traffic, 10SRE: Reduce toil in provisioning and decommissioning of DNS/NTP servers by automating generation of resolv.conf and NTP peers - https://phabricator.wikimedia.org/T340479 (10ssingh) 05Open→03Resolved a:03ssingh With the two commits above, this data is automatically generated instead of the ma... [19:51:11] 10Acme-chief, 10SRE, 10Traffic-Icebox: Decide/document criteria needed to serve acme-chief LE issued unified certificate to end users - https://phabricator.wikimedia.org/T230687 (10BCornwall) 05Stalled→03Resolved For lack of a response, I'm going to close this. @Vgutierrez please do re-open if this isn't...