[00:31:33] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10observability: icinga config error for new rows E/R - https://phabricator.wikimedia.org/T302940 (10RobH) [00:34:37] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10observability: icinga config error for new rows E/R - https://phabricator.wikimedia.org/T302940 (10RobH) Failed to run Homer on lsw1-f1-eqiad.mgmt.eqiad.wmnet: Command '['/usr/local/bin/homer', 'lsw1-f1-eqiad.mgmt.eqiad.wmnet', 'commit', 'Ho... [00:39:22] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10observability: icinga config error for new rows E/R - https://phabricator.wikimedia.org/T302940 (10Dzahn) When this host was installed and added to Icinga config by puppet, it broke Icinga config. The error was: ` Error: 'lsw1-f1-eqiad.mgmt.eqi... [05:39:56] (EdgeTrafficDrop) firing: 66% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org [05:44:56] (EdgeTrafficDrop) resolved: 66% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org [06:10:56] (EdgeTrafficDrop) firing: 65% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [06:15:56] (EdgeTrafficDrop) resolved: 66% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [06:50:56] (EdgeTrafficDrop) firing: (2) 34% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org [06:55:56] (EdgeTrafficDrop) resolved: (4) 52% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://alerts.wikimedia.org [07:38:13] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster [08:20:00] 10Traffic, 10SRE, 10Patch-For-Review, 10Performance-Team (Radar): Test haproxy as a WMF's CDN TLS terminator with real traffic - https://phabricator.wikimedia.org/T290005 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1001 for host cp4034.ulsfo.wmnet with OS buster c... [09:36:00] 10netops, 10Infrastructure-Foundations, 10SRE: all network devices must run OpenSSH >= 7.2p1 but != 7.4p1 - https://phabricator.wikimedia.org/T254013 (10ayounsi) Slightly related, as of today those devices don't support ssh-ed25519: (11) asw2-b-eqiad.mgmt.eqiad.wmnet,asw2-c-eqiad.mgmt.eqiad.wmnet,asw2-d-eqi... [09:45:56] (EdgeTrafficDrop) firing: 67% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org [09:50:56] (EdgeTrafficDrop) resolved: 67% request drop in text@ulsfo during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=ulsfo&var-cache_type=text - https://alerts.wikimedia.org [10:41:03] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10observability: icinga config error for new rows E/R - https://phabricator.wikimedia.org/T302940 (10ayounsi) https://gerrit.wikimedia.org/r/c/operations/puppet/+/764791 should fix the issue. About hostname vs. FQDN is because the devices use LLDP... [10:41:56] (EdgeTrafficDrop) firing: 60% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [10:59:21] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10observability: icinga config error for new rows E/R - https://phabricator.wikimedia.org/T302940 (10cmooney) @robh apologies for this, I was working on an improved version of the CR Arzhel lists above yesterday. But it should have occurred to me... [11:16:56] (EdgeTrafficDrop) resolved: 69% request drop in text@codfw during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=codfw&var-cache_type=text - https://alerts.wikimedia.org [13:57:41] 10Traffic, 10SRE, 10WMF-General-or-Unknown: Failure to produce an image at specified resolution - https://phabricator.wikimedia.org/T302979 (10Zabe) [13:58:08] 10Traffic, 10SRE, 10Thumbor, 10WMF-General-or-Unknown: Failure to produce an image at specified resolution - https://phabricator.wikimedia.org/T302979 (10Zabe) [15:12:36] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: (Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10RobH) p:05Medium→03High @nskaggs, These hosts were ordered without a fully filed racking task (I meant to do it before order), so... [15:12:48] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: (Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10RobH) [15:15:13] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10RobH) [15:24:31] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, and 2 others: Q3:(Need By: TBD) rack/setup/install 2 new labstore hosts - https://phabricator.wikimedia.org/T302981 (10RobH) [15:59:17] 10Traffic, 10Observability-Metrics, 10SRE: Port Traffic dashboards to Thanos - https://phabricator.wikimedia.org/T302266 (10MMandere) [19:44:24] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10observability: icinga config error for new rows E/R - https://phabricator.wikimedia.org/T302940 (10cmooney) 05Open→03Resolved dumpsdata1007 looks good in Icinga now after being re-added, following the above patches being merged. Apologies fo...