[04:50:57] (EdgeTrafficDrop) firing: 64% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [04:55:57] (EdgeTrafficDrop) resolved: 68% request drop in text@eqiad during the past 30 minutes - https://wikitech.wikimedia.org/wiki/Monitoring/EdgeTrafficDrop - https://grafana.wikimedia.org/d/000000479/frontend-traffic?viewPanel=12&orgId=1&from=now-24h&to=now&var-site=eqiad&var-cache_type=text - https://alerts.wikimedia.org [11:59:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10Puppet, 10User-jbond: Investigate improvements to how puppet manages network interfaces - https://phabricator.wikimedia.org/T234207 (10jbond) [12:00:32] 10Traffic, 10SRE, 10Wikimedia Enterprise, 10Wikimedia Enterprise Discussion: Allow-Listing for Enterprise IPs - https://phabricator.wikimedia.org/T294798 (10AnnaMikla) [14:18:02] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) Hey Guys, The cabling plan for the switch->switch cabling in the new Eqiad cage should be as follows: ` LSW1-E1 Links: LSW1-E... [16:14:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp4033:9331 is unreachable - https://alerts.wikimedia.org [16:18:09] 10netops, 10Infrastructure-Foundations: Add peering sessions on cr1-eqiad Equinix port - https://phabricator.wikimedia.org/T294948 (10ayounsi) p:05Triage→03Medium [16:19:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp4033:9331 is unreachable - https://alerts.wikimedia.org [16:20:27] ^^ we're reimaging the ulsfo instances again cp403[3-6] after discovering we'd not updated the underlying hardware category... no cause for alarm https://gerrit.wikimedia.org/r/c/operations/puppet/+/736475 [16:22:08] thanks! [16:52:39] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4033.ulsfo.wmnet with OS buster completed: - cp4033 (**WARN**... [17:05:37] the buster in the cookbook output reminds me: we're due to start our first bullseye upgrades sometime soon. Maybe next quarter, we can start looking at some "easier" clusters like the dns/ntp boxes or something. [17:07:54] 10netops, 10Infrastructure-Foundations, 10SRE: Management routers: use BGP instead of OSPF - https://phabricator.wikimedia.org/T294845 (10cmooney) Looks good. As discussed on irc I think the second term in "BGP_production" on mr1 isn't needed, although given how it works can hardly blame you for putting it... [17:50:55] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by mmandere@cumin1001 for host cp4035.ulsfo.wmnet with OS buster [18:00:57] (VarnishPrometheusExporterDown) firing: Varnish Exporter on instance cp4035:9331 is unreachable - https://alerts.wikimedia.org [18:25:57] (VarnishPrometheusExporterDown) resolved: Varnish Exporter on instance cp4035:9331 is unreachable - https://alerts.wikimedia.org [18:51:39] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by mmandere@cumin1001 for host cp4035.ulsfo.wmnet with OS buster completed: - cp4035 (**WARN**... [20:30:19] 10Traffic, 10serviceops, 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): Create and host assetlinks.json file. (Android 12 deeplinking support) - https://phabricator.wikimedia.org/T294776 (10Dzahn) [20:33:06] 10Traffic, 10serviceops, 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): Create and host assetlinks.json file. (Android 12 deeplinking support) - https://phabricator.wikimedia.org/T294776 (10Dzahn) Hi Traffic, the question here is.. how come we get a redirect at the caching level from bare domai... [20:35:11] 10Traffic, 10serviceops, 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): Create and host assetlinks.json file. (Android 12 deeplinking support) - https://phabricator.wikimedia.org/T294776 (10Dbrant) (@Dzahn The `spec.yaml` file is also redirecting) [22:05:58] bblack: any chance you're around? there's context in #wikimedia-sre but I could use some help understanding how to shut up `PyBal IPVS diff check` [22:06:28] TL;DR is that the `wcqs*` hosts are in a bad state, and we want to do whatever's necessary to get monitoring to stop checking pybal / ipvs for wcqs [22:06:55] To that end I removed the discovery DNS entry (incl the `authdns merge` step) and state changed the service from `production` to `lvs_setup` [22:07:31] But the alerts are still there, and I'm not sure if it's because I didn't clear away the corresponding conftool entries that were originally added in this step: https://wikitech.wikimedia.org/wiki/LVS#For_active/active_services