[00:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [04:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [07:16:11] 10netops, 10Infrastructure-Foundations, 10conftool, 10ops-codfw, 10serviceops: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10Joe) a:03Joe [07:19:29] 10netops, 10Infrastructure-Foundations, 10conftool, 10ops-codfw, and 2 others: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10ayounsi) [07:50:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10conftool, and 2 others: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10Joe) p:05Unbreak!→03High The situation is as follows: * I depooled codfw from mediawiki; before repooling, we'll need to do a scap pull on... [08:16:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10conftool, and 2 others: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10Joe) Confirmed that now scap works and we can do deployments normally. Please @papaul @ayounsi ping serviceops so that we can bring things back... [08:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [12:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [12:41:45] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad, 10Sustainability (Incident Followup): eqiad: upgrade row C and D uplinks from 4x10G to 1x40G - https://phabricator.wikimedia.org/T313463 (10Jclark-ctr) https://netbox.wikimedia.org/dcim/cables/5899 Cable is Ran and plugged in [12:48:36] (PurgedHighEventLag) firing: High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5018 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:53:36] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5018 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:57:58] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi) [14:45:50] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: eqiad: Move links to new MPC7E linecard - https://phabricator.wikimedia.org/T304712 (10ayounsi) Thanks John and Papaul, as soon as Netbox is updated this can be closed! [16:04:17] 10Traffic, 10API Platform, 10SRE, 10Discovery-Search (Current work): Generic strategy to deal with high volume / expensive traffic from cloud providers - https://phabricator.wikimedia.org/T326782 (10Gehel) [16:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [20:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown