[00:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [00:40:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10conftool, and 2 others: Scap deploy failed to depool codfw servers - https://phabricator.wikimedia.org/T327041 (10Papaul) @Joe will do [04:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [05:25:46] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers racked in U27 in all racks in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10Papaul) @ayounsi since A1 and A8 are supposed to be our network racks I will prefer possible to put one spine in A1 and the other s... [07:26:52] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-codfw: codfw: Relocate servers racked in U27 in all racks in rowA and rowB - https://phabricator.wikimedia.org/T326564 (10ayounsi) I see, what would be the best later on for the rows C and D spines? C1/C8 or C1/`D1` ? Is using A1/A8 better for eqiad as w... [08:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [12:22:50] 10Acme-chief, 10Traffic, 10SRE: Ci check for acme-chief changes - https://phabricator.wikimedia.org/T326942 (10LSobanski) [12:24:16] (VarnishPrometheusExporterDown) firing: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [12:55:36] (PurgedHighEventLag) firing: High event process lag with purged on cp4043:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=ulsfo%20prometheus/ops&var-instance=cp4043 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [12:59:16] (VarnishPrometheusExporterDown) firing: (4) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [13:00:20] (PurgedHighEventLag) firing: (9) High event process lag with purged on cp2029:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:37:00] (VarnishPrometheusExporterDown) firing: (16) Varnish Exporter on instance cp2027:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [13:38:17] (PurgedHighEventLag) resolved: (18) High event process lag with purged on cp2029:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [14:02:47] (PurgedHighEventLag) firing: (29) High event process lag with purged on cp2029:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [14:05:21] (PurgedHighEventLag) firing: (33) High event process lag with purged on cp2029:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [14:08:18] (PurgedHighEventLag) resolved: (20) High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:34:16] (VarnishPrometheusExporterDown) resolved: (2) Varnish Exporter on instance cp2031:9331 is unreachable - TODO - TODO - https://alerts.wikimedia.org/?q=alertname%3DVarnishPrometheusExporterDown [15:35:14] (PurgedHighEventLag) firing: (2) High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:36:27] (PurgedHighBacklogQueue) firing: Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [16:07:24] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Move WMCS servers to 1 NIC - https://phabricator.wikimedia.org/T319184 (10aborrero) 05In progress→03Stalled a:05aborrero→03None [16:15:06] 10Traffic, 10SRE, 10Patch-For-Review, 10Upstream: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) Thanks to Faidon's suggestion of building against 0.44.0 and not 0.46.0, we have a working cadvisor 0.44.0 build for bullseye/sid, which has been merged above. [16:19:35] (PurgedHighEventLag) resolved: (4) High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:19:50] (PurgedHighEventLag) firing: (4) High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:20:05] (PurgedHighEventLag) resolved: High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [16:20:35] (PurgedHighBacklogQueue) resolved: (2) Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [18:44:00] sukhe: bblack ping me when we are out of the woods so I can give a quick show over of what will change [18:46:09] thanks! let's just wait for this to get over I guess [18:46:14] meanwhile I will continue to play with ja & fr to get a better idea of what's actually new [18:57:35] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [18:57:35] (PurgedHighBacklogQueue) resolved: (2) Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [18:59:50] (TrafficServerRestarted) firing: ATS backend server restarted on cp2031:9122 - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server - https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=codfw&var-instance=cp2031&var-layer=backend - https://alerts.wikimedia.org/?q=alertname%3DTrafficServerRestarted [19:03:24] 10Traffic, 10SRE: Start warning and deprecation process for all legacy TLS - https://phabricator.wikimedia.org/T238038 (10BCornwall) [19:07:04] 10Traffic, 10SRE: Start warning and deprecation process for all legacy TLS - https://phabricator.wikimedia.org/T238038 (10BCornwall) Looks like this can be closed, right @Vgutierrez? [19:09:46] 10Traffic, 10SRE: Start warning and deprecation process for all legacy TLS - https://phabricator.wikimedia.org/T238038 (10BCornwall) [20:55:35] (PurgedHighEventLag) firing: High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [20:55:35] (PurgedHighBacklogQueue) firing: Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [21:40:35] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [21:40:35] (PurgedHighBacklogQueue) resolved: (2) Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue [22:29:50] (TrafficServerRestarted) resolved: ATS backend server restarted on cp2031:9122 - https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server - https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=codfw&var-instance=cp2031&var-layer=backend - https://alerts.wikimedia.org/?q=alertname%3DTrafficServerRestarted [22:30:07] (LVSHighRX) firing: Excessive RX traffic on lvs6001:9100 (ens3f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs6001 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [22:35:07] (LVSHighRX) resolved: Excessive RX traffic on lvs6001:9100 (ens3f0np0) #page - https://bit.ly/wmf-lvsrx - https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs6001 - https://alerts.wikimedia.org/?q=alertname%3DLVSHighRX [22:35:26] ^ https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=lvs6001&from=1673993973835&to=1673994748076&viewPanel=8 [23:31:35] (PurgedHighEventLag) firing: High event process lag with purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [23:31:35] (PurgedHighBacklogQueue) firing: Large backlog queue for purged on cp2031:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=codfw%20prometheus/ops&var-instance=cp2031 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighBacklogQueue