[01:03:16] (VarnishChildRestarted) firing: varnish-upload restarted on cp5032 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp5032&datasource=eqsin%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [05:03:16] (VarnishChildRestarted) firing: varnish-upload restarted on cp5032 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp5032&datasource=eqsin%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [09:03:16] (VarnishChildRestarted) firing: varnish-upload restarted on cp5032 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp5032&datasource=eqsin%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [09:26:47] 10Traffic, 10SRE: oom killed varnish on cp4052 - https://phabricator.wikimedia.org/T325797 (10Vgutierrez) [09:33:16] (VarnishChildRestarted) resolved: varnish-upload restarted on cp5032 - https://wikitech.wikimedia.org/wiki/Varnish - https://grafana.wikimedia.org/d/000000330/varnish-machine-stats?orgId=1&viewPanel=66&var-server=cp5032&datasource=eqsin%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DVarnishChildRestarted [12:59:35] (PurgedHighEventLag) firing: High event process lag with purged on cp5020:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://grafana.wikimedia.org/d/RvscY1CZk/purged?var-datasource=eqsin%20prometheus/ops&var-instance=cp5020 - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:04:35] (PurgedHighEventLag) resolved: (2) High event process lag with purged on cp5020:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [13:43:27] uh? [13:49:16] connectivity issue between eqsin and kafka cluster in eqiad [18:58:22] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) >>! In T325557#8478613, @Vgutierrez wrote: > * Monitoring issue: CPU seconds for haproxy, varnish and ATS is reported as 0 on bullseye hosts: https://grafana.wikimedia.org/goto/eCGKNUc4k?orgId=1,... [19:33:43] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission cp4029 - https://phabricator.wikimedia.org/T321340 (10wiki_willy) a:03RobH [19:34:33] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqsin, 10Wikimedia-Incident: asw1-eqsin: VC mastership change - https://phabricator.wikimedia.org/T323094 (10wiki_willy) a:03RobH [19:35:39] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission cp4029 - https://phabricator.wikimedia.org/T321340 (10RobH) [19:35:49] 10Traffic, 10SRE, 10decommission-hardware, 10ops-ulsfo: decommission cp4029 - https://phabricator.wikimedia.org/T321340 (10RobH) 05Open→03Resolved This was taken care of during the recycling, when I completed the remaining checklist steps in unison with the other decom cp hosts there. resolving. [19:35:52] 10Traffic, 10DC-Ops, 10SRE, 10ops-ulsfo, 10Patch-For-Review: Q1:rack/setup/install cp40[37-51] - https://phabricator.wikimedia.org/T317244 (10RobH) [19:39:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-eqsin, 10Wikimedia-Incident: asw1-eqsin: VC mastership change - https://phabricator.wikimedia.org/T323094 (10RobH) a:05RobH→03ayounsi I think this actually should be over to Arzhel, as the mastership change is something they're fixing with the upgr... [20:08:33] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) After an hour of writing the above and digging around some more, I think I am even less convinced about my own theory about the cause of this issue. One of the main reasons being that I haven't fo... [20:12:46] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) >>! In T325557#8496155, @ssingh wrote: > After an hour of writing the above and digging around some more, I think I am even less convinced now about my own theory about the cause of this issue. On... [21:23:20] 10netops, 10Infrastructure-Foundations, 10ops-drmrs: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH)