[07:54:53] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-drmrs: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10ayounsi) @RobH I don't think there is a need to depool the site as optic are hot-swappable the risk of killing the router is quite low (unless they start hammering at th... [09:45:35] (PurgedHighEventLag) firing: (5) High event process lag with purged on cp5021:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [09:50:35] (PurgedHighEventLag) firing: (18) High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [09:55:35] (PurgedHighEventLag) resolved: (13) High event process lag with purged on cp5018:2112 - https://wikitech.wikimedia.org/wiki/Purged#Alerts - https://alerts.wikimedia.org/?q=alertname%3DPurgedHighEventLag [15:12:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Set consistent MTUs - https://phabricator.wikimedia.org/T315838 (10ayounsi) a:05cmooney→03ayounsi Last ones are the Fundraising Infrastructure related links (between cr, pfw and fasw). As most of them are not managed by Netbox, I ignore... [16:42:29] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops: Set consistent MTUs - https://phabricator.wikimedia.org/T315838 (10Dwisehaupt) @ayounsi That window is perfect. I'll add it to our list for the week to make sure we don't forget it. [19:08:31] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) OK, I think I finally found the issue and also confirmed the fix on `traffic-cache-bullseye.traffic.eqiad1.wikimedia.cloud`. The TL;DR is that Debian bullseye supports cgroup v2 by default whereas... [19:24:26] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) Metrics (default on bullseye, cgroup v2): ` container_cpu_system_seconds_total{id="/system.slice/varnish-frontend-fetcherr.service"} 0 1672859971947 container_cpu_system_seconds_total{id="/system... [19:34:13] 10netops, 10Infrastructure-Foundations, 10SRE, 10ops-drmrs: cr2-drmrs:xe-0/1/1 stuck optic - https://phabricator.wikimedia.org/T324555 (10RobH) Well, its pretty much the same on all SFP and XFP optics, looking at the one on my desk right now. The lever engages a very, very small metal plate that engages t... [19:44:22] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) https://salsa.debian.org/systemd-team/systemd/-/commit/170fb124a32884bd9975ee4ea9e1ffbbc2ee26b4 ` - -Ddefault-hierarchy=hybrid \ + -Ddefault-hierarchy=unified \ ` [19:47:42] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10BCornwall) `systemd.unified_cgroup_hierarchy=0` enables systemd's "hybrid" mode, meaning that both v1 and v2 are enabled. Systemd makes its opinion very clear at https://systemd.io/CGROUP_DELEGATION/: >... [19:53:58] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) >>! In T325557#8499916, @BCornwall wrote: > `systemd.unified_cgroup_hierarchy=0` enables systemd's "hybrid" mode, meaning that both v1 and v2 are enabled. Systemd makes its opinion very clear at h... [20:05:00] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10BCornwall) Another data point: It seems that some metrics are lost when moving to v2: https://github.com/google/cadvisor/issues/3062 > On nodes running cgroup v1 the following metrics such as container_... [21:32:19] 10Traffic, 10SRE: Review cp2041 and cp2042 running bullseye - https://phabricator.wikimedia.org/T325557 (10ssingh) >>! In T325557#8499972, @BCornwall wrote: > Another data point: It seems that some metrics are lost when moving to v2: > > https://github.com/google/cadvisor/issues/3062 > >> On nodes running cg...