[08:16:04] hi folks, I'm working on deploying cadvisor fleetwide as part of T108027, to that end I've send out https://gerrit.wikimedia.org/r/c/operations/puppet/+/920661 to disable high cardinality metrics, who's best to review ? [08:16:05] T108027: Collect per-cgroup cpu/mem and other system level metrics - https://phabricator.wikimedia.org/T108027 [09:26:14] 10netops, 10Infrastructure-Foundations, 10SRE, 10netbox: Represent sub-interface and bridge device assocations in Netbox - https://phabricator.wikimedia.org/T296832 (10cmooney) The updated PuppetDB -> Netbox import script has now been merged, and I've run it against all servers in Netbox in state 'active'... [11:11:11] 10netops, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Configure cloudsw1-b1-codfw and migrate cloud hosts in codfw B1 to it - https://phabricator.wikimedia.org/T327919 (10cmooney) [11:11:21] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, 10cloud-services-team: Move cloud vps ns-recursor IPs to host/row-independent addressing - https://phabricator.wikimedia.org/T307357 (10cmooney) [11:12:09] 10netops, 10Cloud-VPS, 10Infrastructure-Foundations, 10SRE, and 2 others: cloudservices[2004/2005]-dev & cloudweb2002-dev: connect them to cloudsw so they can have cloud-private vlan - https://phabricator.wikimedia.org/T336587 (10cmooney) 05Open→03Resolved Couple of niggles getting this going on the ho... [11:39:30] godog: I believe sukhe will take a look at this in his morning... [11:48:57] kwakuofori: thank you! appreciate it [12:12:17] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [12:17:34] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [12:19:24] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [12:24:17] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [12:24:41] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [12:44:15] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [12:44:37] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [12:51:11] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [12:51:32] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [12:56:55] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [12:57:11] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [12:57:32] godog: thanks, will look shortly! [13:00:30] sukhe: cheers! [13:02:51] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [13:20:03] godog: looks OK to me but will check with valentin once in case we are using this metric somewhere [13:20:06] (will comment on the task0 [13:20:09] s/0/) [13:20:24] :) [13:20:37] https://gerrit.wikimedia.org/r/c/operations/puppet/+/920661/3/modules/prometheus/manifests/cadvisor.pp [13:20:45] vgutierrez: anything you see here that we care about? [13:21:08] let me double check [13:21:20] thank you <3 [13:21:51] so regarding CPU we render container_cpu_system_seconds_total on several dashboards [13:22:20] but we don't split it by CPU [13:22:31] yeah [13:22:34] so AFAIK is ok for us [13:22:43] that was my understanding as well but wanted to be sure [13:23:55] godog: I am sure you know but we (Traffic) worked on cadvisor 0.44 and that's what we are running on the cp hosts [13:24:04] oh wait, you are talking about that too, ok :) [13:24:49] I was uncertain because T108027 seems to be an old ticket but I think the description was edited to reflect 0.44 [13:24:49] T108027: Collect per-cgroup cpu/mem and other system level metrics - https://phabricator.wikimedia.org/T108027 [13:32:21] sukhe: yes that's correct! [13:32:26] thank you for checking cc vgutierrez [13:32:39] godog: +1ed as in for traffic looks good [13:33:38] \o/ cheers [14:30:22] 10Traffic, 10Patch-For-Review: Write a cookbook to handle restarts of Wikimedia DNS - https://phabricator.wikimedia.org/T335533 (10ssingh) >>! In T335533#8860861, @BCornwall wrote: > I don't agree with abstracting the After=/BindsTo= and would rather just hard-code it. But since bird did it that way... I gues... [14:31:27] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [14:31:50] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm [14:41:12] 10Traffic, 10SRE: Add systemd-level service bindings for Wikimedia DNS - https://phabricator.wikimedia.org/T336792 (10ssingh) [15:18:31] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10cmooney) >>! In T320508#8488549, @ayounsi wrote: > Marking this task dependent on DHCP option 97 to reduce the risk of DHCP oddities related to Option 82. Ironic I hadn't... [15:19:16] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10cmooney) 05Open→03Resolved [15:20:27] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Consolidate Automation Templates for DC Switches - https://phabricator.wikimedia.org/T312635 (10cmooney) [15:20:50] 10netops, 10Infrastructure-Foundations, 10SRE: Allow managing drmrs DHCP settings with Homer - https://phabricator.wikimedia.org/T328737 (10cmooney) 05Open→03Resolved Complete now after merging above patch. [15:25:41] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bookworm executed with errors: - sretest1002... [15:37:41] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye [15:46:29] 10Traffic, 10Infrastructure-Foundations, 10SRE, 10Performance-Team (Radar): Mapping Client IPs to Resolver IPs - https://phabricator.wikimedia.org/T336947 (10JameelKaisar) [16:10:10] 10netops, 10Infrastructure-Foundations, 10SRE: Core routers: replace bootp with dhcp-relay - https://phabricator.wikimedia.org/T320508 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye completed: - sretest1002 (**PASS**)... [18:19:44] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Update network SSH keys to ssh-ed25519 - https://phabricator.wikimedia.org/T336769 (10ayounsi) [18:50:59] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Migrate row E/F network aggregation to dedicated Spine switches - https://phabricator.wikimedia.org/T322937 (10cmooney) lvs1020 is currently the "secondary" lvs in eqiad, so I'd propose we start with trying to do that one if we can. It's c... [18:55:23] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) @Jclark-ctr hey. It's taken a bit of time to line this up, hit a few bumps in the road with the Juniper config. As detailed in T3... [18:56:18] 10Traffic, 10SRE: Add systemd-level service bindings for Wikimedia DNS - https://phabricator.wikimedia.org/T336792 (10ssingh) [19:01:45] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10Jclark-ctr) @cmooney i am available tomorrow if you would like to address it that quickly. otherwise monday [19:54:55] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) Rows E/F network racking task - https://phabricator.wikimedia.org/T292095 (10cmooney) @Jclark-ctr thanks yeah I just had a word with @ssingh and I think tomorrow if probably possible. What time suits you to be on site? [20:18:24] 10Traffic: dnsbox: Add gdnsd to bird's BindsTo systemd service - https://phabricator.wikimedia.org/T336973 (10BCornwall) [20:19:10] 10Traffic, 10Patch-For-Review: dnsbox: Add gdnsd to bird's BindsTo systemd service - https://phabricator.wikimedia.org/T336973 (10BCornwall) 05Open→03In progress p:05Triage→03Low