[06:28:15] 10Traffic, 10Maps, 10SRE: Allow Wikimedia Maps usage on Wikimedia Commons Android app - https://phabricator.wikimedia.org/T349280 (10Nicolas_Raoul) Any blocker for approval? We have already implemented the code in the app (switched from Mapbox to osmdroid library), but are waiting for your approval here bef... [08:31:12] 10Acme-chief, 10Traffic: Provide second acmechief server configured for Puppet 7 in eqiad - https://phabricator.wikimedia.org/T352242 (10MoritzMuehlenhoff) There's one active alert, is that known/expected? FILE_AGE CRITICAL: File not found - /var/lib/acme-chief/certs/.rsync.status [09:02:31] 10Acme-chief, 10Traffic: Provide second acmechief server configured for Puppet 7 in eqiad - https://phabricator.wikimedia.org/T352242 (10Vgutierrez) 05Resolved→03Open >>! In T352242#9412109, @MoritzMuehlenhoff wrote: > There's one active alert, is that known/expected? > > FILE_AGE CRITICAL: File not found... [09:16:41] 10Acme-chief, 10Traffic, 10Patch-For-Review: Provide second acmechief server configured for Puppet 7 in eqiad - https://phabricator.wikimedia.org/T352242 (10Vgutierrez) 05Open→03Resolved @MoritzMuehlenhoff Sorry about that, acmechief1002 is now ready for service :) [10:09:29] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Fabfur) I'm reimaging cp4037 today [10:17:46] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye [11:33:34] don't know if it's related to the acmechief work but first puppetization on cp4037 (reimaged w/ puppet 5) fails with [11:33:35] Dec 18 10:55:34 cp4037 puppet-agent[2357]: (/Stage[main]/Profile::Cache::Haproxy/Acme_chief::Cert[unified]/File[/etc/acmecerts/unified]) Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate): [unable to get local issuer certificate for /CN=acmechief2002.codfw.wmnet] [11:35:38] fabfur: the cp and lvs hosts in ulsfo have been converted to Puppet 7 (as canaries before doing the rest in January), as such you'll need to reimage cp4037 with Puppet 7 [11:35:50] ack! [11:35:55] thanks [11:36:11] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye executed with errors: - cp4037 (**FAIL**) - Removed from... [11:36:42] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye [11:37:07] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye executed with errors: - cp4037 (**FAIL**) - Downtimed on... [11:38:05] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye [11:45:43] but the reimage takes the value automatically from hiera. fabfur how did you get it istalled with puppet5? [11:45:56] unless is a new host [11:46:15] I passed the `--new` option [11:46:40] because it's a new host or failed a previous reimage? [11:47:46] nope I passed the `--new` knowing that in case will be removed by the cookbook [11:48:29] but in this case I think the cookbook assumed that was "really" new and didn't read the hiera for puppet version [11:48:30] lol, I guess that was a corner case not well thought with the puppet5/7 stuff [11:48:33] my bad [11:49:04] well, corner case maybe is too much, if the host is not new, I shouldn't have passed the `--new` flag :) [11:49:44] anyway I'll check it later, thx for the clarification [11:51:23] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye executed with errors: - cp4037 (**FAIL**) - Removed from... [11:51:35] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye [12:19:53] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye executed with errors: - cp4037 (**FAIL**) - Removed from... [12:20:22] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye [13:08:11] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by fabfur@cumin1002 for host cp4037.ulsfo.wmnet with OS bullseye completed: - cp4037 (**PASS**) - Removed from Puppet and... [15:12:37] 10Traffic, 10SRE-swift-storage: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744 (10jhathaway) wolfssl is packaged in Debian, so that may be a possible option longer term, https://tracker.debian.org/pkg/wolfssl. [15:14:06] XS patch to remove an unused CNAME from DNS repo if anyone has time to look: https://gerrit.wikimedia.org/r/c/operations/dns/+/983725 [15:18:30] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Fabfur) cp4037 repooled [15:43:41] 10Traffic, 10SRE-swift-storage: OpenSSL 3.x performance issues - https://phabricator.wikimedia.org/T352744 (10MoritzMuehlenhoff) >>! In T352744#9413140, @jhathaway wrote: > wolfssl is packaged in Debian, so that may be a possible option longer term, https://tracker.debian.org/pkg/wolfssl. wolfssl isn't fully... [15:53:18] moritzm: debian provides minimal versions of their linux kernels? something enough to boot on qemu? [15:55:29] there is a cloud image, which has less drivers enabled compared to the default image, but it's not totally reduced minimal image [15:56:05] moritzm: I wanna try the approach used by Brad F. on https://www.youtube.com/watch?app=desktop&v=69Zy77O-BUM [15:56:27] basically using qemu to run integration tests for eBPF programs [15:56:44] so I can target several kernels [16:00:26] I dig his quick pace. No reading off slides [16:03:23] 10Traffic: tcp-mss-clamper doesn't work on bullseye / kernel 5.10 - https://phabricator.wikimedia.org/T353657 (10Vgutierrez) [16:03:42] 10Traffic: tcp-mss-clamper doesn't work on bullseye / kernel 5.10 - https://phabricator.wikimedia.org/T353657 (10Vgutierrez) p:05Triage→03Medium [16:13:17] 10Traffic, 10DC-Ops: cp4037 reimage for cookbook getting stuck at PXE boot - https://phabricator.wikimedia.org/T352876 (10Vgutierrez) 05Open→03Resolved a:03Fabfur [16:20:08] https://www.irccloud.com/pastebin/R4pl2zqr/ [16:21:06] moritzm: hmm following https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.3 yields that error on a bullseye VM... bullseye kernel can't be built on bullseye? [17:35:44] 10Traffic, 10Data-Engineering, 10Observability-Logging: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Ottomata) [17:41:18] 10Traffic, 10Data-Engineering, 10Observability-Logging: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Ottomata) Hey all, whatever the chosen solution for producing logs, we need a nice migration plan. I just talked with @Ahoelzl, and we realized that it would prob... [17:42:19] 10Traffic, 10Data-Engineering, 10Observability-Logging: Move analytics log from Varnish to HAProxy - https://phabricator.wikimedia.org/T351117 (10Ottomata) I'm going to work on T314956 now, before I go on sabbatical + parent leave in January. (I'll be out Jan 8 - April 12 2024.) I hope to get the new stream... [21:17:11] 10Traffic, 10GitLab (Project Migration), 10Patch-For-Review: Migrate Traffic repositories from Gerrit to Gitlab - https://phabricator.wikimedia.org/T347623 (10BCornwall)