[07:41:29] 10Traffic, 10SRE, 10observability: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) Regarding HAProxy reload process, basically HAProxy spawns a new process and hands over all the file descriptors to the new process (that's been started with the new configur... [07:54:07] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10MatthewVernon) Here are slightly nicer figures (more sf, which means the lines are rather more accurate) - the frequency distribu... [11:10:09] 10Traffic, 10SRE, 10observability, 10Patch-For-Review: HAProxy metrics go down on config reload - https://phabricator.wikimedia.org/T343000 (10Vgutierrez) 05Open→03Stalled After disabling KA, `haproxy_frontend_connections_total{proxy="stats"}` starts to increase as expected: {F37156880} Let's wait 24h... [12:46:07] 10Traffic, 10Performance-Team, 10SRE, 10SRE-swift-storage, 10Patch-For-Review: Automatically clean up unused thumbnails in Swift - https://phabricator.wikimedia.org/T211661 (10ori) >>! In T211661#9054485, @MatthewVernon wrote: > The other thing I can't quite leave alone is - why are we being asked for so... [13:08:20] 10Traffic, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) [13:08:26] 10Traffic, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Fabfur) We finally managed to reinstall lvs1016, thanks for all the support! [13:08:55] 10Traffic, 10ops-eqiad: Relocate lvs1013-lvs1016 to rows E & F - https://phabricator.wikimedia.org/T341992 (10Vgutierrez) 05Open→03Resolved [13:08:59] 10Traffic: Replace current L4LB with with Katran-based alternative - https://phabricator.wikimedia.org/T332027 (10Vgutierrez) [13:30:10] XioNoX, topranks: what would be the best way of monitoring the health of irb-1031.lsw1-e1-eqiad.eqiad.wmnet? librenms? [13:34:15] vgutierrez: that is a good question, what kind of monitoring are you thinking? [13:34:38] off the top of my head adding a ping check in Icinga similar to how we do for our OOB IPs might be an idea: [13:34:48] https://gerrit.wikimedia.org/g/operations/puppet/+/refs/changes/61/921261/2/modules/netops/manifests/monitoring.pp#62 [13:35:04] https://librenms.wikimedia.org/device/device=225/tab=port/port=26685/ --> this seems enough for monitoring while hammering it [13:35:29] that creates a simple ping check in Icinga. We could also look at some of the black box checks and see if they could be used for it [13:36:16] vgutierrez: those graphs are for the port connecting lvs1013, is that what you need to monitor? [13:36:36] yep.. see how much love is lvs1013 getting [13:36:48] ok yep in that case those graphs are definitely best [13:37:07] that and of course overall health for the switch, but I doubt I can make the device sweat using just a 10G NIC [13:37:15] irb.1031 is a virtual interface inside the switch, but I don't think you'd need to monitor that for anything [13:37:19] (open to correction) [13:37:53] yeah, I got that FQDN after checking lvs1013 routing table, but I was talking about the device itself, not the specific interface [13:38:00] vgutierrez: the switches should switch and route at line level even with 64-byte frames, so yeah hard to make it sweat [13:38:17] good :D [13:38:22] what may come in to it is a bottle-neck, like multiple sources sending 10G worth of traffic towards a single destination etc. [13:38:51] those have 100G uplinks, port 54 and 55 [13:39:15] https://librenms.wikimedia.org/device/device=225/tab=port/port=22471/ [13:39:22] https://librenms.wikimedia.org/device/device=225/tab=port/port=22472/ [13:39:45] So you can also keep an eye there, we will get alerts at 80% though and a single server is not likely to push it that far [13:41:53] vgutierrez: for the device itself you can check the "health" tab in LibreNMS to get CPU usage, memory etc. [13:42:12] but for regular traffic it is switched in hardware and the amount won't affect the usage of those [13:44:58] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [14:08:39] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10ssingh) [14:21:37] topranks: hmm interesting [14:29:43] hi, could someone please revie/merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/942383 [14:32:53] * vgutierrez looking [14:35:01] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [14:35:54] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) Just a reminder: for `file-read-backwards` package always build with the `-sa` option. ex. ` GIT_PBUILDER_AUTOCONF=no WIKIMEDIA=yes ARCH=amd64 GBP_PBUILDER_DIST=bookworm DIST=bookw... [14:47:24] 10Traffic, 10SRE, 10Patch-For-Review: Upgrade Traffic hosts to bookworm - https://phabricator.wikimedia.org/T342154 (10Fabfur) [16:02:07] vgutierrez: can you merge as well? I don't have rights to do that myself [16:02:27] sure, sorry :)