[00:11:55] just refreshed my memory of how the report consumed by the `PyBal backends health check` is formed, and tl;dr - it's expected that the check would not fail in this case :) [00:11:56] longer version: [00:11:56] * my (incorrect) recollection was that a persistently down-but-enabled host (note, "enabled" in pybal means "pooled" in the confctl sense) was sufficient to trigger this [00:11:56] * however, the report is looking at down-but-pooled (note, "pooled" in pybal means active for load balancing), which this was clearly not (e.g., the failing host was internally depooled, as the service remained above the depool threshold) [08:37:18] if anyone in SRE has ever used dstat on our servers: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1145100 [08:57:45] dstat, wow, that brings back memories [09:11:51] depending how fond those memories are, it seems the upstream maintenance is also up for grabs :-) [11:18:13] <_joe_> I haven't used dstat in 8 years I think [12:06:29] https://www.youtube.com/watch?v=EJizB6hFON8 Interesting talk on how Booking moved from Blade servers to VM-alike partitioning of hosts [12:10:24] * Emperor has "fond" memories of blade servers from the Sanger [12:11:52] klausman: I worked at booking.com and only handling the purchases of them was a nightmare [12:18:28] I can imagine (but sdon't wanna :)) [12:19:22] "Only five blade servers left at this price on our site" [12:21:30] lol [12:21:55] marostegui: did you know Andrew Mobbs by any chance? He was a DBA for booking.com for a bit [12:22:26] Emperor: Nope, he wasn't in the team when I joined/left, do you know which years he was there? [12:22:52] not beyond "donkeys years ago", I'm afraid. [12:22:57] haha [12:44:28] bblack, marostegui, arnoldokoth, volans, fyi, I'm in the process of upgrading cr3-eqsin, eqsin is depooled [12:44:58] XioNoX: ack, thanks for the heads up, anything to expect? need a hand for silencing alerts? [12:45:53] volans: I downtimed cr3-eqsin, there will be some alerts for bgp sessions going down for example, so maybe a bit of noise but nothing more [12:46:06] ack [12:49:01] ack :) [13:13:51] interesting, thanks for sharing klausman [13:19:36] XioNoX: Ack. [13:33:17] XioNoX: let us know once completed, so far so good :) [13:33:26] volans: completed :) [13:33:32] nice, thx [13:36:06] I'll wait a bit before repooling just in case, but it can be done anytime now both routers are 100% operational [13:41:23] ack [13:47:20] volans: and oncall, I'm repooling eqsin [13:47:28] great, thx [13:55:04] inflatador: when you get a moment, an easy one for your eyes https://gerrit.wikimedia.org/r/c/operations/puppet/+/1145138 [13:59:05] godog 👀 [14:01:48] +1'd, thanks for spotting this [14:02:13] general heads-up: we're just finalising the migration of mobileapps/pcs APIs out of restbase at the moment - we've been doing it in chunks and the increase in load hasn't been anything alarming so far but this will be a bit of a jump [14:03:19] inflatador: sure no worries, cheers for taking a look [15:55:59] I still use dstat 😢 [16:40:14] brett: is it alright if I pick up your admin patch? [16:40:34] yep! [16:41:04] brett: ah, it seems you have the lock :) [16:41:18] you're good to go on mine [16:41:36] oh, derp, sorry! [16:41:40] doing now [16:41:49] thanks!