[07:37:02] I'm moving the apt servers to new hosts, repository imports will be unavailable for while (but installations from apt.w.o should continue to work fine) [08:15:18] reprepro can be used again (but now on apt1002) [09:53:19] is wikibugs behind of ~1h? I just saw the notification on IRC for the patch I sent 1h ago [09:54:58] T357729, I restarted it and now it's working through the backlog [09:54:58] T357729: wikibugs having a hard time staying connected to libera.chat IRC network - https://phabricator.wikimedia.org/T357729 [09:59:53] ack, thx [14:59:47] Hey all! [15:00:05] ** DC switchover testing is happening now ** [15:00:16] ^ let me know if you see something wrong [15:00:42] effie: dry-run or live-test? [15:00:51] live test [15:01:19] break a leg! [15:01:21] * volans hides [15:03:23] effie: aren't you missing one of the pending cookbooks patches that depends on me maling a spicerack release? or that will be tested later on in isolation? [15:04:40] I think we do [15:06:41] urm [15:06:42] no [15:06:54] sorry! [15:07:02] * volans lost [15:38:30] * bblack too! [15:39:52] what's a "live test" of DC switchover that isn't an actual switchover? [15:41:05] https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Weeks_in_advance_preparation_and_communication :) [15:41:08] bblack: that's something that has been done since many years in the switchdc preparation, is after testing the dry-run of the cookbooks to run them with the --live-test flag [15:49:54] makes sense, the naming layers are just confusing :) [22:28:25] cwhite: I've cc-ed you at https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMaintenance/+/1008839 for input. I think gauges work the same way in prometheus as graphite basically, but curious if I'm missing something. Basically: how reliable is sum(my_metric) as a way to sum up all label combos in a situation where the label combos are periodically emitted but not instantly/exactly at the same time. I.e. want to avoid getting [22:28:25] only the sum() of those emitted around the same time, and especially an incomplete sum() at the end, but yet when removing a label value, for it to stop adding to the total. This is akin to e.g. someting like sum(memory_avail) across a number of different hosts and e.g. a server going away. How does the sum stay stable normallyu, but also go down correclty when it should? [22:28:51] My workaround is to emit a separate total (which works for this case because it's actually one source of truth, unlike with memory_avail) [22:29:01] Curious if there's a better way :) [22:38:27] Krinkle: thanks for the ping and for the detailed explanation. It doesn't look like an easy problem at first glance. I want to think on it.