[02:03:29] * bd808 off [09:06:09] morning [09:06:27] o/ [09:40:49] o/ [10:45:56] there has been a speed up in the response time delay for nova_api starting earlier today, has anyone changed anything? [10:46:37] not me! [10:50:04] https://usercontent.irccloud-cdn.com/file/WckBcq1g/image.png [10:51:16] definitely an increase [11:23:49] * arturo sorry for not being more helpful, I'm multitasking on other stuff at the moment. [11:29:13] it's ok, I'm not focused myself either [11:30:34] there's a lot of errors trying to connect to mysql on openstack.codfw1dev.wikimediacloud.org for backups [11:38:53] dcaro: on cloudbackup100[12]-dev? or somewhere else? [11:39:03] cloudcontrol1005 [11:39:08] let me double check [11:39:18] why is cloudcontrol1005 trying to connect to the codfw1dev database? [11:39:28] oh yes cloudbackup1001-dev [11:39:40] I was wondering that myself [11:40:12] i was wondering those hosts earlier myself today, as according to T344065 those should not exist [11:40:13] the openstack logstash dashboard also shows the 100* backup nodes (though technically they are from the codfw setup), note taken [11:40:13] T344065: Replace cinder-backup process with backy2 - https://phabricator.wikimedia.org/T344065 [11:40:42] and we indeed don't have any hosts running cinder-backups for the eqiad1 deployment, which matches that task [11:41:13] so I wonder if those two VMs (in eqiad, but for the codfw1dev deployment) were just forgotten, or what's going on with them [11:41:37] I think they might be forgotten yep [11:46:19] ok, filed T358855 so we don't forget. I'm happy to decom those once andrewbogott confirms they should not exist [11:46:20] T358855: Maybe decom cloudbackup100[12]-dev - https://phabricator.wikimedia.org/T358855 [11:46:30] where did you see the mysql errors? [11:46:42] ah cloudcontrol1005 sorry [11:47:20] no, the mysql errors are on cloudbackup100[12]-dev [11:47:47] if there are also mysql errors on cloudcontrol1005, that is both news to me and also much more worrying than the mysql errors on cloudbackup100[12]-dev [11:48:19] only on the backup nodes yes [11:48:52] however, if I understand things correctly, the nova api on cloudcontrol1005 is currently slow to respond? [11:50:09] yep, it started to get slow ~5am UTC today [11:50:18] that's why I'm looking at logs around that time [11:50:51] there are some sqlalchemy errors in cloudcontrol1005 but they're related to keystone and not too frequent [11:51:39] dcaro: were you looking at logs across all instances? I'm curious to understand how you spotted the backup errors [11:52:03] yep, cluster-wide [11:52:16] (eqiad cluster, though it's eqiad site, not cluster) [11:52:23] in logstash [11:53:29] yes [11:53:34] sorry for the lack of context [11:55:00] np, thanks for clarifying :) [13:59:02] * dcaro off for a bit [16:06:55] dcaro: when you are around, could you please help me sort this python type checking problem? https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1006529 [16:11:20] on my way back from the airport [16:12:59] no rush! [16:15:36] arturo: so the the thing it's complaining about is that the output type of `KubernetesController.get_object` depends on the value of the `missing_ok` parameter, so the use of a parameter as `missing_ok` in `is_pod_running` (line 395) makes it confused [16:16:43] yes, ok [16:17:12] might be missing an overload on one of the get_object defs [16:22:24] https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1007942 seems to fix it [16:25:30] thanks, yes, I was writing the exact same here [16:25:37] let's merge yours [16:30:09] also left a comment on your patch, once that's fixed I'm happy to +1 [16:32:07] the blank color and "false" username thing in etherpad is apparently https://github.com/ether/etherpad-lite/issues/5401. Somebody in -sre said they will look into patching. [16:39:01] bd808: I'm glad things are a bit better wrt. etherpad maintenance nowadays [16:40:13] * taavi off [17:13:08] * arturo off [18:36:32] * bd808 lunch