[00:08:06] * bd808 off [07:41:43] morning [07:42:11] o/ [08:14:02] morning! [08:18:55] what's up with this alert? [08:18:57] HAProxy service nova-api_backend backend cloudcontrol1005.private.eqiad.wikimedia.cloud is down [08:19:16] is this part of any work by you folks? [08:20:04] I noticed yesterday we already had alerts about openstack API being slow [08:20:14] not sure if it's related [08:20:29] not me [08:22:39] maybe we can try restarting the control plan as described in https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Runbooks/OpenstackAPIResponse [08:24:22] let me poke by hand first [08:25:33] also, I think the alert shows up as duplicated in alertmanager [08:25:36] https://usercontent.irccloud-cdn.com/file/GgQdsDv8/image.png [08:25:42] maybe because the different @receiver [08:26:08] Fri Apr 19 08:26:01 2024 - *** uWSGI listen queue of socket ":18774" (fd: 3) full !!! (101/100) *** [08:29:48] dhinus: I just restarted the service by hand, and is back into working normally [08:30:03] will open a phab ticket, to see if we can tune this listen queue somehow [08:30:17] thanks [08:30:30] the alert duplication is an old issue, it happens with other alerts as well [08:31:00] where did you find the uWSGI error? [08:31:13] in /var/log/nova-api.log [08:31:23] sorry /var/log/nova/nova-api.log [08:31:34] it is weird that journalctl doesn't show anything BTW [08:33:11] re: alert duplication T353457 [08:33:12] T353457: Karma UI shows duplicate alerts - https://phabricator.wikimedia.org/T353457 [08:33:42] T362956 [08:33:43] T362956: nova-api can get the listen queue of socket full - https://phabricator.wikimedia.org/T362956 [11:06:02] my openvswitch test vm just got an ip assigned via DHCP!!!! https://phabricator.wikimedia.org/P61012 [11:40:17] taavi: that's a really nice progress [11:47:59] is the hypervisor AND the cloudnet both running OVS? [11:48:18] do they communicate via vxlan? [12:15:59] arturo: yes, both are based on OVS and they communicate on VXLAN over cloud-private :-P [12:17:18] neat [12:21:21] what do you think? https://gerrit.wikimedia.org/r/c/operations/alerts/+/1021909 [14:46:00] even more ovs good news: it's totally possible to have a single hypervisor with a single OVS agent running that can talk to both the existing vlan-based provider network and a vxlan based network: https://phabricator.wikimedia.org/P61024 [14:51:45] taavi: that should help make the transition from one model to the other very smooth [15:14:29] do buildpack-based jobs also mount /var/lib/sss/pipes/ ? [15:14:35] and nsswitch? [15:14:47] * arturo goes to check himself [15:19:01] well, the code is the same, so I guess they do [15:19:31] the wont have the libnss-sss library though [15:19:58] so I guess the actual question I have is: what is the uid/gid buildpack-based jobs should be running as? [15:20:17] taavi: ^^^ ? [15:20:51] the ones that have NFS mounted must run as the tool user/group. not sure about others [15:22:24] ok [15:48:46] * arturo offline [18:55:55] * bd808 lunch [23:50:27] * bd808 off