[07:31:14] wdqs2022 seems a lot slower than other nodes, took 4hours more to catchup [07:33:25] also not sure to understand why we named it 2022 (jumping from wdqs2012 to wdqs2022) [10:44:50] resumed the wdqs updater job from k8s@codfw, it has completed few checkpoints so far [10:58:11] lunch [12:53:10] o/ [13:02:08] dcausse interesting that wdqs2022 took so long. It is our only bullseye host [13:03:05] oh indeed I forgot that we're moving to bullseye [13:04:53] in https://grafana-rw.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?orgId=1&var-site=codfw&var-k8sds=codfw%20prometheus%2Fk8s&var-opsds=codfw%20prometheus%2Fops&var-cluster_name=wdqs&from=1683651995441&to=1683675992002&viewPanel=6 you see that wdqs2022 ingest fewer triples/sec than others wdqs2* hosts [13:05:15] unless the host was busy on something else not sure how to explain the diff [13:13:54] We may want to image another bullseye server and see what happens [13:15:38] re: 2022 vs 2012, we have 2013-2021 as well, just picked 2022 for bullseye as it's the last host we have [13:21:38] inflatador: thanks for info! [14:52:51] o/ I do have limited connectivity (traveling), don’t know how well that will work throughout the Wednesday meeting [16:05:09] workout, back in ~40 [16:51:45] back [17:12:33] dinner [17:46:32] lunch/errands, back in 45m-1h [18:46:51] back [19:47:29] just repooled wdqs in codfw...sorry for not doing this earlier [20:07:10] reimaging wdqs2021 to bullseye, we need another host to determine whether or not bullseye has anything to do with wdqs2022's slowness , see ^^ grafana [20:24:55] The newer hosts' CPUs have lower clock speed and BogoMIPs. Not sure if that counts for much, but could possible explain it: https://phabricator.wikimedia.org/P48178 [20:25:44] I think we should try running the WDQS stack on the new chassis WITHOUT reimaging to bullseye, maybe that will clear up hardware vs OS [20:54:36] ^ sounds like a good idea [21:12:22] I also noticed that our CPU frequency governors are set to 'powersave' which I think is sub-optimal. SREs gave us some links for reference https://phabricator.wikimedia.org/T315398 https://phabricator.wikimedia.org/T328957 https://phabricator.wikimedia.org/T225713 [21:12:30] FWiW I don't think this is the key to our problem, the old chassis are using 'powersave' as well [21:42:00] OK, ticket created https://phabricator.wikimedia.org/T336443 [21:56:07] inflatador: i would guess something with the disks, during that bakfill most servers are reporting ~10% disk utilization spread between multiple disks, 2022 is showing 25-50% [21:56:28] it's choking on IO [22:22:48] not clear what though, it's doing some reads other instances aren't which is curious, but they aren't big enough (~10MB/s) to be too relevant. otherwise iops and throughput are similar, wdqs2022 is just working harder to do the same IO. curious...