[10:07:02] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10elukey) The disk-template change didn't really stop the slow-cpu-leak in eqiad, so I assume that it wasn't the culprit. At this point we should go one level deepe... [10:09:38] 10Lift-Wing, 10serviceops, 10Kubernetes, 10Machine-Learning-Team (Active Tasks): Discussion: dedicated directory in the deployment-chart repository for ML services - https://phabricator.wikimedia.org/T286791 (10elukey) It turns out that even for the `admin_ng` dir it is a problem, see for example early att... [14:33:59] ok this is weird [14:33:59] elukey@ml-serve-ctrl2001:~$ ping ml-etcd2001.codfw.wmnet -6 [14:34:00] PING ml-etcd2001.codfw.wmnet(ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44)) 56 data bytes [14:34:03] 64 bytes from ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44): icmp_seq=1 ttl=63 time=0.662 ms [14:34:06] 64 bytes from ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44): icmp_seq=2 ttl=63 time=0.395 ms [14:34:09] but [14:34:22] elukey@ml-serve-ctrl2001:~$ ping ml-etcd2001.codfw.wmnet -4 [14:34:22] PING ml-etcd2001.codfw.wmnet (10.192.16.44) 56(84) bytes of data. [14:34:22] 64 bytes from ml-etcd2001.codfw.wmnet (10.192.16.44): icmp_seq=1 ttl=63 time=15.1 ms [14:34:25] 64 bytes from ml-etcd2001.codfw.wmnet (10.192.16.44): icmp_seq=2 ttl=63 time=7.54 ms [14:34:28] 64 bytes from ml-etcd2001.codfw.wmnet (10.192.16.44): icmp_seq=3 ttl=63 time=5.36 ms [14:35:03] I think it is great that ipv6 is so performant but it smells weird :D [14:35:46] ipv6 is the future, haven't you heard? :D [14:37:01] majavah: ahahah yes [14:40:41] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10elukey) ` elukey@ml-serve-ctrl2001:~$ ping ml-etcd2001.codfw.wmnet -6 PING ml-etcd2001.codfw.wmnet(ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44)) 56 data... [17:41:40] Morning all! [17:43:23] going afk, ttl! [18:46:07] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10cmooney) Indeed that is odd. I see in a traceroute the latency is high even to the first hop (cr2-codfw Juniper router,) when testing with v4: ` cmooney@ml-serve... [19:12:47] 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10cmooney) Sorry to clog this up with junk, but this test is relevant. When pinging from the Ganeti host to the VM, if you do a TCPdump on the VM, you can see it i... [19:13:25] 10artificial-intelligence, 10revscoring, 10Machine-Learning-Team (Active Tasks): Move CJK segmentation features to a branch and revert revscoring - https://phabricator.wikimedia.org/T287021 (10calbon) Thanks for making a Phab issue, I know it can feel trivial but it keeps things organized on our end. I just... [19:13:29] 10artificial-intelligence, 10revscoring, 10Machine-Learning-Team (Active Tasks): Move CJK segmentation features to a branch and revert revscoring - https://phabricator.wikimedia.org/T287021 (10calbon) 05Open→03Resolved [19:56:45] More and more I appreciate wikibugs