[10:07:02] <wikibugs>	 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10elukey) The disk-template change didn't really stop the slow-cpu-leak in eqiad, so I assume that it wasn't the culprit. At this point we should go one level deepe...
[10:09:38] <wikibugs>	 10Lift-Wing, 10serviceops, 10Kubernetes, 10Machine-Learning-Team (Active Tasks): Discussion: dedicated directory in the deployment-chart repository for ML services - https://phabricator.wikimedia.org/T286791 (10elukey) It turns out that even for the `admin_ng` dir it is a problem, see for example early att...
[14:33:59] <elukey>	 ok this is weird
[14:33:59] <elukey>	 elukey@ml-serve-ctrl2001:~$ ping ml-etcd2001.codfw.wmnet -6
[14:34:00] <elukey>	 PING ml-etcd2001.codfw.wmnet(ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44)) 56 data bytes
[14:34:03] <elukey>	 64 bytes from ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44): icmp_seq=1 ttl=63 time=0.662 ms
[14:34:06] <elukey>	 64 bytes from ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44): icmp_seq=2 ttl=63 time=0.395 ms
[14:34:09] <elukey>	 but
[14:34:22] <elukey>	 elukey@ml-serve-ctrl2001:~$ ping ml-etcd2001.codfw.wmnet -4
[14:34:22] <elukey>	 PING ml-etcd2001.codfw.wmnet (10.192.16.44) 56(84) bytes of data.
[14:34:22] <elukey>	 64 bytes from ml-etcd2001.codfw.wmnet (10.192.16.44): icmp_seq=1 ttl=63 time=15.1 ms
[14:34:25] <elukey>	 64 bytes from ml-etcd2001.codfw.wmnet (10.192.16.44): icmp_seq=2 ttl=63 time=7.54 ms
[14:34:28] <elukey>	 64 bytes from ml-etcd2001.codfw.wmnet (10.192.16.44): icmp_seq=3 ttl=63 time=5.36 ms
[14:35:03] <elukey>	 I think it is great that ipv6 is so performant but it smells weird :D
[14:35:46] <majavah>	 ipv6 is the future, haven't you heard? :D
[14:37:01] <elukey>	 majavah: ahahah yes 
[14:40:41] <wikibugs>	 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10elukey) ` elukey@ml-serve-ctrl2001:~$ ping ml-etcd2001.codfw.wmnet -6 PING ml-etcd2001.codfw.wmnet(ml-etcd2001.codfw.wmnet (2620:0:860:102:10:192:16:44)) 56 data...
[17:41:40] <chrisalbon>	 Morning all!
[17:43:23] <elukey>	 going afk, ttl!
[18:46:07] <wikibugs>	 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10cmooney) Indeed that is odd.  I see in a traceroute the latency is high even to the first hop (cr2-codfw Juniper router,) when testing with v4: ` cmooney@ml-serve...
[19:12:47] <wikibugs>	 10Machine-Learning-Team: ML Serve controller vms show a slowly increasing resource usage leak over time - https://phabricator.wikimedia.org/T287238 (10cmooney) Sorry to clog this up with junk, but this test is relevant.  When pinging from the Ganeti host to the VM, if you do a TCPdump on the VM, you can see it i...
[19:13:25] <wikibugs>	 10artificial-intelligence, 10revscoring, 10Machine-Learning-Team (Active Tasks): Move CJK segmentation features to a branch and revert revscoring - https://phabricator.wikimedia.org/T287021 (10calbon) Thanks for making a Phab issue, I know it can feel trivial but it keeps things organized on our end. I just...
[19:13:29] <wikibugs>	 10artificial-intelligence, 10revscoring, 10Machine-Learning-Team (Active Tasks): Move CJK segmentation features to a branch and revert revscoring - https://phabricator.wikimedia.org/T287021 (10calbon) 05Open→03Resolved
[19:56:45] <chrisalbon>	 More and more I appreciate wikibugs