[15:58:09] 10Lift-Wing: Bootstrap the ml-serve-codfw cluster - https://phabricator.wikimedia.org/T294412 (10elukey) Did a quick os update + upgrade + dist-upgrade on codfw nodes (it wasn't really needed) and rebooted all nodes to get the latest kernel running. The partitions and lvm volumes look ok :) [16:28:22] I am doing the first tests with siege, to get a sense of how a single pod handles traffic and if our metrics are good enough etc.. [16:29:43] o/ [16:29:52] elukey: that sounds cool! [16:30:09] o/ [16:30:18] maybe next week we can try doing changeprop stream too [16:36:11] it may be a little early for it, I'd do some load tests before.. plus we'd need moar models deployed :) [16:36:48] good point :) [16:43:58] with one pod and 25 concurrent users, latencies goes up to seconds but it seems to work without failures [16:44:09] (this is a quick test but it is promising) [16:44:58] all metrics in https://grafana-rw.wikimedia.org/d/Rvs1p4K7k/kserve?orgId=1&from=now-3h&to=now [16:45:42] niiiice [16:46:05] I still need to figure out when/if the knative activator works, it load balances traffic (in theory only up to a certain point) [16:46:15] see metrics in https://grafana-rw.wikimedia.org/d/c6GYmqdnz/knative-serving?orgId=1&from=now-3h&to=now [17:37:59] going afk for the weekend! [17:38:11] have a good rest of the day and weekend folks :) [17:38:11] have a good one elukey! [17:38:12] o/