[06:33:13] morning! Have to run errand for a bit, bbiab! [08:05:43] checking latencies again [08:05:44] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=ml-serve-ctrl1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=ml_serve&from=now-2d&to=now [08:06:14] definitely better but it seems slowly getting worse [08:24:16] I am going to remove istio from the cluster to better isolate [08:34:54] done, we'll see how it goes [08:35:04] it may be a sneaky problem to solve sigh [08:36:06] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Inference Clients - https://phabricator.wikimedia.org/T287051 (10kevinbazira) @ACraze, thank you for suggesting that we add `inference-client.sh` scripts to the repo. Instead of creating a `client/` directory, could we instead add `inference-client.sh` to t... [08:37:31] the operations landing to the k8s api seem to be patch/list/etc.., I am wondering if istiod tries to verify something about the webhooks continously leading to a performance degradation on the api servers [08:38:14] the regression in cpu started when we deployed the kubelet though, but istiod started to work right after it (since routing started working between master and workers, so istiod started to contact the k8s api) [09:37:06] from a quick check I see cpu usage still increasing, but to be sure we'll need some extra hours of datapoints [09:37:26] if the trend continues we can rule out istio (good) and concentrate on kubelets [09:37:35] maybe there is a setting that I missed while setting them up [10:34:15] * elukey lunch! [11:40:16] 10Machine-Learning-Team, 10Release Pipeline, 10Wikidata, 10Wikilabels, and 2 others: Stretch in docker registry forces ascii encoding - https://phabricator.wikimedia.org/T210260 (10hashar) [16:41:45] FYI, I'm doing a livestreamed office hours on Twitch in 1 hour and 20 minutes https://www.twitch.tv/WikimediaML [18:02:39] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Production images for ORES/revscoring models - https://phabricator.wikimedia.org/T279004 (10ACraze) It seems I was thinking about this backwards re: monorepo. I believe we should instead be using PipelineLib to define pipelines for each... [19:01:15] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Configure tox tests for inference service pipelines - https://phabricator.wikimedia.org/T287053 (10ACraze) I found a tox plugin that should let us do this in a monorepo: https://pypi.org/project/tox-monorepo/ [19:30:48] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Inference Clients - https://phabricator.wikimedia.org/T287051 (10ACraze) > Instead of creating a client/ directory, could we add inference-client.sh to the same directory as service.yaml and input.json? IMHO this would help make navigating the repo a little...