[07:23:44] good morning :) [07:27:04] 10Machine-Learning-Team, 10artificial-intelligence, 10SRE, 10Service-deployment-requests: New Service Request 'open_nsfw' - https://phabricator.wikimedia.org/T250110 (10Aklapper) Adding #Machine-Learning-Team per my last question [08:17:38] so I found out the metrics that envoy, on istio sidecars, produces: [08:17:39] envoy_cluster_discovery_wmnet_upstream_cx_length_ms_bucket{cluster_name="outbound|443||api-ro",le="3600000"} 305 [08:17:58] the discover_wmnet bit is probably off, not sure why it is there sigh [08:54:44] I checked the envoy's config and [08:54:45] "cluster": { [08:54:45] "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster", [08:54:45] "name": "outbound|443||api-ro.discovery.wmnet", [08:55:12] so maybe it is enovoy's metrics rewrite logic that turns dots in something different? [08:56:49] https://github.com/envoyproxy/envoy/issues/4357 sigh [09:04:56] morning :) [09:20:35] hello :) [09:23:10] 10Lift-Wing, 10Machine-Learning-Team, 10Epic, 10Research (FY2022-23-Research-July-September): Create a language agnostic model to predict reverts on Wikipedia - https://phabricator.wikimedia.org/T314385 (10achou) [09:49:21] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move revscoring isvcs to async architecture - https://phabricator.wikimedia.org/T313915 (10elukey) I discovered https://github.com/envoyproxy/envoy/issues/4357 today, that causes the following metric-weirdness to be published: ` envoy_... [10:27:01] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Support pre-transformed inputs for Outlink topic model - https://phabricator.wikimedia.org/T315998 (10achou) Got it! ML team will monitor metrics on Grafana: https://grafana.wikimedia.org/d/Rvs1p4K7k/kserve?orgId=1&var-cluster=eqiad%20p... [10:43:48] * elukey lunch [13:54:31] so, back to the istio/envoy sidecar problem - there seems to be no easy solution :D [13:54:53] we could try some hacks but it is definitely adding some tech debt [13:55:17] for example, metrics like envoy_cluster_discovery_wmnet_etc.. could be rewritten removing the discovery_wmnet bit [13:55:38] the label already contains a "good-enough" value, like cluster_name="outbound|443||api-ro [13:55:56] it is the best compromise I can think of [14:02:02] going to ask to observability if the idea is doable/sane [16:04:37] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Move revscoring isvcs to async architecture - https://phabricator.wikimedia.org/T313915 (10elukey) The SRE team uses `^envoy_(http_down|cluster_up)stream_(rq|cx).*$` as regex for the tls-proxies, so I tried to add something similar for... [16:04:47] I added all my discoveries to --^ [16:04:55] I'll restart tomorrow :) [16:05:02] have a nice rest of the day folks! [17:52:31] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks), 10Patch-For-Review: Connect Outlink topic model to eventgate - https://phabricator.wikimedia.org/T315994 (10Isaac) > For Outlink topic model, one question is: do we want to follow the current revision-score schema or create a new schema? @AikoChou thanks f...