[07:29:23] pfischer: I think that T386406 and T386056 can be closed. Could you confirm? [07:29:24] T386406: Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406 [07:29:24] T386056: Resolve conflict between GitLab CI automated package deployment token variable names - https://phabricator.wikimedia.org/T386056 [07:31:50] gehel: looking, I’ll update my standup notes too [07:38:58] gehel: T386056 is concerned with workflow_utils, a repo maintained by DPE, that provides GitLab CI snippets for building pipelines. It is related to my Maven CI Components but covers more than tat (and other build systems/processes as well). I had a discussion with ottomata that we should consolidate those repositories at some point, but for now, this ticket is not solved by my efforts (they still use a dedicated variable [07:38:58] holding the access token instead of the group-provided one). [07:38:58] T386056: Resolve conflict between GitLab CI automated package deployment token variable names - https://phabricator.wikimedia.org/T386056 [07:43:59] gehel: regarding T386406, yes I’d say it can be closed, I left a final comment regarding a maven-test-project [07:44:00] T386406: Create Gitlab CI templates for JVM packages - https://phabricator.wikimedia.org/T386406 [13:02:35] \o [14:38:36] .o/ [14:39:38] going to kick off full-cluster reindexes of cirrussearch [14:47:47] ebernhardson SGTM [14:49:02] err, sigh...had to exclude commonswiki and testcommonswiki. They throw an exception when fetching the expected indices because the `dnsdisc` cluster isn't setup for that [14:49:20] (it's trying to look up shard counts for the _file index, but we didn't configure those for dnsdisc) [14:56:04] i'm actually not sure what the right way to solve that is...we need another state or flag for connections. The thing is we need a list of writable clusters to limit the query to, but in cirrus writable cluster means it would actually send writes [14:56:18] so we need a writable cluster tag, but without sending writes....needs some naming bike shedding :P [14:57:18] Serious (but also possibly dumb) question: should we consider rolling back the changes and going back to the bad old way of doing things? [14:57:28] inflatador: which changes? [14:57:54] ebernhardson rip out all the dns discovery stuff and just use mwconfig [14:58:13] inflatador: nah, this is totally fine. The only hard part about fixing this bit is an appropriate name that doesn't mislead people in the future [14:58:29] if i called it `frobnating` or some other arbitrary word i could have it done in an hour or two [15:00:54] * inflatador needs to think about the DC-specific metrics collection stuff more [15:06:09] ahh, right the metrics are another annoyance there [15:06:34] i suspect the answer that other people are using is envoy on the servers terminating the requests and providing metrics [15:06:55] but that also sounds like a couple weeks of work :S [15:08:47] maybe for the cirrus side i add a silly `pseudo => true` flag to the dnsdisc connection [15:10:07] ebernhardson ACK, I'll ask around next week. We should be using envoy for TLS termination anyway, so if this is what pushes us there I'm fine w/that [15:10:39] i'll try and check see if i can find anyone else doing that, basically find some proof we would get the metrics we want if we switched [15:12:06] Cool. I think we are the last ones using nginx for TLS termination anyway, so we'll need to pay off that tech debt at some point [16:16:55] \o [16:24:07] really not sure...for example we have envoy on wdqs hosts, but not finding the metrics i would want [16:24:47] i think it does, probably, just not finding it yet :P [16:27:36] yeah, the answer is probably in some existing dashboard but I haven't found it yet ;P [16:29:18] oh duh...i'm going about this completely backwards. Instead of poking around in grafana, should be asking the prom exporter on a wdqs host what metrics it exposes [16:30:27] I know you've looked at the trafficserver metrics before, but any chance something like the query in this panel would work? https://grafana.wikimedia.org/goto/6hNXXuENR?orgId=1 [16:31:39] inflatador: i doubt it, the requests wont flow through trafficserver iiuc [16:32:04] the envoy prom exporter on wdqs exports stats about it talking to the local instance, i guess thats close enough [16:32:06] ex: envoy_cluster_external_upstream_rq_time_bucket{envoy_cluster_name="local_port_80",le="5"} [16:32:37] it would be nice if it had a better name than "local_port_80", but i suppose for search our port numbers will be unique enough [16:32:59] NICE [16:33:36] I'll get a ticket started for the envoy TLS stuff [16:35:24] i suppose one thing we would miss with those stats would be the latency bump of going cross-dc, since the metrics would be the metrics between envoy and opensearch over localhost, but thats probably a wash [16:35:28] better sometimes, worse others [16:37:45] Yeah, I bet there is a way to get that info too, although I'm already out of my depth here ;). I can ask ServiceOps if they have any ideas [16:38:46] we would still have the latency bump in the top level dnsdisc stats. I guess will have to ponder, might want both the dnsdisc stats from cirrus side, and the per-cluster stats from envoy on the servers to know per-cluster health [16:45:19] I created T398070 for switching to envoy TLS termination, feel free to add/change anything I missed [16:45:20] T398070: Migrate CirrusSearch TLS termination from nginx to envoy - https://phabricator.wikimedia.org/T398070 [16:47:16] looks good, thanks! [17:37:44] `cirrus-streaming-updater job in eqiad (k8s) is running without any taskmanagers `...checking on this now [17:39:57] ebernhardson ^^ I think this is because the `flink-app-consumer-search-backfill` release doesn't have a task manager associated. Is this OK or should I take a closer look? [17:48:28] inflatador: thats all fine, it should probably ignore the backfill releases [17:49:51] ebernhardson NP, I'll make a note (and another one to enable that kind of monitoring for rdf-streaming-updater if we don't have it already) [19:02:08] Lunch [19:02:22] Planning on taking a look at the elasticsearch spicerack dependency stuff when I’m back [19:04:31] ryankemper ACK, thanks [21:04:27] back