[06:12:04] <_joe_> bd808: yes, but tbh that job runs in 2 minutes on my desktop, which is now 5.5 years old [06:12:50] <_joe_> so I was trying to avoid going through a costly process of creating heuristics, which usually comes with bugs [06:13:07] <_joe_> to make you an example: charts can depend on each other [06:13:33] <_joe_> and we already have some henious hacks to be able to determine that [06:17:20] 10serviceops, 10SRE, 10Patch-For-Review: Move Kafka main to the new intermediate PKI CA - https://phabricator.wikimedia.org/T319372 (10elukey) All brokers have the new truststore, so they can validate certs emitted by PKI. Next steps: 1) Upgrade kafka-main1001 to PKI, and monitor if any client fails to conn... [06:58:14] 10serviceops, 10Patch-For-Review: Install wmf-certificates on the envoy docker image - https://phabricator.wikimedia.org/T333551 (10JMeybohm) 05Open→03Resolved I've bumped the default envoy version to 1.18.3-2 for all clusters, you should be good to go. [13:05:36] 10serviceops, 10RESTbase Sunsetting, 10Epic, 10Platform Engineering Roadmap: Survey RESTBase services and find which ones accesses Parsoid via RESTBase - https://phabricator.wikimedia.org/T333536 (10DAlangi_WMF) [13:30:53] 10serviceops, 10Machine-Learning-Team, 10SRE, 10Language-Team (Language-2023-April-June ), 10Service-deployment-requests: New Service Deployment Request: NNLB-200 for machine translation - https://phabricator.wikimedia.org/T329971 (10Pginer-WMF) [14:41:08] 10serviceops, 10Data-Engineering, 10Data Pipelines (Sprint 11), 10Epic, 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JArguello-WMF) [14:41:11] 10serviceops, 10Data-Engineering-Planning, 10Data Pipelines (Sprint 11), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10JArguello-WMF) [14:42:52] 10serviceops, 10Data-Engineering, 10Epic, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review: New Service Request: flink-kubernetes-operator - https://phabricator.wikimedia.org/T333464 (10JArguello-WMF) [14:43:09] 10serviceops, 10Data-Engineering-Planning, 10Event-Platform Value Stream (Sprint 11), 10Patch-For-Review, 10Service-deployment-requests: New Service Request mediawiki-page-content-change-enrichment - https://phabricator.wikimedia.org/T330507 (10JArguello-WMF) [15:06:43] anyone got a hint for me as to why I can't curl the prom port 9999 in dse-k8s-eqiad flink-operator namespace? [15:07:06] PodSelector: app.kubernetes.io/name=flink-kubernetes-operator [15:07:06] Allowing ingress traffic: [15:07:07] To Port: 9999/TCP [15:07:07] From: (traffic not restricted by source) [15:11:07] hm maybe its just my app not listening...it should be...but maybe that is it. [15:17:05] i'm also trying to figure out why flink-operator namespace doesn't show up in the list of k8s namespaces in the Kubernetes Pods dashboard? https://grafana-rw.wikimedia.org/d/000000473/kubernetes-pods?orgId=1&var-cluster=eqiad+prometheus%2Fk8s-dse&from=1680270857261&to=1680274457261 [15:17:05] 10:54:56 [15:17:05] it is def a namespace in dse-k8s-eqiad: [15:17:05] 10:55:33 [15:17:05] https://www.irccloud.com/pastebin/s33GTltO/ [15:25:16] ottomata: o/ have you tried with nsenter to see if that port returns metrics? [15:25:45] also, the pods need to have the right annotations to be discoverable [15:26:31] ok you have those [15:26:31] prometheus.io/port: 9999 [15:26:31] prometheus.io/scrape: true [15:28:17] afaics on the pod this is the only port listening [15:28:18] tcp6 0 0 :::8085 :::* LISTEN 3548838/java [15:28:21] ottomata: --^ [16:26:13] 10serviceops, 10Thumbor, 10Kubernetes: Investigate whether configuring hardware P-states would help with performance on k8s - https://phabricator.wikimedia.org/T333317 (10kamila) **TL;DR: "It's probably fine."** We currently have the following processors on our k8s nodes: ` kubernetes[2018-2024].codfw.wmnet... [16:31:57] elukey: thanks okay. 8085?? [16:32:22] elukey: howw did you get that? [16:33:41] elukey: i think i'm trying to solve two problems. 1: where are my app promethus metrics (from 9999). and 2. where are the usual k8s metrics for this pod? [19:44:36] _joe_: I'm pretty sure our 5+ year old laptops have more IOPS and CPU than any CI worker, but I understand why getting tricky with the dependency tree to speed things up is a fragile solution. [19:46:03] <_joe_> bd808: I have done it for the puppet repo and I'm resisting the urge to make the same mistake again, that's all :P [19:48:40] Part of my frustration is that we have to wait for the same slow tests twice, once for the v+2 to unlock cr+2 and then the exact same wait again before the merge happens. This frustrates me in every repo, but especially in this one.