[06:38:56] good morning, eventstreams fully on node18 :) [06:40:48] from the codfw metrics (deployed yesterday) I see only a little bump in the node's eventrouter reported latency, I'll follow up with the event platform folks [07:29:26] 10serviceops, 10MW-on-K8s: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284 (10JMeybohm) k8s-controller-sidecars uses exec to send SIGTERM to PID 1 of the containers (it executes "sh -c 'kill -s TERM 1'"). So this ultimately requires all pods to have a Debian bas... [08:19:50] 10serviceops, 10docker-pkg: Attach git info metadata to docker images - https://phabricator.wikimedia.org/T345070 (10JMeybohm) /link T287130 [08:24:55] 10serviceops, 10docker-pkg: Attach git info metadata to docker images - https://phabricator.wikimedia.org/T345070 (10JMeybohm) [08:38:45] 10serviceops, 10Release-Engineering-Team, 10docker-pkg: Attach opencontainers image metadata to docker images - https://phabricator.wikimedia.org/T345070 (10JMeybohm) p:05Triage→03Medium [08:40:43] 10serviceops, 10Release-Engineering-Team, 10docker-pkg: Attach opencontainers image metadata to docker images - https://phabricator.wikimedia.org/T345070 (10JMeybohm) [08:53:25] 10serviceops, 10Content-Transform-Team-WIP, 10RESTBase Sunsetting, 10Wikifeeds, and 2 others: Switchover outgoing wikifeeds parsoid requests - https://phabricator.wikimedia.org/T347027 (10daniel) >>! In T347027#9230743, @Jgiannelos wrote: > I tried the workaround but this gets even more problematic when th... [09:28:54] 10serviceops, 10Abstract Wikipedia team, 10function-evaluator: Split the monolithic function-evaluator service up in production so we have differently-scalable pods for python 3.7 vs. python 3.8 vs. … - https://phabricator.wikimedia.org/T343389 (10JMeybohm) [09:29:12] 10serviceops, 10Abstract Wikipedia team, 10function-evaluator: Split the monolithic function-evaluator service up in production so we have differently-scalable pods for python vs. node - https://phabricator.wikimedia.org/T343388 (10JMeybohm) 05Resolved→03Open Copying stuff from slack so we don't forget:... [11:14:44] 10serviceops, 10Abstract Wikipedia team, 10function-evaluator, 10Patch-For-Review: Split the monolithic function-evaluator service up in production so we have differently-scalable pods for python vs. node - https://phabricator.wikimedia.org/T343388 (10JMeybohm) >>! In T343388#9249170, @JMeybohm wrote: > Co... [11:33:16] 10serviceops, 10Abstract Wikipedia team, 10function-evaluator, 10Patch-For-Review: Split the monolithic function-evaluator service up in production so we have differently-scalable pods for python vs. node - https://phabricator.wikimedia.org/T343388 (10JMeybohm) I've added some patches that should make hand... [12:27:32] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Remove the use of :latest image tags in production - https://phabricator.wikimedia.org/T348856 (10JMeybohm) p:05Triage→03Medium [12:27:58] 10serviceops, 10CX-deployments, 10MinT, 10Prod-Kubernetes, 10Kubernetes: Remove the use of :latest image tags in production - https://phabricator.wikimedia.org/T348856 (10JMeybohm) [14:55:19] elukey: it was you that proposed to test using NVMEs with memcached right? [14:55:50] joe: maybe, a long time ago :) [14:56:11] https://www.usenix.org/system/files/srecon23emea-slides_kopping.pdf grafana did it with great profit! [14:56:36] The speaker offered his help if we get into doing it; I think it's worth it [14:59:29] yeah extstore is very appealing [14:59:34] and neat [15:29:29] 10serviceops, 10Infrastructure-Foundations: Container image reports in debmonitor are broken - https://phabricator.wikimedia.org/T348876 (10JMeybohm) [15:29:50] 10serviceops, 10Infrastructure-Foundations: Container image reports in debmonitor are broken - https://phabricator.wikimedia.org/T348876 (10JMeybohm) p:05Triage→03High [15:43:31] 10serviceops, 10Infrastructure-Foundations: Container image reports in debmonitor are broken - https://phabricator.wikimedia.org/T348876 (10JMeybohm) [17:11:19] ^^^^^^^ wow, that's cool [17:45:02] 10serviceops, 10AQS2.0, 10Cassandra, 10SRE, 10Service-deployment-requests: AQS 2.0 differentially private pageviews deploy API - https://phabricator.wikimedia.org/T343855 (10VirginiaPoundstone) Had a quick chat with @jAllemandou to think through some open questions we need to answer across teams. ====O... [18:22:23] you could try the same nvme we've been using lately on the cache nodes, since it's operationally-familiar and a standard dell offering [18:22:52] in our R450 nodes, we can fit 2x 6.4TB ones in a machine (they each take a PCIe slot) [18:23:27] they also come in a 1.6TB model if that's all that's needed [18:25:46] https://phabricator.wikimedia.org/T341588 was the most-recent US dell quote that had them [18:29:19] (there are of course smaller and simpler nvmes too, including plain sata-plugged drives. these are very fast+large models for mixed-use workload) [18:32:12] under the hood of dell's ordering system, what the drives really are is these samsungs: https://semiconductor.samsung.com/us/ssd/enterprise-ssd/pm1733-pm1735/