[06:24:22] hnowlan: interesting, it would match with what I am seeing in the profiling. I think that the amount of cpu used is not dramatic and it shouldn't worry us too much, but I'll try to profile (if possible, not sure if nodejs 10 allows it) the current changeprop to see differences (as Janis suggested). Then we will surely have to take a decison, namely either drilling down the code or accept/test [06:24:28] the new performances. [06:25:01] the other dimension that we should consider is that we are using a more up-to-date version of librdkafka/noderdkafka, f [06:25:15] (in the new image) [06:33:55] 10serviceops, 10Observability-Metrics, 10Patch-For-Review, 10SRE Observability (FY2023/2024-Q2): Identify path forward for k8s deployment of prometheus-statsd-exporter - https://phabricator.wikimedia.org/T343025 (10Joe) 05In progress→03Resolved @colewhite prometheus statsd exporter can be now installed... [09:23:34] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10Patch-For-Review: Some apache access logs are invalid json - https://phabricator.wikimedia.org/T340935 (10CodeReviewBot) oblivian merged https://gitlab.wikimedia.org/repos/sre/glogger/-/merge_requests/2 Fix problems with non-utf8 encoded escape sequences [09:46:15] 10serviceops, 10Patch-For-Review: [Fallback Task] Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) a:05akosiaris→03JMeybohm [10:15:55] 10serviceops, 10Patch-For-Review: [Fallback Task] Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [10:24:10] hnowlan: o/ - something interesting: in https://phabricator.wikimedia.org/T347477#9279129 I noticed a similar cpu load increase for eventgate, and the tools are similar (webapps using node-rdkafka etc..) [10:28:13] elukey: oof, those are pretty spicy jumps. looks like there's a little bit of a drop in throughput as well? hard to tell [10:31:36] yes something weird is going on as well [10:31:51] but we are jumping years of versions in one go :D [10:32:27] <_joe_> so things we could try is installing an older librdkafka version to see if the problem is there? [10:32:38] <_joe_> but it could also be we need to allow more cpu for nodejs itself [10:33:14] <_joe_> I also don't think the model of service-template-node with the controller process etc etc really works well with the model of modern nodejs [10:33:55] I wouldn't be surprised :/ [10:47:19] _joe_ I thought about downgrading librdkafka but its version is tight with the node-rdkafka one, I'd keep what upstream suggests.. [10:59:42] <_joe_> elukey: oh yeah you'd need to downgrade that as well [12:00:21] 10serviceops, 10Thumbor, 10Patch-For-Review: Upgrade Thumbor to bullseye - https://phabricator.wikimedia.org/T336881 (10hnowlan) [12:33:55] 10serviceops, 10Patch-For-Review: [Fallback Task] Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [12:34:14] 10serviceops, 10Patch-For-Review: [Fallback Task] Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [12:59:54] 10serviceops, 10GrowthExperiments-Homepage, 10GrowthExperiments-ImpactModule, 10SRE, and 3 others: RefreshUserImpactJob consumes too many file descriptors - https://phabricator.wikimedia.org/T344428 (10Urbanecm_WMF) Thanks @joe for the FD limits change! All tests I did so far suggest that the errors tracke... [13:06:59] 10serviceops, 10Patch-For-Review: [Fallback Task] Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [13:14:26] 10serviceops, 10Observability-Metrics, 10Patch-For-Review, 10SRE Observability (FY2023/2024-Q2): Identify path forward for k8s deployment of prometheus-statsd-exporter - https://phabricator.wikimedia.org/T343025 (10lmata) Thank you @Joe ! [13:44:39] 10serviceops, 10Dumps-Generation, 10MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10JMeybohm) [13:45:01] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) p:05Triage→03High [14:03:32] 10serviceops, 10MW-on-K8s, 10MediaWiki-Configuration, 10MediaWiki-Engineering, and 4 others: Uncaught ConfigException: Failed to load configuration from etcd - https://phabricator.wikimedia.org/T346971 (10SLopes-WMF) [14:07:13] 10serviceops, 10SRE, 10ops-eqiad: deploy1002 lost connectivity - https://phabricator.wikimedia.org/T349587 (10Jclark-ctr) a:03Jclark-ctr [14:07:30] 10serviceops, 10SRE, 10ops-eqiad: deploy1002 lost connectivity - https://phabricator.wikimedia.org/T349587 (10Jclark-ctr) It is reachable and Taavi took care of switch interface it was missed by Valery i will work with her and remind her of it do you still see any other issue @ayounsi prior to me closing... [14:50:28] 10serviceops, 10Data-Engineering, 10Event-Platform: Upgrade change propagation to nodejs18 - https://phabricator.wikimedia.org/T348950 (10elukey) Profiled changeprop on nodejs 10 in staging as well: https://phabricator.wikimedia.org/P53054 One thing that I noticed is this: node 18: ` ticks total nonl... [15:03:50] 10serviceops, 10Data-Engineering, 10Machine-Learning-Team: URI to use when hitting the Pageviews API on rest-gateway - https://phabricator.wikimedia.org/T349722 (10elukey) [15:18:32] 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 67 - https://phabricator.wikimedia.org/T345561 (10JMeybohm) [15:43:17] 10serviceops, 10Data-Engineering, 10Machine-Learning-Team: URI to use when hitting the Pageviews API on rest-gateway - https://phabricator.wikimedia.org/T349722 (10hnowlan) Documentation fail on my part - this endpoint requires the host header of "wikimedia.org" be set. This is to force clients at the edge t... [15:44:14] 10serviceops, 10Data-Engineering, 10Machine-Learning-Team: URI to use when hitting the Pageviews API on rest-gateway - https://phabricator.wikimedia.org/T349722 (10elukey) 05Open→03Resolved a:03elukey Right this works! ` curl https://rest-gateway.discovery.wmnet:4113/wikimedia.org/v1/metrics/pageviews... [16:16:36] 10serviceops, 10SRE, 10ops-eqiad: deploy1002 lost connectivity - https://phabricator.wikimedia.org/T349587 (10ayounsi) 05Open→03Resolved All good! [16:36:10] 10serviceops, 10Abstract Wikipedia team, 10function-evaluator: Explore providing a writable RAM disk / etc. for the function-evaluator instances in k8s so they can write cache and transient operational material there - https://phabricator.wikimedia.org/T349738 (10Jdforrester-WMF) [17:04:09] 10serviceops, 10CirrusSearch, 10MediaWiki-Configuration, 10MediaWiki-Engineering, 10Discovery-Search (Current work): Provide a method for internal services to run api requests for private wikis - https://phabricator.wikimedia.org/T345185 (10aaron) [17:09:44] 10serviceops, 10CirrusSearch, 10MediaWiki-Configuration, 10MediaWiki-Engineering, 10Discovery-Search (Current work): Provide a method for internal services to run api requests for private wikis - https://phabricator.wikimedia.org/T345185 (10aaron) a:03aaron [22:52:46] 10serviceops, 10Commons, 10Traffic, 10Wikimedia-Site-requests: Enforce upload rate limits for bots on commons - https://phabricator.wikimedia.org/T248177 (10Pppery)