[07:28:47] hey folks [07:29:06] I worked with eventplatform folks to upgrade eventstreams to nodejs18 and bookworm (!!) [07:29:35] https://stream-beta.wmflabs.org/v2/ui/#/ has been running for a while with it [07:31:01] so I'd deploy es-internal and monitor it [07:38:24] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/964848/ [07:49:05] elukey: uh do we have a base node18 image? [07:49:07] TIL [07:49:21] if you did it, wikilove :P [07:49:29] I did it yes :) [07:50:02] I am also talking with event platform to onboard changeprop for base maintenance [07:50:20] now that they have a dedicated team it may be a good fit [07:51:08] maybe in the near future we'll run only node18 in production :D [07:51:23] (they are also working to move eventgate to node18) [08:07:50] yes [08:07:56] that's amazing [10:28:39] \o We (ml-team) are working on running recommendation-api on LiftWing. One of the services it uses is mw-api-int-async-ro (from hieradata/common/profile/services_proxy/envoy.yaml). Before we start sending traffic, I wanted to check with you if that is ok. We expect it to be low-traffic, nothing special. [10:35:26] 10serviceops, 10Prod-Kubernetes: Rethink kubernetes etcd storage - https://phabricator.wikimedia.org/T348466 (10JMeybohm) [10:40:06] klausman: probably depends on the definition of "low-traffic" :) but overall I don't see an issue [10:40:21] recommendation-api-ng that is, right? [10:40:25] yes [10:41:36] so aound the same api request volume as the recommendation-api-lg? [10:44:05] Yes, though initially lower, of course. [10:51:11] it's linkrecommendation? [10:51:21] (the og service) [10:52:23] 10serviceops, 10PageViewInfo: Daily pageview/PageViewInfo errors on jobrunners - https://phabricator.wikimedia.org/T348517 (10hnowlan) [10:54:01] According to my metrics linkrecommendation does about 0 rps to mw-api-int-async so if you do less than that it'll obviously be fine x) [10:54:09] i think linkrecommendation is something different :D [10:54:20] Ah no, recommendation-api it is [10:54:31] Well that's 5rps peak [10:54:36] We'll be ok :') [11:05:14] 10serviceops, 10PageViewInfo: Daily pageview/PageViewInfo errors on jobrunners - https://phabricator.wikimedia.org/T348517 (10Clement_Goubert) According to logstash, this started July 12th https://logstash.wikimedia.org/goto/026a95fbb8c9ae9e313b41440745d607 {F38186503} [11:32:55] 10serviceops, 10Infrastructure-Foundations, 10SRE: etcd increased QGET traffic since January 2023 - https://phabricator.wikimedia.org/T348525 (10Volans) p:05Triage→03Medium [14:37:28] 10serviceops, 10Content-Transform-Team-WIP, 10Maintenance-Worktype, 10Wikimedia-Incident: Maps Unavailability due to thanos-swift cfssl rollout (14 Aug 2023) - https://phabricator.wikimedia.org/T344324 (10Jgiannelos) a:03Jgiannelos [14:51:51] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Create kube-state-metrics docker image - https://phabricator.wikimedia.org/T343801 (10kamila) 05Open→03In progress [14:51:53] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10User-jijiki: Deploy kube-state-metrics - https://phabricator.wikimedia.org/T264625 (10kamila) [14:57:40] 10serviceops, 10Content-Transform-Team-WIP, 10Maintenance-Worktype, 10Wikimedia-Incident: Maps Unavailability due to thanos-swift cfssl rollout (14 Aug 2023) - https://phabricator.wikimedia.org/T344324 (10Jgiannelos) Major upgrades of tegola + dependencies + base debian image have been live for quite some... [22:23:19] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Move 25% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T348122 (10matmarex) The Kubernetes work so far has caused problems with cross-wiki Echo notifications (see T223413, T342201). Please help resolve this before...