[08:07:16] hi folks! [08:07:19] morning :) [08:07:41] going to deploy changeprop and changeprop job queue eqiad (low volume instances) to pick up the CPU limits increase [08:07:53] (follow up after the last deployments in codfw) [08:10:07] aaand done [08:16:54] the last step is deploy to Beta [08:22:02] <_joe_> lol :) [08:28:07] I know it is a little backward but I didn't know we had it :D [08:45:44] <_joe_> eheh [08:59:43] filed also https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/975738 [09:17:44] ok tried to run the container on the beta instance and it fails :D [09:20:07] 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10Developer Productivity, 10MediaWiki-Platform-Team (Radar): php-fpm logs from Kubernetes lack 'message' and 'normalized_message' - https://phabricator.wikimedia.org/T350430 (10Clement_Goubert) 05Open→03Resolved [10:15:52] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10JMeybohm) [11:25:13] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Introduce PCS cache management layer - https://phabricator.wikimedia.org/T348995 (10Jgiannelos) @Eevans i cant see the file you attached in your comment. [14:09:33] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Migrate mobileapps to k8s - https://phabricator.wikimedia.org/T350846 (10Joe) As you might have noticed by the patches here, we've pivoted as traffic splitting to the canaries via kube-proxy converges over hours, not seconds which is what we'll nee... [14:21:02] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Introduce PCS cache management layer - https://phabricator.wikimedia.org/T348995 (10Eevans) >>! In T348995#9344181, @Jgiannelos wrote: > @Eevans i cant see the file you attached in your comment. That's strange, it'... [14:22:45] 10serviceops, 10Content-Transform-Team-WIP, 10Page Content Service, 10RESTBase Sunsetting: Introduce PCS cache management layer - https://phabricator.wikimedia.org/T348995 (10Jgiannelos) Works thanks! The previous file showed up as "restricted file". [15:12:49] 10serviceops, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10Gehel) [16:17:43] 10serviceops, 10MediaWiki-Engineering: Fold services recommendations into Standards for services RfC - https://phabricator.wikimedia.org/T239856 (10Krinkle) I was asked to review this task, and in particular to review Eric's proposal at 10serviceops, 10Content-Transform-Team, 10Maintenance-Worktype, 10Wikimedia-Incident: Maps Unavailability due to thanos-swift cfssl rollout (14 Aug 2023) - https://phabricator.wikimedia.org/T344324 (10MSantos) [16:26:07] 10serviceops, 10Data-Platform-SRE, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10Gehel) [16:28:02] 10serviceops, 10Data-Platform-SRE, 10Discovery-Search (Current work): Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis - https://phabricator.wikimedia.org/T351503 (10Gehel) [16:45:15] need to make new instances [16:48:35] ottomata: o/ [16:48:37] re: https://gerrit.wikimedia.org/r/c/mediawiki/services/change-propagation/+/974267 [16:48:44] ya [16:49:19] I tried to upgrade beta today but failed, if you want to try it should be sufficient to upgrade the docker image version in horizon hiera (the cp vm specific puppet config) [16:49:29] i tried that [16:49:34] ah nice [16:49:36] getting Nov 20 16:49:32 deployment-docker-cpjobqueue01 docker-cpjobqueue[21261]: node[1]: ../src/node_platform.cc:68:std::unique_ptr node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed. [16:50:02] do I need to maybe run a bookworm host instance...? hmm, maybe not? [16:52:57] we run on bullseye for wikikube, not sure if it is maybe the docker version or something else config-related [16:53:01] didn't dig into it [16:53:45] bullsye is host on wikikube? hm, okay, well i'll try bookworm first cuz I already launched an instance... [16:59:08] yes the k8s workers are on bullseye [16:59:48] and they run docker.io version 20.10.5+dfsg1-1+deb11u2, meanwhile we have 18.x on buster (the current cp vm) [17:00:43] (going afk, lemme know how it goes! I can keep working on it tomorrow in case) [17:07:10] elukey: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/975862 [17:42:22] I dunno elukey i'm failling, it does look like things are working better on my new node; deployment-changeprop-1, and changeprop is running. but i'm having trouble getting cpjobqueue running. [17:42:28] was trying to run them both on the same node cuz why not? [17:42:37] got passed the port collision issue [17:42:48] now no idea, some error in compiling hyperswitch templates? off [17:43:00] the method for applying changeprop configs in beta is pretty convoluted [17:45:07] giving up on running cpjobqueue on same node, will just try changeprop for now... [17:52:43] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 4 parsoid hosts - https://phabricator.wikimedia.org/T349873 (10Jhancock.wm) [19:20:12] In case anyone has the desire to understand change-propagation a little bit better, I just posted https://wikitech.wikimedia.org/wiki/Changeprop/Memorandum-2023-11 , my own journey trying to understand it. I probably have things wrong there! Let me know and/or make some corrections! [20:29:55] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install 3 sessionstore hosts - https://phabricator.wikimedia.org/T349875 (10Eevans) [20:30:42] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install 3 sessionstore hosts - https://phabricator.wikimedia.org/T349875 (10Eevans) >>! In T349875#9286181, @RobH wrote: >>>! In T348021#9281147, @Kappakayala wrote: >> @Clement_Goubert / @Joe could one of you help with the racking details? > > I'v... [20:31:47] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q2:rack/setup/install 3 sessionstore hosts (eqiad) - https://phabricator.wikimedia.org/T349875 (10Eevans) [20:32:53] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10Eevans) [20:36:40] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10Eevans) [20:47:52] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (eqiad) - https://phabricator.wikimedia.org/T349875 (10Eevans) [20:48:06] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10Eevans) [21:50:08] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (eqiad) - https://phabricator.wikimedia.org/T349875 (10RobH) a:05Clement_Goubert→03None [21:51:12] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad, 10Patch-For-Review: Q2:rack/setup/install 3 sessionstore hosts (eqiad) - https://phabricator.wikimedia.org/T349875 (10RobH) [22:35:15] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10RobH) [22:36:02] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10RobH) a:05Clement_Goubert→03None