[07:50:09] jayme: o/ [07:50:22] going to rollout the new istio to dse and ml-serve [08:01:50] 10serviceops, 10Machine-Learning-Team, 10SRE: Import and deploy istio 1.15.7 - https://phabricator.wikimedia.org/T334068 (10elukey) [08:02:49] 10serviceops, 10Machine-Learning-Team, 10SRE: Import and deploy istio 1.15.7 - https://phabricator.wikimedia.org/T334068 (10elukey) Rollout to ml-serve/aux/dse completed. To keep archives happy, I used ` istioctl-1.15.7 upgrade -f config.yaml` Last step: rollout to wikikube clusters [08:03:06] 10serviceops, 10Machine-Learning-Team, 10SRE: Import and deploy istio 1.15.7 - https://phabricator.wikimedia.org/T334068 (10elukey) a:05elukey→03JMeybohm [08:42:43] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Monitor all mw-on-k8s deployments with httpbb - https://phabricator.wikimedia.org/T334456 (10Clement_Goubert) 05In progress→03Resolved [08:42:53] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 3 others: Migrate internal traffic to k8s - https://phabricator.wikimedia.org/T333120 (10Clement_Goubert) [08:47:50] elukey: cool. will do the same on wikikube now [08:48:30] super [08:48:38] (also done aux and dse) [08:53:46] <3 [08:59:28] 10serviceops, 10Machine-Learning-Team, 10SRE: Import and deploy istio 1.15.7 - https://phabricator.wikimedia.org/T334068 (10JMeybohm) 05Open→03Resolved Thanks! Wikikube is done as well [08:59:38] 10serviceops, 10Machine-Learning-Team, 10SRE: Import and deploy istio 1.15.7 - https://phabricator.wikimedia.org/T334068 (10JMeybohm) [09:00:35] Can I get a +1 for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/905941 ? [09:04:38] claime: does the listener have the same port as mwapi-async? [09:04:46] jayme: yes [09:04:49] (on purpose) [09:04:59] yes, yes. sure [09:05:10] +1ed [09:05:24] thx <3 [09:05:45] I'm going to migrate the two for which I have actual test instructions (linkrec and cxserver) [09:05:55] cool [09:06:24] I'm curious as to when the first person will include two conflicting listeners in their config :D [09:07:26] >_> [09:08:31] (tbh I anticipate it's going to happen, it's a matter of moving traffic quickly enough towards mw-on-k8s that we can depreciate the mwapi endpoints :P) [09:26:40] Hmmm >_> [09:27:15] Not exactly going as planned, I think I'm missing SAN for mw-api-int.discovery.wmnet [09:31:54] that sounds reasonable [09:34:10] 10serviceops, 10MW-on-K8s: Add mw-on-k8s deployments to appservers-rw.discovery.wmnet certificate SAN - https://phabricator.wikimedia.org/T334561 (10Clement_Goubert) [09:34:25] 10serviceops, 10MW-on-K8s: Add mw-on-k8s deployments to appservers-rw.discovery.wmnet certificate SAN - https://phabricator.wikimedia.org/T334561 (10Clement_Goubert) p:05Triage→03High [10:02:51] hi, I have a few Puppet changes for the deployment server which are to populate scap dsh files from a Puppet query instead of manually crafted list of hosts. They should be entirely noop [10:02:57] ex https://gerrit.wikimedia.org/r/c/operations/puppet/+/893483 [10:03:16] may someone pair with me this week to apply them on the deployment server? :] [10:04:38] hashar: be with you as soon as I'm done with j.nuche's change [10:05:09] \o/ [10:12:41] ok I'm done :) [10:14:37] hashar: Running PCC for deployment servers [10:15:31] ah yeah [10:15:46] well I don't know whether the PCC is smart enough to handle puppet db queries [10:16:09] there are two other child changes, I can't remember why I have split them [10:16:44] Compilation fails, I think it's just a matter of quoting the resource name [10:16:53] grr [10:17:03] I have it pulled, I'll change it [10:17:21] hieradata/common/scap/dsh.yaml: pdb_query: "Class[Profile::Kubernetes::Mediawiki_runner] and User[mwdeploy]{ensure=present}" [10:17:24] that is the previous usage [10:17:56] then I don't know anything about puppet db queries :/ [10:20:16] 10serviceops, 10Platform Team Workboards (Platform Engineering Reliability): Replace Nutcracker - https://phabricator.wikimedia.org/T333019 (10hnowlan) >>! In T333019#8772531, @kamila wrote: > I am inclined to go with Envoy: it supports our use cases, has good performance (esp. with TLS), seems to have the bes... [10:20:37] Hmm [10:22:04] claime: if the query is wrong, just give up [10:22:08] I will dig more, sorry :] [10:22:25] hashar: I'm not sure it's wrong [10:22:36] We'll see it it breaks again :D [10:23:02] it should work, it may need the {ensure=present} selector, maybe [10:24:24] single quotes for the win! [10:24:28] Yep [10:24:35] Does the PCC change look good to you? [10:24:45] checking [10:25:09] yeah looks good [10:25:22] there are some extra hosts added which are being prepared [10:25:33] and if they have the scap::target applied they have everything needed [10:25:44] once merged, I will will try a scap deploy on them [10:25:48] ack [10:25:50] merging [10:26:14] that will first need a puppet run on the deployment server though [10:26:22] ( deploy2002.codfw.wmnet currently) [10:26:24] yep [10:28:05] hashar: merged, puppet run done on deploy2002 [10:28:09] You can test [10:28:12] testing [10:29:57] all the new hosts work :] [10:31:20] I am rebasing the other changes [10:31:31] ack [10:32:37] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: CPU error for mw2448.codfw.wmnet - https://phabricator.wikimedia.org/T334429 (10Clement_Goubert) Starting 10 min cpu stress test: ` cgoubert@mw2448:~$ stress -c 48 --timeout 600s... [10:34:09] this time I added a `Hosts:` header and running them through the pcc [10:34:25] https://gerrit.wikimedia.org/r/c/operations/puppet/+/893485/ will break, missing closing bracket [10:35:16] and https://gerrit.wikimedia.org/r/c/operations/puppet/+/893484/ removes the hosts ... [10:38:10] cgoubert@cumin1001:~$ sudo cumin "P{R:Class = Profile::Ci::Jenkins}" [10:38:12] 3 hosts will be targeted: [10:38:14] contint[1002,2001-2002].wikimedia.org [10:38:16] DRY-RUN mode enabled, aborting [10:38:18] cgoubert@cumin1001:~$ sudo cumin "P{R:Scap::Target = 'releng/jenkins-deploy'}" [10:38:20] 2 hosts will be targeted: [10:38:22] releases2002.codfw.wmnet,releases1002.eqiad.wmnet [10:38:24] DRY-RUN mode enabled, aborting [10:38:26] The two sets are different [10:38:28] Did you want 'or' instead of 'and' maybe [10:38:39] yeah I screwed it up :/ [10:39:03] I think cause the jenkins-ci host are not a scap target yet [10:39:18] sorry :/ [10:39:21] No worries [10:41:30] I have cherry picked the other one and fixed the missing bracket https://gerrit.wikimedia.org/r/c/operations/puppet/+/893485 it is in PCC [10:41:40] pro-tip: 'P{R:Class = Profile::Ci::Jenkins}' can be simplified with 'P:Ci::Jenkins' ;) [10:41:48] any case works [10:42:37] can the puppet query be all lower case as well? aka `class[profile::ci::jenkins]`? [10:43:42] you mean in puppet code? [10:43:52] In puppet code I don't think so [10:43:58] the normalization of the cumin query is done by cumin to simplify the user's life ;) [10:44:02] it is from a Puppet DB query [10:44:05] ah yeah [10:44:10] so different entry point I guess [10:44:12] yep [10:44:53] pbd_query vs pdb_query , I feel like I am 7 years old again sometime [10:45:36] lmao [10:45:49] :b [10:45:53] (my youngest kid had a very hard time to differentiate the four letters from the set `(b, d, p, q)` and I can not blame her for it [10:47:37] we thus had this bird http://chdecole.ch/wordpress/wp-content/uploads/2012/01/poussin-dbqp-300x211.jpg all other the house as a helper :) [10:47:52] Oh un boussin [10:48:12] (sorry) [10:49:48] https://gerrit.wikimedia.org/r/c/operations/puppet/+/893485 is good to go https://puppet-compiler.wmflabs.org/output/893485/1727/deploy1002.eqiad.wmnet/index.html shows it is a noop [10:50:25] ack merging [10:51:08] merged, running puppet on deploy2002 [10:51:44] lovely thank you [10:52:09] the last one I have to dig into it further [10:52:25] it is less of an issue, we don't use scap yet for that one ( https://gerrit.wikimedia.org/r/c/operations/puppet/+/893484 ) [10:53:18] confirmed noop for 893485 on deploy2002 [10:53:34] hit me up when you need to merge 893484 [10:56:34] claime: I am set. I need to verify what is up with the jenkins-ci and that will take a few more hours I am afraid :] [10:56:36] I gotta dig [10:57:04] at least the other use cases are now covered by Puppet DB Query which will make it waayyyyy easier to add new hosts [10:57:18] thanks for the puppet merge and the query fix up! [10:57:25] My pleasure [10:58:10] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: hw troubleshooting: CPU error for mw2448.codfw.wmnet - https://phabricator.wikimedia.org/T334429 (10Clement_Goubert) Stress test went without issue, removing downtime and repooling host. [11:04:45] more or less related, but all those dsh groups can certainly be converted to use puppet db queries [11:05:01] so as long as a host has the related scap::target, it will eventually show up on the deployment server [11:06:13] we could even drop that `scap::dsh::groups` map entirely and have it entirely generated from a Puppet query for hosts having Scap::Target [11:06:19] or it is over engineering I don't know [11:08:23] I don't have an opinion, but transitioning to scap::target queries for hosts: stanzas would be a good first step to avoid forgetting to add hosts when they're not in conftool [11:08:35] * claime lunch [11:18:31] ditto [12:31:12] 10serviceops, 10Data-Persistence, 10SRE, 10Datacenter-Switchover, and 2 others: March 2023 Datacenter Switchover - https://phabricator.wikimedia.org/T327920 (10ayounsi) [12:32:10] 10serviceops, 10DBA, 10Data-Engineering-Planning, 10Data-Persistence, and 11 others: eqiad row A switches upgrade - https://phabricator.wikimedia.org/T329073 (10ayounsi) [12:39:34] 10serviceops, 10MW-on-K8s: Add mw-on-k8s deployments to mediawiki certificates - https://phabricator.wikimedia.org/T334561 (10Clement_Goubert) [15:17:58] Huh more weird stuff [17:22:07] 10serviceops, 10Shellbox, 10SyntaxHighlight, 10Patch-For-Review, 10User-bd808: Install pygments in Shellbox container with pip, not a Debian package - https://phabricator.wikimedia.org/T320848 (10Legoktm) Could we get a +1 from someone in serviceops on the general approach of installing via pip instead o... [19:01:41] 10serviceops, 10Arc-Lamp, 10Performance-Team, 10WikimediaDebug, 10Patch-For-Review: Add per-request flamegraph option to WikimediaDebug - https://phabricator.wikimedia.org/T291015 (10jijiki) [19:09:43] 10serviceops, 10Arc-Lamp, 10Performance-Team, 10WikimediaDebug, 10Patch-For-Review: Add per-request flamegraph option to WikimediaDebug - https://phabricator.wikimedia.org/T291015 (10Krinkle) @jijiki (Capturing here from last month's Perf:SvcOps meeting) As part of this goal, we're developed a few small... [19:30:24] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10ARM support: Adoption of aarch64 (aka arm64) in WMF production? (SRE Summit 2022 Session) - https://phabricator.wikimedia.org/T320811 (10Ladsgroup) This might be interesting, specially in choosing a manufacturer: https://www.hetzner.com/press-release/arm... [19:31:59] win 24 [21:23:55] 10serviceops, 10Shellbox, 10SyntaxHighlight, 10Patch-For-Review, 10User-bd808: Install pygments in Shellbox container with pip, not a Debian package - https://phabricator.wikimedia.org/T320848 (10bd808) >>! In T320848#8776934, @Legoktm wrote: > Could we get a +1 from someone in serviceops on the general...