[07:42:21] <wikibugs>	 10serviceops, 10serviceops-collab, 10Release-Engineering-Team (Radar): give releng access to logs to debug buildkit-to-wmf-registry publishing - https://phabricator.wikimedia.org/T322579 (10Joe) >>! In T322579#8377279, @dduvall wrote: > Thanks for filing this! >  > This is what would be helpful for us in deb...
[08:12:21] <moritzm>	 headsup: I'm going to temporarily (less than 30 mins each) sequentially switch kubetcd100[46] to DRBD to migrate them off their current nodes (for the Bullseye reimages)
[08:16:35] <jayme>	 moritzm: thanks. have fun!
[08:20:17] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Deploy new mw-debug service - https://phabricator.wikimedia.org/T321201 (10JMeybohm) >>! In T321201#8366250, @Clement_Goubert wrote: > [...] > @JMeybohm @Joe @akosiaris If this seems like the right way, I will start writing the "Kubernetes/Remove_a_service" wiki...
[08:53:10] <claime>	 Heya
[08:54:27] <claime>	 jayme: It did, although I had to run it twice for codfw, for some reason. Maybe a helmfile sync would have helped.
[08:54:49] <jayme>	 hm, that's odd
[08:54:57] <jayme>	 so the first run did not change anything?
[08:55:16] <claime>	 _joe_: jayme: https://gerrit.wikimedia.org/r/c/operations/puppet/+/854059  < Wondering about this approach or just marking the service directory recurse+force
[08:56:51] <claime>	 jayme: COMBINED OUTPUT:
[08:56:53] <claime>	   Error: UPGRADE FAILED: an error occurred while rolling back the release. original upgrade error: failed to refresh resource information: Get "https://kubemaster.svc.codfw.wmnet:6443/apis/rbac.authorization.k8s.io/v1/namespaces/eventgate-analytics/rolebindings/deploy": dial tcp 10.2.1.8:6443: connect: connection refused: no Namespace with the name "mwdebug" found
[08:57:35] <claime>	 May have just been a blip though
[08:57:51] <jayme>	 connection refused sounds like a blib indeed
[08:57:54] <_joe_>	 claime: I guess related to moritzm's work
[08:58:09] <claime>	 _joe_: That was last night
[08:58:29] <_joe_>	 ah wait
[08:58:43] <_joe_>	 no namespace with the name "mwdebug" found 
[08:58:56] <_joe_>	 that's before you removed it right
[08:59:10] <_joe_>	 I thought you were trying to apply something now
[08:59:53] <claime>	 _joe_: It was removed from staging-codfw and staging-eqiad
[09:00:06] <claime>	 I was trying to remove it from codfw
[09:00:48] <_joe_>	 then yeah, network blip I guess
[09:01:01] <_joe_>	 or kube master blip
[09:01:25] <claime>	 That's what I figured because a retry 2 minutes later worked
[09:03:02] <_joe_>	 https://grafana.wikimedia.org/d/000000435/kubernetes-api?orgId=1&var-datasource=thanos&var-site=codfw&var-cluster=k8s&from=now-24h&to=now&viewPanel=6 something happened around 6 pm 
[09:03:33] <_joe_>	 6 pm our TZ I mean
[09:03:49] <_joe_>	 err 5 pm, DST is over
[09:05:09] <jayme>	 not completely uncommon. that pattern is seen on changes in primary scheduler/kube-controller-manager
[09:05:24] <claime>	 That was right around when I did the applies, 1600UTC
[09:05:37] <jayme>	 (which might be the result of the blip)
[09:06:02] <_joe_>	 brb
[09:16:26] <moritzm>	 kubetcd100[46] are back on "plain" disks
[09:34:06] <wikibugs>	 10serviceops, 10Dumps-Generation, 10SRE, 10MW-1.39-notes, and 2 others: conf* hosts ran out of disk space due to log spam - https://phabricator.wikimedia.org/T322360 (10ArielGlenn)
[09:47:51] <claime>	 effie: Sorry I was wrong yesterday about T322360, it was incident related, but not to the swift one we discussed yesterday
[09:48:35] <effie>	 alright 
[09:58:04] <wikibugs>	 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): WMF container registry does not accept a manifest list (aka OCI manifest index, or "fat" manifest) - https://phabricator.wikimedia.org/T322453 (10JMeybohm) I took a quick look and AIUI our registry does support `appl...
[10:05:41] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10SRE: Keep calculating latencies for MediaWiki requests that happen k8s - https://phabricator.wikimedia.org/T276095 (10Joe) After a discussion with @fgiunchedi - given we're going to stream apache logs in json format to kafka, we can just use benthos to...
[11:19:46] <wikibugs>	 10serviceops, 10SRE, 10Thumbor, 10Security: Filter potentially harmful PostScript commands in Commons upload/thumbor - https://phabricator.wikimedia.org/T210833 (10jijiki)
[11:20:59] <wikibugs>	 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar), 10User-jijiki: Roll out remote-DC gutter pool for /*/mw-wan/ - https://phabricator.wikimedia.org/T258779 (10jijiki) p:05Triage→03High
[11:44:30] <_joe_>	 hnowlan, jayme https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/837495 has all your comments addressed as far as I can tell
[14:01:18] <jayme>	 elukey: calico docs are pain sometimes... https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/854520
[14:02:35] <jayme>	 I wonder if we should move to the operator at some point as the manifest installation is kind of unsupported it seems
[14:06:19] <elukey>	 jayme: o/ I see some diffs for admin though, are they expected? (outside fixtures)
[14:08:06] <jayme>	 hmm, no. Thats unexpected
[14:08:45] <jayme>	 maybe the version pinning thing does not work in CI...
[14:10:18] <jayme>	 at least we have not yet silently upgraded calico :)
[14:15:25] <claime>	 jayme: Mind if I disable puppet on deploy hosts so I can merge this https://gerrit.wikimedia.org/r/c/operations/puppet/+/854059 as safely as possible?
[14:15:34] <claime>	 Actually I'll wait until the end of the deploy window
[14:15:49] <jayme>	 yeah, looks like the version pinning does not work in CI. hmpf
[14:16:10] <jayme>	 claime: yeah, fine by me after the window
[14:16:15] <claime>	 ack
[14:17:47] <jayme>	 ...there are no words for how eager I am to go down the deployment-charts CI rabbithole (again)
[14:18:14] <claime>	 You know you want to :p
[14:19:34] <jayme>	 you better prepare as someone has to review the mess I'm about to make :-)
[14:20:02] <wikibugs>	 10serviceops, 10Shellbox, 10SyntaxHighlight: Install pygments in Shellbox container with pip, not a Debian package - https://phabricator.wikimedia.org/T320848 (10akosiaris) @Legoktm, technically speaking, are we talking about `pip install --no-binary pygments` ? Not sure this is supported in blubber, at leas...
[14:20:03] * claime braces
[14:33:47] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Observability-Logging, 10SRE: Keep calculating latencies for MediaWiki requests in the WikiKube environment - https://phabricator.wikimedia.org/T276095 (10akosiaris)
[14:43:42] <_joe_>	 jayme: what doesn't work?
[14:44:20] <jayme>	 _joe_: pinning of specific chart versions in admin_ng - something like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/838134
[14:44:33] <jayme>	 so https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/838137/1 should not produce a diff
[14:44:42] <jayme>	 (for admin_ng deployments)
[14:45:06] <_joe_>	 ugh 
[14:45:12] <_joe_>	 you're templating out charts
[14:45:16] <_joe_>	 yes ofc it won't work
[14:45:24] <_joe_>	 we pattern-extract the chart IIRC
[14:45:50] <jayme>	 I'm templating in versions - not remplating out charts
[14:45:52] <_joe_>	 uhm
[14:45:57] <_joe_>	 actually it should work
[14:46:05] <_joe_>	 what "doesn't work" then?
[14:46:08] <jayme>	 yes. I do think it should
[14:46:46] <jayme>	 if oyu look at the output of https://integration.wikimedia.org/ci/job/helm-lint/8245/console
[14:46:58] <jayme>	 there should not be a diff for admin AIUI
[14:47:08] <jayme>	 (and there is none in prod)
[14:47:08] <_joe_>	 there is none in the gate and submit job
[14:47:18] <_joe_>	 https://integration.wikimedia.org/ci/job/helm-lint/8190/console
[14:47:22] <_joe_>	 it's a rebase issue
[14:48:14] <_joe_>	 uh wait
[14:48:25] <_joe_>	 that job you linked is not for the change you just showed me
[14:48:35] <_joe_>	 ah the one after
[14:48:50] <jayme>	 that is correct and I was not implying so :)
[14:49:32] <_joe_>	 why are you committing the tgz of the chart to the repo, btw?
[14:49:43] <jayme>	 veeerry different storry
[14:49:59] <jayme>	 because https://gerrit.wikimedia.org/r/c/operations/docker-images/docker-report/+/826859
[14:50:39] <_joe_>	 ok, why not merge that first?
[14:50:51] <_joe_>	 ah the current chart is broken?
[14:50:59] <jayme>	 yes, kinda
[14:51:03] <_joe_>	 can't we just remove it from chartmuseum
[14:51:47] <jayme>	 as said, different story. I not sure (it was some time ago) if CI is even ready for dependencies
[14:52:29] <jayme>	 because we use the local versions, e.g. we would need to rewrite dependencies from https URIs to file URIs etc
[14:53:01] <jayme>	 pushing the tar was the easy way around at the time
[14:53:21] <_joe_>	 I don't have the headspace to dive into this, but I'm pretty convinced that tar is the source of the issue somewhat
[14:53:37] <_joe_>	 there's 5 levels of helm chart caching involved
[14:55:01] <jayme>	 I'm not so sure. The tar contains just the CRDs. If that is to create a strange diff, it would be a diff in just the CRDs
[14:55:49] <jayme>	 there are other changes (that don't include a tar) that also create this kind of diff (like https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/854520)
[14:57:41] <_joe_>	 so, this seems simpler to check
[14:57:42] <jayme>	 I guess the clue is somewhere in the patching charts area
[14:58:11] <jayme>	 rephrase: pathing the "chart:" key in helmfiles
[14:58:31] <_joe_>	 sorry, what is wrong with this second change?
[14:58:38] <_joe_>	 the diff seems reasonable to me
[14:58:57] <_joe_>	 you're bumping the calico chart
[14:59:12] <jayme>	 https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/854520 you mean? Same thing. calico and calico-crds are pinned the same way for admin deployments
[14:59:16] <_joe_>	 and the diffs seem a bit awkward as in it's mixing stuff
[14:59:33] <_joe_>	 what is changing and shouldn't?
[14:59:49] <jayme>	 there should not be a diff in admin
[14:59:49] <_joe_>	 the diffs I see in the console log seem to check out with the changes you made
[14:59:55] <_joe_>	 uh
[15:00:08] <jayme>	 because the version is pinned
[15:00:11] <_joe_>	 lol ok, this is your fault actually
[15:00:21] <jayme>	 yes, it probably is :)
[15:00:24] <_joe_>	 you did change behaviour to always use the local chart in the path
[15:00:32] <_joe_>	 and not a remote one
[15:00:37] <jayme>	 yes ...said so a minute ago :)
[15:00:39] <jayme>	 content.gsub!(%r{^(\s*chart:\s+["']{0,1})wmf-stable/}, "\\1#{charts_dir.chomp('/').concat('/')}")
[15:00:41] <_joe_>	 you can't have both
[15:01:11] <_joe_>	 else you might have to have two versions of the chart in our charts repo
[15:01:20] <_joe_>	 or, you can check if there is a pinned version
[15:01:24] <_joe_>	 and skip the gsub
[15:01:30] <_joe_>	 that's probably easiest?
[15:01:58] <jayme>	 IIRC the gsub was invented for services
[15:02:10] <jayme>	 maybe it's not wise to reuse it for admin_ng...
[15:02:15] <jayme>	 not sure
[15:02:19] <wikibugs>	 10serviceops, 10Arc-Lamp, 10Performance-Team (Radar): Expand RAM on arclamp hosts and move them to baremetal - https://phabricator.wikimedia.org/T316223 (10akosiaris)
[15:02:21] <_joe_>	 no it was invented so that we would never miss a chart change in our diffs
[15:02:32] <_joe_>	 which applies to admin too
[15:02:40] <jayme>	 hm..indeed
[15:02:50] <_joe_>	 and the issue of the version pinning would be there in services as well one day
[15:02:56] <_joe_>	 I can look into this later
[15:03:06] <_joe_>	 as in, in like half an hour
[15:03:09] <jayme>	 funny/sad that helmfile does simply ignore the "version:" parameter in that case
[15:03:25] <_joe_>	 "hey I'll get what I find, sorry"
[15:03:53] <jayme>	 yeah...I think it should probably fail then
[15:04:51] <_joe_>	 as a once brilliant engineer turned excel wrangler once said, helm is "gen 0 tooling"
[15:04:53] <_joe_>	 and it shows
[15:05:39] <wikibugs>	 10serviceops, 10Service-Architecture: Standards and health score for existing and new services - https://phabricator.wikimedia.org/T88643 (10jijiki) 05Open→03Declined Given that we are actively doing adding SLOs to services (which is indeed addressing some things as already stated), I am bluntly closing th...
[15:07:26] <_joe_>	 damn you and your templating version, it's harder than I hoped :P
[15:07:36] <jayme>	 yeah...
[15:07:47] <_joe_>	 I could in theory just get "version: ..." and that would be ok
[15:08:11] <jayme>	 but there might be multiple releases
[15:08:47] <_joe_>	 I'm not saying you shouldn't do this
[15:09:08] <jayme>	 I did not understand that...also I kind of have to :)
[15:10:37] <_joe_>	 I'm just cursing how complex this becomes
[15:10:43] <jayme>	 I would also say it's mainly Lucas fault because he noticed :D
[15:11:58] <_joe_>	 nope, it's fully up your alley dude
[15:12:25] <jayme>	 worth a try 🤷
[15:16:49] <jayme>	 as the helmfile is not valid YAML at patch point..could we "helmfile build" it, then patch, then lint/template?
[15:17:27] <jayme>	 in that case we could at least properly parse it as it's YAML (at least I think it is)
[15:18:28] <wikibugs>	 10serviceops, 10Observability-Metrics, 10SRE, 10Patch-For-Review: Strongswan Icinga check: do not report issues about depooled hosts - https://phabricator.wikimedia.org/T148976 (10jijiki) 05Open→03Declined Strongswan is going away because we do not need it anymore. We were using it for redis_sessions T...
[15:19:01] <_joe_>	 jayme: yeah but sadly we don't know what values to feed it
[15:19:04] <_joe_>	 basically
[15:19:34] <_joe_>	 I'm looking specifically to admin, we don't even pass it the calico helmfile directly, so not sure it will even be patched
[15:19:57] <jayme>	 helmfile_glob = File.join(dir, '**/helmfile*.yaml')
[15:20:36] <jayme>	 AIUI that should to the trick for all helmfiles, no?
[15:21:03] <_joe_>	 ah, right
[15:21:10] <_joe_>	 ok, but when we run patch_helmfile
[15:21:32] <_joe_>	 do we know which values files to feed each helmfile?
[15:21:47] <_joe_>	 that's what I was trying to find and I think the answer is "no"
[15:21:52] <_joe_>	 so some refactoring will be needed
[15:24:04] <jayme>	 that should be just passing environment down to patch_helmfile or am I wrong?
[15:25:12] <_joe_>	 I'm not sure it's enough
[15:25:38] <_joe_>	 ~/Code/WMF/operations/deployment-charts/helmfile.d/admin_ng/calico (master =)$ helmfile -e codfw build 
[15:25:39] <_joe_>	 in ./helmfile.yaml: error during helmfile.yaml.part.0 parsing: template: stringTemplate:5:27: executing "stringTemplate" at <.Values.chartVersions>: map has no entry for key "chartVersions"
[15:25:57] <_joe_>	 just to make an example
[15:26:05] <jayme>	 but that is not what happens
[15:26:16] <jayme>	 in admin_ng we only use the "master" helmfile
[15:26:36] <jayme>	 ~/Code/WMF/operations/deployment-charts/helmfile.d/admin_ng$ helmfile -e codfw build
[15:26:56] <_joe_>	 ok
[15:27:30] <wikibugs>	 10serviceops, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible: Thumbnails on beta cluster return 503 Service Unavailable - https://phabricator.wikimedia.org/T321654 (10Vgutierrez)
[15:27:34] <_joe_>	 so how do you propose we find that a specific helmfile has a version then?
[15:28:40] <_joe_>	 given how we're patching dependent helmfiles
[15:30:02] <_joe_>	 tbh, I think if we just check for "version: {{ $version }}" and in that case don't patch the chart path
[15:30:15] <_joe_>	 will cover 99.9% of cases and be less error prone
[15:30:50] <_joe_>	 the complexity added by trying to do that heuristics correctly would probably mean we should just ditch how that whole thing works
[15:31:15] <jayme>	 re: 99.9%: I agree. But it will also create a strange edge case
[15:31:25] <_joe_>	 which we can document?
[15:31:30] <_joe_>	 but yeah the alternative is
[15:31:41] <_joe_>	 we run helmfile -e <env> build
[15:31:53] <_joe_>	 for each chart there, check if the version is pinned
[15:32:00] <_joe_>	 keep a list of such charts
[15:32:10] <_joe_>	 and then skip the patching for those
[15:33:21] <claime>	 I'm gonna merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/854059 so I'll be disabling puppet on deploy hosts for a bit
[15:33:42] <claime>	 Just so I can test safely on deploy2002
[15:34:23] <_joe_>	 ack
[15:34:38] <_joe_>	 claime: as long as the disable message has your signature, just do it
[15:34:46] <claime>	 ack
[15:34:49] <_joe_>	 if anyone will need puppet to run they will contact you
[15:35:21] <_joe_>	 jayme: I'm trying to take a stab at it
[15:35:41] <jayme>	 _joe_: not sure the plan is correct
[15:37:19] <_joe_>	 jayme: why?
[15:38:42] <wikibugs>	 10serviceops: Add IRC SRE bot for SAL !log actions to #wikimedia-serviceops - https://phabricator.wikimedia.org/T213196 (10jijiki) 05Open→03Declined Given that there has not been any update in this task from  #serviceops, I will bluntly close it. We'll reopen if there is interest :)
[15:38:56] <jayme>	 if we "helmfile -e <env> build" a helmfile, we're getting a complete artefact for that <env> which we can YAML.safe_load. Then "for release in yaml.get('releases',[])" and patch the "chart:" key if the version is not pinned
[15:39:23] <_joe_>	 yes?
[15:39:30] <_joe_>	 (that's not what I said, btw)
[15:40:01] <_joe_>	 also I love you write python and this is ruby lol
[15:40:22] <jayme>	 then write the patched artifact back as helmfile.yaml and run "helmfile -e <env> template -f helmfile.yaml"
[15:40:53] <_joe_>	 that's not what I said
[15:41:02] <jayme>	 wasn't trying to say this is what you proposed. I think that would be enough and we don't need to keep a list or something
[15:41:18] <_joe_>	 the artifact is not a valid helmfile
[15:41:23] <_joe_>	 it's an helmfile status file
[15:41:29] <jayme>	 fuu...it's not?
[15:41:33] <_joe_>	 I don't think you can feed it directly to template
[15:41:37] <_joe_>	 yeah, nope :P
[15:41:43] <jayme>	 oh hell
[15:41:51] <_joe_>	 yes
[15:45:38] <_joe_>	 well turns out with some time to think it's simpler than we thought
[15:45:40] <wikibugs>	 10serviceops, 10Parsoid (Third-party): parsoid apt repo rolled back breaks updates - https://phabricator.wikimedia.org/T264546 (10jijiki) 05Open→03Declined This is for parsoidJS I reckon, which we have moved away from.
[15:46:21] <jayme>	 it is?
[15:46:35] <_joe_>	 yes
[15:47:02] <jayme>	 uh
[15:47:12] <jayme>	 read the values files instead
[15:47:22] <_joe_>	 no
[15:47:29] <_joe_>	 how do you know what the values file is?
[15:47:39] <_joe_>	 you need to build helmfile first :P
[15:47:45] <jayme>	 hrhr, indeed
[15:47:58] <_joe_>	 don't worry, I found a good way to wire this in
[15:48:39] <jayme>	 sure, but I obviously want to know *now* :)
[15:49:20] <_joe_>	 oh simply @fixtures.values.each |env| helmfile build, find pinned charts, stash in a dict
[15:49:53] <wikibugs>	 10serviceops, 10Parsoid, 10RESTBase: Decommission Parsoid/JS from the Wikimedia cluster - https://phabricator.wikimedia.org/T241207 (10jijiki) @Dzahn If there is anything left here, I reckon we can mark it as resolved? Thank you!
[15:53:16] <jayme>	 okay, cool. AIUI that would then still pinn all releases of said chart (assuming we have >1) even if only one is pinned, right?
[15:53:37] <jayme>	 (not saying this is a problem we have right now)
[16:00:19] <_joe_>	 can we have multiple versions of the same chart under the same helmfile?
[16:00:27] <_joe_>	 that doesn't seem probable?
[16:00:46] <jayme>	 absolutely
[16:00:55] <jayme>	 different releases
[16:01:21] <claime>	 All done, we can now absent services in hieradata/common/profile/kubernetes/deployment_server.yaml
[16:02:32] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Allow absenting profile::kubernetes::deployment_server::services - https://phabricator.wikimedia.org/T322298 (10Clement_Goubert) 05In progress→03Resolved
[16:02:34] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Deploy new mw-debug service - https://phabricator.wikimedia.org/T321201 (10Clement_Goubert)
[16:06:10] <wikibugs>	 10serviceops, 10Foundational Technology Requests, 10Prod-Kubernetes, 10Shared-Data-Infrastructure, and 2 others: Update Kubernetes clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm)
[16:08:30] <_joe_>	 jayme: I think we can outmaneuver that too
[16:30:26] <jayme>	 that would be nice ofc
[16:52:28] <claime>	 https://wikitech.wikimedia.org/wiki/Kubernetes/Remove_a_service feedback and additions welcome
[16:59:35] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Deploy new mw-debug service - https://phabricator.wikimedia.org/T321201 (10Clement_Goubert) 05In progress→03Resolved All cleaned up.
[17:04:15] <wikibugs>	 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Deploy new mw-debug service - https://phabricator.wikimedia.org/T321201 (10Clement_Goubert) I lied.
[17:21:09] <wikibugs>	 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): WMF container registry does not accept a manifest list (aka OCI manifest index, or "fat" manifest) - https://phabricator.wikimedia.org/T322453 (10dduvall) Thanks for debugging this further, @JMeybohm and @hashar. In...
[17:22:12] <wikibugs>	 10serviceops, 10MW-on-K8s, 10SRE: Sandbox/limit child processes within a container runtime - https://phabricator.wikimedia.org/T252745 (10Joe) 05Open→03Resolved a:03Joe This task can be considered resolved given we've deployed shellbox.
[17:26:15] <_joe_>	 jayme: I hate you
[17:28:13] <wikibugs>	 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push - https://phabricator.wikimedia.org/T322453 (10dduvall)
[17:31:37] <claime>	 I'm off, see you tomorrow
[17:50:20] <wikibugs>	 10serviceops, 10Continuous-Integration-Infrastructure, 10Datacenter-Switchover, 10Release-Engineering-Team (Priority Backlog 📥): Create a runbook for switching CI master - https://phabricator.wikimedia.org/T256396 (10jijiki)
[17:55:14] * _joe_ too
[17:56:21] <wikibugs>	 10serviceops, 10Parsoid, 10RESTBase: Decommission Parsoid/JS from the Wikimedia cluster - https://phabricator.wikimedia.org/T241207 (10Dzahn) 05Open→03Resolved a:03Dzahn @jijiki There is a small detail left but I am not going to work on it. It requires changes to scap config first.  I don't mind if it...
[18:12:41] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Drop the use of nonexisting groups in kubernetes infrastructure_users - https://phabricator.wikimedia.org/T290963 (10jijiki)
[18:12:45] <wikibugs>	 10serviceops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE, 10netops: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10jijiki)
[18:13:15] <wikibugs>	 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Import istio 1.1x (k8s 1.23 dependency) - https://phabricator.wikimedia.org/T322193 (10jijiki)
[18:46:51] <wikibugs>	 10serviceops, 10GitLab (Infrastructure), 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10jijiki) p:05High→03Unbreak!
[18:47:05] <wikibugs>	 10serviceops, 10GitLab (Infrastructure), 10Patch-For-Review: bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10jijiki) p:05Unbreak!→03High
[19:42:46] <wikibugs>	 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push - https://phabricator.wikimedia.org/T322453 (10dduvall) @JMeybohm can you provide the nginx access log entries from that ti...
[20:15:52] <wikibugs>	 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Priority Backlog 📥): Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push - https://phabricator.wikimedia.org/T322453 (10dduvall) I enabled debug logging for buildkitd on the gitlab-runner hosts an...
[20:38:58] <wikibugs>	 10serviceops, 10GitLab, 10Release-Engineering-Team (Priority Backlog 📥): Build and import new release of jwt-authorizer (1.1.0) - https://phabricator.wikimedia.org/T322691 (10dduvall)