[00:56:39] 06serviceops, 13Patch-For-Review: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#11547377 (10Scott_French) [02:18:58] 06serviceops, 13Patch-For-Review: docker-registry.wikimedia.org keeps serving bad blobs - https://phabricator.wikimedia.org/T390251#11547596 (10Scott_French) I've now also merged T406392, for the same reason. One key point of note from that task is that buildkit, as used in the Gitlab CI image build / push jo... [07:48:44] nemo-yiannis: Ping me when you get in please? [08:21:21] Hello! IPoid is being tested with a new OpenSearch backend on testwiki. There has been a few communications from inflatador, so this is probably not a surprise. Product Safety & Integrity is ready to roll this out to production, after the SRE summit. Should there be a more formal sign off before moving forward? [08:22:11] More context about the OpenSearch backend in https://docs.google.com/document/d/1G7OTmBmzl5GVwoanCrzQNHMMXIrAvROpmKumdWtZfuo/edit. kostajh probably has similar docs for IPoid itself. [09:29:34] claime: 👋 [09:35:02] nemo-yiannis: Hey :D https://phabricator.wikimedia.org/T410296#11547120 I think we need to implement one of scott's two proposals [09:36:09] ok, i am setting a patch for --max-semi-space-size=16 and if things don't work out, we revert and revisit it on Monday so we don't have incidents this weekend [09:36:50] 06serviceops, 10MW-on-K8s, 06SRE: Migrate MW appservers' base images to bullseye - https://phabricator.wikimedia.org/T356293#11547964 (10MoritzMuehlenhoff) 05Stalled→03Resolved a:03MoritzMuehlenhoff This is long done [09:37:36] nemo-yiannis: I don't think we'll be able to assist next week, we're all offsite unfortunately [09:38:23] ok, i don't think we are in an urge to bump the node version next week, it can wait until SREs are back [09:44:09] Great, thank you <3 [10:04:43] claime: I am deploying now [10:07:00] Thanks a bunch [10:09:02] Ok, done. I am keeping an eye on this: https://grafana.wikimedia.org/goto/Aq0oCyIDR?orgId=1 If in the next ~4-5 hours things dont stabilize, I will revert to the previous working state [10:10:41] nemo-yiannis: Ack, thanks for being so reactive <3 [10:33:23] gehel: thanks for reaching out! We have questions on failure recovery & SLOs. As the team is traveling for SRE summit from today onwards, could we schedule a quick review on the week after the summit? We can also add a part offline if you prefer [10:34:31] matthieulec: It probably makes sense to start offline, at least to know what kind of questions need to be addressed. I suspect that the main contact point to answer them is going to be kostajh, so let's make sure he is onboard with this. [10:44:16] sure adding now in the doc and tagging Kosta, hope that's ok (feel free to move the discussion somewhere else if you prefer) [12:49:17] claime: still latency is not stable, i am gonna revert [12:49:27] ack, thanks [13:28:57] claime: wanna take a look at the patch ? [13:29:28] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1230917