[08:21:37] 06serviceops, 06Commons, 10SRE-swift-storage: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9644839 (10MatthewVernon) Yes, we don't replicate thumbnails between DCs any more (and this has been the case since July 2022 cf. T313102) [09:00:24] 06serviceops, 06Commons, 10SRE-swift-storage: Commons thumbnails are broken for certain large sizes of thumbnail images - https://phabricator.wikimedia.org/T358738#9644903 (10akosiaris) >>! In T358738#9644283, @tstarling wrote: > I thought there was no cross-DC replication of thumbnails. T299125#8221206 seem... [09:01:53] 06serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE (2024.03.04 - 2024.03.24), 07Kubernetes, 13Patch-For-Review: Improve how we address outside k8s infrastructure from within charts (e.g. network policies) - https://phabricator.wikimedia.org/T331894#9644918 (10brouberol) @JMeybohm Thanks for the thoroug... [09:05:00] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes: Migrate charts to Calico Network Policies - https://phabricator.wikimedia.org/T359423#9644938 (10Gehel) p:05Triage→03Medium [09:12:12] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9644965 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1368.eqiad.wmnet with OS bullseye [09:12:34] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9644966 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1369.eqiad.wmnet with OS bullseye [09:13:01] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9644967 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1370.eqiad.wmnet with OS bullseye [09:13:32] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9644969 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1478.eqiad.wmnet with OS bullseye [09:13:57] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9644970 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1479.eqiad.wmnet with OS bullseye [09:48:06] 06serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE (2024.03.04 - 2024.03.24), 07Kubernetes, 13Patch-For-Review: Improve how we address outside k8s infrastructure from within charts (e.g. network policies) - https://phabricator.wikimedia.org/T331894#9645068 (10brouberol) The only change I've made to the... [09:48:31] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9645069 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1368.eqiad.wmnet with OS bullseye completed: - mw13... [09:50:59] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9645073 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1370.eqiad.wmnet with OS bullseye completed: - mw13... [09:52:11] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9645079 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1479.eqiad.wmnet with OS bullseye completed: - mw14... [09:53:40] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9645081 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1478.eqiad.wmnet with OS bullseye completed: - mw14... [09:56:55] 06serviceops, 10MW-on-K8s, 13Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074#9645098 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1369.eqiad.wmnet with OS bullseye completed: - mw13... [09:59:05] 06serviceops, 10Prod-Kubernetes, 10Data-Platform-SRE (2024.03.04 - 2024.03.24), 07Kubernetes: Create the required namespaces within each Kubernetes cluster - https://phabricator.wikimedia.org/T360508 (10brouberol) 03NEW [12:35:36] 06serviceops, 10CirrusSearch, 06Discovery-Search, 10Data-Platform-SRE (2024.03.04 - 2024.03.24): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794#9645485 (10brouberol) [13:44:52] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9645651 (10MatthewVernon) Noting here for future reference - we found that thumbor was incorrectly using the global discovery recor... [15:07:25] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9646008 (10Marostegui) [15:52:07] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9646169 (10jijiki) [15:57:13] I know it's a busy time, but would anyone be up for a +1 on https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1009494 please? I've made the changes suggested in review... [16:03:20] Emperor: I am late to the party so feel free to discard this, but it would be way clearer to COPY the lvm.conf config instead of doing all the seds. [16:03:30] (Joe suggested it as I read) [16:04:44] if it is exactly as upstream does it we can probably skip [16:04:59] (I see the comment on line 4) [16:05:58] it is exactly as upstream does (well, not with more robust seddery, but) [16:08:36] ack okok [16:08:41] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9646285 (10Marostegui) [16:08:46] and docker-pkg locally built the image correctly etc.. right? [16:08:59] (just double checking, I didn't see the output in gerrit) [16:11:12] Emperor: --^ [16:11:57] I've not tried poking it on a real build host as yet... (I thought CI might do that?) [16:13:01] nope if it didn't change when I was on paternity leave it is all a manual process [16:13:59] lemme check [16:15:42] what I usually do is to create a py venv and pip install docker-pkg [16:16:09] then something like `docker-pkg build images/ --select *ceph*` should build the image locally [16:16:18] it is currently failing for me and the error is cryptic [16:19:26] checking [16:20:16] also there are some warnings for the changelo [16:20:18] *changelog [16:29:10] 06serviceops, 10MoveComms-Support, 07User-notice: MoveComms support for Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T358233#9646399 (10Trizek-WMF) Debrief with @jijiki: >The read-only time can happen between 14:00 and 14:30 - the time window allocated by SRE - depending o... [16:29:20] 06serviceops, 10MoveComms-Support, 07User-notice: 14MoveComms support for Northward Datacentre Switchover (March 2024) - 14https://phabricator.wikimedia.org/T358233#9646401 (10Trizek-WMF) [16:29:25] 06serviceops, 10MoveComms-Support, 07User-notice: 14MoveComms support for Northward Datacentre Switchover (March 2024) - 14https://phabricator.wikimedia.org/T358233#9646402 (10Trizek-WMF) 05In progress→03Resolved [16:29:34] 06serviceops, 06collaboration-services, 06Data-Persistence, 06DC-Ops, and 5 others: ☂️ Northward Datacentre Switchover (March 2024) - https://phabricator.wikimedia.org/T357547#9646403 (10Trizek-WMF) [16:30:02] ok so the changelog wasn't liking the ceph/ceph [16:30:13] doesn't solve the weird 400 I'm getting [16:30:38] hmm bet that's the ~ it isn't liking either [16:31:06] claime: same 400 that I get, so it is not my local env messed up [16:31:14] yep, it's the ~ [16:31:30] commenting on CR to have a written reference [16:40:10] I currently can't get anything to build, so I'm clearing doing something wrong [16:41:08] requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.41/images/docker-registry.wikimedia.org/None:None/json [16:42:27] (I did python3 setup.py install --user in my docker-pkg checkout, then ran ~/.local/bin/docker-pkg build images/ --select 'wmf-debci/*' to try and build some known-to-work images) [16:44:50] I don't know why it's talking to localhost at all, config.yaml (untouched) has registry: docker-registry.wikimedia.org [16:44:58] 06serviceops, 10Thumbor, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): [XL] Upgrade Thumbor to bullseye - https://phabricator.wikimedia.org/T336881#9646477 (10MarkTraceur) [16:45:01] Emperor: claime and I left two comments, I think that the changelog entry messes up the build, plus there is a missing } in the dockerfile [16:45:07] but I've not tried building docker images on my laptop before, so maybe there are unrelated yaks to shave [16:45:23] nono same on my laptop [16:46:10] 06serviceops, 10Thumbor, 13Patch-For-Review, 10Structured-Data-Backlog (Current Work): [XL] Upgrade Thumbor to bullseye - https://phabricator.wikimedia.org/T336881#9646483 (10MarkTraceur) n.b. the Structured Content team has estimated this as XL for our purposes (code review only), and from our perspective... [16:46:35] I was able to start a build, but it fails when running the RUN command [16:47:02] to check the errors you'll have docker-pkg-build.log where you run docker-pkg [16:48:09] ah yeah it actually doesn´t build, I read the log wrong [16:48:17] but it clears the 400 :p [16:48:30] yes definitely! [16:51:51] well, it's currently saying "== Step 1: building images ==" so hopefully something is happening... [16:52:09] exactly yes, if you tail the build.log you should see some progress [16:52:20] the 400 error was, um, not very helpful at saying what the problem was [16:54:15] it is probably worth to create a task explaining the issue, the fix shouldn't be difficult [16:54:57] * Emperor adds that to their yak-heap [16:55:53] The 400 error was a typical parsing issue making its way down the stack