[08:00:44] am I ok to re-merge this change with the fix that broke the registry? I'll try on 1 depooled node first https://gerrit.wikimedia.org/r/c/operations/puppet/+/1226204 [08:30:01] dpogorzelski: o/ I am reviewing the change, I think it is ready to go. Quick note - https://wikitech.wikimedia.org/wiki/Deployments shows the scheduled deployment windows, and now there seem to be one (if you connect to #wikimedia-operations and type jouncebot: nowandnext you'll see a similar thing). If possible we should use the MediaWiki infra window (you can list your patch there) [08:30:11] so we won't impact anything in flight [08:32:17] dpogorzelski: https://puppet-compiler.wmflabs.org/output/1226204/5607/registry2005.codfw.wmnet/change.registry2005.codfw.wmnet.err seems to fail, the profile::docker_registry::ml_build_user_password pass is probably not present in labs private (better to cover this bit for others that run pcc in the future) [09:04:05] where is the labs private located? [09:10:07] i assume it's https://gerrit.wikimedia.org/r/admin/repos/labs/private,general [09:12:09] yep exactly [09:12:31] it is used by our cloud instances, and also to add "fake" private data for our pcc runs [09:20:31] https://gerrit.wikimedia.org/r/c/labs/private/+/1226768 [09:34:49] perfect, there is no CI so you can +2/V+2 and then puppet-merge as a regular puppet patch [09:35:08] after this 'check experimental' in the other code review should work [09:36:42] kk [11:07:27] i'm deploying the registry changes but just fyi there are tests that fail even without my changes: [11:07:27] ```https://docker-registry.wikimedia.org/v2/nonexistent/blobs/upload (/srv/deployment/httpbb-tests/docker-registry/test_docker-registry.yaml:28) [11:07:27] Status code: expected 401, got 403. [11:07:27] WWW-Authenticate header: expected 'Basic realm="docker-registry (push)"', was missing. [11:07:27] https://docker-registry.wikimedia.org/v2/nonexistent/blobs/upload (/srv/deployment/httpbb-tests/docker-registry/test_docker-registry.yaml:33) [11:07:27] Status code: expected 404, got 403. [11:07:27] https://docker-registry.wikimedia.org/v2/wikimedia/mediawiki-multiversion/manifests/sha256:bcb74d22d6fe40def32cbc39166889ac6114fcc2810e8a622a4c4983edeb89a7 (/srv/deployment/httpbb-tests/docker-registry/test_docker-registry.yaml:56) [11:07:28] X-Cache-Status header: expected to match /^(MISS|BYPASS|EXPIRED|STALE|UPDATING|REVALIDATED|HIT)$/, was missing. [11:07:28] https://docker-registry.wikimedia.org/v2/wikimedia/mediawiki-multiversion/blobs/sha256:b7213227cfef387a937968df3b41e0938b593ca4353044adcfd2828ec0167206 (/srv/deployment/httpbb-tests/docker-registry/test_docker-registry.yaml:69) [11:07:29] X-Cache-Status header: expected to match /^(MISS|BYPASS|EXPIRED|STALE|UPDATING|REVALIDATED|HIT)$/, was missing. [11:07:29] https://docker-registry.wikimedia.org/v2/restricted/nonexistent/blobs/upload (/srv/deployment/httpbb-tests/docker-registry/test_docker-registry.yaml:85) [11:07:30] Status code: expected 401, got 403. [11:07:54] https://www.irccloud.com/pastebin/ZHTipGyR/ [11:08:11] it's a 403 instead of 401 or 404 so not the end of the world [11:09:29] the changes look good on 1004 so i'll roll them out on other nodes too [11:11:19] tests related to ml changes are green btw [11:15:14] checked all 4 nodes and all seem to serve 200 responses under typical request urls [12:42:59] okok nice! [12:43:15] Do you mind to open a task about the above failures? So we can fix them asap [13:13:12] https://phabricator.wikimedia.org/T414576 [13:14:22] ok so i guess the next question is: how do i test pushing out images from ml-build if pushing testing images is discouraged due to difficulties with image deletion? [13:19:17] I think that we could use docker-pkg and try to push the vllm image [13:20:32] once the ml-build node is finished we can just run it and check how it goes [14:33:20] https://gerrit.wikimedia.org/r/c/operations/docker-images/production-images/+/1146891 will have to be merged then i can try [14:50:20] dpogorzelski: not in that state, since vllm is not under /ml [14:50:55] that is the only bit that ml-build should push in theory, to avoid overlapping with the build prod nodes [15:52:04] do we have to move just that vllm image under the /ml folder or the whole dependency chain ? [16:27:22] dpogorzelski: just the vllm image, the deps should be fine