[06:54:54] Hello hello! in ~1h we're going to start the eqiad routers upgrades. No impact expected, but as it's a high risk maintenance please refrain from doing significant changes, and let us know if something is acting up. Thanks! [06:56:02] Good luck! [09:50:00] claime: `diff -u <(tar -tzf <(git show HEAD^:artifacts/artifacts.buster.tar.gz)) <(tar -tzf artifacts/artifacts.buster.tar.gz)` [09:50:00] :D [09:50:14] brutal [09:58:23] <_joe_> add that to the makefile :P [10:10:00] you're keeping a tarball in git? [10:10:10] of course! [10:10:14] shh [10:10:17] :p [10:10:37] we could move it to LFS but might need a S3 backend to store all those tarballs :D [10:11:03] * Emperor weeps [10:11:16] * claime hands Emperor tissues [10:11:18] There there [10:12:53] claime and I had a very productive session this morning! \o/ [10:13:12] Yes! [10:13:40] Now I have documentation to write [10:14:59] Also, I should try and do the make build locally just to see how long it takes on the M1 [10:14:59] (and also I'm a bit cold and don't want to turn the heating on, so 2 birds 1 stone) [10:14:59] M1: MediaWiki Userpage - https://phabricator.wikimedia.org/M1 [10:20:17] <_joe_> Emperor: given the release rate, we chose practicality over "cleanliness" in the absence of an artifact repository of practical use [10:20:31] <_joe_> we might reconsider now that gitlab offers such facilities [10:24:34] what I am wondering is why the various wheels are packaged in an artifacts.tar.gz , I would rather store the .whl directly [10:25:11] <_joe_> hashar: because it's more practical for deployments tbh [10:25:31] <_joe_> but yeah it could easily be a directory full of binaries instead of a single one :) [10:25:48] would made the diff easier to reviews and, for pure python wheels the dupe between buster/bullseye would be the same blob object in git saving space [10:26:05] <_joe_> that is a fair point [10:26:30] <_joe_> it's also not that much space [10:26:45] <_joe_> for larger projects it might make more sense indeed [10:27:24] <_joe_> but also, I really want to get to the point where we upload these tarballs from CI to an artifact repo upon push of a tag [10:27:41] <_joe_> I set it up for golang binaries already [10:28:11] any hint as to how a tarball simplify deployment? I could not find a standi g out reason but I surely miss something [10:28:14] <_joe_> I will remark that it wouldn't work well with scap3 atm, we'd need to get creative [10:29:01] <_joe_> hashar: mostly that you untar a pack of wheels where you want instead of cp-ing or moving directories around [10:29:04] +1 on having CI to do all of that for us, that was Clément first questio :) [10:29:29] <_joe_> yeah so, this is an interesting side quest [10:29:40] <_joe_> how to move stuff we deploy with scap3 to gitlab [10:30:24] <_joe_> I would assume that when you push a well-formatted tag, CI would run the makefile, build the artifacts for the distro the ci job runs as [10:30:38] <_joe_> and upload them to as "packages" linked to a release on gitlab [10:30:48] (this assumes VM CI runners) [10:30:55] (or a change in the build process) [10:31:06] <_joe_> oh yes [10:31:08] <_joe_> ofc [10:31:09] oh like the releases artifacts on GitHub? [10:31:18] <_joe_> hashar: let me show you [10:31:38] <_joe_> https://gitlab.wikimedia.org/repos/sre/vopsbot/-/releases [10:31:49] <_joe_> there is a link to a binary under "other" [10:32:04] <_joe_> this is generated with https://gitlab.wikimedia.org/repos/sre/vopsbot/-/blob/main/.gitlab-ci.yml [10:32:30] so one can download the bknary directly. Nice [10:32:30] <_joe_> specifically https://gitlab.wikimedia.org/repos/sre/vopsbot/-/blob/main/.gitlab-ci.yml#L24-31 is how we upload the artifact [10:35:12] neat. So scap would have to retrieve the artifacts of a given version from that release page? [10:35:50] or we replace scap with whatever is gitlab system to deploy [10:35:58] Or puppet goes and gets it [10:36:06] There's multiple ways to skin a deployment [10:37:14] <_joe_> I already have some shameful code to do simple deployments of such things with puppet [10:37:27] and we do also have a cookbook fwiw :D [10:37:54] <_joe_> but yeah, I would like for us (sres, more or less) to converge on a single tool [10:37:57] was made to deploy python code to bullseye when scap couldn't run on py3 [10:38:15] indeed [10:38:26] <_joe_> and it needs to be declared via puppet [10:38:30] <_joe_> which we support for scap2 [10:38:35] <_joe_> err scap3 [10:38:47] <_joe_> so I would still err on the side of scap, mostly [10:39:02] <_joe_> or, you know [10:39:30] <_joe_> we just build debs on tagging, and automatically upload them to the repos, and we use debdeploy to deploy them all [10:39:44] <_joe_> that in many cases is probably the best solution [10:39:48] for anything that can be a deb sure [10:39:53] remember mortals dont have access to cookbook/puppet [10:39:58] the whole wheels thing was for projects that couldn't be debianized [10:40:10] <_joe_> volans: right [10:40:39] hashar: that's T244840... at some point will be done... don't have an ETA yet though [10:40:40] T244840: Evaluate options for non-root operations with cumin and spicerack cookbooks - https://phabricator.wikimedia.org/T244840 [10:40:41] <_joe_> so ok, for anything else, uhm, we might find a way to trick scap3 into DTRT with frozen wheels from gitlab [10:43:11] <_joe_> now first thing to explore would be how to build wheels for various distros and upload them to https://docs.gitlab.com/ee/user/packages/pypi_repository/ [10:43:55] hashar: told you there was PyPi in gitlab :p [10:46:09] or we could pip install on the target hosts [10:46:23] anyway lunch time be back in an hour or so [10:48:34] All done for eqiad's routers upgrade, I'll do cr2-eqord shortly, but that's much less critical [10:48:54] XioNoX: good! [10:49:05] So we can proceed with other's risky operations? XD [10:49:14] marostegui: yep :) [10:49:19] \o/ [13:18:01] godog puppet is failing to run on codfw prometheus hosts after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/835691 . Will revert soon unless you would like to take a look before [13:19:40] inflatador: ack! I'll take a quick look, thanks for the heads up [13:19:50] godog NM, looks like it went thru on the 2nd pass [13:21:27] inflatador: ok! I'm looking on prometheus2005 though I'm not seeing puppet itself having failed, how did the failure manifest ? [13:22:02] godog just a failure running puppet agent on all prom hosts via cumin. All others completed on first pass, codfw failed tho [13:22:45] inflatador: odd but all good it seems, even better [13:23:26] inflatador: on which host it failed? [13:23:28] it worked on the 2nd pass for codfw. Unfortunately it doesn't look like the patch did what we want, let me paste the cmd I'm running to verify, it could be wrong [13:24:13] there are no failures on puppetboard... are you sure was a failure or just a return code that's not 0? how did you run puppet? [13:24:47] `sudo cumin "P{prometheus*.wmnet}" "run-puppet-agent"` [13:25:40] run-puppet-agent returns the return code of puppet agent [13:26:32] no worries, I'm more concerned about the patch possibly not doing anything [13:26:45] godog here is the cmd I'm running along with my assumption: https://phabricator.wikimedia.org/P35177 [13:27:26] pinging jayme as well in case you can provide further context [13:27:59] 👀 [13:27:59] [protip] you can use A:prometheus to target all prometheus::ops hosts [13:29:16] inflatador: thank you for calling out your assumption too! the way the patch works is to stop prometheus from collecting those metrics, from envoy's perspecting nothing changes (i.e. all metrics continue to be incremented/exported) [13:29:26] inflatador: the metrics won't dissapear from the endpoint you curl. Your patsch will just configure prometheus to not store them [13:30:04] * jayme fades back into the hedgerow [13:30:20] Awesome, thanks for clearing that up. So what is the best way to check if it worked? Grafana? [13:30:49] yeah. check one of the metrics we dug out yesterday to see if they still increment [13:31:22] indeed, grafana and/or thanos.w.o [13:37:48] OK, looks to me like it worked. I pasted the query I'm using at https://phabricator.wikimedia.org/P35177#145844 if anyone wants to double-check [13:40:58] yeah, that looks about right [13:41:34] {◕ ◡ ◕} [13:42:06] onto the next, ticket then! Thanks y'all