[06:36:45] we are down [06:37:05] marostegui: glad typing it worked, wasn't sure how much that would do [06:37:21] english wikipedia still works for me for what is worth [06:37:59] I can't get on from UK [06:38:11] Must depend on where you go through [06:40:41] marostegui, _joe_: recovered here [06:41:13] marostegui: you're going to drmrs though [06:41:22] volans: true [06:42:07] * RhinosF1 is esams [06:44:10] First user report was #wikipedia-en @ 07:34 from what I saw [09:46:42] Reedy: ping, we are having issues to log into horizon, might be related to https://github.com/wikimedia/puppet/commit/b45eae92f159c545e573f7944314310ac604f121 [12:18:10] moritzm: thanks for response on ticket. [12:18:31] i can make that work on nodes then, but what about in deployment pipeline blubber files? [12:19:13] my main reason for making this a .deb was so that I could use blubber to install conda in a docker image easily [12:19:46] ottomata: I think I know this one. I added a repo in one of the datahub pipelinelib files. One sec... [12:19:51] didn't realize I'd need extra apt config setup to install it. I'm not sure if blubber will let me do that before I specify apt packages to install [12:20:11] oh okay! [12:22:13] ottomata: --> https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/datahub/+/refs/heads/wmf/.pipeline/kafka-setup/blubber.yaml#20 [12:22:41] ! nice! [12:22:45] how'd youu find that out! reading source? [12:22:47] https://wikitech.wikimedia.org/wiki/Blubber/User_Guide#Common_Config doesn't have it [12:24:59] I think I remeber asking in #wikimedia-releng and then saying something like: "I'll definitely be adding that to the user guide page on Wikitech :+1" [12:29:24] :) [12:47:48] Now added to the page: https://wikitech.wikimedia.org/w/index.php?diff=1961098&oldid=1919134&title=Blubber/User_Guide [12:57:41] ty!!!! [13:46:29] _joe_: o/ i'm making a docker image for use in gitlab-ci [13:47:06] it is intended to be a base image from which we can use easily use conda environments to install dependencies, and generate 'conda dist' envs for running code in hadoop yarn [13:47:13] https://phabricator.wikimedia.org/T304450 [13:47:40] i had started using Deployment Pipeline tools to make this image. but hashar just informed me that maybe production-images would be a better place for this. [13:48:22] <_joe_> ottomata: are you going to run images based on this in production, or just in CI? [13:49:33] just in CI [13:49:37] i think.... [13:49:40] yeah [13:49:46] yeah just in CI [13:50:01] if it was prod we wouldn't need all this conda stuff...because we have docker :p [13:50:03] <_joe_> ok so what I think hashar wanted to say is, you need to use docker-pkg, and the right repo is integration/config [13:50:31] ah! dockerfiles there [13:50:39] <_joe_> so not production-images [13:50:41] got it [13:50:42] <_joe_> yeah [13:50:43] that makes sense [13:50:46] hashar: that okay? [13:50:55] <_joe_> it uses the same tool as production-images to build the images [13:52:55] ottomata: or you get Gitlab ci to install the conda package and you are set? [13:53:27] hashar: i'm trying to make that part easier [13:53:39] we're trying to automate a build process using conda [13:53:43] the output is a tgz file [13:53:51] it is probably easier than having to manually maintain an image with docker-pkg or using an empty repo that triggers Pipeline lib [13:53:52] but, there are several repos (and teams?) that will do this [13:54:26] so instead of saying to each one " make sure you paste this snipped correctly into your gitlab-ci", and then also tracking them down if we need to make changes or add packages [13:54:34] having one image for them to start with is much easier [13:54:49] the thing is [13:54:55] Pipelinelib is not meant for that [13:55:01] sure sure [13:55:10] and it will be dismantled eventually (though we have yet to port it to gitlab) [13:55:14] but... docker-pkg + integration/config ? [13:55:18] nop [13:55:21] no? [13:55:24] that one is legacy as well [13:55:30] :-\ [13:55:38] okay, but i'm sure you will still need lots of base docker images like that for CI in gitlab [13:56:17] what I don't know though is whether our Gitlab CI has support to build images [13:56:45] but I know gitlab CI has support to include a definition from another repo / a shared repo [13:57:00] so you could have some CI definition shared between multiple repositories [13:57:11] hm. [13:57:32] i could maybe make gitlab build a docker image, but i woudln't have a docker registry to put it in. [13:57:45] but interesting idea about just including a CI def [13:57:48] investigating... [13:58:06] but I don't think oour Gitlab let you build images [13:58:11] so back to step #1 :-\ [14:00:38] ah [14:00:39] https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-framework-api/-/blob/main/.gitlab-ci.yml [14:00:51] for the include statement to share a definition between repos [14:01:25] that one got created by arturo and relies on an image hosted on the toolforge docker registry [14:01:59] so I guess you can adjust https://gitlab.wikimedia.org/repos/data-engineering/conda_dist_testing/-/blob/a40b8247c7eab8df562321108e08d73ac5d894b4/.gitlab-ci.yml to use the thirdparty/conda repo [14:02:16] and `include` it in the repos that need conda? [14:02:37] this way you no more rely on any of the legacy stuff and can adjust the image by editing the shared .gitlab-ci.yml [14:09:56] looking [14:11:17] interesting [14:16:04] ottomata: yup I think it is worth investigate a shared .gitlab-ci.yml with a `before_script` which injects our apt.wikimedia.org component thirdparty/conda and install there [14:16:36] then that recipe can be `include` in the multiples repo that would need conda [14:20:08] i think you are right [14:20:13] thanks for the idea [14:20:17] am investigating [15:00:37] ottomata: see case 3 in T304845 [15:00:38] T304845: gitlab: consider enabling docker container registry - https://phabricator.wikimedia.org/T304845 [15:13:46] <_joe_> hashar: wait what do you mean "docker-pkg is legacy" [15:14:04] <_joe_> if you're planning to change things on how we build images [15:14:12] <_joe_> I'd like to be informed, and moritzm too [15:15:54] <_joe_> ottomata: integration/config as a place to build base images to use for CI is the right place IMHO. [15:16:24] <_joe_> I think repeating the same operation in a docker container 1000 times for 1000 CI runs instead than once when you build the image is a waste of time and resources [15:16:32] <_joe_> more importantly of time for every CI run [15:16:36] <_joe_> but YMMV [15:19:41] arturo: yup that is exactly what i want [15:20:05] _joe_: i will do whatever you and hashar recommend, but I think the gitlab-ci template solution will work for me [15:20:12] indeed less efficient since conda has to be installed every time [15:20:27] <_joe_> it makes zero sense not to create your own base image. [15:20:38] <_joe_> it takes what, 30 minutes to do? [15:21:00] if you make hashar also recommend that i will do it! :) [15:53:59] _joe_: integration/config releng images are legacy, not docker-pkg :) [15:54:24] <_joe_> hashar: so what repo are you using now to store docker-pkg definitions? [15:56:06] integration/config which is legacy [15:56:56] <_joe_> ok, so where do you take the images to run ci on gitlab from? [15:57:07] <_joe_> the base images, say one with tox in it [15:57:25] ottomata: cool, then perhaps I may suggest to write something in phab so others are aware as well [15:57:29] <_joe_> please don't tell me we rebuild the images at every CI run. [15:58:24] _joe_: see our experiment here: [15:58:25] https://gitlab.wikimedia.org/repos/cloud/cicd/gitlab-ci [15:58:33] https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-framework-api/-/blob/main/.gitlab-ci.yml [15:58:56] <_joe_> arturo: ok what i feared [15:59:12] <_joe_> how do we keep track of such images? [15:59:21] <_joe_> how do we ensure they're properly updated? [15:59:35] arturo: done thank you [15:59:40] <_joe_> it's ok for toolsforge ofc [15:59:47] <_joe_> it has a separate image management [16:00:31] <_joe_> but for the prod-related stuff, that's quite the step back in terms of image management [16:06:55] so um, maybe we need https://phabricator.wikimedia.org/T304845 for CI, and the ability to use docker-pkg to build and then gitlab CI to publish them somewhere? [16:12:48] <_joe_> ottomata: docker-pkg can publish to our main docker registry if needed, but yes all these things can be discussed [16:47:23] hmm.. puppet compiler claims "noop" https://puppet-compiler.wmflabs.org/pcc-worker1002/34616/ but clicking on puppetmaster1001 in that list...a whole bunch of changes: https://puppet-compiler.wmflabs.org/pcc-worker1002/34616/puppetmaster1001.eqiad.wmnet/index.html [16:47:40] maybe because it's on the master itself..not sure [16:47:56] but now I have trust issues :) [16:51:34] <_joe_> where does it claim "noop"? [16:52:43] <_joe_> ah I see [16:52:53] <_joe_> uh that looks like a threading kerfuffle [16:53:05] <_joe_> also you can now mix labs and prod hosts/ [16:53:09] <_joe_> I didn't know [16:53:31] <_joe_> mutante: you'll have to ping david or john about this, they've been working on that [16:53:36] that's something recent IIRC, but I didn't follow closely the latest on PCC [16:54:07] <_joe_> volans: if I had to bet, they're synchronizing data from various threads and there are some race conditions [16:54:09] feel free to 302's john ping to me, as he's out, but I don't have a ready answer [16:55:36] _joe_: volans: ah, thanks. I will do that [16:55:47] glad to know it should be new [16:56:15] <_joe_> yes, clearly there is some error in slotting runs in this case [16:56:31] <_joe_> in fact it says it has differences for two hosts that have none AFAICT [16:59:02] also you made me lookup https://en.wiktionary.org/wiki/kerfuffle :) [16:59:28] Similar to modern Welsh cythrwfl (“uproar, trouble, agitation”) Welsh :) [18:02:32] TIL [21:20:15] has anyone noticed LVM hanging during shutdown on their hosts? It's happening on our wdqs hosts https://phabricator.wikimedia.org/T274270 [21:27:26] no, but I see chatter of similar issues with blkdeactivate hanging on reboot / not umounting in 4.15 kernels. maybe another one still around in 4.19.x [21:28:47] interesting. ryankemper found this thread on systemd/lvm settings https://github.com/systemd/systemd/issues/11821 [21:29:01] eh, lvm2 when blkdeactivate tries to deactivate the device [21:33:53] -e|--errors Show errors reported from tools called by blkdeactivate. Without this option, any error messages from these external tools are suppressed and the blkdeactivate itself provides only a summary message to indicate the device was skipped. [21:34:02] ^ maybe you can get the -e in there somehow? [21:34:05] interesting [21:35:31] wonder what would happen if you ran the "blkdeactivate -u -e" manually, only on swap first, like it does swap first but then skips it in your output [21:35:32] yeah, we are going to bring out all the debug flags on this one [21:35:36] ;) [21:35:52] -r|--mdraidoptions mdraid_options Comma-separated list of MD RAID specific options: [21:35:57] wait Wait MD device's resync, recovery or reshape action [21:35:57] to complete before deactivation. [21:36:07] ^ make it not wait? [21:37:00] this seems like the "kill -9" variety: blkdeactivate -d force,retry [21:38:04] https://github.com/systemd/systemd/issues/11821#issuecomment-477545885 this suggestion looks promising too [21:38:58] ah, it does [21:40:14] manually stop/mask lvmetad and then cookbook again as before?