[06:05:28] dcausse: bonjour, if you want we can merge your change for the elastic alerts? [06:05:35] err s/?/:) [06:30:14] other thing - puppet on wdqs nodes is broken [06:30:15] Sep 15 06:23:42 wdqs1010 puppet-agent[5231]: Found 1 dependency cycle: [06:30:16] Sep 15 06:23:42 wdqs1010 puppet-agent[5231]: (File[/etc/wdqs/vars.yaml] => Scap::Target[wdqs/wdqs] => Group[deploy-service] => File[/etc/wdqs] => File[/etc/wdqs/vars.yaml]) [06:30:31] probably https://gerrit.wikimedia.org/r/c/operations/puppet/+/721099 [06:38:36] I'd revert it for the moment to find a better solution, wdyt? [06:42:26] elukey: o/ [06:42:47] yes that'd be great if we could merge the patch [06:43:07] o/ [06:43:16] I also created a revert for wdqs https://gerrit.wikimedia.org/r/c/operations/puppet/+/721073 [06:43:31] dcausse: merging yours first [06:43:35] thanks! [06:46:25] I'm fine reverting https://gerrit.wikimedia.org/r/c/operations/puppet/+/721073 but looks like it was meant to fix another issue [06:46:27] dcausse: done! The alerts should start to clear as soon as puppet runs (I verified on one node) [06:46:28] We're hoping this solves this error: [06:46:30] 22:32:15 deploy-local failed: [Errno 2] No such file or [06:46:32] directory: '/etc/wdqs/vars.yaml' [06:46:42] elukey: thanks! [06:46:45] yeah I saw the commit msg, but there is a puppet loop atm :( [06:47:41] there is probably some weird dep that needs to be analyzed first [06:48:29] elukey: is this problem affecting machines other than wdqs*? [06:48:40] dcausse: nope only wdqs nodes [06:48:56] but it seems an easy revert, it is only a puppet require [06:49:26] elukey: sure let's revert then [06:52:38] wondering if we could break the tie of deps though [07:01:07] dcausse: reverted as well, puppet unblocked [07:01:13] elukey: thanks! [07:08:36] zpapierski: we might have to delay the updater rollout by one week, alerts are not yet merged/tested, not sure about data-transfer/reload cookbook but seems unlikely it'll be merged by the end of the week, thoughts? [07:10:06] I might merge the scripts, but I doubt the whole process with cookbooks will be ready [07:10:24] I think it makes sense to give ourselves a week more [07:13:09] ok [07:13:27] did you have a conversion with Guillaume about spicerack/cookbooks? [07:16:57] Sorry, not yet, I barely got to work on the script yesterday, but I will have it done today and talk about spicerack/cookbooks [07:19:06] in fact there won't need to be a script per se, the code should be part of spicerack which is a lib that cookbooks use, all that to say that e.g. things like params handling is something already taken care of [07:40:56] I'm not sure I know what you mean - I don't need a python script I'm developing, or I needed somewhere else? [07:47:14] zpapierski: my undersntanding is that there's python module (or simply functions if the module exists) to write in spicerack but not a script called from the command-line [07:47:29] e.g. this is how we control the elastic cluster from the cookbooks https://github.com/wikimedia/operations-software-spicerack/blob/master/spicerack/elasticsearch_cluster.py [07:48:08] ah, I see [07:48:25] just to say that some boiler-plate code might not be needed [07:50:37] I'll need to modify the script a bit, but that's ok [07:50:48] in any case, I wanted to test that manually on stat anyway [07:51:43] sure [07:54:33] btw - should we assume the same partition setup cross-DC, or should I somehow manually retrieve those on other DC? [07:55:09] we should assume same [08:01:47] that's makes things easier [08:07:51] Errand [09:25:45] zpapierski: ping me if you want to chat about cookbooks vs spicerack [09:25:52] * gehel is still around for the next 30' [09:39:39] Folks we want to make the same change on relforge1003 as was done on relforge1003 yesterday evening. [09:39:52] Disabling the NICs hardware LLDP processing so the kernel gets LLDP messages. [09:40:20] T290984 for context :) [09:40:20] T290984: error while resolving custom fact "lldp_neighbors" on ms-be105[1-9], ms-be205[1-6] and relforge100[3-4] - https://phabricator.wikimedia.org/T290984 [09:40:23] Any issue with going ahead? Wasn't any impact yesterday that we could detect. [09:44:10] topranks: the cluster seems fine, haven't seen any alerts related to this cluster [09:44:29] you mean you want to go ahead with relforge1004? [09:44:31] topranks: and breaking relforge is not a big deal, so please go ahead! [09:45:04] ok will do! and thanks, shouldn't have any impact. [09:45:26] I'm out for lunch in 15' and dcausse probably as well, we might not be around to help if things go wrong (it's also not an issue to wait a few hours to fix issues if any) [09:46:48] actually I out now :) [09:46:51] lunch [09:47:11] Ok done. [09:47:44] Didn't drop a ping or report any change in link status. [09:47:45] Command seems to just "work". Go figure ;) [10:01:29] gehel: and now I'm here - let me know we can talk about spicerack/cookbooks, I should be ready by that time [10:07:21] zpapierski: I'm out, I'll ping you when back [10:07:59] cool [12:31:56] zpapierski: ping [12:32:47] I was just leaving to eat, but should be back in no more than 20min [12:32:59] ack [12:57:35] and I'm back, can we proceed? [12:57:42] (damn, more than 20min) [12:57:55] gehel: ^^ [12:58:15] meet.google.com/gxb-hpyh-jpx [13:43:50] zpapierski: https://gerrit.wikimedia.org/r/admin/repos/operations/software/spicerack [13:44:06] https://gerrit.wikimedia.org/r/admin/repos/operations/cookbooks [15:09:26] ryankemper: can you join the Wednesay meeting? We have some questions around streaming updater go live: https://meet.google.com/yau-mkip-tqg [15:16:03] gehel: yup getting back from appt now, 2 mins [15:17:58] I think I know the answer to this question, but if we could come up with a way to create "fresh" RDF dumps of wikidata in less than a a day (maybe a few hours) would that be nice? [15:27:37] And does flink use the mediawiki revision create events right now? [15:28:07] addshore: yes both the old and flink based updater are using revision create events [15:28:23] ack, so I guess they still have the issue potentially of T120242 ? [15:28:24] T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 [15:28:51] addshore: yes [15:29:22] it's not new tho, this issue is causing missing updates since Stas write the kafka poller [15:29:48] yarp! [15:29:57] and previously it was relying on RecentChange which had a whole different set of issues [15:30:05] you looking for T215001 but ya :) [15:30:05] T215001: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 [15:31:12] ottomata: you stalker! [15:31:59] :) [15:33:22] ottomata: https://phabricator.wikimedia.org/T291089 might interest you [15:33:37] I probably chatted with you about it before, but only just wrote it into a ticket [15:35:31] addshore: nice! we should coordinate on this I'm sure we might have tools useful for this (esp if you plan to do consumer stream -> fetch data out of wikibase API) [15:36:54] yeah, i didnt really consider just using recent changes until just now, which is what pushed me to write the ticket [15:37:01] as that works around the "reliable" problem [15:37:14] as long as that service in itself is reliable enough [15:38:17] but thats the biggest difference between the streaming updater and this idea right now I guess [15:38:35] which all comes down to those tickets about events missing, or reliable events etc. [15:47:58] dcausse: zpapierski: I just remembered that 11 Oct is a US holiday. If yall are still working, then starting the streaming updater data transfer that data still seems fine to me [17:45:45] addshore: so status on the event relaiability [17:45:53] i'm working on a draft DSO for the tech decision forum to address it [17:46:01] that will just be a problem statement [17:46:11] exciting [17:46:12] i hope to submit it early next quarter, hopefully october [17:46:20] after that, dunno what is suppsosed to happen [17:46:32] but, i can use your ticket as yet another reason why it is important :) [17:46:34] thanks to Rosta and Nathan for the presentations, and to everyone for the lively discussion (mainly on YT) [17:46:37] oops, sorry [17:47:05] the problem you wrote there is the same as almost every other data integration problem outside of MW right now [17:47:14] yup [17:47:32] and I think this wikdata case probably is one of the easier ones to think about and see the benefits of [17:47:37] yeah [17:47:50] at least, easier to see the benifits of for sure [17:47:53] the data is already structured [17:48:03] ad if we can get to reliable events, the rest is "trivial"ish, as long as there is space for it all [17:48:12] a lot of the other use cases are around parsing and structuring wikitext/html content [17:48:46] yeah, at least theoretically trivial aka possible [17:48:52] right now as you say its not really possible [17:49:20] related addshore: [17:49:24] shared-data platform idea [17:49:24] https://docs.google.com/document/d/15QqLTsKIrUCfhGPHIkl6OKeh2S1NZeEe4h0O9yTm7Fo/edit [17:49:37] *needs access* [17:51:11] try now [17:51:28] im in! [17:54:30] looks nice [17:57:09] 'big ideas' [18:32:59] addshore: I'm not convinced that RecentChanges is more reliable than the revision-create stream, using this stream did improve consistency of wdqs IIRC [18:33:20] can you change a directory that currently exists in puppet into a symlink without a bunch of evil hacks? I'm pondering how to rename /etc/wdqs into /etc/query_service and leave a symlink behind, but i realized puppet is just going to complain that /etc/wdqs already exists [18:34:07] (the reasoning behind the rename is that /etc/wdqs/vars.yaml is hardcoded into the wikidata/query/deploy repo scap config, and we can't really parameterize it, so instead use a single expected path) [18:38:02] ebernhardson: worst case just have someone cumin the correct state that you want (rm the directory and ln the symlink in) and then merge the puppet change to enforce it [18:39:03] legoktm: yea i suppose taking a manual approach makes the most sense for things that dont really fit [18:42:51] dcausse: yeah I guess RC is stilla secondary updater, really a stream of ids from the revision table would be the dream [20:25:03] ryankemper: i think based on talk this morning, that this should let puppet run (or at least continue past the current failure): https://gerrit.wikimedia.org/r/c/operations/puppet/+/721382 [20:25:31] ended up avoiding the rename, instead we just symlink the new location to the old dir, instead of moving the old dir to the new location. Some day when all paths go through the right way we could flip it [20:28:25] ebernhardson: got it, so that should avoid the need for any manual steps when deploying this right? [20:28:39] kicked off pcc again since the pcc for PS5 failed (I imagine PS6 fixes it), then will merge [20:34:30] ryankemper: i think so, because the only change to wdqs should be adding a new symlink that it doesn't reference [20:35:38] ebernhardson: No dependency cycle, still seeing that error we were seeing before though [20:35:41] https://www.irccloud.com/pastebin/VDcQUvUk/ [20:35:58] oh, we need to un-revert the deploy patch too. sec [20:36:08] ah [20:39:34] hmm, reviewing this though we may have one more problem of the same kind. It's also going to `ln -sf /var/log/wdqs/rules.log /srv/deployment/wdqs/wdqs/rules.log` (from scap/checks.yaml). [20:39:41] * ebernhardson wonders why [20:57:51] so https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/721314 should be the revert (and adjustments for /var/log) to deploy repo. That will need one more puppet patch to add the /var/log/query_service symlink: https://gerrit.wikimedia.org/r/c/operations/puppet/+/721394 [20:58:07] ryankemper: ^ [21:01:29] I wonder why pcc is taking so long on https://gerrit.wikimedia.org/r/c/operations/puppet/+/721394 [21:03:07] sometimes i double check that i spelled experimental right :) [21:04:23] stuck in the queue :S [21:04:29] per https://integration.wikimedia.org/zuul/ [21:04:45] running it locally rn [21:05:11] https://puppet-compiler.wmflabs.org/compiler1001/31095/ [21:05:39] yup, looks expected. wcqs does nothing because the log_dir is already set to that, and wdqs adds 1 new symlink [21:08:02] same error [21:08:28] hmm, same path? [21:08:32] I think we haven't added back a fix for /etc/wdqs/vars.yaml...unless one of the symlinks did that, I'm losing track of what changes we've made, one sec [21:08:38] meanwhile https://www.irccloud.com/pastebin/itDvaVAL/ [21:09:05] hmm, https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/721314/2/scap/config-files.yaml should have removed the /etc/wdqs/vars.yaml reference :S [21:09:08] ebernhardson: so I thikn the problem is we're linking /etc/$DEPLOY_NAME but scap is still ooking for `/etc/wdqs` [21:09:22] ah [21:09:23] right [21:09:27] ryankemper: oh, we proably have to pull to deployment host [21:09:41] uhh...is it possible htat we're not on the latest wikidata_query_deploy because it runs later in puppet? [21:09:49] ah yeah your guess is slightly different but along same lines [21:09:57] ryankemper: because it doesn't clone from gerrit, it clones from deployment.eqiad.wmnet [21:10:07] oh right [21:10:17] * ebernhardson should remember these things much sooner ... [21:10:19] i'll pull, sec [21:11:01] yea, so we never actually deployed that patch yesterday. The patch and revert are still pending to pull :) [21:11:31] ryankemper: pulled, we'll have to wait to `scap deploy` that out until after puppet has run to create symlinks across wdqs hosts [21:11:51] in theory (man i've said that too many times recently) this will get a step further now... [21:11:58] ebernhardson: ah I just did a git fetch / rebase / git fat pull as well :P we should be in the right state regardless [21:12:04] yea :) [21:12:11] running puppet again [21:14:27] same? :S [21:15:11] ebernhardson: yeah, lemme see what running the scap deploy on a wcqs host locally again says [21:15:33] ebernhardson: same, it's still looking for `/etc/wdqs/vars.yaml` [21:15:42] at the risk of jinxing us I think we're close, there's gotta be some dangling reference we're not aware of [21:15:51] actually let me check the local repo on the wcqs host to make sure it is on the right SHA too [21:15:57] what git hash is it giving? [21:16:02] yea [21:16:38] random guess, it already pulled and isn't pulling again, instead just attepmting to promote whats already cloned? [21:17:04] * ebernhardson now randomly realizes the log symlink might need an ordering dependency against Package['wdqs/wdqs'] as well [21:17:06] https://www.irccloud.com/pastebin/ioqzTDvH/ [21:17:12] ebernhardson: yeah still old commit [21:17:25] ryankemper: hmm, delete /srv/deployment/wdqs/wdqs across wcqs instances? [21:17:38] yeah i'm gonna pull latest on this host just to verify command wil lwork after [21:17:44] and then we'll do that to get it properly working [21:17:50] makes sense [21:20:03] Very confused, it only pulls up to this commit [21:20:05] https://www.irccloud.com/pastebin/AtZRE7I8/ [21:20:20] here's the remotes [21:20:22] https://www.irccloud.com/pastebin/o6yc8txq/ [21:20:46] I'll just go ahead and delete wdqs/wdqs across all the wcqs hosts and run puppet again :) [21:21:05] yea i glanced a little, but scap is doing some special sauce with their directory management, easier to let it start ove r:) [21:21:36] er `/srv/deployment/wdqs` not `/srv/deployment/wdqs/wdqs` (there is no such directory) [21:22:26] directories nuked, running puppet again [21:22:38] :S [21:23:41] ebernhardson: same result, on second thought because a git pull origin master only brought up to that commit I mentioned, I bet we are on that same commit now [21:23:42] checking... [21:23:55] yup, of course [21:24:01] okay I need to figure out what's actually happening in gitland here [21:24:37] hmm, is it pulling by tags perhaps instead of by master? We could `scap sync` the repo out to hosts, i suppose i was worried that wouldn't work right because the wcqs instances aren't setup, and the wdqs might not have run puppet to create the symlinks [21:24:47] err, `scap deploy` i mean [21:25:58] ebernhardson: of course, it's using submodule to set what it should be pulling up to. until we do an actual scap deploy we won't be pulling the latest [21:26:23] ryankemper: ok, in that case i suppose we have to make sure all the wdqs instances have run puppet to create the symlinks and run a deploy [21:26:25] the only weird thing is I literally did a `git pull origin master`...so I'd have thought that it would work when I did it manually (but fail when done via puppet) [21:27:46] ebernhardson: sounds reasonable...if we managed to actually break anything on wdqs then the canary will catch that [21:28:01] let me make sure there's no important dc switchover stuff happening real quick [21:28:45] ryankemper: also checking wcqs1001:/var/log/query_service, we do need the extra ordering requirement for that too, patch in a sec just re-reviewing puppet's docs on ordering [21:29:01] sounds good [21:29:22] skies are clear as far as the dc switchover / other deployments, so we can roll the deploy once we get that last puppet patch ironed out [21:37:10] ryankemper: ok i think https://gerrit.wikimedia.org/r/c/operations/puppet/+/721405 does it, i ended up changing from the syntax we were using to the `before => ...` syntax, which should be the same but generally seems to be prefered since it's actually part of a resource definition [21:37:28] pcc https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/31096/console [21:38:30] pcc seems to verify only diff for /etc/wdqs/vars.yaml is syntax, the resulting resource is same [21:41:08] ebernhardson: indeed, shipping it [21:44:41] Running puppet one last time just to make sure everything doesn't explode, then rolling the deploy [21:45:12] ryankemper: actually one more patch :) I just can't get things right .. a _ needs to be a /, sec [21:45:29] ok [21:45:58] * ebernhardson needs ways to test these things locally instead of just install and repeat :P [21:46:41] ryankemper: https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/721406 [21:49:03] ah last thing is I'll need to run the `deploy-prepare.sh` to turn this into an actual version number [21:50:18] ah but first I need to kick off jenkins of course [21:50:23] ok [21:50:57] ryankemper: hmm, do we only need that when the rdf service has a deploy? [21:51:08] it probably works w/o needing a new jenkins run if I feed it the latest successful version number, since I believe it basically takes the current state of the repo then fetches some stuff [21:51:18] ryankemper: mostly i'm looking at the line in deploy-prepare.sh that fetches from https://archiva.wikimedia.org/repository/releases/org/wikidata/query/rdf/service/${CURRENT_VERSION_NUMBER}/service-${CURRENT_VERSION_NUMBER}-dist.tar.gz⏎ [21:51:41] Hmm [21:51:51] i don't really know what that is though, only that its a .tar.gz :) [21:52:07] ebernhardson: the only hangup I have is that on the actual deployment host come deploy-time, we deploy like so: `scap deploy '0.3.84'` [21:52:20] ryankemper: the last part is only a comment [21:52:28] ryankemper: it's included in logs and such [21:52:41] Oh [21:52:44] Okay we're fine then [21:52:53] And yeah that makes sense since the deploy process should just take the latest commit on the deploy repo [21:53:19] Okay, will roll the deploy now with `0.3.85` [21:54:00] mostly i remember by having typed scap deploy way too many times :) For extra funsies its `scap deploy` for almost everything, but mediawiki gets special sauce and uses `scap sync`. [21:55:38] ebernhardson: scap sync-file or scap sync-world (depending of which kind of a deploy you're doing) [21:56:19] urbanecm: oh right, enough of us tried to sync when it was supposed to sync-file we turned sync into an error and use sync-world now [21:56:28] Ux improvements :) [21:56:38] yeah :) [21:57:00] haha classic [21:57:47] (deploy ongoing, canary was fine so it's rolling out to rest of fleet, then it will take a good chunk of time (30-45 mins) to restart `wdqs-categories` across the fleet [21:57:50] ) [21:58:22] i have a meeting at 3:30, should be 30min or less [21:59:32] just to say, i wont be available when thats done but back about 4 [22:03:45] sounds good [22:04:51] Hmm I tried `deploy-service@wcqs1001:/srv/deployment/wdqs/wdqs-cache/cache$ /usr/bin/scap deploy-local --repo wdqs/wdqs -D log_json:False` again and the wdqs repo is still on `8361ac9c87a2a9bcf9fa7b847244b357a948dafa` rather than the commit we want, weird [22:05:02] oh not weird, puppet probably hasn't ran yet [22:07:09] hmm, i'm not really sure how that part of scap works sadly :S I suppose my naive assumption was that it would clone whatever is master on the deploy host but clearly it's not that easy :) I wonder if some strace magic could find out what exactly scap is doing there [22:07:41] something like `strace -e trace=process scap deploy-local ...` [22:08:04] would verbosely tell all the exec calls it makes, which probably invoke git [22:09:29] Running the scap deploy-local keeps putting us back on the same commit, even if I do a `git pull origin master` which now does pull up to the latest commit (but gets reverted from that deploy-local) [22:09:57] Interestingly this output would make me guess it's using a submodule, but the `.gitmodules` file is empty in this directory [22:09:59] totally guessing, when `scap deploy ...` is almost done it marks some metadata or some such that tells things what to clone? not sure [22:10:00] https://www.irccloud.com/pastebin/5dXuEDUD/ [22:10:13] going to give the strace a try [22:10:37] ebernhardson: it's just weird cause the scap side of the actual wdqs deploy is done, so whatever scap is supposed to do there, it's already done that [22:10:37] it might complain that deploy-user doesn't have tracing abilities, which makes for annoying command lines :) [22:10:58] ryankemper: oh, the scap deploy part is done? Hmm i would fully expect it to be better then...hmm [22:12:04] ebernhardson: ^ https://www.irccloud.com/pastebin/89YDLPQa/ [22:12:12] oops that arrow is pointing the wrong way [22:12:16] not sure the strace tells us much [22:12:44] ebernhardson: so previously `git pull origin master` wasn't even changing the commit, now it is but it's just getting reverted by the scap deploy-local, so something did change but not what we wanted [22:13:03] ryankemper: add `-f` to the strace flags, it looks like scap forking to a new process that does things [22:13:12] -f tells it to follow and report created subprocesses [22:13:18] Oh neat I was gonna ask if there was a way to do that [22:14:19] ebernhardson: v https://www.irccloud.com/pastebin/ENW61BA5/ [22:15:35] hmm, so it does say something :) Now to decrypt [22:19:26] ebernhardson: so I think the actual problem we need to solve is figure out why scap is setting it to that one commit, and how to get it to the commit we want [22:19:48] indeed [22:19:57] i dont see where it's pulling that commit in the strace output though :S [22:19:59] I believe if it were setting it to what's currently the tip of master it would just work since it wouldn't be looking for `/etc/wdqs/vars.yaml` [22:20:14] so I'm looking at `scap/scap/deploy.py` [22:20:25] line 142 does `self.rev = self.config["git_rev"]` [22:22:57] bleh I wanted to add some print statements into `deploy-service@wcqs1001:/usr/lib/python2.7/dist-packages/scap/deploy.py` but there's a sudo password [22:23:14] oh wait i'm dumb I just need to become root [22:23:16] that seems set on line 729, self.config["git_rev"] = commit. The commit comes from a few lines earlier, commit = git.sha(location=self.context.root, rev=rev) [22:23:49] ebernhardson: wait so is that last line saying that it's just reading the current SHA the repo's already set to? [22:23:58] if so that means even earlier somewhere else is where the rev is getting changed [22:24:18] ryankemper: oddly, yes :S doesn't really make sense yet [22:24:47] there are remote overrides it looks like, i wonder if somehow a revision is set there [22:25:11] I don't know whether the chicken comes first or the egg, but there's `/srv/deployment/wdqs/wdqs-cache/revs/8361ac9c87a2a9bcf9fa7b847244b357a948dafa` [22:26:25] hmm, so the remote overrides come from http://deploy1002.eqiad.wmnet/wdqs/wdqs/.git/DEPLOY_HEAD [22:26:28] right that's from the python code rendering to there [22:26:31] but that has your latest sync, so should be fine :S [22:26:45] hmmm [22:27:28] ebernhardson: I don't quite get what that `DEPLOY_HEAD` is [22:27:55] ah nevermind [22:28:02] that file literally has `commit: 902529b4d54adef7d59f138607cdc08d23d25f7e` now [22:28:04] ryankemper: from the DeployLocal python impl [22:28:12] yea, that file looks to have what we expect [22:28:20] i don't get it :S [22:28:31] maybe we are missing some critical config somewhere that ties it together..but not sure what [22:28:49] for half a second I was thinking nuking the directories again and running scap would do it [22:29:00] couldn't hurt :) [22:29:16] but I already tried locally pulling and running the scap deploy right after, and that, starting from SHA 9025, ends up reverting to 836 [22:29:26] so I will give it a try but expect it to fail :P [22:29:46] could try asking in releng, i think twentyafterfour worked on this stuff before [22:29:55] but they would of course take time to get up to speed [22:30:13] (not this particular repo, i mean they worked on scap) [22:30:29] yeah that's a good idea [22:30:34] ebernhardson: I'll do that while you're in your meeting [22:30:50] will have to step out for a bit soon is, but not for another 20 mins or so [22:30:53] soonish* [22:32:31] Okay after nuking and running puppet, on a wcqs host I see `902529b4d54adef7d59f138607cdc08d23d25f7e` under `ls /srv/deployment/wdqs/wdqs-cache/revs` [22:32:48] The scap command still failed as part of the puppet, running it manually again now.. [22:33:20] it just worked! [22:33:26] https://www.irccloud.com/pastebin/d6bIw9R4/ [22:34:23] Must be a puppet order of operations thing (for why the puppet run still had that error but then running the command manually didn't), running puppet again and expecting to see it work [22:35:28] ebernhardson: it works! we don't have any errors whatsoever from the puppet run :) and all it took was like 7 patches, 3 reverts, a wdqs deploy, and the combined intelligence of at least 3 engineers :P [22:36:05] So...this did reveal that we are missing another ordering, where we need to tell it to setup that `/srv/deployment/wdqs...` directory before trying to actually do the `scap deploy-local` [22:36:45] The issue will only crop up in situations like this where a "future" `wikidata_query_deploy` commit works but not the current one, and just running puppet twice gets thru it, but if it's simple we should tell puppet about the ordering [22:50:23] spent a few minutes looking at it and I don't understand things well enough; I think that https://github.com/wikimedia/puppet/blob/280ac57fded9eb5028e36fffbe24049f357461ff/modules/scap/manifests/target.pp#L191-L200 should require whatever piece creates that clones `/srv/deployment/wdqs/wdqs-cache` in to be ran first [22:50:53] but I might be misunderstanding and that `package` block *is* what creates `/srv/deployment/wdqs/wdqs-cache`...that does seem kind of likely...but I don't feel like spelunking through the scap3 provider code more [22:51:09] meh anyway stuff's working, that would just be a nice to have but isn't blocking anything, so not worth me spending any more time on :P [22:59:50] \o/ [22:59:58] eventually worked :) [23:01:32] the readiness probe fails, since we haven't loaded any data. I wonder what we do about that ... we have to have a working readiness probe before LVS config can be deployed. Will have to ask the EU side if we load data, or create an empty-ish namespace for now thats enough to make it happy [23:57:11] Hello ebernhardson , Guest71 from yesterday this side. I have requested for your review at https://gerrit.wikimedia.org/r/721413 I assume is no emergency for the review though as I see this task as medium priority.