[08:29:48] gehel: the wcqs cookbook seems to have failed do you have access to the logs? [08:30:02] dcausse: looking [08:31:27] looks like it failed to stop the updater [08:31:43] https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/wdqs/data-reload.py#L151 [08:32:05] ah indeed... [08:32:06] yep, to the data load has not started [08:32:12] hm... [08:32:39] we could make the cookbook smarter, if the updater is already stopped this should not be an issue [08:32:52] the systemunit is not even there [08:33:16] ah yes, that's a bigger issue, the cookbook should probably not be robust to that [08:34:11] I wonder should we add a parameter for this - doesn't sound terribly useful in the future and adds to code complexity [08:35:07] I don't think we should, this is most probably the only time we'll have this issue [08:36:12] systemctl list-unit-files | grep -E ^wcqs-updater > /dev/null && echo systemctl stop wcqs-updater ? [08:36:29] or we enable the updater... [08:37:33] can't we just modify the script to disable updater changes now ? [08:37:43] (I know it's a hack) [08:38:12] or we finish this run manually [08:39:08] if we restart the cookbook (after fixing), we can skip the download, but I don't think we have an option to skip the munging [08:39:58] if the munging is done then we can prep a small set of command lines to continue the import [08:40:27] the data load is trivial [08:41:02] not sure about the kafka magic [08:41:20] there should be no kafka magic [08:41:30] there is, actually [08:41:48] but it's weird, I don't remember that too [08:43:02] that cookbook should be simplified, too many level of indirection [08:43:24] we SRE are simple people, all those function handles will confuse us :) [08:44:11] should be something around those lines: https://etherpad.wikimedia.org/p/import_wcqs [08:44:35] looks ok [08:44:51] that Kafka thing is bothering me, I'm checking it out [08:44:56] agreed [08:44:59] I can start on those [08:45:13] while zpapierski figures out the kafka thingy [08:45:30] gehel: wait the last touch is wrong [08:45:53] btw it seems to be missing on the commons reload in the cookbook [08:46:12] I'm not going to script, just send the commands one after the other, you have time [08:46:34] ok [08:47:20] huh, interesting [08:47:21] the cookbook is missing some stuff post-reload [08:48:01] obviously, I was the one to throw in that kafka stuff, since we discussed that we should set the kafka offset after being done, otherwise updater might fail [08:48:53] yes or it might reset to earliest, checking [08:48:55] but we didn't use the --kafka-timestamp parameter, which I don't what means - default value? would fail with, because it's none [08:48:56] ? [08:49:17] com.bigdata.rdf.sail.webapp.DatasetNotFoundException: namespace=wcqs [08:49:37] where do we create that namespace... [08:50:02] those were adapted from wdqs, maybe we don't [08:50:16] but there should be data already, probably namespace name is wrong [08:50:27] gehel: "wcq" perhaps [08:50:52] my bad :/ [08:51:00] yep [08:51:03] that's the one [08:51:34] data load in progress [08:52:12] anyway, we don't need to care about that kafka thing, we wanted to take care of that ourselves after initiating updater consumer on those [08:52:46] this timestamp param should be mandatory in case of reloads now [08:54:04] yeah, we only could get away from that because producer hasn't started yet [09:01:41] it seems that import_commons_ttl was completed [09:07:07] nice [09:07:18] I'm downloading the rev map [09:08:02] where are we deploying the jobs from? deploy hosts? [09:08:31] https://www.irccloud.com/pastebin/JF248v9X/ [09:08:46] I see that the fix worked, no need for sed/awk the output :) [09:09:15] cool [09:27:25] ok, it seems I can create a savepoint now, I'll do so [09:33:39] dcausse: do I need to bootstrap that state into any specific path on swift? [09:33:50] zpapierski: yes [09:34:08] lemme check [09:35:24] something like swift://rdf-streaming-updater-eqiad.thanos-swift/commons/savepoints/bootstrap_$DUMP_DATE [09:35:40] and then same for rdf-streaming-updater-codfw.thanos-swift [09:36:10] got it, starting it now [09:40:36] ok, submitted, let's see if it fails [09:56:58] huh, weird - I can't see that savestate on swift, even though the job finished without exceptions [09:57:36] there's some from a week ago, strangely [09:58:39] hm.. strange [09:58:47] let's see if I get the same thing on codfw [10:02:32] I probably messed up some parameters [10:02:52] wrong swift container perhaps? [10:03:14] I'm checking [10:03:24] should be rdf-streaming-updater-eqiad or rdf-streaming-updater-codfw [10:04:46] arghh [10:04:54] SAVEPOINT_DIR= swift://rdf-streaming-updater-codfw.thanos-swift/commons/savepoints/bootstrap_20220109 [10:05:01] you see what's wrong :) ? [10:05:35] hm the space? [10:05:39] yep [10:05:46] it's bash, after all [10:05:53] I wonder where it went then [10:06:36] no idea, but it definitely did something [10:06:40] nvm, I'm running again [10:17:59] and now it's a different story, savepoint is there [10:18:05] now codfw [10:47:22] ok, got both [10:49:49] dcausse: will the query from the bootstrap instructions work for commons? [10:49:52] https://www.irccloud.com/pastebin/qd8BeXqj/ [10:50:00] ah, wiki=commons? [10:50:14] yes and the date [10:50:22] yep [10:50:23] ok ,thx [10:53:40] lunch [10:54:30] WCQS data load progress: 84/724 [10:54:56] and I got the start date for consumers [11:01:37] zpapierski: can we continue [11:01:48] The build failed yesterday [11:03:11] ejoseph: sure, there's no rush with streaming updater, I'll get back to that later on [11:04:26] https://meet.google.com/tzp-bave-spz [11:13:39] errand+lunch [11:15:08] ejoseph: I've lost you, unfortunately [11:15:22] I need to consume, but I'll be back in few minures [11:15:25] s/minures/minutes [11:19:22] Sorry my network [11:35:55] no worries [11:35:57] I'm back [11:36:58] 5minutes pls [11:37:03] ok [12:05:17] ltr plugin is proving difficult to build - dcausse: did you have any issues in the past? I see that you build (or at least stored) the last version [12:56:10] apparently it requires java version precisely 12 to compile [13:00:59] lunch [13:06:49] WCQS data load progress: 144/724 [13:45:53] zpapierski: yes the elastic gradle build system requires specific java version for building [14:05:39] ryankemper: for when around, can you confirm (here or email) your availability for the ES training (https://www.elastic.co/training/elasticsearch-engineer/6342) [14:08:19] greetings! [14:08:26] o/ [14:08:41] gehel you'll have to show me how you calculated progress on that data load! [14:11:25] WCQS data load progress: 171/724 [14:11:35] I'll show you in 20' in our 1-on-1 [14:14:57] o/ [14:33:23] o/ [14:37:38] how do I upload to that people.wikimedia.org? [14:44:18] zpapierski: scp file people.eqiad.wmnet:~/public_html [14:44:25] thanks! [14:44:51] well create the public_html folder first :) [14:55:16] that's a sound advice [14:55:50] apparently I already had one [15:00:47] dcausse: last we need to check is analysis hebrew and it's shared from you dir - is this the plugin - https://github.com/synhershko/elasticsearch-analysis-hebrew? [15:02:52] zpapierski: yes, we need to check if there exists an official version or we still have to build it on our own [15:03:12] yeah, we've been following that process so far [15:03:27] (only needed for two, and one was ours, fortunately) [16:00:27] It's time for office hours [16:04:24] zpapierski dcausse office hours? [16:04:36] we are presenting on all tech today [16:04:42] gehel: confirming avail for https://www.elastic.co/training/elasticsearch-engineer/6342 [16:41:07] WCQS data load progress: ~227/724 [16:51:06] speed demon, that one is [17:08:44] ┗(-_- )┓┗(-_-)┛┏( -_-)┛ ┗(-_- )┓┗(-_-)┛┏( -_-)┛ [17:33:19] family emergency, back as soon as I can [17:40:46] zpapierski: i have managed to ruin my java setup [17:40:56] that happens :) [17:41:10] I'm working on fixing it [17:41:18] I'm guessing installing JDK 12 didn't work out well? [17:41:31] Yh [17:41:43] btw - in my experience, sdkman is a better tool than jenv [17:41:54] and it handles other sdks than javas as well [18:02:45] unmeeting anyone? [18:09:31] Got some bad news, taking rest of the day off. See you tomorrow! [21:41:58] * ebernhardson fights with cindy/mwv to turn display_errors off and keep it off [22:13:30] meh, also fun: grunt on cirrus-integ.eqiad.wmflabs (runs cindy) gets stuck at the end of a run, but doesn't exit. poked a bit but it's non-obvious, and while poking built a new env locally and it seems to work. Going to bring up a new instance instead of trying to fix old one [22:14:22] also, lunch [22:46:21] back