[07:21:53] dcausse: I want to make sure I didn't misunderstand yesterday - it seems to me that resetting to time is actually used even now (at least in the patch), if you provide an ISO-8601 formatted date as initialOffset - when you start StreamingUpdate (via method parseInitialOffset) - am I correct? [07:39:21] zpapierski: yes that is true in the code but was never actually used [07:39:55] isn't this just a case of setting a specific option? I mean, code path isn't unreachable, right? [07:44:19] zpapierski: I think this option exists but I think the point is to make this controlled by the content of the offsets file instead of a command line options [07:44:47] I know, I was simply making sure I understand how it works [08:05:40] sure, tbh I don't know what I had in mind with this command line option looking it now it seems pretty unusable in our setup [08:17:22] because of the format? [08:19:04] no because we have to provide the command line options in puppet and adding something as specific as a start date in puppet sounds weird since it's "instance specific" [08:19:59] in other words, using this option manually sounds ok but it's when you look at how this has to be setup in puppet that it becomes weird [08:20:23] it's part of the state of the instance [08:22:23] yeah, make sense [08:25:23] something I have not thought about: other strategy would perhaps be to control offsets outside the app itself, e.g. use the cookbook to setup the consumer offsets in kafka [08:29:32] is it better than doing it through the app? [08:37:52] given the actions we have to make: 1/ data import, 2/ data transer intra DC, 3/ data transfer cross DC [08:38:31] I think offsets managed with a file we cover 2 & 3 with minimal adaption of the data-transfer cookbook [08:38:36] but 1 is not covered [08:40:23] after the data import the updater does not know from where to start (unless the topic is empty and it starts from the beginning) [08:41:17] is empty means do not contain data from the prior run of the flink pipeline [08:43:47] the cookbook approach would be: 1/ the data-import takes a param with the timestamp of when the flink pipeline started and would position the offset in kafka for the comsumer named after the machine hostname it's reloading [08:44:27] 2/ the cookbook would simply read the offset from kafka for the source host and set it for the target host [08:44:48] 3/ is similar but using timestamps instead of plain offsets [08:45:14] the updater java code would be told to fail if it has no offset pre-positined [08:47:59] couldn't cookbook set the offset file for the 1st case? [08:48:07] (or timestamp, in this case) [08:48:17] it could indeed [08:49:13] I just a little cautious about that "fail if no offset is set" part - we could have an offset set from some of the old runs, before data reload [08:49:44] obviously, file can have an incorrect content as well [08:49:50] so its probably the same [08:50:18] I probably only argue here, because I have a very little knowledge on how to adapt and operate cookbooks [08:50:25] with the file I think I worry about having two source for the offsets and making weird assumptions while with the cookbook approach the java code must trust offsets in kafka [08:51:15] can you replace all of the operations from OffsetRepo with kafka tooling? [08:51:29] as in, retrieve both offsets and timestamps? [08:51:53] There are metrics but I would not trust them [08:52:09] so we'd still need a java code for that part [08:52:15] You have to subscribe to the topic to obtain the truth [08:52:27] python can do that [08:52:33] in the same consumer group? [08:53:24] yes this is why I don't like this approach either, it feels weird to have an app impersonating another [08:54:06] what will happen if we basically sigkill Flink during commit? [08:54:50] from what I understand we might end up in a situation we have kafka commited, but file not reflecting that commit [08:55:06] I know it's not bad, because we don't really care about duplicated patches [08:55:10] you mean sigkill the updater-consumer? [08:55:14] yep [08:55:21] sorry, I don't know why I wrote flink [08:55:54] anyway, if we go with cookbook approach and python "impersonating" a consumer group to read offsets and timestamps [08:56:10] we wouldn;t actually need to write down offsets and timestamps every commit, right? [08:56:13] yes there are no guarantee that the file is in line with kafka maintained offsets [08:56:41] consumer will fail, but the latest commit will always be visible in Kafka, if I understand correctly [08:57:43] (irony of me working towards dropping some of the code I just wrote isn't lost on me) [08:58:56] did you started to the write the code that have to decide what to start from: offsets from file, timestamp from file or offsets from kafka? [08:59:19] I'm almost done with it, actually [08:59:25] oh sorry [08:59:26] not that part [08:59:32] only writing down the timestamps [08:59:57] I see [09:00:00] ad reading them, of course [09:00:14] tbh I'm not sold on any approach [09:00:38] I look at that this way - async process is a bother [09:01:18] it might affect performance and its additional I/O [09:01:48] if we can replace that with a simple script run when needed (instead of writing down a file we might never need) - I'm all for it [09:02:29] makes sense [09:03:34] ok, might've exaggerated the IO, since it's done on close, but that makes it worse if closed unexpectedly, [09:03:47] also - memory consumption is somewhat unknown [09:04:27] I need to go eat my meal, but we can discuss it further (e.g. on meet) if you want - I might not understand fully the benefits of file approach [09:04:38] sure [09:05:58] * dcausse now wonders if cumin can import the kafka python client [09:08:16] dcausse: you probably mean spicerack, but yes, it should be able to import anything [09:09:28] gehel: thanks! was looking at cumin and was a bit lost not finding all the elastic code mat wrote :) [09:19:19] with the python kafka client it's relatively straightforward: https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/rdf/+/refs/heads/master/streaming-updater-producer/scripts/set_initial_offsets.py [09:19:50] this script does only manage timestamps, hopefully managing plain offsets is even easier [09:20:29] let's discuss that this afternoon and list all the pros & cons [09:49:24] lunch [09:53:32] lunch 2 [10:20:45] gehel, dcausse: shout out when you're ready to talk about the way we approach all of this, I'll look into how python handles it in the mean time [10:33:11] ok, apparently it's very easy for retrieving offsets and timestamps as well [11:14:59] stat1004 doesn't seem to have kafka-python installed :( [11:39:51] meal + errand break [12:49:34] dcausse, gehel: are you available? [12:49:44] zpapierski: I'm around [12:49:59] In 2 minutes [12:50:17] https://meet.google.com/off-oywd-hhr [12:50:22] gehel: ok [16:46:28] ebernhardson: hi, we broke the gui build :D https://integration.wikimedia.org/ci/job/wikidata-query-gui-build/35/console [16:46:42] > 14:01:11 [2021-09-10T12:01:11.196Z] "GET /custom-config.json" Error (404): "Not found" [16:47:01] let me see if I can find a fix, otherwise I will bother you more [16:55:02] Amir1: :S [16:55:27] Amir1: looks to be missing a trailing \ based on the error message [16:56:34] I'm missing it [16:56:37] let me see [16:56:57] Amir1: https://gerrit.wikimedia.org/r/c/integration/config/+/720345 [16:57:13] oh that [16:57:19] okay, I can deploy it [16:57:54] i wonder if some sort of shell linter could check shell definitions in ci jobs [16:59:04] pasting that into https://www.shellcheck.net/ gives an error, so would have worked. hmm [17:00:16] that part of zuul could use a lot of love [17:00:21] possibly a big overhaul [17:00:38] s/zuul/integration config [17:01:01] indeed, i imagine there is at least consideration to gitlab ci stuff? [17:01:11] * ebernhardson knows nothing about it, other than it exists :) [17:03:10] I hope so [17:03:40] wohhooo https://integration.wikimedia.org/ci/job/wikidata-query-gui-build/36/console [17:03:54] excellent, only broken briefly :) [17:04:13] well https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/720349 [17:04:17] :((( [17:04:47] oh. of course. It deletes the .gitignore [17:05:29] should .gitreview be excluded as well? [17:07:33] it also adds the logos [17:07:45] I think so [17:07:46] yea, the logo's get added because the .gitignore was deleted, so it didn't know to not add them [17:08:01] ah, okay, that's not that bad [17:09:25] just testing find a sec, will have a patch [17:12:05] all good [17:12:13] Amir1: https://gerrit.wikimedia.org/r/c/integration/config/+/720351 should be appropriate then, keeps anything matching .git* [17:13:03] ebernhardson: do you want me to deploy it first so owe can be sure? [17:13:04] i also put the full paths, i don't really know why it was initially doing that but it seemed odd to me to have it two different ways. Basically just made it consistent [17:13:10] Amir1: sure [18:00:38] lol, just about as expected. Wrote something to extract all the shell commands from jjb output and run shellcheck over them. It hates everything :P [19:52:25] * ebernhardson wonders how worthwhile it is to make puppet work for wcqs-beta and wcqs-prod at same time...tempted to turn off puppet on beta and call it good enough [19:54:50] ebernhardson: where is beta hosted? [20:49:59] RhinosF1: https://wcqs-beta.wmflabs.org/ [20:58:13] ebernhardson: cloud vps admins will not like you [20:58:25] They like puppet on [20:58:38] it would only be temporary, i just can't delete the instance until prod is live