[00:05:55] not sure who to poke with T324525, basically WikibaseLexeme master branch is currently failing CI which blocks WikibaseCirrusSearch [00:05:55] T324525: The module 'wikibase.lexeme.lexemeview' must not have target 'mobile' because its dependency 'jquery.ui.languagesuggester' does not have it - https://phabricator.wikimedia.org/T324525 [10:00:05] pfischer: a gerrit tip, when amending https://gerrit.wikimedia.org/r/c/search/cirrus-streaming-updater/+/864733 you "broke" the patch chain, to avoid this it's generally better to always be on the top of the patch chain and use an interactive rebase: git rebase -i @~2 (and mark "edit" to the patch you want to change), update the code, git add changed files, git rebase --continue and then [10:00:07] "git review" [10:00:39] that way the chain of patches remain connected [10:01:27] a broken patch chain is visible when gerrit says things like "Indirect ancestor or Not current" [10:09:08] Some doc here: https://www.mediawiki.org/wiki/User:Aude/Git#Amending (we shoult copy some of this content to https://www.mediawiki.org/wiki/Gerrit/Tutorial I guess) [10:10:16] Yes, I noticed that. Thank you. I was about to work on the dependent CR so I didn’t want to push an (empty) rebase-only CR. But I can do that. [10:12:17] Talking of the dependent CR Hi! dcausse: https://gerrit.wikimedia.org/r/c/search/cirrus-streaming-updater/+/864788/comments/cdc9d7e5_f58bab2f - How do we want to deploy the application in the end? Does the flink-application-deployment run as a k8s job? I’m trying to figure out what would be the most comfortable/easy to maintain way of passing config to the application at deployment time. [10:13:05] Ignore the „Hi!“ [10:13:19] :) [10:14:23] I think a single property file for the whole application is the easiest for deployment [10:14:53] in yarn when submitting the job the property file will be read while constucting the job graph on the stat machine [10:15:26] in k8s the property file will be provided as a configmap accessible in the container [10:16:25] having a mix of command line args and several property files is possible but this seems to be a bit tedious to manage [10:19:20] Alright. I’ll modify it so we expect only a file. [10:20:03] this is what we have for the WDQS updater: https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/rdf/+/refs/heads/master/streaming-updater-producer/src/main/scala/org/wikidata/query/rdf/updater/config/BaseConfig.scala#77 [10:20:24] basically it allows both command line only args or a single property file [10:53:04] lunch [13:26:56] > having a mix of command line args and several property files is possible but this seems to be a bit tedious to manage [13:28:13] it might be, but being able to overrdie things via CLI might be useful for development and troubleshooting. maybe not in k8s? for our yarn jobs, we made/adpated a ConfigHelper class hat works with properties files, but also allows opts and overrides from the cli: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-tools/src/main/scala/org/wikimedia/analytics/refinery/tools/config/ConfigHelper.scala [13:28:28] I guess Flink has its own app options parser, but i forget how it works :) [13:29:32] example usage: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-tools/src/main/scala/org/wikimedia/analytics/refinery/tools/config/ConfigHelper.scala [13:29:36] oops [13:29:39] https://github.com/wikimedia/analytics-refinery-source/blob/592c163c591f135450d223ba5c431f3d29c774a7/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/Refine.scala#L28 [13:31:21] flink has ParameterTools that can accept a Map so I guess we could have that, something that would take in a file and argv and merge all this [13:33:21] the "having a mix of command line args and several property files is possible but this seems to be a bit tedious to manage" comment was regarding something like --kafka-consumer-options consumer_option.properties --kafka-producer-options producer_options.properties [13:33:39] that'd be tedious to manage I think [13:34:54] ah yes that would be tedious to manage [13:35:00] oh 'several' properties files, yeah [13:37:17] but something like run job.jar ./config.properties --option1 "overridden_value" sounds usefull indeed [13:58:49] o/ [14:05:14] dcausse, pfischer: maybe a stupid question, but why do we materialize the Elasticsearch update documents in a Kafka queue? Why don't we call CirrusSearch each time instead? [14:06:11] gehel: to be able to replay the updates and write to multiple clusters independantly [14:06:23] I've been wondering about that for a while (the usual notification vs messaging) [14:07:14] replaying updates requires to have a stream of notifications, not necessarily a stream of actual documents, no? We could still recreate the update doc each time (by calling Cirrus at this point). [14:07:39] That would raise the compute cost, but lower the storage cost (and complexity) [14:08:48] we might want to replay a single wiki not the whole edit stream [14:09:41] how is that different if we have a stream of update docs? We still have a single stream? Or do we have a different stream for each wiki? [14:09:58] So that's still a filtering step for the ingestion job? [14:10:04] we will filter the updates per wiki [14:10:42] do you mind jumping into a meet? (cc pfischer) meet.google.com/mcy-soqj-cxn [14:10:56] I have a meeting in 5 min [14:11:22] maybe later then. Or next week. No emergency there. [15:00:40] gehel: I'm around [15:03:49] interested in ^ too :) [15:37:41] my time to be busy. I'll schedule some time for next week [16:11:37] still WIP, but I added some stuff that will hopefully help us fix WDQS more quickly if we get a naughty user agent again: https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Identifying_the_user_agent [16:12:02] thanks! [16:12:28] Thanks! That's super helpful! [16:13:55] np, will probably want to talk to you (David that is) about the logstash method a bit more at our pairing this Thurs if that is OK (will be out most of the rest of the day w/training) [16:16:08] inflatador: I'll be out thursday but happy to talk about that when you want next week [16:18:02] dcausse great, will hit you up then [16:18:15] working out, then puppet training starting at the top of the hour so might not be available much today [16:28:42] \o [16:33:30] o/ [16:34:26] ebernhardson: any objections to drop https://gerrit.wikimedia.org/r/c/operations/puppet/+/865072? quickly checked airflow and I believe this handled by it now [16:35:29] dcausse: looking [16:36:18] (it's failing apparently) [16:36:23] dcausse: yea, lgtm [16:36:26] thanks! [17:34:37] meh, i should have looked closer at latest elastic-ltr. I had looked over the closed issues but not actual code, they now add 0.0f to everything when ctx != null. I suppose will just build a custom version with that patch for 7.10 [17:35:01] should we make a repo in somewhere (gitlab?) [17:35:31] ebernhardson: I think we have a 7.10 branch (Emmanuel created it IIRC) [17:37:20] dcausse: hmm, where? [17:37:40] hm... can't find it but I'm pretty sure Emmanuel pushed a patch there... [17:39:29] not seeing in the gerrit list of patches, there were only 3 pages, not seeing any gitlab patches for him either. hmm [17:40:01] looking at the 7.10.2 release [17:41:03] we seem to download the ltr plugin from o19s github releases [17:43:12] Release v1.5.4-es7.10.2 Mar 9, 2021 [17:44:04] so my memory is failing :( [17:44:10] https://github.com/o19s/elasticsearch-learning-to-rank/commit/a6eee49d9fd4b20befdd6d36ee1cbd1fcf84740d [17:45:07] I think they would not mind making a fix on their repo on an es_7_10 branch and making a release [17:45:51] hmm, i suppose worth asking. I'm pretty sure i've seen other bits where they didn't want to re-release old versions of the plugin, but maybe they can make an exception :) [17:47:39] if it's a burden your personal github fork is totally acceptable [17:49:02] well, i'll ask in the slack and prep something in my github for now [18:01:33] ebernhardson: meeting? [18:02:39] omw [19:06:22] sheesh, now i realize why i didn't notice the changes. my local copy of ltr has a master branch, last updated 2018. The branch was renamed to main [19:32:43] ebernhardson: 5’ late to pairing [19:33:31] kk [21:33:41] * ebernhardson forgot how tedious it was to get all the right versions of jvm inplace for an elasticsearch compile [21:33:44] they want all the jvm's :P [22:08:17] meh, github doesn't like my 2FA for some reason :S [22:11:27] ahh, seems the clock on my phone wasn't accurate enough