[06:58:51] o/ [07:14:42] cormacparle: it should be CirrusSearchNamespaceWeights but tuning this might have undesired effects on other enpoints, perhaps a phab task would be ideal to discuss further and possibly test its impact? [08:06:08] o/ dcausse: welcome back! [08:06:27] thanks :) [10:13:50] lunch [12:13:55] dcausse: If you feel like wrapping your head around AirFlow DAGs already, here’s a PR for writing image recommendations as weighted tags via kafka: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1514 - while working on that, I was wondering how you got that to work (for RDF?) since I ended up with building a new version of the event utilitities that are properly shaded. (IIRC you ran [12:13:55] into a guava-related JAR-hell issue earlier this year). [12:14:52] pfischer: looking, indeed I remember having jar issues there, trying to remember what I've done [12:25:11] pfischer: the jar issue I got is that RateLimiter was pulled from the guava jar coming from hadoop/spark, I solved my problem with https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/1120205 [12:25:25] perhaps it had undesired consiquences on something else? [12:27:29] I ran spark with --packages org.wikimedia:eventutilities-spark:1.4.3 and it seemed to have work well IIRC [12:28:08] but perhaps that's not possible to use --packages with airflow? [12:29:24] for context the only time I used your kafka datasource is from a custom script ran manually from stat1009:/home/dcausse/articlecountry [12:29:59] for RDF I never used eventutilities-spark [12:30:29] (looking if the rdf-spark submodule is using eventutilities) [12:36:34] the wdqs rdf project relies on eventutilities, eventutilities-flink & eventutilities-shaded on version 1.3.7, eventutilities-shaded is used by the spark jobs (rdf-spark-tools) and non-shaded versions of eventutilities & eventutilities-shaded are used by the flink job [12:38:12] ah I see https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/1167297 [12:43:44] so I bet I never ended using the jar-with-dependencies eventutilities-spark artifact [12:43:46] dcausse: okay, if you just ran that as a custom script, that explains why the other classpath issues did not come up back then. Packages is not an option, IIUC, every jar has to be explicitly declared (which in turn requires a properly shaded uber-jar) [12:44:35] seems cleaner to not depend on -shaded between modules of the same project indeed [12:45:56] I wish hadoop/spark will provide better user class loader isolation at some point :/ [13:20:17] \o [13:20:25] o/ [14:01:11] o/ [14:29:18] fyi i kicked off full-cluster reindexes on all clusters, please don't run rolling restarts :) [14:40:16] d'oh.. Brian gave me the all-clear to do that last week and I plum forgot... so many side quests..... [14:40:49] no worries, hopefully it simply runs in the background for a few days and completes [16:45:36] dinner [21:33:48] hmm, weird...reindex failed on all three clusters. The reindex itself it still running in cluster tasks, but the reindex orchestrator failed. And it turns out if you return from a `finally:` that overrides exception propagation, so we didn't print the exception :S [21:49:38] was able to find them in the kubectl pods list, so it seems the container was still running but the subprocess somehow failed...but without an exception who knows. [22:01:52] * ebernhardson realizes after canceling it all that if mediawiki was still running...could have deleted the `reindex_complete` action from the end of state.json and let it check live indices after the mwscript had finally completed...oh well