[10:27:49] Early lunch [10:46:36] lunch [12:04:20] hey! By eye we *think* this is wrong (or we screwed something up?) https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/REL1_39/README#L6 we think probably a 7.x version is required here. We think using version 6.8 is the cause of us getting error like: [12:04:22] `Indexing namespaces...Elastica\Exception\ResponseException from line 178 of /var/www/html/extensions/Elastica/vendor/ruflin/elastica/src/Transport/Http.php: Validation Failed: 1: type is missing;2: type is missing;3: type is missing;4: type is missing;5: type is missing;6: type is missing;7: type is missing;8: type is missing;9: type is missing;10: type is missing;11: type is missing;12: type is missing;13: type is missing;14: type is [12:04:22] missing;15: type is missing;16: type is missing;17: type is missing;` [12:04:55] does that sound likely? Do you think there was switch to a newer version of ES since 1.39 versions of wikis + extensions? [12:19:54] tarrow: yes with 1.39 you should use elastic 7.10.2, but we have a small compat layer that you can use to help the transition: https://www.mediawiki.org/wiki/Extension:CirrusSearch [12:20:15] README in the codebase is probably obsolete :/ [12:21:07] the compat layer can be enabled with https://www.mediawiki.org/wiki/Extension:CirrusSearch/ES6CompatTransportWrapper [12:21:47] dcausse: wonderful! I think we'll make good use of that! [12:21:57] this will fix the missing type problem but you should definitely upgrade to es7.10 [13:32:24] we also need to upgrade to 1.39 :/ [13:35:39] tarrow: I thought the error you pasted was because you upgraded to 1.39? [13:36:17] nah: part of use mucking about locally / testing stuff to imagine what life would be like when we do [13:36:30] oh I see [13:45:54] we're currently more "slowly and be nervous" than "fast and break things" in our style of movement ;) [14:09:43] o/ [15:29:16] \o [15:30:22] ryankemper / inflatador: I've just forwarded you the Incident Review meeting that's happening after our retro. I won't be able to be there (dinner time), but it might be interesting for you to be there. [15:30:38] pfischer: Spark 3 meeting: https://meet.google.com/crs-aqyq-ssd [15:32:25] dcausse are you at the spark 3 meeting? [15:32:43] inflatador: I don't think so, checking [15:33:03] dcausse NM, I have a pairing session w/you on my calendar that I apparently never sent to you ;( [15:33:20] We can skip it for now [15:33:30] nope, that meeting is just between Erik, Peter and myself (not that we want to exclude anyone, but no need to FoMO) [15:33:31] inflatador: indeed I don't see anything on my calendar [15:38:31] inflatador: if you have time next week I'd love to pair with you on T304914, we did it on codfw a couple month ago but we still need to do that for eqiad [15:38:32] T304914: Remove the presto client for swift from the flink image - https://phabricator.wikimedia.org/T304914 [15:39:23] dcausse just fixed the invite to make it recurring every 2 wks starting 02 Feb. We can do next week though too [15:39:33] inflatador: thanks! [15:39:51] ^^ would you all want to punt on that and instead (eventually) moe to flink-app native? [15:39:57] move* [15:40:47] ottomata: we're still running on flink 1.12 so we need to migrate first :/ [15:41:58] gehel I don't think I got the incident review invite? [15:42:11] and moving away from swift is part of this migration [15:47:01] inflatador: I can see it in your calendar. 5pm UTC today. I've forwarded that invite not long ago. [15:47:37] gehel OK, I see it now. Not enough caffeine this morning [15:49:24] inflatador: the pairing meeting you created is conflicting with the retro for the last half [15:50:08] dcausse oh yeah ;( Will move shortly [15:50:16] pfischer: meeting sent for 9am CET tomorrow (with joal) [15:56:18] gehel: thanks! [15:59:31] SparkUtils in the rdf-spark-tools does a matching against HiveString, a type that apparently no longer exists in any of the dependencies. Do we work with hive? For spark 3.0.2 there still is javadoc but I do not know what happened afterwards. Is it deprecated? Was it moved? [16:00:33] pfischer: that might be a good question to ask in #wikimedia-analytics [16:01:27] dcausse, ryankemper: retrospective in https://meet.google.com/eki-rafx-cxi [17:07:58] I jumped on the very tail end of the VP office hours and they were discussing this: https://www.perplexity.ai/ [17:08:00] It is a cross between a search engine and a large language model. It does not do well with insufficiently famous people, merging info on different people—with the now familiar ChatGPT trait of being very confidently incorrect. [17:09:28] It also suggests "Related" searches that it then doesn't give very good answers for. [17:23:46] suggested searches that are not great? where have i seen that before :) [17:36:10] the hard questions...what do we name the gitlab repo containing pyspark code [17:37:26] could be search-platform/pyspark, but that seems overly specific since it won't contain everything pyspark. It would mostly be the simpler projects that don't need anything complicated (mjolnir would be the obvious missing piece, but who knows maybe other things later) [17:39:57] workout , back. in ~40 [17:40:05] what would it contain? everything in airflow/spark ? [17:40:51] dcausse: yea mostly, we need to turn our spark code into a python module that is installable into a conda environment, then CI pipelines would install the module and declared dependencies and offer up an artifact that will be used at runtime [17:42:38] it would then be invoked something like this: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/platform_eng/dags/image_suggestions_dag.py#L148-154 [17:43:14] with the way they've setup airflow, all the dags live in a repo shared between teams and the code to run lives elsewhere [17:43:17] this lib will be meant to be always run from airflow? [17:43:35] "bellows" — it's all about the airflow and the spark [17:44:05] in theory there should be nothing airflow specific about the module, it should be installable and runnable by anything. airflow just happens to invoke it [17:46:04] you said you use "discolytics" locally perhaps something around python-discolytics ? [17:46:43] well... I don't have great ideas... sorry! :) [17:46:45] hmm, i had forgotten about that. search-platform/python-discolytics would probably be fine, with the python package named discolytics [17:46:57] bellows isn't terrible either, although it implies a tighter coupling with airflow than should exist [17:47:22] mostly i called it discolytics because wikimedia-discovery-analytics was too long :) [17:58:46] pfischer: re HiveStringType, is seems to have been removed here : https://github.com/apache/spark/commit/5cfbdddefe0753c3aff03f326b31c0ba8882b3a9 It looks like HiveStringType was split into VarcharType and CharType [17:59:51] ( i only gave the patch a cursory look, but that is turned up by `git log -SHiveStringType` as the most recent commit referencing it from the v3.1.2 tag of apache/spark) [18:26:29] dinner [18:29:16] back, but taking lunch. Will be back in time for SRE pairing [18:42:20] gehel: inflatador: got a bit of a headache, gonna go lie down. may or may not be around for pairing [19:01:43] back [19:27:50] hope ya feel better ryan-kemper! [19:30:27] ryankemper: good luck ! [20:27:55] ebernhardson, I have an amusing but unhelpful auggestion: "disc🪩lytics" [20:28:04] * Trey314159 is amused, at least [20:33:30] :)