[11:19:20] Hi. Do I need to use $wgUseCdn=True; if I use the TrustedXFF extension, as mentioned in https://www.mediawiki.org/wiki/Manual:Cloudflare#Configure_Cloudflare_IP_addresses_directly_on_MediaWiki [11:20:37] and also $wgCdnServersNoPurge []; with the list of cloudflare ips that I also put in trusted-proxies.txt (for the TrustedFXX extension)? [11:54:44] I tried like this: https://gist.tnonline.net/HU [12:07:23] Ciao [17:06:35] Hi all, got a question about CirrusSearch, I was able to install it along with Elasticeearch but it doesn't seem to be working well for Chinese, so I installed IK Analyzer (here https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-7.10.2.zip) , the analyzer itself seems working fine, bcs if I do this "curl -XGET [17:06:36] 'http://localhost:9200/_analyze?pretty' -H 'Content-Type: application/json' -d' {"analyzer": "ik_smart","text": "昨天去意大利旅游很开心"}'" in command line, I got expected results, however it doesn't seem to be working fine when i do the actual search on the wiki, it feels it is not using the IK analyzer, are there any config I need to [17:06:36] add to the LocalSettings.php to make it trigger the IK analyzer? Thank you for your help! [17:17:45] or switching to a custom analyzer is something that should be configured somewhere in Elasticsearch itself and has nothing to do with LocalSettings.php? [17:26:32] but wait.. if this is something needs to be done in Elasticsearch itself and has not been done, this should probably not work at the first place? "curl -XGET 'http://localhost:9200/_analyze?pretty' -H 'Content-Type: application/json' -d' {"analyzer": "ik_smart","text": "昨天去意大利旅游很开心"}'".. [17:34:25] Guest68: adding support for an analyzer in CirrusSearch requires some effort and not something you can simply change in the config file, you would have to hack https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/refs/heads/master/includes/Maintenance/AnalysisConfigBuilder.php for this [17:47:30] opps.. that is quite advanced change to me [17:53:19] but I will be glad to try, any idea where in the code I should make the change & roughly what changes need to make? dcausse [18:03:28] Guest68: for a quick&dirty hack I would: 1/ add a new entry in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/refs/heads/master/includes/Maintenance/AnalysisConfigBuilder.php#2132 with 'analysis-ik' => [ 'zh' => 'chinese' ] (assuming the plugin name is analysis-ik but check the plugin name with curl /_cat/plugins) [18:04:19] 2/ change the case at https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/refs/heads/master/includes/Maintenance/AnalysisConfigBuilder.php#986 [18:04:55] 3/ remove references to stconvert & smartcn* [18:06:47] 4/ change the withTokenizer to use the one from your plugin which is either 'ik_smart' or 'ik_max_word' (no clue what's the difference between those [18:07:21] 5/ iterate by reindexing & testing [18:08:26] (make sure also that your wiki is configured with $wgLanguageCode = 'zh') [18:14:08] a cleaner change might be, instead of overriding 'chinese', making the plugin points to 'zh' => 'chinese-ik' and add a new case after the 'chinese' like 986, if you get something working please feel free to send a contribution! :) [18:31:12] dcausse let me try, if I get it work will definitely contribute back! :) [18:34:05] just to confirm, it is this script to edit right? /extensions/CirrusSearch/includes/Maintenance/AnalysisConfigBuilder.php [18:40:14] Guest68: yes, line numbers I shared might not exactly match but the overall shape should not have changed much [18:43:46] Sounds good. [19:04:14] dcausse I made the change and put it on my wiki (test instance), can you take a quick look here? http://44.199.64.14/w/index.php?title=TestPage&diff=154859&oldid=154858 [19:04:44] and the plugin name is indeed analysis-ik [19:05:26] I changed "withTokenizer( 'smartcn_tokenizer' )->" to "withTokenizer( 'ik_smart' )->" [19:05:57] "withStop( [ ',' ], 'smartcn_stop' )->" to "withStop( [ ',' ], 'analysis_ik_stop' )->" (not sure if this is right?) [19:06:35] and changed "withFilters( [ 'smartcn_stop', 'lowercase' ] )->" to "withFilters( [ 'analysis_ik_stop', 'lowercase' ] )->" (also not sure if this is right) [19:06:50] Guest68: do you have the stconvert plugin installed? if not you might need to remove tsconvert and its references [19:07:57] otherwize this looks good, you can easily test by running UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now [19:08:12] should fail rapidly if you attempt to use something that's not installed [19:09:03] ok, stconvert is to convert simple chinese to traditional right? if so I will need that too, but dont think i have installed it. [19:09:24] bcs the content on the wiki is a mix of simple & traditional chinese. [19:12:02] Guest68: this is the list of plugins we installed while running elastic 7.10.2: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/elasticsearch/plugins/+/b39cf71d8c9d8c0c0a9326eedeabbc5003f4ee60/debian/plugin_urls.lst [19:12:06] stconvert is there [19:13:21] does that list mean I have to install all of them when setting up CirrusSearch? [19:13:26] have not seen it before [19:15:21] Guest68: no, the one I'd suggest are the language analyzers you need, analysis-icu, if you want regex support the "extra" & experimental-highlighter-elasticsearch-plugin are nice [19:16:08] nice.. i think I need the highlighter one too [19:16:45] some might need some tweaking in you wiki settings do be fully activated [19:17:39] got it, my primary goal is to get the IK analyzer working now, will look at it later. [19:17:56] and I just run UpdateSearchIndexConfig.php --reindexAndR [19:17:57] emoveOk --indexIdentifier now, here is the result: http://44.199.64.14/wiki/TestPage [19:19:38] seems like you change was not detected, I could see one reason, your wiki language is not set to "zh" perhaps? [19:20:16] ah, I have not uploaded the updated php to server yet, lol! give me a few seconds. [19:20:53] BTW this is already set: $wgLanguageCode = "zh"; [19:24:07] ok got this error "Unknown char_filter type  [stconvert] for [tsconvert]" [19:24:49] I guess that means i need to install the stconvert plugin? [19:30:39] Guest68: yes exactly, you can either drop that from your code or install it [19:30:55] ok let me install it now [19:32:00] if you install you can add it as a required plugin in the first map with 'analysis-ik,analysis-stconvert' => [ 'zh' => 'chinese-ik' ] [19:34:35] heading out, Guest68: I'll check your progress tomorrow but feel free to post questions at https://www.mediawiki.org/wiki/Extension_talk:CirrusSearch (irc might not be great for async communication) [19:35:04] will do, thank you! dcausse