[06:42:56] What do you think about my idea with levenshtein distance instead of language style? [06:42:57] It would mean that we trust the Wikipedia/Wikidata community to set the best label for concepts and we stand on their shoulders. [06:42:58] We could make slang NLG functions using your function instead 🤪 [06:43:00] -> best slang lexeme for Wikidata item (re @dvd_ccc27919: I already tried to implement Z27327, based mainly by looking at properties like P6191, but it certainly can be improved) [07:32:21] I think the best approach is a combined one (re @Npriskorn: What do you think about my idea with levenshtein distance instead of language style? [07:32:22] It would mean that we trust the Wikipedia/W...) [10:29:13] let's agree on an algorithm then [10:29:13] ) fetch lexeme references via a property [10:29:15] if any: [10:29:16] ) filter according to language style removing slang, etc. [10:29:18] if any left [10:29:19] ) choose the lexeme reference with the shortest distance between the singular form and the label, throw error if no label, or no singular form [10:29:21] -> return lexeme reference (re @dvd_ccc27919: I think the best approach is a combined one) [10:31:00] besides slang, what do we want to exclude? [10:31:01] vulgarity? [10:31:03] offensive? [10:36:36] the following helpers are needed: [10:36:37] * typed pair levenshtein function: lexeme reference, levenshtein distance natural number (WIP) [10:36:39] * filter list of lexeme references based on language style [10:36:40] * enum for excluded language style item references [10:40:42] It's already implemented in Z27336 (re @Npriskorn: besides slang, what do we want to exclude? [10:40:43] vulgar and offensive?) [10:42:23] `const styleRank=[ [10:42:24] "Q184439", //profanity [10:42:25] "Q1521634", //vulgarism [10:42:27] "Q545779", //pejorative [10:42:28] "Q83464", //euphemism [10:42:30] "Q58233068", //humorous [10:42:31] "Q58157328", //rare [10:42:33] "Q901711", //colloqual [10:42:34] "Q181970", //archaism [10:42:36] "Q57495609", //outdated [10:42:37] "Q110983878", //idiomatic [10:42:39] "Q130989", //neologism [10:42:40] ];` [10:42:42] This is my rank (from worse to best). What do you think? [10:42:43] oh, I see you are considering rank of statement also. [10:42:45] I would like to implement it all using compose. (re @dvd_ccc27919: It's already implemented in Z27336) [10:43:21] The problem with composition is that it's very slow. But you are free to add another implementation (re @Npriskorn: oh, I see you are considering rank of statement also. [10:43:21] I would like to implement it all using compose.) [10:44:20] Yeah, slow is not ideal. But I'll test the limits of the system then 🤪 (re @dvd_ccc27919: The problem with composition is that it's very slow. But you are free to add another implementation) [10:44:46] Compositions are so nice, I cannot resist using them 😍 [11:16:27] I find it useful to have one of each. (re @Npriskorn: Yeah, slow is not ideal. But I'll test the limits of the system then 🤪) [11:35:49] @dvd_ccc27919 dee https://www.wikifunctions.org/wiki/Talk:Z27327?uselang=en#Change_output_to_lexeme_reference? [12:57:16] @vrandecic I created this definite function which works for basically all Swedish nouns [20:17:37] I failed getting this to pass Z30537 [20:17:59] its the only test of a built in impl [20:35:52] so maybe the helper should rank instead of filter: [20:35:52] the following helpers are needed: [20:35:54] * typed pair levenshtein function: lexeme reference, levenshtein distance natural number (WIP) -> calculate levenshtein distance between label and list of lexeme references [20:35:55] * rank list of lexeme references based on language style -> typed list of lexeme references sorted by style-rank, best first. [20:35:57] * enum for excluded language style item references (re @dvd_ccc27919: const styleRank=[ [20:35:58] "Q184439", //profanity [20:36:00] "Q1521634", //vulgarism [20:36:01] "Q545779", //pejorative [20:36:03] "Q83464", //euphemism [20:36:04] "Q5...) [21:08:44] Am I allowed to create an enum? [21:08:45] I guess not, because using the Zeditor I cannot create an object with the enum type. [21:08:46] Interestingly I may have found a bug, because the API returns 400 but should probably return 403 instead. [21:08:48] ``` [21:08:49] https://www.wikifunctions.org/w/api.php [21:08:51] Status [21:08:52] 400 [21:08:54] VersionHTTP/2 [21:08:55] Transferred1.61 kB (956 B size) [21:08:57] Referrer Policyorigin-when-cross-origin [21:08:58] Request PriorityHighest [21:09:00] DNS ResolutionDNS over HTTPS [21:09:01] [21:09:03] age [21:09:04] 1 [21:09:06] cache-control [21:09:07] private, must-revalidate, max-age=0 [21:09:09] content-disposition [21:09:10] inline; filename=api-result.json [21:09:12] content-encoding [21:09:13] gzip [21:09:15] content-length [21:09:16] 466 [21:09:18] content-security-policy [21:09:19] default-src 'self'; script-src 'none'; object-src 'none' [21:09:21] content-type [21:09:22] application/json; charset=utf-8 [21:09:24] date [21:09:25] Thu, 11 Dec 2025 20:56:09 GMT [21:09:27] mediawiki-api-error [21:09:29] wikilambda-zerror [21:14:02] Creating enums is currently locked down to staff. You can make proposals at [[Wikifunctions:Type proposals#Lightweight Wikidata enumerations]]. (re @Npriskorn: Am I allowed to create an enum? [21:14:03] I guess not, because using the Zeditor I cannot create an object with the enum type. [21:14:04] Interesting...) [21:52:52] I added a new section on https://www.wikifunctions.org/wiki/Wikifunctions:Catalogue/Natural_language_operations [21:52:59] https://tools-static.wmflabs.org/bridgebot/dbb929fb/file_75986.jpg [22:04:32] @Sannita do you know if there is an API to any of these: [22:04:33] * for a function if any connected test fail? [22:04:34] * nudge a function to rerun tests? [22:04:36] * check fail/pass status of any test [22:48:47] Thanks for mentioning. Weird; I have no idea why they are disconnected. I've just connected the passing tests, and I'm going to ask within the team why they were all disconnected. Also curious to know what motivates you to have them connected (given that this is a built-in function with only one connected implementation and, as far as I know, no work happening [22:48:47] recently)? (re @N [22:48:48] priskorn: David the tests on Z811 are all disconnected and I'm not allowed to connect them 😅) [22:49:20] Maybe @jdforrester knows the answer to this (re @Npriskorn: @Sannita do you know if there is an API to any of these: [22:49:21] * for a function if any connected test fail? [22:49:22] * nudge a function to reru...) [22:50:16] Can we rephrase to not have double negatives? (Don't avoid lexeme references?) (re @Npriskorn: ) [23:01:39] @jdforrester may not be able to reply for some time, due to holiday travel. But no, it's not possible to do any of those things via an API. (re @Sannita: Maybe @jdforrester knows the answer to this) [23:12:47] Thanks David :) (re @David: @jdforrester may not be able to reply for some time, due to holiday travel. But no, it's not possible to do any of those things...) [23:17:32] No, I'm afraid not, not at this time. Getting blocked from using Wikidata updates is a concern the team has discussed, quite a bit. Purging or "busting" the cache has been mentioned recently as possible work for the coming quarter, but I'm not sure if it will be prioritized at this time. I will try to push for it. Thanks for mentioning. Wikidata entities in the [23:17:32] orchestra [23:17:33] tor cache timeout after 24 hours. [23:17:34] I'm not aware of a task for purge/bust, but there is a ticket for adding change notifications, which has been investigated informally and is nontrivial: T401086. Purge/bust, I believe, would be a much smaller task. (re @Npriskorn: David Al do we currently have a way to purge the cache of a specific Wikidata lexeme/item in the backend? If not is there a task...)