[02:27:39] Have you tried Tabernacle? Depending on your appetite for SPARQL, such a grouping is certainly possible. I'd be happy to help further if SPARQL is not your thing. But here is an example of all Properties missing Hebrew labels sorted according to their P31 (so Wikidata property to identify libraries shows up first). I picked other languages to show as "en/de/ar", but this is also [02:27:39] customisable. Finally Tabernacle doesn't require you to open and close the pages. It may not be as good as the others you mention, but I think you'd find it a big step up from what you tried before. https://tabernacle.toolforge.org/?#/tab/sparql/SELECT%20%3Fproperty%20%3FpropertyType%20%3FpropertyLabel_en%20%3FpropertyLabel_he%20%3Fgroup%20WHERE%20%7B%0A%20%20%20%20%3Fproperty%2 [02:27:41] 0wikibase%3ApropertyType%20%3FpropertyType%20.%0A%20%20%20%20%3Fproperty%20rdfs%3Alabel%20%3FpropertyLabel_en%20.%20%0A%20%20%20%20FILTER(lang(%3FpropertyLabel_en)%3D'en')%0A%20%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%3Fproperty%20rdfs%3Alabel%20%3FpropertyLabel_he%20.%20%0A%20%20%20%20%20%20FILTER(lang(%3FpropertyLabel_he)%3D'he')%0A%20%20%20%20%7D%0A%20%20%20%20FILTER(!BOUN [02:27:42] D(%3FpropertyLabel_he))%0A%20%20%20%20%3Fproperty%20wdt%3AP31%20%3Fgroup%20.%0A%7D%0AORDER%20BY%20ASC(%3Fgroup)%0ALIMIT%2010000%0A/Lhe%3BDhe%3BLen%2Cde%2Car%3BDen%2Cde%2Car%3BP31 (re @amire80: If there was grouping by topic, I could ask a friend who knows Hebrew and who knows this topic well to complete a translation, b...) [02:28:43] https://tabernacle.toolforge.org/?#/tab/sparql/SELECT%20%3Fproperty%20%3FpropertyType%20%3FpropertyLabel_en%20%3FpropertyLabel_he%20%3Fgroup%20WHERE%20%7B%0A%20%20%20%20%3Fproperty%20wikibase%3ApropertyType%20%3FpropertyType%20.%0A%20%20%20%20%3Fproperty%20rdfs%3Alabel%20%3FpropertyLabel_en%20.%20%0A%20%20%20%20FILTER(lang(%3FpropertyLabel_en)%3D'en')%0A%20%20%20%20OPTIONAL%20%7B [02:28:44] %0A%20%20%20%20%20%20%3Fproperty%20rdfs%3Alabel%20%3FpropertyLabel_he%20.%20%0A%20%20%20%20%20%20FILTER(lang(%3FpropertyLabel_he)%3D'he')%0A%20%20%20%20%7D%0A%20%20%20%20FILTER(!BOUND(%3FpropertyLabel_he))%0A%20%20%20%20%3Fproperty%20wdt%3AP31%20%3Fgroup%20.%0A%7D%0AORDER%20BY%20ASC(%3Fgroup)%0ALIMIT%2010000%0A/Lhe%3BDhe%3BLen%2Cde%2Car%3BDen%2Cde%2Car%3BP31 [02:29:08] (you'll have to copy the whole link, because Telegram broke it) [02:32:19] I tried it a couple of times, and couldn't do anything useful. I don't remember why exactly. But I can try again. (re @Toby: Have you tried Tabernacle? Depending on your appetite for SPARQL, such a grouping is certainly possible. I'd be happy to help f...) [02:37:40] And a more concrete manageable group. Here is what I would send to your astronomical friend. Just properties relating to Astronomy or Astronomical Objects: [02:37:42] https://tabernacle.toolforge.org/?#/tab/sparql/SELECT%20%3Fproperty%20%3FpropertyType%20%3FpropertyLabel_en%20%3FpropertyLabel_he%20%3Fgroup%20WHERE%20%7B%0A%20%20%20%20%3Fproperty%20wikibase%3ApropertyType%20%3FpropertyType%20.%0A%20%20%20%20%3Fproperty%20rdfs%3Alabel%20%3FpropertyLabel_en%20.%20%0A%20%20%20%20FILTER(lang(%3FpropertyLabel_en)%3D'en')%0A%20%20%20%20OPTIONAL%20%7B [02:37:42] %0A%20%20%20%20%20%20%3Fproperty%20rdfs%3Alabel%20%3FpropertyLabel_he%20.%20%0A%20%20%20%20%20%20FILTER(lang(%3FpropertyLabel_he)%3D'he')%0A%20%20%20%20%7D%0A%20%20%20%20FILTER(!BOUND(%3FpropertyLabel_he))%0A%20%20%20%20VALUES%20%3Fgroup%20%7Bwd%3AQ41799791%20wd%3AQ21451142%7D%0A%20%20%20%20%3Fproperty%20wdt%3AP31%20%3Fgroup%20.%0A%7D%0AORDER%20BY%20ASC(%3Fgroup)%0ALIMIT%2010000% [02:37:44] 0A/Lhe%3BDhe%3BLen%2Cde%2Car%3BDen%2Cde%2Car%3BP31 [02:56:59] OK, it kind of works in theory, but, there's no right-to-left support, so it's a non-starter for Hebrew. [02:59:49] In the Wikidata world, lots of people think that Wikidata is supposed to be just the internal store of data, and everything else is supposed to be done by other tools. There are lots of problems with this attitude, and right to left support is one of them. The MediaWiki platform has had good built-in RTL support since 2010, so if you do something upon this platform, it mostly-aut [02:59:50] omatically gives you good support for RTL and lots of other internationalization features. [03:01:03] If you do something with external tools built using other frameworks (or no frameworks at all), you get something that may work well in English or maybe German (I'm not even sure about that), but most likely broken in other languages. [03:03:24] In a tool whose whole point is its being massively multilingual, an _efficient_ translation workflow is supposed to be a core feature and not an afterthought. [03:06:21] The whole point of Wikidata and Wikidata Lexeme is that they are massively multilingual, but after ten years, they still don't have an _efficient_ translation workflow. The same seems to be happening with Abstract Wikipedia :( [03:12:56] what's the issue with lexemes exactly? (also Abstract Wikipedia doesn't exist yet) [03:15:20] Indeed, that's why I wrote "seems to be happening". [03:17:28] And the issue with Lexemes is that the last time I checked, there was no efficient way to get a list of lexemes that don't have a translation into a language, and submit translations for them. [03:17:58] I'm only in charge of one multilingual tool myself, and am as monolingual as we come. But based on this discussion, I'll commit to not release version 1.0 of Entity Explosion until right-to-left script is supported. https://www.wikidata.org/w/index.php?title=Wikidata%3AEntity_Explosion&diff=2087418412&oldid=1985649876 (re @amire80: If you do something with external tools built us [03:17:59] ing other frameworks (or no frameworks at all), you get something that may work ...) [03:18:51] Every time I mention this, people send me links to tools, and I can't for the life of me, figure out how do they do the thing I actually asked for. (re @amire80: And the issue with Lexemes is that the last time I checked, there was no efficient way to get a list of lexemes that don't have ...) [03:22:05] orthohin.toolforge.org: [03:22:06] 1) click "log in" on the top right (might require clicking on a menu at the top on mobile) [03:22:08] 2) log in [03:22:09] 3) click "Hebrew" [03:22:11] 4) if you can define the word that shows up, type a definition in Hebrew for it, then click "Add" [03:22:12] 5) if you cannot define the word that shows up, click "Give me another lemma" [03:22:14] 6) repeat ad finitum (re @amire80: And the issue with Lexemes is that the last time I checked, there was no efficient way to get a list of lexemes that don't have ...) [03:36:40] Can it give me a *list* of lexemes? [03:37:00] (I realize that list was more targeted at the substring "I can't for the life of me, figure out how do they do the thing", but) as my efforts are starting to demonstrate, not every lexeme in a language like English must have an (in)direct translation into every other language--what is more important is that the lexemes that do exist in a language have meanings, so that we can use [03:37:02] those meanings to represent other meanings from other languages using a slowly increasing number of strategies (re @mahir256: orthohin.toolforge.org: [03:37:03] 1) click "log in" on the top right (might require clicking on a menu at the top on mobile) [03:37:05] 2) log in [03:37:06] 3) ...) [03:37:38] A list of all the missing lexemes is not actually practical, but still relatively more useful than random lexemes. (re @amire80: Can it give me a list of lexemes?) [03:38:02] Even better would be a list of lexemes with a unifying theme. [03:38:28] A list of lexemes with missing meanings on a certain topic would be quite useful, too. [03:40:24] ok, and how do you define 'missing'? where do you expect a 'unifying theme' and a 'certain topic' to come from when most of Uziel's Hebrew import lacks senses or even topic-related statements? (re @amire80: A list of all the missing lexemes is not actually practical, but still relatively more useful than random lexemes.) [03:41:25] "Words without senses that begin with the letters כ" would be a good beginning. [03:41:29] if you want to go through elemwala.toolforge.org/static/indices/P11280.txt and make sure every entry in Ma'agarim is covered by at least one lexeme, that's one way to fill in 'missing' lexemes, but I doubt that's what you're intending [03:43:55] w.wiki/9Hdu (re @amire80: "Words without senses that begin with the letters כ" would be a good beginning.) [03:46:22] This goes back to what I wrote earlier: "opening one property page or item page, translating the label, closing that page, them opening the next one, and so on, is a very inefficient workflow". (re @mahir256: w.wiki/9Hdu) [03:54:28] I also have a vague recollection of a tool that gets a list of common words that appear in Wikipedia and don't have a lexeme at all. That would be very useful, but in Hebrew, it mostly consisted of words with particles, and there was no clear way to define them in the database that they are a compound of a particle and a word. [03:54:57] Hebrew is, of course, not the only language that has this problem. [03:55:21] It's a somewhat different problem because it's about lexemes that don't yet exist at all, but it's kind of related because it's about a list of stuff that has a topic. [03:55:49] that's not a tool, that's a list: [[:d:Wikidata:Lexicographical_coverage/he/Missing]] (re @amire80: I also have a vague recollection of a tool that gets a list of common words that appear in Wikipedia and don't have a lexeme at ...) [04:05:18] Cool, but the actual problem is that it's unclear how does one make #14 in that list, "הברית", which very roughly translates to "The Second" (as in "The Second World War", for example) disappear from it. [04:05:45] To make Abstract Wikipedia useful in a language, that language would need a lot of lexemes defined. [04:06:57] This list would be a good way to find what the useful lexemes are and make sure they all appear in the database, if this was possible. Last time I checked, it was not. [04:08:39] (And again, Hebrew is far from the only language with this issue. Arabic and Turkish have it, too, and probably many more languages.) [04:17:02] as someone who has an interest in adding Arabic lexemes to improve etymological paths between Turkish and Indo-Aryan languages, I see less usefulness in simply string-matching space-delimited strings in articles to lexeme forms than I see in filling in well-defined fixed worklists like [[User:Mahir256/routledge-buckwalterparkinson]] (or elemwala.toolforge.org/static/indices/P1103 [04:17:03] 8.txt, or www.wikidata.org/w/index.php?curid=118658451 for Perso-Arabic loans, or [[User:Mahir256/routledge-aamd]] for Turkish) [04:17:05] [04:17:06] and as someone who is actively producing sentences using Turkish lexemes, that language's lexemes are generally in much much better shape for producing arbitrary sentences than Hebrew may ever become within the coming year (re @amire80: (And again, Hebrew is far from the only language with this issue. Arabic and Turkish have it, too, and probably many more langua...) [04:17:51] (those links should have been [[:d:User:Mahir256/routledge-buckwalterparkinson]] and [[:d:User:Mahir256/routledge-aamd]]) [04:41:04] Several questions about that: [04:43:28] 1. A list of space-delimited words from Wikipedia is good at showing which words are actually used in a Wikipedia. If one wants Abstract Wikipedia to produce something comparable to manually-written Wikipedia, isn't it a good idea to make sure it includes the words that are actually used in it? [04:44:13] 2. "filling in well-defined fixed worklists" - how do you fill them? Open a tab with a lexeme page for each of them and fill the whole form? [04:44:44] 3. "generally in much much better shape" - how do you measure that it's in a better shape? [05:15:19] For this problem, perhaps Wikifunctions could be the start of the solution. We (you?) could write a function that takes a long Hebrew string such as a wikipedia paragraph, and produces a list of the expected lexemes within it (and their frequency?). If we had this for each language, that may help improve the listing tool. (re @amire80: I also have a vague recollection of a tool t [05:15:20] hat gets a list of common words that appear in Wikipedia and don't have a lexeme at ...) [05:26:50] That's tokenization. I don't think that Wikifunctions should be used for that. There is existing code that does it for Hebrew and for many other languages. In Hebrew, it's not perfect, but I'm pretty sure that existing tokenizers would identify the parts of the word I gave as an example earlier. [05:26:51] [05:26:53] Perhaps the code that generates the list of missing lexemes could try sending them to a tokenizer, and add them to the list only if the tokenizer really doesn't know what to do with them. [05:27:51] Tokenizers won't identify all the missing senses, but it would still be an improvement. [05:32:45] I imagine they get the list from a (simplistically space-punctuation-tokenized) tokenizer already. They might just need to use a better tokenizer. Is the existing code readily available across all scripts? If not, why shouldn't WF be used for that? [05:35:14] Is WF good for analyzing lots of existing wiki pages? :) [05:35:34] I haven't tried, but I suspect that it's not quite what WF is for. [05:36:42] As for how that code works, perhaps Nikki knows. [05:40:23] Not yet! Partly because we don't have a tokenizer function :) (re @amire80: Is WF good for analyzing lots of existing wiki pages? :)) [05:41:45] More seriously, the biggest batches I've seen done are tests against all WD Lexemes of a certain type in a certain language. [05:42:16] But I understand we will later have a way of using the functions on external servers. [06:06:19] Want to throw in a few RTL test cases? Z13402 [08:57:44] Hello everyone, I am Nimish a new contributor at wikimedia. I've set up mediawiki in wsl and also installed wikilambda. I wanted to ask that how can I open wikifunctions in localhost? Is there any installation guide cause I was not able to find any. Thanks for the help! [10:54:09] “*Implementation metadata [10:54:10] Implementation type [10:54:12] *The type (BuiltIn, Evaluated, or Composition) of the implementation used to run the function. See the Function model (https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model#Z14/Implementations) for more about implementation types.” [10:54:13] [10:54:15] (According to https://www.mediawiki.org/wiki/Help:Wikifunctions/Function_call_metadata) 🤷‍♂️ (re @vrandecic: That's weird. I thought the value would be key label for Z14K2, Z14K3 and Z14K4. I think that might be a bug.) [11:08:30] I'm getting this on editing https://www.wikifunctions.org/wiki/Z13390?action=edit&uselang=en while having admin permission, is it normal? : https://tools-static.wmflabs.org/bridgebot/82c57aa6/file_57864.jpg [11:10:55] Okay, so I think I was mistaken (user error). It is a bit of a usability issue or concern: if you change the text within a selector but fail to click the new object (clicking away from the selector), it still looks like you’ve changed it. In this case, intending to change from “string equality” to “list equality” (re @Al: Thought so… I tried changing an element within [11:10:55] one of the lists in Z13236 and the test still passed. I’ll set it up as a formal ...) [11:40:15] Hi, Ebrahim. You don’t appear to be in the “functioneers” user group. Perhaps your temporary right expired? If so, you can request “functioneer” permission at https://www.wikifunctions.org/wiki/Wikifunctions:Requests_for_user_groups. (re @ebraminio: I'm getting this on editing https://www.wikifunctions.org/wiki/Z13390?action=edit&uselang=en while having admin permission, is i...) [12:31:45] Does Wikifunctions have RTL frontend tests? (re @Toby: Want to throw in a few RTL test cases? Z13402) [13:18:21] I think it's capable, but we don't have many testers yet. Here's an example: Z13140. (re @amire80: Does Wikifunctions have RTL frontend tests?) [13:22:26] perhaps https://www.wikifunctions.org/view/he/Z10991 is an even better example [15:51:22] Having the ability to express words on a given Wikipedia with that language's lexemes should be a *side effect* of introducing enough lexemes to cover much of what that language can convey; this is why I find (especially lists of) external identifiers and published lists of most frequent words much more helpful. The coverage pages, as much as one may treat them as lists to be fil [15:51:22] led in, are better treated as guides on which people may account for entries with lexeme forms (think placing a checkmark next to, rather than outright removing, a form which exists on a lexeme on that list) either manually or semi-automatically. [15:51:24] [15:51:25] (In addition to the unnecessary complexity that comes with manually trying to insert every inflection of a term in an agglutinating/polysynthetic language as its own form, space delimitation is not a meaningful criterion for Sinitic languages, Japanese, or Thai which do not use spaces, and it is insufficient for Vietnamese where all syllables are space-separated making useful lex [15:51:27] ical units difficult to discern this way.) (re @amire80: 1. A list of space-delimited words from Wikipedia is good at showing which words are actually used in a Wikipedia. If one wants ...) [15:51:30] Yes, those pages are helpful to keep track of progress as users substitute in lexeme IDs when they create lexemes for the entries in the list. (If your objection is that such a workflow as you question is inefficient, the lexicographical coverage pages don't necessarily make this easier.) (re @amire80: 2. "filling in well-defined fixed worklists" - how do you fill them? Open a ta [15:51:31] b with a lexeme page for each of them and fill the...) [15:52:26] This to me is dependent on a combination of [15:52:27] (1) each lexeme having a sense; [15:52:28] (2) either (a) passing (1), as many lexemes as possible having a link to a resource describing them; [15:52:30] (2) or (b) failing (1), each senseless lexeme having a link to a resource describing it; [15:52:31] (3) closed word classes in that language all being appropriately sensed for the uses they exhibit (usually satisfied when (1) is satisfied); [15:52:33] (4) grammatical rules being formulated into programmatic code that uses (at least some of) the lexemes in those closed word classes; [15:52:34] (5) people that are actively working on (1) or (2b); [15:52:36] (6) people that are actively working on (4); and [15:52:37] (7) people that are readily available to judge the application of (4) using existing lexemes in closed and open word classes. [15:52:39] Turkish currently satisfies all 7; [15:52:40] other languages that are closer to being ready satisfy (1) or (2b) and both (5) and (7); [15:52:42] Hebrew satisfies none of them. (re @amire80: 3. "generally in much much better shape" - how do you measure that it's in a better shape?) [16:35:50] Where do see those measurements? Is there a chart, or a table, or a progress bar? (re @mahir256: This to me is dependent on a combination of [16:35:51] (1) each lexeme having a sense; [16:35:52] (2) either (a) passing (1), as many lexemes as possi...) [17:53:32] Adaptations of queries below to other languages will require changes based on the word classes pertinent to the language and the external identifiers available for it (what works for one language may not work for another!), but the list below should not in any way diminish the validity of the list I posted or the degree to which readiness of a single language may be measured: [17:53:33] (1): [[:d:Wikidata:Lexicographical_data/Statistics/Count_of_lexemes_without_senses]] (https://w.wiki/9J9F is the query for Hebrew lexemes without senses) [17:53:34] (2a): (https://w.wiki/9J9u is the query for Bokmål—a language where each lexeme has a sense) [17:53:36] (2b): (https://w.wiki/9J9d is the query for German—a language where not each lexeme has a sense) [17:53:37] (3): (https://w.wiki/9JAz is the query for Hebrew lexemes not satisfying (3)) [17:53:39] (4): this can be quantitative but is often qualitative—whether through observing activity on Wikifunctions (either [[Special:RecentChanges]] or [[Wikifunctions:Catalogue]]) or on https://gitlab.com/mahir256/ninai and https://gitlab.com/mahir256/udiron [17:53:40] (5): more qualitative than quantitative—whether through active membership in appropriate discussion (sub)groups, regular activity in https://w.wiki/9JB5, or active contributions to pages like www.wikidata.org/w/index.php?curid=108388486 or discussion pages, or other observations [17:53:42] (6): more qualitative than quantitative—see (4) [17:53:43] (7): more qualitative than quantitative—but observable through who is generally consistently available in discussion fora and actually willing to perform the necessary judgments (re @amire80: Where do see those measurements? Is there a chart, or a table, or a progress bar?) [18:08:37] How do I see the difference between 1, 2a, and 2b? These are three queries that produce a lot of results. Some produce more results, some produce less. 2a actually produces more results than 2b, and from your explanation I'd think that 2a is supposed to produce zero. [18:10:54] (2a) states "as many lexemes as possible", not "all lexemes": given that there are around 36,000 lexemes, I'd say @jhsoby has been doing a damn good job (and perhaps in his efforts to add more from Bokmålsordboka the (2a) list will clear out) (re @amire80: How do I see the difference between 1, 2a, and 2b? These are three queries that produce a lot of results. Some produce more resu...) [18:13:20] Comparing strings is not frontend testing. Frontend testing is comparing screenshots. Comparing strings is about how things are stored in memory. RTL problems are usually about incorrect display: a string can be stored correctly, but displayed incorrectly. (re @Toby: I think it's capable, but we don't have many testers yet. Here's an example: Z13140.) [18:13:59] (I don't quite understand what is this function supposed to do, and the test case has incorrect Hebrew text. It's a bit weird, because the user who created it knows Hebrew well. But it's not really related to RTL support.) (re @Toby: perhaps https://www.wikifunctions.org/view/he/Z10991 is an even better example) [18:16:08] 2a in your latest example says "a language where each lexeme has a sense". (re @mahir256: (2a) states "as many lexemes as possible", not "all lexemes": given that there are around 36,000 Bokmål lexemes, I'd say @jhsoby...) [18:17:02] https://w.wiki/9JBk is a (not applicable) (2a) query for Hebrew, and https://w.wiki/9JBm is the (2b) query for Hebrew (re @amire80: How do I see the difference between 1, 2a, and 2b? These are three queries that produce a lot of results. Some produce more resu...) [18:17:48] let me clarify, (2a) says "passing (1), as many lexemes as possible having a link to a resource describing them;"; Bokmål passes (1), so (2a) is applicable; Hebrew clearly does not pass (1), so (2b) is applicable to it (re @amire80: 2a in your latest example says "a language where each lexeme has a sense".) [18:22:53] I don't understand a thing. [18:29:55] let's combine (1) and (2) above as follows: [18:29:55] (1) one of the following is true: [18:29:57] (a) each lexeme has a sense, and as many lexemes as possible have a link to a resource describing them; or [18:29:58] (b) not all lexemes have a sense, but each senseless lexeme has a link to a resource describing it [18:30:00] now there are six criteria instead of seven (a language can only fall under one of the two subpoints above), and the response to your "where do see those measurements" question is revised as follows: [18:30:01] (for both 1a and 1b) [[:d:Wikidata:Lexicographical_data/Statistics/Count_of_lexemes_without_senses]] (https://w.wiki/9J9F is the query for Hebrew lexemes without senses) [18:30:03] (for 1a specifically) since all Bokmål lexemes have senses, https://w.wiki/9J9u is the query for lexemes in Bokmål without links to resource describing them [18:30:04] (for 1b specifically) since not all Hebrew lexemes have senses, https://w.wiki/9JBm is the query for lexemes without senses in Hebrew without links to resources describing them [18:40:40] If it is helpful to put it this way, there are two important tasks for Hebrew. Adding all the grammatical/function words, by manually adding them based on a reference grammar. Then adding senses to lexemes which don't have them, which is about 80% of them according to that table above. [18:55:39] those which have an entry in P11416 would be a more useful list than a Wikipedia one in my opinion. Lexemes which have several meanings and grammatical nuances make sense to prioritize, and Wikipedia concentrates a lot of specialized vocabulary at high frequences. It is useful for example to be able to distinguish the uses of "recall," "recollect," and "remember" even if words l [18:55:40] ike "football" or "box-office" might be more frequent [19:47:08] I completely fail to understand everything you say, which is extremely frustrating, because I've been involved with Wikimedia since 2004, and I've loved dictionaries since 1985, and I have a linguistics degree, so it feels like I'm supposed to understand what's going on here, and I don't, even though I really want to. [19:48:31] No one has to do it, but if anyone wants to try to explain any of this to me yet again, please try to do it without arbitrary numbers and letters like "2b". [19:51:59] Also, "Senselesses" is not a word. [19:52:08] you asked how to measure something, [19:52:09] I gave you seven criteria which I numbered, [19:52:10] you asked where those measurements may be seen, [19:52:12] I responded with answers matching each of the seven criteria that I numbered, [19:52:13] you asked for clarifications on some of these queries, [19:52:15] which I gave you with reference to the seven criteria that I numbered, [19:52:16] but I'm truly sorry to have still confused the hell out of you [19:53:44] I mean, I think I actually understand what it means here, but if anyone wants to make a table that anyone has a chance to understand without extra explanations, it would be good to use standard language instead of getting people to wonder what this weird word is. [19:55:25] there I changed it ya happy (re @amire80: Also, "Senselesses" is not a word.) [19:56:25] Better. [21:11:44] What I'm really trying to say is this: Wikipedia has many thousands of contributors. There should be even more, but the current number is quite large. Those thousands of contributors somehow figured out how to do it, and it's great. [21:11:45] [21:11:46] Wikidata Lexeme doesn't have nearly as many contributors. In theory, it could have a lot, because lots of people know words, translations, and definitions, and could contribute them if the workflow was usable. [21:12:24] What actually happens is that @mahir256 and a few more people figured out how Wikidata Lexeme works, built their own tools and terminology around it, and work there happily. But after trying to understand for several years, I find that world impenetrable, which is weird, because given my interests, I'd expect this to be the most natural place for me to contribute to. And if I fin [21:12:24] d it impenetrable, people who are less motivated than I am will find it even more impenetrable. [21:12:25] [21:12:27] And if Lexeme can't be a successful project that attracts masses, neither can Abstract Wikipedia. [21:12:28] [21:12:30] It may sound like I'm blaming one particular person, but honestly, I'm not. I'm just describing how I feel about it. I don't know, maybe it's just me, but I suspect that it's not just me. [21:23:12] I feel it's a systemic, built-in problem in how the Lexemes product was built. It's a super-generic data store, and to do anything useful, you need to use other tools. If the Abstract Wikipedia product tries to build something upon the Lexemes product without fixing this problem, it will not succeed. [21:25:10] If a few people built those tools, figured out how Lexemes work, and did something useful (although I don't entirely understand how useful), then maybe it's a bit better than if no one used it at all. But it doesn't look like this current way of how the Lexemes product works can scale. [21:25:51] The difference between Wikipedia and Lexemes is that after a fairly short while, there won't be much more to add. All common words will be done and there will only be a (albeit endless) pile of corner cases to fix. [21:26:10] "short" :) [21:26:53] If the workflow was good, then maybe it would actually short for the top 30 languages. [21:26:59] The first language-related projet that I've found popular with people so far is Lingua Libre (a tool so people record words easily; by Wikimédia France), then Wiktionnary. Lexemes don't seem to be presented in an accessible way when you enter Wikidata (and to be honest, the projects main pages in general are too text-heavy, with fixed structures and rarely pictures). (re @amire8 [21:27:00] 0: What I'm really trying to say is this: Wikipedia has many thousands of contributors. There should be even more, but the current ...) [21:29:05] But why not try to fill it in 2000 languages, or 6000? That would take much more time, but that's ok. (re @amire80: If the workflow was good, then maybe it would actually short for the top 30 languages.) [21:29:24] But without a workflow that masses can easily understand, that won't happen. [21:29:32] I think it would take decades just for the languages spoken in France (and even French got an estimation of 2+ million lexemes scattered in many dictionaries and specialized fields) (re @Jan_ainali: The difference between Wikipedia and Lexemes is that after a fairly short while, there won't be much more to add. All common wor...) [21:34:25] Text, by itself, is not a problem. I love text. (re @JN_Squire: The first language-related projet that I've found popular with people so far is Lingua Libre (a tool so people record words easi...) [21:36:05] If it was text that actually tells me what I can do, I wouldn't complain. [21:36:49] But when I look at https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/How_to_help , I don't understand how can I actually help. [21:37:10] Evidently, some people figured it out, but not me, and not for lack of trying. [22:00:09] When I asked at the top if you wanted to pitch in some RTL test cases, I meant more Z20 using RTL strings attached to functions. If instead you want to comment on the interface and take screenshots and file phabricator tasks, I'm sure there is value in that too, but I'm not the right person to answer. (re @amire80: Comparing strings is not frontend testing. Frontend testing is co [22:00:10] mparing screenshots. Comparing strings is about how things are ...) [22:04:56] The time (in person hours) is probably more or less the same for most languages. The hard part is that just speaking a language doesn't make you a linguist. I am clearly under qualified for many tasks even in my native language just because I don't know grammar good enough. (re @amire80: If the workflow was good, then maybe it would actually short for the top 30 languages.) [22:07:11] You can probably translate a lot of nouns. [22:07:12] Or upload images that illustrate them. [22:08:20] Ah well, I'm exaggerating with images. I wouldn't do it very well myself. [22:08:38] I think we are more or less done with adding nouns for Swedish already. (Obviously not completely, but the most common ones.) [22:08:39] But maybe you would. [22:09:00] And personally, I don't see the point with images on lexemes. [22:09:15] Cool! But in most languages, it's far from complete. (re @Jan_ainali: I think we are more or less done with adding nouns for Swedish already. (Obviously not completely, but the most common ones.)) [22:10:26] I'll offer an alternative perspective here and say that working on lexemes has got me more interested in languages than I ever thought I would be and indirectly that has been a vehicle for connecting to people who speak languages other than English. [22:10:27] I agree there are aspects that are impenetrable (especially the mobile interface) but the underlying idea is one I find very promising (re @amire80: What actually happens is that @mahir256 and a few more people figured out how Wikidata Lexeme works, built their own tools and t...) [22:10:30] Yeah, and that's the second problem. Most people don't know most languages. :) (re @amire80: Cool! But in most languages, it's far from complete.) [22:11:09] (I barely know one.) [22:12:07] part of it is also that the existing resources for the languages I am interested in are so unsatisfactory that it doesn't take much for lexemes to be useful for things which haven't been done before [22:14:24] Language documentation. Swedish has lots of dictionaries already, as do the rest of the top 30 languages. [22:14:24] [22:14:25] But most languages have no dictionaries at all. So: [22:14:27] 1. Upload images for common nouns. [22:14:28] 2. Take a computer that is somehow connected to Lexemes to someone who knows a poorly documented language. [22:14:30] 3. Show them a picture from Lexemes, and ask them to type that thing's name. Then show them another thing. If they know another language, show them that thing's name in another language. [22:14:31] [22:14:33] Result: the first dictionary of that language. [22:14:34] [22:14:36] It can work without images, but it's often easier with images. (re @Jan_ainali: And personally, I don't see the point with images on lexemes.) [22:14:48] You know at least two, that's the whole point. (re @Jan_ainali: (I barely know one.)) [22:16:09] I also find it promising, but every time I try actually doing anything, I get stuck, and when I ask for help, I cannot understand the explanations. Something, somewhere is wrong. (re @bgo_eiu: I'll offer an alternative perspective here and say that working on lexemes has got me more interested in languages than I ever t...) [22:16:12] I meant, I rather add images on the Wikidata items and then they'll be connected through the sense. It is double storing to do it on both. (re @amire80: Language documentation. Swedish has lots of dictionaries already, as do the rest of the top 30 languages. [22:16:13] [22:16:15] But most languages ha...) [22:17:42] That's a possibility, too. (re @Jan_ainali: I meant, I rather add images on the Wikidata items and then they'll be connected through the sense. It is double storing to do i...) [22:17:51] pnbwikipedia has had at most 3 active editors per year (who mostly copy paste from Urdu Wikipedia) despite representing the most spoken language of a major country. [22:17:52] the potential for lexemes to be useful is huge there [22:20:37] the fact that Pakistan and India have the same official language and that both urwiki and hiwiki are largely incomprehensible to many native speakers of that language is also a big problem [22:22:02] Or course, but how does one even start doing anything useful? Honestly, I don't know. I attended talks. I watched videos. I read documentation pages. I asked on Telegram. And I don't understand a damn thing. (re @bgo_eiu: pnbwikipedia has had at most 3 active editors per year (who mostly copy paste from Urdu Wikipedia) despite representing the mos...) [22:22:13] Am I stupid? I mean, I wrote lots of Wikipedia articles, I wrote several MediaWiki extensions, and I've been maintaining the Hebrew translation of MediaWiki and all of its extensions, with about 40,000 messages, at 100% localization, since 2010. Evidently, I understand some things about how the Wikimedia world works. [22:22:13] [22:22:15] But I can't understand Lexemes. [22:23:01] The Lexemes videos I have made, have you missed them, or are they too incomprehensible? (re @amire80: Am I stupid? I mean, I wrote lots of Wikipedia articles, I wrote several MediaWiki extensions, and I've been maintaining the Heb...) [22:25:44] Can you send a link? [22:27:25] But generally, if it's about editing one lexeme, I probably know what to do. That's not a big deal. This whole discussion started from pointing out a problem: one lexeme is a drop in an ocean for any language; how can one plan to work on many Lexemes in a language they know? Is there a good methodology for that? [22:28:26] If you do something with many, many drops, eventually an ocean comes together. But how do I build a strategy toward that? [22:28:27] [22:28:28] In other words, is there a progress bar anywhere? :) [22:28:57] There are 12 videos keyworded with lexemes here [[m:Wikipedia Weekly Network/Live Wikidata Editing]] (re @amire80: Can you send a link?) [22:29:57] Is there a particular problem you have tried to approach it with? When I started, the problem I was looking to address was simply that sources in the two countries Punjab is split between are not linked to eachother anywhere else [22:30:00] My point is that it isn't a drop in an ocean. It's a drop in a bucket. And you actually don't need that many drops to fill a bucket. (re @amire80: But generally, if it's about editing one lexeme, I probably know what to do. That's not a big deal. This whole discussion starte...) [22:30:35] I did give a specific suggestion I find compelling with respect to Hebrew further up in the chat today (re @amire80: If you do something with many, many drops, eventually an ocean comes together. But how do I build a strategy toward that? [22:30:36] [22:30:37] In ot...) [22:31:34] I saw that, but where's the progress bar? (re @bgo_eiu: I did give a specific suggestion I find compelling with respect to Hebrew further up in the chat today) [22:31:53] in the link right above that comment (re @amire80: I saw that, but where's the progress bar?) [22:32:23] Can you please send it again? (re @bgo_eiu: in the link right above that comment) [22:32:46] I think I looked at those links and I didn't see a progress bar, but maybe I missed something. [22:34:12] https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Statistics/Count_of_lexemes_without_senses [22:34:13] [22:34:15] it's a table, not a bar [22:34:16] [22:34:18] a bar would not be useful here, because for languages like Punjabi it would always be at 100%. there are a handful of specific languages, of which Hebrew is one, where thousands of blank lexemes have been imported [22:42:07] I couldn't understand that table. Is there a percentage? [22:42:22] And why would it always be 100% for Punjabi? [22:51:35] You can do percentages but the goal is not 100% (it's more, a word can have multiple meanings), that's why the last column is a ratio (that should be above 1 ideally) (re @amire80: I couldn't understand that table. Is there a percentage?) [22:52:24] And if I run the "Senseless lexemes" query, what do I do? Just open the query results, click a lexeme, and type a sense, and publish? And then do another one? [22:52:25] My point was, attractivity to a project is something made of a lot of things, and the project for Lexemes suffers from at least a lack immediate readability compared to most of the current UI practices. (re @amire80: Text, by itself, is not a problem. I love text.) [22:52:49] yup (re @amire80: And if I run the "Senseless lexemes" query, what do I do? Just open the query results, click a lexeme, and type a sense, and pub...) [22:53:27] Sorry, but it's an awful workflow. Am I really the only one complaining about it? (re @jhsoby: yup) [22:53:57] What workflow would you like? (re @amire80: Sorry, but it's an awful workflow. Am I really the only one complaining about it?) [22:54:08] What is your workflow for editing Wikipedia articles? (re @amire80: Sorry, but it's an awful workflow. Am I really the only one complaining about it?) [22:54:27] I respect, and even admire the people who contribute their knowledge using it, but that's not a way to find more people to do efficient things. [22:55:05] A Wikipedia article is long. A lexeme is very short. Its translation is also very short. The sense is a bit longer. You could fit many of them on one page. (re @Jan_ainali: What is your workflow for editing Wikipedia articles?) [22:55:44] I usually spend a good 5-10 minutes on each lexeme before I consider it done. [22:56:18] It's still much less than a Wikipedia article. (re @Jan_ainali: I usually spend a good 5-10 minutes on each lexeme before I consider it done.) [22:56:37] I usually need at least two-three other tabs too for one lexeme for references and examples. Cramming several into one page would be mseey [22:57:02] Ideally lexemes should be longer, currently most lexemes are way too short (re @amire80: A Wikipedia article is long. A lexeme is very short. Its translation is also very short. The sense is a bit longer. You could fi...) [22:58:08] My point is, that it isn't translating like translatewiki. Perhaps your mental model of a finished lexeme is too similar to a translation of a single word. (re @amire80: It's still much less than a Wikipedia article.) [22:58:31] I'm still teaching my parents how to read their native language. The idea that they can even read and write in their language is new for them, my expectations for efficiency here are in a completely different place (re @amire80: And why would it always be 100% for Punjabi?) [22:59:45] Is that a response to why would the progress bar would always be 100% for Punjabi? (re @bgo_eiu: I'm still teaching my parents how to read their native language. The idea that they can even read and write in their language is...) [23:00:17] yes, it is not possible to import data from elsewhere because suitable data for that does not exist [23:00:33] +1, for me, ideally there should be 0 translation on a lexeme (re @Jan_ainali: My point is, that it isn't translating like translatewiki. Perhaps your mental model of a finished lexeme is too similar to a tr...) [23:01:05] translation comes afterwards in tools reusing lexemes [23:01:31] (obviously it's ideally, reality is way more complex) [23:02:09] So that's the thing I totally fail to understand. What good is a dictionary if you're not able to translate? (re @Nicolas: translation comes afterwards in tools reusing lexemes) [23:02:42] most speakers of under-resourced languages are already bilingual [23:02:51] It isn't a dictionary. It is the data to build a dictionary from. (re @amire80: So that's the thing I totally fail to understand. What good is a dictionary if you're not able to translate?) [23:03:24] the tools can translate based in lexemes senses but in most cases you don't need to store the translation, you can infer it (re @amire80: So that's the thing I totally fail to understand. What good is a dictionary if you're not able to translate?) [23:03:39] And how does that work? (re @Jan_ainali: It isn't a dictionary. It is the data to build a dictionary from.) [23:03:57] Like, if I want to say that кот is the Russian word for cat, where do I do it? [23:04:16] And can I get a list of all the animals in English, and submit their translations into Russian? [23:04:19] 20–30 seconds here 😜 (re @Jan_ainali: I usually spend a good 5-10 minutes on each lexeme before I consider it done.) [23:04:46] In the sense, you link to the item for cat (re @amire80: Like, if I want to say that кот is the Russian word for cat, where do I do it?) [23:05:24] And if there's no item? There are words for which there's no item. [23:06:51] That's the complex part of reality. (Unless it is just a missing item, then we create it.) [23:06:53] Here you indeed might need to store the translation (hence my caveat with "ideally"), but counter-intuitively it's quite exceptional (re @amire80: And if there's no item? There are words for which there's no item.) [23:07:18] Here's a complete set of progress bars for Property labels. https://w.wiki/5$Gh Apart from English, the other languages doing well generally have just one or two editors committed to putting a label on every new property. A single person can make a big difference to their language here. (re @amire80: If you do something with many, many drops, eventually an ocean comes together. B [23:07:19] ut how do I build a strategy toward that? [23:07:21] [23:07:22] In ot...) [23:07:27] an item isn't a requirement, the gloss on a sense can stand on its own [23:07:54] creating an item and essentially putting the same information on it that is in the gloss is not that useful on its own [23:08:06] Yes, in Wikidata. But it might be hard for software tools to reuse if that is missing. (re @bgo_eiu: an item isn't a requirement, the gloss on a sense can stand on its own) [23:08:51] True. Ideally, the item should be usable as well with references and usage be regular w (re @bgo_eiu: creating an item and essentially putting the same information on it that is in the gloss is not that useful on its own) [23:12:15] yes, I made a key of equivalent grammatical words between close cognate languages of Punjabi [23:12:16] I don't think there is a way around that task [23:29:37] Going back to this, it can be useful to spend a long time on an item, but it can also be useful to do one thing for lots of items. (re @Jan_ainali: I usually spend a good 5-10 minutes on each lexeme before I consider it done.) [23:32:10] Yes. And the same applies to Wikipedia articles, I guess. (re @amire80: Going back to this, it can be useful to spend a long time on an item, but it can also be useful to do one thing for lots of item...)