[02:37:03] Will it be useful for Wikidata lexemes? (re @dpriskorn: Wow! This would be nice to include also. But the sentences in Wikipedia should probably not be used as usage examples on lexemes...) [06:26:31] >should probably not be used as usage examples on lexemes [06:26:32] so no [06:28:31] Are you replying to me? (re @mahir256: >should probably not be used as usage examples on lexemes [06:28:32] so no) [06:46:12] yes [06:52:47] Are you fed up with waiting for things to unfold? Seize control of your life and approach your dreams! 'DanWhale24' on Telegram has completely transformed my approach. This individual truly knows their stuff and provides excellent trading signals. [06:55:05] Well, spam! [07:36:30] Great, did you see/contact the similar content on huggingface ? [07:36:30] Could you do the same for Wikisource? (a bit more tricky because of the namespaces) (re @sthottingal: Hi, In case this is useful - I have been preparing a huge sentence dataset extracted from Wikipedia for 300+ languages. The data...) [07:39:17] +1 not as examples, but maybe other uses (comparing the list of words to the list of Lexemes to find the missing, inferring some grammatical traits from colocation and comparing to Lexemes, etc.) (re @mahir256: >should probably not be used as usage examples on lexemes [07:39:18] so no) [09:21:58] I very much hope so. E.g. it can be found to see which forms appear in the wild and improve the lexemes. [09:21:59] Or it can be used to find good usage example from both written and oral sources. [09:22:00] See the readme and https://github.com/dpriskorn/riksdagen_sentences/issues/14 (re @cvictorovich: Will it be useful for Wikidata lexemes?) [10:04:04] Not sure for example, but big yes to the other points (re @dpriskorn: I very much hope so. [10:04:05] E.g. it can be found to see which forms appear in the wild and improve the lexemes. [10:04:06] Or it can be used to f...) [10:06:23] Interesting, what are you unsure about? (re @Nicolas: Not sure for example, but big yes to the other points) [10:42:01] I think examples from published documents are more reliable (like from Wikisource, which Luthor does) [10:42:02] In dictionary's science, it's the distinction between examples (made up) and attestations (referenced) (re @dpriskorn: Interesting, what are you unsure about?) [10:49:26] I agree. That's exactly why I'm doing this 😀 [10:49:27] All the documents are published on riksdagen.se and can be referenced easily using P8433 (re @Nicolas: I think examples from published documents are more reliable (like from Wikisource, which Luthor does) [10:49:29] In dictionary's science, i...) [10:50:48] If they had a well working search API I could just make a wrap it, but they really don't, so I'm downloading all the documents and analyzing them myself instead 🤠 [10:51:45] Good, but Santhosh extraction is from Wikipedia which is closer to made up examples than attestations (re @dpriskorn: I agree. That's exactly why I'm doing this 😀 [10:51:45] All the documents are published on riksdagen.se and can be referenced easily using ...) [10:52:01] This seems often to be the case with corpora. They are not searchable via API but just published silos. [10:53:00] Yes, that's why I recommend it should be excluded in the usage example API endpoint 😀 (re @Nicolas: Good, but Santhosh extraction is from Wikipedia which is closer to made up examples than attestations) [10:58:45] @dpriskorn good, if you do some scripts for NLP, please share them [10:59:16] I'll be off grid for the holidays but I'll look at it next year [11:00:05] Nicolas see https://github.com/dpriskorn/riksdagen_sentences/issues/18 where I noted the requirements we talked about here :)