[00:45:45] but links should still make it across [16:41:32] are there any resources documenting the xml formatting used in wiktionary if I want to parse it from a data dump? specifically I'm looking at JP wiktionary, the xml file is 400MB so it's not too feasible to just open it in VSCode as it takes up my ram. If there's some program specifically for viewing large xml files that'd be useful too. [16:41:45] Not sure how that message ended up striked out oops [16:41:49] Hi marvnc. [16:41:59] Are you familiar with command-line tools? [16:42:00] hi [16:42:03] somewhat [16:42:20] You can use "head" to just view the top X lines of a file. [16:42:27] ohh [16:42:34] For example, "head -10 somefile.xml". [16:42:34] vim also deals with huge files pretty well sometimes [16:42:45] Will show the first ten lines of a file. [16:42:57] assuming it has line breaks ^ [16:43:02] Hah. [16:43:04] i see, i guess i can do that to split out the file [16:43:31] Yeah, something like "head -100 somefile.xml > newfile.xml" would work roughly. [16:43:40] You'll just need to cleanup the end of the new file a bit. [16:43:47] There are existing XML parsing tools. [16:44:12] https://pythonhosted.org/mwxml/ for example. [16:44:25] I personally don't enjoy XML and I've written some terrible parsers myself to deal with it. [16:51:02] thanks, I guess parsing it from scratch wouldn't be very feasible for now and I'll try some existing tools [16:51:24] or maybe i'll just scrape it slowly... haha [16:51:33] There's also an API. [16:51:41] But Wiktionary in general has horrible formatting. [16:51:51] And there's little interest from Wikimedia Foundation Inc. to fix it. [16:52:00] They're very busy making a new terrible skin that nobody asked for. [16:52:09] And wasting money on other horrible projects. [16:52:31] :( [16:52:51] Depending on what you need, Wordnik(?) might be better. [16:52:57] https://www.wordnik.com/ [16:53:04] I think it has a usable API. [16:53:15] There may also be tools to specifically parse Wiktionary these days. [16:53:20] Instead of using the XML dumps directly. [16:53:28] Really depends what you're trying to do. [16:56:54] wiktionary seems to be the most complete source of information that i need unfortunately [20:56:41] Hi, anyone maybe familiar with PageForms + Cargo? I'm searching for a way to have a query form with two inputs, the first one will allow to select "main category" (from cargo or template) and the second one will allow to search for "sub category" but it should be based on the user selection in the first button (and don't show everything), anyone [20:56:42] knows a way to achieve that ?:) Thanks!