[02:24:29] Hi, where can I find the hardware requirements to host a wikimedia instance? I will only be running one large query, and it can take however long it needs to, as long as it doesn't crash. After that I plan on decomissioning the instance. I tried starting up the instance on a 32 or 64 core ryzen (epyc?) processor, but I get a timeout exception after [02:24:29] loading the data dump into blazegraph, cpu on all cores has been about 100% for about two weeks after the timeout exception. Should I wait longer for it to finish? I have multiple TBs of harddrive space on an SSD. [02:26:44] Would an AMD Ryzen 9 7950X3D with 128GB of RAM and a 2TB nvme ssd be sufficient to load the data and run a single query against the instance, or do you recommend higher hardware specs? [03:37:14] specs are entirely dependent on the content, traffic levels, and what you're doing with it [03:38:13] mediawiki itself does not use much in the way of resources, even on massive sites. blazegraph is an entirely different story [04:09:33] Are there recommended hardware requirements including sample scenarios I could take a look at? I would be running the query privately, no external traffic, and then deleting the entire thing. If it takes a couple weeks that's ok, as long as it doesn't timeout/crash when loading data [04:18:43] No, such recommendations would be completely useless because the answer is "it depends on your specific wiki and workload" [04:20:08] like even if you were to import a full dump of Wikipedia, while you'd need disk space to handle all of that, you wouldn't need nearly as many webservers or need a cluster of database servers because you wouldn't be getting nearly as much traffic [04:21:31] meanwhile if you have a heavily-trafficed wiki but editing is restricted to a small allowlist of users and you only have a couple thousand pages and very few uploaded images, you wouldn't need a lot of disk space but you'd need networking and CPU/memory resources to run a lot of simultaneous PHP instances (depending on what caching layers you have in place) [04:21:38] The wiki I want to use is: https://dumps.wikimedia.org/wikidatawiki/entities/, then I want to import it into blazegraph (without it timing out), then run this single query: [04:21:39] SELECT DISTINCT ?item ?itemLabel ?website WHERE { [04:21:39]   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]", "[EN]". } [04:21:40]   ?item wdt:P31/wdt:P279* wd:Q11033; # P31 is "instance of" and P279 is "subclass of" [04:21:40]         wdt:P856 ?website. # P856 is the property for "official website" [04:21:41]   FILTER NOT EXISTS {?item wdt:P31/wdt:P279* wd:Q2085381} # Filter out instances of "Wikimedia disambiguation page" [04:21:41]   LIMIT 100 [04:21:42] } [04:21:42] And then discard the entire thing [04:23:03] I tried to load it into blazegraph, blazegraph throws a timeout error after about three days and my CPU is at 80% on all cores (blazegraph is producing what seems to be millions of log entries a second, I can't read them as they are going by so fast) [04:23:17] well right now you said it's pegging your cpu at 100% but seemingly not consuming all of your memory, so either a) the import is singlethreaded so the only way to speed up the import is via a faster cpu, or b) you need more cores or to specify that the import uses more cores (I really don't know which is the case, haven't looked into the import process) [04:24:37] Ah gotcha, I'll try a faster CPU and see if that helps, memory is ok (I have about 128GB), disk has lots of space left (200GB) [04:24:39] either way seems that cpu is your current bottleneck based on the limited benchmarking info you've provided [04:53:46] Started it up again, looks like it's only using a couple cores at 100% at a time, and I've piped the output from blazegraph to /dev/null. I'll take a peek and see if there's a way to get it to use more cores [17:40:18] Is there a up-to-date wikitext spec? Or do I need to infer from Help:Editing, existing implementations (aka. Parsoid), and/or the parser tests? [17:41:01] Is there some wikitext spec (as in non up-to-date) that you're looking at? [17:46:53] andre: I see pages discussing spec attempts and roadmaps so that's why I specified up-to-date. [17:47:15] afaik there has never been one. Updated or not updated. [17:49:24] The spec is functionally "whatever MediaWiki core and Parsoid do". https://www.mediawiki.org/wiki/Specs/wikitext/1.0.0 [17:51:48] Then the information in https://www.mediawiki.org/wiki/Markup_spec is unauthorative at best? [17:54:24] yeah. that pages looks to have been marked as "historical" (which functionally means "outdated garbage") in mid-2016. Looking at the content history I think that it was last edited by someone authoritative about parsing in 2006. [17:55:54] Alright, let's see what I can do then :/ [17:56:01] https://www.mediawiki.org/wiki/Parsoid/Parser_Unification is the most active suite of changes to parsing currently. [17:57:17] Once the parser unification is done the spec for wikitext will be "whatever the unified parser does" instead of whatever the 2 separate parsers do. It will not however become a regular language with a formal grammar [17:58:35] thuna`: perhaps there is an XY Problem in your question that could be resolved. For what purpose are you seeking a specification for wikitext parsing? [17:58:55] thuna`: depending on what you want to do, it might be easier to extract the information you want from the parsed HTML, which has an actual spec: https://www.mediawiki.org/wiki/Specs/HTML/2.8.0 [17:59:38] I was looking to write an emacs major mode for wikitext [18:00:51] https://www.emacswiki.org/emacs/wikitext-mode.el [18:03:44] That is quite outdated and lacks a significant amount of the functionality that I am looking for [18:04:01] in that case, maybe the code for the built-in wikitext syntax highlighting would be helpful for you? probably more helpful than the real parser [18:04:52] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CodeMirror/+/master/src/codemirror.mode.mediawiki.js [18:04:54] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CodeMirror/+/master/src/codemirror.mode.mediawiki.config.js [18:05:47] i mean, that's still a good chunk of code. but probably like 5% the size of the real thing, and probably more likely to be useful for you [18:06:03] MatmaRex: Yes, that is probably the way to go, thanks for the links