[03:15:34] [[Tech]]; MediaWiki message delivery; /* Community Wishlist Survey 2022 is coming. Help us! */ new section; https://meta.wikimedia.org/w/index.php?diff=22509107&oldid=22483873&rcid=21025383 [16:45:48] FYI. I seem some errors on phabricator [16:46:09] Things like: PhabricatorClusterStrandedException: Unable to establish a connection to any database host (while trying "phabricator_paste"). All masters and replicas are completely unreachable. AphrontConnectionLostQueryException: #2006: MySQL server has gone away This error may occur if your configured MySQL "wait_timeout" or "max_allowed_packet" [16:46:09] values are too small. This may also indicate that something used the MySQL "KILL " command to kill the connection running the query. [16:46:23] from nl [16:49:18] see T298369 [16:49:19] T298369: Phabricator can not connect to MySQL - https://phabricator.wikimedia.org/T298369 [19:06:50] Hello, can I ask a few questions about parsoid API? [19:08:29] yes [19:09:35] So first of all, my initial goal was to convert the wikinews dump, which is given in wikitext into html [19:10:05] I hoped to achieve this using parsoid, since I would be sending too many requests if I used the wikinews rest_api [19:10:42] I read that current version of mediawiki have parsoid bundled with it [19:11:10] so I tried accessing the parsoid using the API: https://www.mediawiki.org/wiki/Parsoid/API [19:11:54] but this did not work, is this because the parsoid bundled with mediawiki is the PHP implementation and so does mediawiki not host the parsoid API? [19:12:38] You need to use the rest api [19:13:02] "On Wikimedia wikis, Parsoid's API is not accessible on the public Internet. On these wikis, you can access Parsoid's content via RESTBase's REST API (e.g.: https://en.wikipedia.org/api/rest_v1/ )." [19:13:05] From the top of the page [19:13:24] do you mean the mediawiki_rest api or the wikinews_restapi? Because confusing the two is where I wasted a lot of time on [19:13:58] Something like https://en.wikinews.org/api/rest_v1/page/html/Main_Page [19:14:04] Would give you html for a page [19:14:21] I installed mediawiki thinking I could use the transforms endpoint to convert the documents but I learnt that [19:15:09] mediawiki API and wikimedia API are not the same: https://www.mediawiki.org/wiki/API:REST_API#Extension_endpoints [19:15:31] and that mediawiki did not support transform endpoints, and only things with basic functionality like page [19:16:17] Are you trying to end with a html dump of all wiki news? [19:16:22] What's your use case? [19:16:50] Yes that is my intention, which I thought would be quite rude if I used the wikinews rest api for all articles [19:17:17] so I considered settingup my own mediawiki server and importing the templates from the wikinews server [19:18:20] apergos: do you have any idea? [19:18:33] dumps.wikimedia.org doesn't do html dumps [19:18:46] Guest25: that would be huge [19:19:33] I know, but I wanted to work with the big data, I think wikipedia offers html dump so it should be considerably smaller [19:20:43] so am I correct in assuming that parsoid api is only available in the js version and not the php version bundled with mediawiki? https://www.mediawiki.org/wiki/Parsoid/API [19:22:10] The rest.php endpoint from mediawiki uses parsoid php [19:23:53] https://en.wikinews.org/w/rest.php/en.wikinews.org/v3/page/html/Main_Page?redirect=false&stash=true [19:24:03] Guest25: ^ [19:24:15] Both of them urls give parsoid php [19:24:22] I have no idea about request rates [19:24:41] I didnt quite understand what you mean [19:25:37] Oh yes, I think under the hood "page" request uses parsoid [19:25:45] even though it is not a parsoid API [19:25:48] Guest25: that url I've given should be the parsoid API [19:26:05] just replace Main_Page with the page you want [19:26:59] It would take a while to get all of wikinews [19:26:59] yes, but this would require me to load the entire dump into my media wiki correct? [19:27:38] Why not use the wikimedia api [19:27:49] Talk to the parsing team about request rates [19:28:46] Is there a way to contact the parsing team? I am only doing this as a personal project, so I did not want to unnecessarily stress the servers [19:29:45] Probably file a phab task [19:30:16] Guest25: https://phabricator.wikimedia.org/maniphest/task/edit/form/1/ [19:30:32] Please be aware phab is a bit intermittent today so try again later if it fails [19:32:11] So, I should ask the same question in Phabricator right? I am new to the mediawiki community, so I dont have an understanding on what is considered a norm. Sorry. [19:32:29] Should be fine as long as it loads [19:32:48] Thank you [19:32:59] for helping RhinosF1 [19:33:31] Np [19:33:40] Feel free to add me to the task [19:35:03] Thanks [19:38:07] there are html dumps for some of the wikinews sites [19:38:19] https://dumps.wikimedia.org/other/enterprise_html/runs/20211220/ [19:38:25] Guest25: [19:38:49] note that staff are on vacation right now so pings are unlikely to be seen in a timely fashion, RhinosF1 [19:39:19] you're only getting me now because I happened to look in. [19:40:08] and I'm checking out again, given it's late evening here. hope everyone is having good holidays (as good as they can be) [19:40:45] Thanks, Ive noticed that some dumps are not linked by any page, what is the correct way of navigating through the dump files? [19:41:15] apergos: thank you, yeah that's why I thought a task might be better because then people will see on workboards when you're back. Please have a good holiday. [19:42:45] https://dumps.wikimedia.org/other/ this should link to any such dumps, or to an index.html page for any given set of dumps. [19:42:51] enjoy! [19:42:58] * RhinosF1 is going off for a bit now, never looked at the enterprise dumps [19:43:13] Guest25: each file is per namespace [19:44:00] https://dumps.wikimedia.org/other/enterprise_html/runs/20211220/enwikinews-NS0-20211220-ENTERPRISE-HTML.json.tar.gz is the latest main namespace dump [19:44:28] I see, I thought you had to pay for enterprise dumps [19:44:38] https://dumps.wikimedia.org/other/enterprise_html/ is the spec [19:44:44] Normal people don't [19:45:24] would you have to pay if it becomes a commercial project? [19:45:32] No idea [19:45:37] ok thank you [19:46:31] https://meta.wikimedia.org/wiki/Wikimedia_Enterprise#Access [19:46:39] Seems only if you want live data [19:46:45] Not the 2 weekly dumps [19:47:20] Ok great, my problem solved than thank you very much [19:47:30] *then [19:49:35] Glad we could help [19:50:21] * RhinosF1 should read up on enterprise api [20:12:02] It already is a commercial project [20:23:05] I meant if my project became a commercial project