[01:47:46] Gah, I am getting quite suspicious that DreamHost is infiltrated by pfof ponzi scheme counterfeit naked short selling infinite money glitch financial terrorist funded staff, lol. For example https://mediawiki.org/wiki/Manual:Security states "Another very important thing you should do to secure your MediaWiki installation: Make sure the user running php does not have write access to any web-accessible file or directory that [01:47:46] php is enabled to run on." and the latest services offered by Dreamhsot shared hosting accounts are run in such a way that the account holder user account used to access SFTP/SSH is the same user that PHP runs as to handle the website files. [01:48:14] In addition to checking myself to confirm, in DreamHost live chat I asked: "Is it correct that with DreamHost shared hosting accounts, the PHP user is the same user that owns all of the files created on the account and therefore has write access to everything that the site admin has access to?" [01:48:35] They (Tyler) said: "let me look into that for you ... This guide has a section for securing mediawiki on a dreamhost server https://help.dreamhost.com/hc/en-us/articles/217292577" [01:48:45] Since that did not answer the question, I asked again [01:48:50] They said: "Yes I believe that the mediawiki documentation is referring to the FTP user as the php user" [01:49:01] I asked: "Is there a way to configure every single file to be owned by a user different than the user that PHP is running as so that PHP does not have write access to every file?" [01:49:12] They said: "that is not possible due to the file structure of the servers: https://help.dreamhost.com/hc/en-us/articles/215562847" [01:52:20] (asking here) Is running PHP as the user that owns all the files secure? Or are there any known security vulnerabilities or security concerns that exist if PHP is run as the user that owns the files? [01:55:04] preventing the webserver from being able to write executable files is webserver defense in depth 101 [01:56:19] having your MediaWiki files being writable isn't inherently insecure, like running an old MediaWiki version or setting your database password to "password" would be [01:57:34] but if some other vulnerability allowed the creation or modification of arbitrary MediaWiki files, it would allow the attackers full control over MediaWiki [01:58:21] Tyler answered the same question also, stating "It is my understanding that it is, as long as the localsettings.php has had its permissions updated to 640 or 600" [02:02:51] that is also important, but a different problem [02:03:10] that stops someone else with access to the shared host from reading your database password out of LocalSettings.php [02:04:01] I'm just kinda realizing this concern now, that PHP is run as the same user that I am logged in via SSH/SFTP to administrate the website. Am I wrong to be concerned? I'm considering that I can no longer trust DreamHost if this is their implementation of service. It wasn't always this way when I signed up and created the account years ago. Am I being overly concerned? [02:10:52] Also, in terms of ease of deploying malware hidden on websites that are difficult to verify, especially in terms of ensuring all the files are not tampered with in any way, this would be difficult given that I'd have to constantly download all the files to ensure they match locally [02:13:50] lol, in the last 20 years I've hopped from hosting provider to hosting provider, having different favorites change after finding some inconvenience, and it's been a long time since I encountered something inconvenient, but I think i might have to consider finding a new favorite again [03:16:00] oh I see https://mediawiki.org/wiki/Manual:Security "where username is a user other than the webserver or mysql user (commonly you would use your own username provided mysql and php are not running as your username)." [03:19:04] oh wow, there's a video of me on that page. I look so awkward [03:28:14] you're fine and it's a good presentation, you should be proud! [03:34:39] bawolff, proud? hell yeah! I just moved my cache directories outside of the web directory... but in my case that probably doesn't matter, cuz on my DreamHost shared hosting account, PHP is running s the same user as me, and I can't change that. DreamHost wasn't always like this, but I guess they changed their service operations to make their services less secure for whatever reason [03:35:12] but I also am considering to switch to a different hosting provider when I can afford it [03:35:36] I just don't know who to trust though, so I'll have to do a bunch of research and investigation and due diligence yet again [04:14:04] thanks :) [04:16:01] ryzenda: For VPS - digital ocean is a popular one that's fairly reasonably priced [13:05:39] Hope this is the right channel to ask. So I've been trying to come up with a way to determine whether a string is a reference or not, by trying to see if the first character of the string is a "<" (I realize that can result in false positives but doesn't matter for my use case). I was hoping that {{#rpos:Foo|<}} would output 0 but instead it outputs -1 [13:05:58] anyone have a workaround for this? or another easy way to determine whether a string is a reference (or starts with a <)? [13:47:31] ecks: looks like is being parsed before #rpos:, and it only finds a strip marker. See https://www.mediawiki.org/wiki/Strip_marker [13:48:41] Although hacky, it may work if you try to find --ref- inside that string [17:26:50] A couple days ago I started to manually create articles on my mediawiki site that are otherwise core articles required for the wiki site to function normally. I manually created these 4 articles (probably incorrectly too): 1) Template:Reflist 2) Template:Reflist/styles.css 3) Template:Cite_web 4) Module:Citation/CS1 [17:27:30] After I created those 4 articles I realized that I am probably doing it wrong and that I should probably figure out how to properly populate those articles to exist some other way, but I still haven't figured out how to do that and I'm still reading the documentation to find out how. [17:27:41] !exporttemplates [17:27:41] To copy templates from Wikipedia, use Special:Export and check the "Include templates" option to get all the sub-templates, then upload the file with Special:Import on your wiki. You'll also likely have to install the ParserFunctions extension, Scribunto extension and install/enable HTML tidy. You also might need some CSS from Wikipedia's Common.css. You'll also need a lot of... [17:27:45] !import [17:27:45] To import a few pages, use Special:Import - you can also import pages directly from another wiki (see !importsources). For mass imports, use importDump.php - see for details. NOTE: when using content from another wiki, follow the LICENSE TERMS, especially, attribute source and authors! [17:27:45] !export [17:27:46] To export pages from a wiki, navigate to Special:Export on the wiki, type in the names of the pages to export, and hit "export". See for an example of this form. See also: !import [18:30:09] I tried searching for the complete message shown from the !exporttemplates command, but I can't find what the rest of the words are. Is that message documented/published anywhere? [18:30:33] !botbrain [18:30:33] Hello, I'm wm-bot. The database for this channel is published at https://wm-bot.wmflabs.org/dump/%23mediawiki.htm More about WM-Bot: https://meta.wikimedia.org/wiki/wm-bot [18:30:57] It's very helpfully truncated there too [18:31:20] lol nice [18:34:46] can that bot trigger itself? [18:34:48] !3x !a [18:34:48] !a !a !a [18:34:57] apparently not [18:35:14] :( [18:35:43] Damn, Add pages from category "Template" and "Module" do not show any results in https://en.wikipedia.org/wiki/Special:Export but I guess because those are namespaces, and not categories [18:36:49] so then if I want to export and import https://en.wikipedia.org/wiki/Template:Reflist (for starters), this would be in category..... [18:38:34] Aha "Pages using reflist with unknown parameters" works, but this way of configuring/setting up MediaWiki base installation seems to be quite tedious if I have to find all of these data sources to import indiviaully [18:39:04] oh, and that category name doesn't seem right too [18:43:20] https://mediawiki.org/wiki/Help:Templates "Unlike extensions and media files, there is no central repository for templates." RIP! lol [18:45:21] I'm surprised it hasn't been done yet, but, if I want to create an article on Wikipedia or Mediwiki site that documents the complete process to export/import all the important templates for bare/base installation, where would be a best place and article name to start documenting these data/informations in case anyone else may benefit from it too? [18:47:59] For example, the base installation comes with these extensions: CategoryTree Cite CiteThisPage CodeEditor ConfirmEdit Gadgets ImageMap InputBox Interwiki LocalisationUpdate MultimediaViewer Nuke OATHAuth PageImages ParserFunctions PdfHandler Poem Renameuser ReplaceText Scribunto SecureLinkFixer SpamBlacklist SyntaxHighlight_GeSHi TemplateData TextExtracts VisualEditor WikiEditor [18:48:46] and if I want to make sure to populate the installation with all the relevant/useful templates, modules, stylesheets and other mediawiki articles that those extensions use, documentation that fully explains all of this information so that it's quick and simple [18:49:50] and then in my case, I added a bunch of other extensions that are listed on https://mediawiki.org/wiki/Category:Extensions_used_on_Wikimedia and same information for all of those extensions as well [18:56:14] Aha! https://mediawiki.org/wiki/Help:Templates#Copying_from_one_wiki_to_another "and download an .xml file with the complete history of all necessary templates" [18:56:36] Now I just need to figure out how do I compile a 100% complete list of identifying what is necessary without missing a single necessary template [19:05:47] I found https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Templates_transcluded_on_the_most_pages which might be useful [19:33:47] question: why would $wgNamespaceProtection[NS_MAIN] = array( 'sysop' ); not work, as in i'm admin and can't edit? [19:40:58] the value of wgNamespaceProtection needs to be a permission not a group [19:41:10] e.g. 'editinterface' [19:42:27] ah. so if i want to restrict editing to administrators, i use "edite [19:42:35] interface instead of sysop there? [19:42:46] correct [19:42:58] Since by default only the sysop group has editinterface [19:42:59] thank you vey much, will try [19:44:20] (Correction, both the sysop and interface-admin groups have editinterface) [19:45:08] works like a charm! thanks again guys [19:45:14] o7 [19:45:18] yw [22:07:39] oh damn, my import using importDump.php, it started out with 2-3 pages per second after, but now it's almost under 1 page per second [22:08:06] I wonder why it has slowed down so much after only 8,200 page imports [22:08:30] are you importing only one revision per page or full histories? [22:09:10] just the latest revision [22:09:38] and I exported only 7,109 articles, so that's confusing too [22:09:51] all of them are modules and templates [22:11:45] oh also, I'm glad I started this process on a dev environment before doing it in production cuz I just noticed at the top of the xml file from the Wikipedia export that the first section is for Wikipedia. I should remove that before importing in production, right? [22:12:16] Nah [22:12:34] That's just metadata from the original source [22:13:28] Ideally it should be used to warn you if you try to import pages from namespaces not present on your installation, but it currently doesn't AFAIK [22:13:47] Aha! grep "^ " Wikipedia-20220307192410.xml |wc -l shows 16,946 lines [22:14:41] <ryzenda> actually, that makes sense for articles that have multiple data files for each article [22:19:50] <ryzenda> oh, interesting https://en.wikipedia.org/wiki/Module:Tlg is listed on https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Templates_transcluded_on_the_most_pages as being used on 6,300 Wikipedia articles, but it is deleted [22:21:23] <Reedy> https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Module:Tlg&limit=500 [22:21:35] <Reedy> Used on https://en.wikipedia.org/wiki/Template:Template_link_with_bold/sandbox [22:22:23] <Reedy> https://en.wikipedia.org/w/index.php?title=Template%3ATemplate_link_with_bold%2Fsandbox&type=revision&diff=1075827614&oldid=989701883 [22:22:25] <Reedy> should fix it [22:25:04] <ryzenda> Also the last 5 articles in https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Module:Tlg&limit=500&offset=3%7C65670596&dir=next [22:25:09] <ryzenda> the rest of the articles are all user talk pages [22:26:51] <Reedy> It'll be nested templates and stuff [22:38:28] <ryzenda> Oh damn, the import failed, and i was running it in screen so I can't see all the lines [22:39:43] <ryzenda> nevermind, copy mode [22:40:07] <ryzenda> PHP Warning: proc_open(): unable to create pipe Too many open files in /home/moasspedia_dev/dev.moasspedia.org/mediawiki-1.37.1/extensions/Scribunto/includes/engines/LuaStandalone/LuaStandaloneInterpreter.php on line 155 [22:40:33] <ryzenda> so maybe if I split it up into smaller xml files then... [22:40:59] <Reedy> That feels like a bit of a bug though [22:42:20] <ryzenda> alright, I'll copy the output and paste a link here so you can see too, one moment [22:45:40] <ryzenda> Reedy, http://ix.io/3RDL [22:46:53] <ryzenda> Also, I was curious how come the revisions per second was slowing down, lol [22:46:59] <ryzenda> maybe that is related to the bug too? [22:47:52] <Reedy> It could be related to it, yeah [22:47:59] <Reedy> and/or just reaching more complex pages [22:52:06] <ryzenda> oh, and comparing the 7,109 lines of modules/templates that I input for the Wikipedia export,to the <title>s listed in the xml dump, after the last one "Wink", the xml file has 9,839 additional articles that were pulled in [22:52:19] legoktm: ^ Any chance shellbox is keeping more file handles ipen than it should be [22:52:23] *open [22:52:55] many of them are /doc from existing ones that I specified [22:54:25] like I said, since I ran this in a dev environment, it's cool if there are any problems with the site, but I'm gonna check out to see what kind of damage there is [22:58:13] Huh [22:58:22] It could be leaking I guess? [22:58:54] I think it's more likely Scribunto leaking it [22:59:15] Except it doesn't hard fail when it can't open the pipe [22:59:23] Unlike Shellbox [23:00:23] scribunto being leaky would explain some behavior I've seen when performance tanked over time of an initial indexing of the wiki for elasticsearch [23:00:34] (since that involves reparsing everything aiui) [23:01:22] but there's so many in-memory caching layers there and I didn't think to profile it so I can't be certain of anything [23:02:30] I want to try to identify the very last article that was successfully added to the database before the import job failed with http://ix.io/3RDL - I am looking at the database with phpMyAdmin right now. Where can I look to try to figure out the last successful article imported? [23:03:04] and if the php script fails, does that break all the imports at the end? or are those safe, without any post-processing? [23:03:46] worst case scenario, i have a database backup for the dev environment that I can restore [23:04:48] ryzenda: look at `page` and `revision` tables [23:05:21] ryzenda: you can probably just run the import script again, and it will probably skip the revisions that were already imported [23:05:33] (no promises though, please test elsewhere first if this is a production site) [23:05:43] (that's how i remember it working) [23:05:53] ah, right, I'll do that [23:06:21] re please test elsewhere, this is a dev environment I'm working on, lol, so it's not an issue [23:07:56] Hell yeah! It's skipping them! Look at this speed! 100 (152.55 pages/sec 152.55 revs/sec) [23:09:08] oh, and this time the speed INCREASED, as it got to 8200, lol [23:09:25] literally, it went up every 100 -> 8200 (402.10 pages/sec 402.10 revs/sec) [23:15:16] probably including the skipped revs in those stats [23:15:17] but also, the database isn't growing anymore while running the importDump.php script again. The last line of output I see so far is: 8300 (63.47 pages/sec 63.47 revs/sec) and it has been more than 100 seconds (for average 1 page per second) [23:16:03] try flipping your transaction isolation level to read uncommitted and check again [23:16:44] I have no idea what that is or how to do that. [23:16:51] oh nvm, thought you were checking in the mysql cli [23:16:54] disregard me :) [23:16:58] I'm using phpMyAdmin [23:17:52] Aha, it is showing a few more now,bbbbbbbuuuuuuuttttttt [23:18:13] page_id was 8335 just a couple minutes ago [23:18:28] highest page id as of 20 seconds ago was 8348 [23:18:31] ok yeah in phpmyadmin you can probably do it too, just slap SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; before your select statement(s) [23:18:53] ah, 8372 now,it's going faster finally, cuz it was stuck at 8335 for like 1-2 minutes [23:19:26] maybe 8335 was a really long one or something [23:19:36] no, you're running into transaction isolation [23:19:41] hence my instructions to relax that [23:20:28] iirc the default isolation level in mysql is repeatable read, meaning that for a single mysql connection you will always select the same data each time even if some other process changes the table [23:20:49] so you'll only see the updated data when phpmyadmin closes out the old connection and creates a fresh one [23:21:19] I suppose. I'm refreshing every few seconds and it is showing updates in real-time now, and was stalled for a couple minutes before at 8335 [23:21:32] read committed and read uncommitted are looser levels that allow subsequent selects against the same table to pick up the updated data [23:23:53] 8400 (8.66 pages/sec 8.66 revs/sec) [23:24:27] As a suggestion, I want to see the pages per second for each batch of 100, not including all the previous hundreds/thousands of pages [23:24:40] cuz that last 100 pages of import was NOT 8 pages per second [23:24:45] more like 0.5 pages per second [23:31:21] heck, I can do some maths: by 8200 it showed 402.1 pages/sec (~20.4 seconds), then 8300 showed 63.47 pages/sec (~131 seconds), then 8400 showed 8.66 pages/sec (~970 seconds), thne 8500 showed 7.05 pages/sec (1205 seconds) [23:33:15] so, 20 seconds to skip already imported, 111 seconds for the next batch of 100, 14 minutes to import the next 100, then 4 minutes for the next 100 [23:34:15] I think the MySQL server for my site is incredibly slow!!!