[02:07:59] Nemo_bis: I guess I just wanted to see the way it reads from stream.wm.o for recent changes and how it gets external links etc. until it hands it off to HQ. pure curiousity, nothing else. [02:11:47] Nemo_bis: I hadn't considered that it was not by choice. Storing forever vs only temporarily, I figured it was by choice to prioritise and limit crawling depth, e.g. do we really want to crawl every file on github at every branch/tag/hash, or limit to what is relevant/curated by people through demonstrating relevance and importance that way. similar for every tweet and fart vs slightly higher value/effort publications like a blog post. [02:14:42] (I'm aware that there are also preserved copies of github repos in git form being done, I mean specifically in webpage form.) [02:29:03] Krinkle: there are prioritisation choices but mostly for what kind of content gets archived. For instance the IA won't try to archive the high resolution video of all YouTube videos they come across. There used to be a very low limit on how big e.g. ZIP files could be before they'd be ignored by the wayback machine. Etc. [02:31:22] There's still so much seemingly low hanging fruit to be archived, for instance simple web pages from many online newspapers aren't crawled just because it's so hard to have proper sitemaps. Your own example was from a major national publication and still it escaped crawls, definitely not by choice. [02:31:57] Which is why ArchiveTeam sometimes can add value by coding site-specific crawlers to make sure an archival copy is (reasonably) exhaustive before it's too late. [02:50:34] ack [02:51:23] I did find that the firehose for org.posts contains 99% spam and seemingly duplicated content farm stuff. Had it open for a few hours and gave up on trying to find a post I wanted to be seen linking to as example. [02:52:18] anyway, thanks for the insights Nemo, I appreciate it. I'll take that into consideration for my planned next post I'm writing that continues on this topic. [05:39:07] Thanks for your writeups Krinkle ! [06:12:01] https://timotijhof.net/posts/2022/internet-archive-crawling/ is super cool [22:41:24] Is there any way to mark a use of !important in CSS as needed? From my testing I need it to make the hiding work. [22:41:31] i.e. the display:none work [22:41:39] (relevant patch is https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/807217) [23:00:40] !important is only needed to override inline styles and other uses of !important [23:00:40] Sorry, you are not authorized to perform this [23:01:15] https://www.mediawiki.org/wiki/Manual:Coding_conventions/CSS#!important [23:01:54] try using the specificity hacks [23:09:57] Thanks for the link. I may be able to override the OOUI styles that apply the display block [23:27:41] Thanks. Those helped me get around the issue. [23:45:42] This sounds suspiciously like exactly what I was doing yesterday (precedence issues when overriding an OOUI style and wanting to use !important, but having to increase specificity instead) [23:46:01] Am I Dreamy_Jazz in the past!? [23:46:37] Maybe... I don't think someone has run a CU on me just yet :) [23:47:32] I'm probably hiding the button to even do that right now^Hthen^Hnext! [23:47:50] Hehe [23:55:41] there is some magic comment to disable stylelint, if you really want to [23:56:23] i think that increasing specificity is usually less likely to cause issues in the future [23:57:56] here's an example: https://gerrit.wikimedia.org/g/mediawiki/extensions/VisualEditor/+/3b9e0bf0dcac35d9c69a9307bc0cddc9febd29f0/modules/ve-mw/ce/styles/annotations/ve.ce.MWExternalLinkAnnotation.css#15 [23:58:51] or like this: https://gerrit.wikimedia.org/g/mediawiki/extensions/VisualEditor/+/3b9e0bf0dcac35d9c69a9307bc0cddc9febd29f0/modules/ve-mw/init/styles/ve.init.mw.DesktopTarget-vector.less#43