[10:58:54] !log bsadowski1@tools-bastion-13 tools.stewardbots Restarted StewardBot/SULWatcher because of a connection loss [10:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [10:59:12] !log bsadowski1@tools-bastion-13 tools.stewardbots Restarted StewardBot/StewardBot because of a connection loss [10:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [12:42:54] !log terraform shutdown tf-pm-1 [13:44:27] stashbot didn’t log that terraform message AFAICT [13:44:27] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [13:45:48] okay, looks like it had a connection hiccup around 12:37–12:45 UTC but recovered on its own… taavi: want to re-log that message? [13:47:22] !log terraform shutdown tf-pm-1 [13:47:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Terraform/SAL [13:47:29] thanks lucaswerkmeister! [13:47:33] np :) [16:24:48] I don't know where should I complain, but I'm getting a slow SSL handshake accessing Wikipedia. Like 5 seconds in every request. [16:27:40] Not completely, but https://wikitech.wikimedia.org/wiki/Reporting_a_connectivity_issue is kinda related [21:34:05] hi, i'm having a bit of trouble with https://patchdemo.wmflabs.org/ , someone is flooding it with weird requests and saturating the CPU. i'm not sure if it's scraping gone wrong, or an attack, or what. can you advise what to do about it? [21:39:27] MatmaRex: there isn't much that can be done from the shared infrastructure side. If there is some signature you can see related to user-agent or similar then you could try dropping requests at your backend service. musikanimal has some experience with this in xtools. [21:40:45] it looks like almost all of the the traffic is to one unremarkable wiki. i'll delete it and see what happens [21:40:53] I noticed PatchDemo was slow! most of the "expensive" things though are behind login, right? that's the best way to stop the botz [21:41:02] You can't see the client IPs without a special exception in the front proxy. That is something that is possible if you don't find other ways to track and block the unwanted traffic. [21:41:17] i.e. you could hide the "Previously generated wikis" unless you're logged in, or cache that query [21:41:28] user-agents look like a mix of normal browsers, various versions. i guess it is an attack. meh [21:42:15] the front page should be pretty cheap and it has some caching. it's hitting mediawiki's special:userlogin [21:42:22] anyway, let's see if it stops if i delete that wiki [21:43:24] putting the big table of attack targets behind the "Sign in with OAuth" step as musikanimal suggests might be helpful magic. I suppose that will break some workflows somewhere as a result. [21:46:31] hmm, i doubt it would do much, the targets are also linked all over phabricator [21:48:15] We have had a bunch of issues with things scraping Phabricator recently. It is possible that this is knock-on effects from that. It can be tricky to figure out where an overeager scraper is getting its URL feed from. [21:48:33] it looks like the traffic now moved to another wiki. maybe it is actually just recursively scaping the whole site or something [21:49:14] and hitting some bug in mediawiki that makes it follow infinite redirects [21:49:59] because it's fetching urls like this: [21:50:07] wikis/937aac9017/w/index.php?mobileaction=toggle_view_desktop&returnto=Special%3AUrlShortener&returntoquery=mobileaction%3Dtoggle_view_desktop%26url%3Dhttps%253A%252F%252Fpatchdemo.wmflabs.org%252Fwikis%252F937aac9017%252Fw%252Findex.php%253Ftitle%253DSpecial%253AUserLogin%2526returnto%253DTalk%25253AMarvin_the_Paranoid_Android%2526returntoquery%253Daction%25253Dedit%252526mfnoscript%25253D1%2525 [21:50:14] and that's a short one :) [21:50:53] let me see if i can block anything with UrlShortener in the path [21:54:59] Adding something at https://patchdemo.wmflabs.org/robots.txt might not be a bad idea to keep out nice webcrawlers. In Toolforge we autogenerate a "Disallow: /" robots.txt if the tool doesn't serve up their own. [21:55:51] that would probably be a nice feature to add to the wmcloud.org shared proxy too. [21:58:15] after all these years, I'm still not convinced the bots are checking https://xtools.wmcloud.org/robots.txt (which disallows all bots). I see still get legitimate, good bots who are known to honor robots.txt… it's quite a mystery [21:59:13] and I have XFF headers enabled, so I'm able to verify that "GoogleBot" is the real GoogleBot [21:59:39] yet they scrape away all day every day, as does Bing and all the other good bots [21:59:46] (I blocked all of them though via UA) [22:00:49] and I had made robots.txt crawlable for a long time, so they weren't blocked from viewing that [22:05:29] (i still haven't figured out how to make apache block those requests. all the RewriteRule stuff i tried has no effect.) [22:06:11] I don't know if this helps (see step #8) https://wikitech.wikimedia.org/wiki/Tool:XTools#Building_a_new_instance [22:12:12] musikanimal: i tried that too (instead of .htaccess), no effect. maybe i just have some stupid typo [22:12:18] RewriteEngine On [22:12:18] RewriteCond "%{QUERY_STRING}" "UrlShortener" [22:12:18] RewriteRule "." "-" [F,L] [22:12:22] why doesn't this work? [22:16:27] hmm, it's because i have AllowOverride all. but i need that for my things to work [22:16:36] My Apache2 skillz have been pretty unused for years, but that looks like it should work to me. It looks mostly like the example at https://httpd.apache.org/docs/current/rewrite/intro.html#rewritecond [22:18:42] it's as if the other RewriteRules used by the application were overwriting all of the RewriteRules i define in /etc/apache2/blahblah. i don't understand why that happens [22:19:41] `AllowOverride` is just about enabling .htaccess files right? [22:20:24] "By default, mod_rewrite overrides rules when merging sections belonging to the same context. The RewriteOptions directive can change this behavior, for example using the Inherit setting." [22:20:31] this sounds suspicious, but what the hell does it really mean [22:21:42] there we go. i needed "RewriteOptions Inherit" in the /etc/apache2/blahblah config file [22:22:52] and my cpu use is down to like 3% at the small price of disallowing access to UrlShortener [22:23:35] the weird thing is still scraping, but i don't care now. thanks for the help [22:28:45] i wrote this up at https://gitlab.wikimedia.org/repos/ci-tools/patchdemo/-/issues/603 for future reference. good night :)