[14:00:15] Could someone have a look at https://phabricator.wikimedia.org/T399464, looks like our DB is down since yesterday incident [14:00:27] Thks in advance [14:01:02] benoit74: looking [14:05:08] not responding to ssh, I'm trying a hard reboot [14:05:34] yes now I can ssh, let's see if the db is healthy [14:06:27] "mysqld: ready for connections" [14:06:43] from the logs, it crashed on 2025-07-12, so one day after the outage [14:07:00] no clear error messages [14:13:06] it did also crash during the outage [14:13:19] I added some more details to the task. should be working now! [14:13:21] I confirm it is back only, thank you very much ; let's hope it will not crash again [14:13:29] *online [14:14:02] Looking at some data, at least there is some good data readable in the DB :D [14:14:11] great [17:02:54] !log lucaswerkmeister@tools-bastion-13 tools.codex-playground deployed 64484d3e43 (Codex 2.2.0) [17:02:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.codex-playground/SAL [17:56:09] !log lucaswerkmeister@tools-bastion-13 tools.speedpatrolling deployed 120a24209b (add health-check-path) [17:56:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.speedpatrolling/SAL [19:02:31] the webserver behind openstack-browser.toolforge.org gets an internal server error but only for a very specific URL.. oddly [19:02:58] https://openstack-browser.toolforge.org/puppetclass/role::simplelamp - fine.. a puppet class that was deleted / is not used by anything [19:03:21] https://openstack-browser.toolforge.org/puppetclass/role::simplelamp2 - Internal Server Error - a puppet class used by something [19:03:36] https://openstack-browser.toolforge.org/puppetclass/role::simplelamp3 - fine.. a puppet class that never existed [19:04:33] it's also fine for other puppet classes used by something.. and if I add random numbers to them.. weird, eh ?:) [19:05:21] well, I know I should create a ticket anyways.. no need to live debug [19:16:25] I guess there’s just generally more work to do for puppet classes used by something? [19:20:25] so far I have only found it broken for this one class. https://phabricator.wikimedia.org/T399492 [19:23:48] this is another one that is used by multiple instances and no issue. oh well. https://openstack-browser.toolforge.org/puppetclass/role::gitlab_runner [21:46:34] hello! So I have installed Anubis on my 3rd party wiki and it works great. That's what I would like to use for XTools https://phabricator.wikimedia.org/T384711 [21:47:06] I think in order for it to work, it needs access to the SSL certificate. How would I be able to do that from a VPS project? [21:48:08] I assume the web proxy form in Horizon sets up a cert somewhere? [21:49:15] I thought the TLS proxying happened before traffic reaches your instance [21:49:23] (but I don’t have a lot of experience with cloud VPS, more with Toolforge) [21:49:24] I know at least initally Anubis was designed to work between TLS termination and your app [21:49:40] but I haven't looked at it in a bit [21:50:14] is installing Anubis on Wikimedia Cloud Services okay if the FSF considers it malware [21:51:29] okay, `lucaswerkmeister@tools-bastion-13:~$ curl -H 'Host: xtools.wmcloud.org' http://172.16.2.178:80` suggests the instance doesn’t do TLS termination indeed [21:51:42] (IP and port from https://openstack-browser.toolforge.org/project/xtools) [21:52:13] you can speak plain-text HTTP to the VM and get a plain-text response back (an “access blocked” response, but I assume that’s expected) [21:52:26] [from within the WMCS network, that is] [21:53:39] yeah that's expected, XTools doesn't like to be curl'd, lol [21:53:42] so if it’s true that Anubis needs the TLS certificate, then you might be out of luck (but also, I would find that surprising, what AC said sounds plausible to me) [21:55:39] I can do everything mentioned here https://anubis.techaro.lol/docs/admin/environments/apache except I don't know where the SSL cert files are [21:57:22] I think what you instead want is a block on port *80* that *doesn’t* terminate TLS and forwards to Anubis (variant of block 2), and then a block on port 3001 (or wherever) that actually serves your websites (block 3) [21:57:38] and the TLS termination is done by WMCS already [21:58:29] that config already proxies to Anubis (on port 9000) over HTTP, so I doubt the TLS part is necessary [21:58:35] it’s just how they expect most websites to be set up [22:02:33] I see [22:02:52] there is this thing called a "web proxy" that you can click in Horizon and it creates the TLS termination in front of the instance [22:02:56] you are right that the certs are there [22:03:23] I dont think giving a tool access to those will be an option [22:03:51] so more like that was already said above [22:06:28] alrighty! I will try to do something like Lucas suggested. Bryan had mentioned Anubis as a potential solution for me. I don't know if he read through all of it but I took his word as meaning it's at least possible to do on VPS [22:06:42] thanks all! I'll report back with findings [22:07:07] `nc -l 80` and have your web proxy connect to it and then see if there's any particular headers that are interested to copy over to the next request [22:08:00] or `tcpdump` 🤣 [22:22:46] musikanimal: I don't think I ever said I had tested Anubis. I said I was interested in it as a possible hack for beta cluster. [22:23:44] If it needs to terminate TLS that is going to be a problem in Cloud VPS. Giving out the wildcard cert for wmcloud.org will be problematic. [22:24:21] definitely doesn't need wildcard, can make a new single domain ceet (re @wmtelegram_bot: If it needs to terminate TLS that is going to be a problem in Cloud VPS. Giving out the wildcard cert for wmcloud.org wi...) [22:25:00] @jeremy_b: that's not what I said either. The TLS cert we have is a wildcard for *.wmcloud.org [22:25:07] but... hopefully it doesn't actually need cert at all. see Lucas [22:25:34] ok? what's to stop someone from running certbot on their own? (re @wmtelegram_bot: @jeremy_b: that's not what I said either. The TLS cert we have is a wildcard for *.wmcloud.org) [22:26:16] not having a public ip [22:27:07] right. so that's a separate issue. but you need that to terminate TLS not just to issue the cert. [22:27:46] I checked the documentation. Anubis doesn't do TLS termination. [22:28:00] "Instead of your traffic going right from TLS termination into the backend, it takes a detour through Anubis. Anubis filters out the "bad" traffic and then passes the "good" traffic to another socket that Nginx has open. This final socket is what you will use to do HTTP routing." [22:29:03] I'm glad you're all here because I am a networking dummmyyyyy! :-P [22:29:25] if it means anything for my situation, I will be using Apache [22:29:55] also sorry Bryan, didn't mean to misquote you. I was just being optimistic that this could work, some way some how, hehe [22:30:09] musikanimal: The "# HTTPS listener that forwards to Anubis" part would be the Horizon managed front proxy. [22:30:12] would Anubis across all of VPS be an option? with no per-tool config? [22:30:24] the examples already are Apache (re @wmtelegram_bot: also sorry Bryan, didn't mean to misquote you. I was just being optimistic that this could work, some way some how...) [22:30:49] you setup anubis in your project, have it forward to your app, and then point the front proxy at anbis [22:30:50] AntiComposite was mentioning Nginx, so I just wanted to be clear what I'm using [22:31:09] right, in Horizon? [22:31:32] this might be really easy!!! [22:31:48] I'll try it out later tonight on xtools-dev [22:32:04] it should be fairly simple, yeah [22:32:11] correct. The Horizon managed front does all the nginx stuff that is listed in https://anubis.techaro.lol/docs/admin/environments/nginx/ already [22:32:24] 👍 [22:35:49] musikanimal: it seems like it would be simpler to put all of xtools behind OAuth, but maybe I'm missing something. [22:37:56] we effectively have that already. When the scrapers go wild, Apache itself can't handle it so the application layer logic is never seen [22:38:17] I've been whack-a-moling IP ranges for years. I'm hoping Anubis will finally put that to rest! [22:39:20] I’m not sure Anubis will help in that case – Apache still needs to handle the connections to forward them to Anubis… [22:39:22] I don't follow already have it. what is Apache doing if it's not waiting on application? [22:39:28] but maybe it’ll slow down the crawlers, idk [22:39:46] or reject them faster, earlier (re @wmtelegram_bot: but maybe it’ll slow down the crawlers, idk) [22:39:50] I didn't have to login to see https://xtools.wmcloud.org/ec/meta.wikimedia.org/BryanDavis, so I think I'm suggesting something different than what is currently implemented [22:40:31] yeah it looks a few things, like your request rate, and how complicated of a query it will be [22:40:38] I would require OAuth for _all_ page views [22:40:50] the actual traffic that gets through is mostly human, so I'm content with the application logic [22:41:18] what happened at https://phabricator.wikimedia.org/T384711 was different. It was effectively a DDoS [22:41:28] but they were all AI scrapers [22:41:54] if you are running a bunch of PHP to decide if you should run more PHP you're doing more work than I would expect [22:41:59] possible Anubis will get overloaded too, but it does stop traffic from getting to Apache and the more expensive application layer [22:42:30] no, it doesn't stop from getting to Apache [22:42:40] yes, it does, that's the whole point [22:43:06] well there's two problems, right: (1) the biggest problem -- SQL queries gone wild. That's fine if they are human, but we don't need bots requesting the Edit Counter for a user with a million edits and in each interface language [22:43:19] problem #2 is new to the AI bots [22:43:20] oh not in the examples but maybe in your case you're using Apache for lsss [22:43:37] {inernet} -> {wmcloud front proxy} -> {anubus} -> {apache} -> {php} -> {xtools} [22:43:46] yeah, so hopefully it will work! [22:44:01] anyway only problem #1 is implemented in PHP, and it works fine [22:44:44] problem #2 (DDoS-style scraping) meant the request never even got to where we'd do an OAuth check [22:45:19] requiring an OAuth authenticated session to do _anything_ would be a clean way to block asshole bots that should be relatively transparent to real users as long as you keep sessions for a reasonably long time [22:45:59] but maybe in your complex app stack checking a session is expensive too ¯\_(ツ)_/¯ [22:46:13] eh, that's my other problem! https://phabricator.wikimedia.org/T224382 [22:46:37] I can't keep session alive for very long, in any Symfony app. Which most likely means developer error [22:47:33] again, I'z the networking dummy (in this case with respect to XTools talking to the db to fetch sessions) [22:48:14] off the top of my head I can't think of anything that mediawiki/oauthclient would have to do with session management