[00:16:18] <wm-bb>	 <Yonatan> I have added a comment on phabricator T355138 regarding the glams dashboard DB
[03:49:01] <Guest33>	 Guest33HELloWORLd!!
[03:56:17] <tgr>	 how hard would it be to set up a beta cluster wiki which uses a different top-level domain?
[04:01:02] <tgr>	 I think we might need something like that for the upcoming SUL3 projects. All beta wikis are on the same registrable domain, and that makes a significant difference in some browser behaviors.
[04:05:51] <tgr>	 (filed T355281 about it)
[04:05:52] <stashbot>	 T355281: Set up some beta cluster wikis with different registrable domain - https://phabricator.wikimedia.org/T355281
[08:05:27] <Rook>	 !log paws upgrade rstudio-server T355288
[08:05:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
[08:05:31] <stashbot>	 T355288: Update rstudio - https://phabricator.wikimedia.org/T355288
[10:26:55] <bawolff>	 So on my tool page, i want to serve about 200 (static) images.
[10:27:06] <bawolff>	 When i load the page, the toolforge webserver gives 429
[10:27:22] <bawolff>	 Is there an easy way to disable that for my page
[10:28:00] <bawolff>	 Like its just a bunch of small images. The total size of all of them combined is 860KB, and they are all statically served
[10:30:37] <wm-bb>	 <lucaswerkmeister> are you using tools-static.wmflabs.org?
[10:35:07] <bawolff>	 no, I haven't heard of that
[10:36:16] <bawolff>	 I looked up the docs and found https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Serving_static_files . I'll give that a try
[10:39:15] <bawolff>	 Still gives 429's, but it seems to hit the limit much later
[10:40:19] <bawolff>	 Also doesn't use index.htm as the index, but that's ok, that doesn't really matter
[10:41:14] <bawolff>	 Also seems like it gives more aggresive caching headers by default, so things work on the second load
[10:43:15] <bawolff>	 While i don't see any fancier caching headers, but regardless, the images seem to be getting cached where they weren't before, which is a good thing
[11:13:24] <wm-bb>	 <lucaswerkmeister> (re index.htm, I imagined you could still serve that via a webservice if needed, just change the embedded resources to tools-static links)
[11:13:36] <wm-bb>	 <lucaswerkmeister> not sure what else can be done about the 429 errors though, sorry :S
[11:13:46] <wm-bb>	 <lucaswerkmeister> (would loading=lazy on the <img>s help?)
[11:16:23] <bawolff>	 I ended up just changing the links to have /index.htm
[11:16:30] <bawolff>	 maybe
[11:19:35] <bawolff>	 loading=lazy didn't really seem to help anything, maybe because all images are viewable from the initial load
[11:29:19] <wm-bb>	 <lucaswerkmeister> yeah, if it’s anything like an image grid then the images at least all need a width= and height= so the browser knows which of them are off-screen before loading them
[11:30:55] <wm-bb>	 <lucaswerkmeister> (I did that in https://github.com/lucaswerkmeister/tool-pagepile-visual-filter/commit/2b9bce7d845106bc83b2c05caed2629789fc5d06 fwiw ^^)
[11:33:21] <bawolff>	 Hmm, even setting min-height and min-width didn't seem to work, not sure why
[11:33:58] <bawolff>	 idk, its probably good enough at this point
[12:20:12] <taavi>	 !log tools.chobot comment newly added crontab entries and add a hopefully-unmissable warning to the crontab about the grid engine deprecation, T319626
[12:20:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.chobot/SAL
[12:20:17] <stashbot>	 T319626: Migrate chobot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T319626
[18:43:12] <tacsipacsi>	 Hi! The `sbot` Toolforge tool largely relies on making queries to the Wiki Replica databases (specifically `commonswiki_p`). It has been failing since Monday (Jan 15) with the error message `Access denied for user 'USER'@'MACHINE' (using password: YES)` (user name and IP redacted as I don’t know how private data they are). I tried to run `sql commonswiki` as the tool account, and got the same result. I tried to run `sq
[18:43:12] <tacsipacsi>	 l commonswiki` from my personal account, and works perfectly fine. Does anyone know what’s wrong with the credentials of sbot?
[18:46:57] <andrewbogott>	 tacsipacsi: That doesn't sound familiar but I'll see what I can see
[18:47:47] <tacsipacsi>	 andrewbogott: Thanks in advance!
[18:52:32] <andrewbogott>	 tacsipacsi: this is tangential but are you already on top of T320030 ?
[18:52:33] <stashbot>	 T320030: Migrate sbot from Toolforge GridEngine to Toolforge Kubernetes - https://phabricator.wikimedia.org/T320030
[18:58:08] <andrewbogott>	 tacsipacsi: I don't know what's up with the db creds (and the grid->k8s thing should be unrelated). I'd like to move your replica.my.cnf file out of the way and let new creds generate, is that OK? It might mean changing passwords if you have them coded in places outside of replica.my.cnf
[19:06:13] <tacsipacsi>	 *I* didn’t code the creds anywhere outside of replica.my.cnf. However, I’m not the author and not the only maintainer of the tool. I’m grepping through its files to see if the username is used anywhere else.
[19:06:23] <andrewbogott>	 ok
[19:07:06] <tacsipacsi>	 For the k8s migration: actually, all but one of the jobs that ran on December 18 on the grid engine already run in k8s, the one missing because I didn’t feel comfortable porting a non-source-available task. (Not all of the porting was done by me, but I ported the largest number of jobs.) A bunch of tasks have been stopped since early September, which should probably also be ported.
[19:10:21] <andrewbogott>	 Can you leave a note on that task about the migration status? It'll help us avoid having to chase you down later
[19:10:37] <andrewbogott>	 And the grid-based tool that was running there is currently stopped, which is maybe fine
[19:12:23] <tacsipacsi>	 I’ll do that. Also, I’m far from that point, but there were more than 50 jobs before September, which is the job count limit allowed in k8s. Is there any way to increase that, or will I need to merge jobs to stay under the limit?
[19:16:06] <andrewbogott>	 you can make a quota request to have the job limit increased
[19:16:32] <andrewbogott>	 (I'm still looking at the db issue, but it's looking interesting)
[19:17:52] <andrewbogott>	 tacsipacsi: would you mind making a bug for this issue? I need to create a task about what might be a general database creds issue, will attach it to your bug so you can get updates
[19:32:47] <tacsipacsi>	 So you want to debug the issue instead of just blindly regenerating the creds? If yes, how long will it probably take? The bot has been down since Monday, I’d like to get it up preferably before the weekend, but at the latest early next week.
[19:32:50] <tacsipacsi>	 I don’t know what to write in the bug description or which tags to use, so maybe it’s easier if you create it and subscribe me.
[19:33:24] <tacsipacsi>	 grep terminated in the meantime, without finding anything interesting.
[19:34:19] <andrewbogott>	 I'm pretty sure if I move the existing creds out of the way they won't regenerate. I'll try it though!
[19:35:04] <andrewbogott>	 I moved them, let's see what happens :)
[19:50:51] <tacsipacsi>	 Now `sql commonswiki` says: There is no configuration file for mysql to use, you will probably be unable to access the database
[19:50:51] <tacsipacsi>	 Enter password: 
[19:50:51] <tacsipacsi>	 ERROR 1045 (28000): Access denied for user 'tools.sbot'@'10.64.151.2' (using password: NO)
[19:51:03] <tacsipacsi>	 So indeed nothing has been regenerated.
[19:58:17] <taavi>	 andrewbogott: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Regenerate_replica.my.cnf
[19:59:14] <andrewbogott>	 thanks taavi
[20:00:09] <andrewbogott>	 I think something is more seriously broken but not sure yet
[20:01:42] <andrewbogott>	 shouldn't I be able to 'mysql --host=m5-master.eqiad.wmnet --user=labsdbaccounts labsdbaccounts --password=xxx' from a cloudcontrol?
[20:02:39] <taavi>	 --port=3306
[20:03:32] <andrewbogott>	 what the heck, isn't that the default?
[20:04:23] <taavi>	 the galera puppetization overrides that somewhere in the config to the node-local galera port
[20:04:45] <andrewbogott>	 ooooh that makes sense
[20:05:01] <taavi>	 (fixes welcome)
[20:05:11] <taavi>	 you're not the first person to get confused by that
[20:05:37] <andrewbogott>	 so if it's not connectivity then why "Operation DROP USER failed for 's51848'@'%'"
[20:05:41] * andrewbogott digs deeper
[20:06:05] <taavi>	 where do you see that?
[20:06:31] <andrewbogott>	 That's from /usr/local/sbin/maintain-dbusers delete tools.sbot --account-type=tool
[20:06:58] <taavi>	 it sounds like the metadata database and the actual databases might be out of sync
[20:07:16] <taavi>	 would be very helpful if it told which db that is on, but oh well
[20:08:59] <andrewbogott>	 it fails on the first host, instance-tools-db-1.tools.wmflabs.org
[20:10:52] <taavi>	 !log admin mysql:labsdbaccounts@m5-master.eqiad.wmnet [labsdbaccounts]> update account_host set status = 'absent' where id = 137613;
[20:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL
[20:12:06] <taavi>	 that did not seem to help, i'll try removing all of the entries for this tool and then we can see where it fails when re-creating
[20:12:17] <andrewbogott>	 ok
[20:12:30] <taavi>	 ERROR [root.inner:160] Request to delete replica.my.cnf file for for account_type tool and account_id tools.sbot failed. Reason: 400 Client Error: BAD REQUEST for url: https://nfs-tools.wmcloud.org/v1/delete-replica-cnf
[20:13:15] <andrewbogott>	 I already moved the file out of the way, could that just be from the file not being there?
[20:13:48] <taavi>	 hopefully not, that seems like an annoying failure scenario
[20:14:17] <taavi>	 i moved it back, no luck
[20:16:57] <taavi>	 Jan 18 20:16:51 tools-nfs-2 uwsgi-toolsdb-replica-cnf-web[865]: PermissionError: [Errno 13] Permission denied: '/srv/tools/project/sbot/replica.my.cnf'
[20:16:59] <taavi>	 interesting
[20:17:51] <taavi>	 ah, i think it's failing because the tool home directory is not world-readable
[20:20:13] <andrewbogott>	 yeah, permissions are uniquely weird for that tool, I will fix if you haven't
[20:21:06] <tacsipacsi>	 I changed it from 700 + setgid to 750 + setgid on Sunday, could that be connected to the error? I have no idea how these permissions came to be.
[20:22:01] <andrewbogott>	 I don't know that that would've made things worse
[20:23:18] <andrewbogott>	 the new creds are working so you should be all set for now
[20:23:23] <tacsipacsi>	 Okay, then it’s hopefully not my fault. By the way, the setgid bit with no group or world access was especially interesting…
[20:23:25] <andrewbogott>	 I don't have a good theory for what broken in the first place
[20:24:20] <andrewbogott>	 thanks taavi 
[20:24:49] <taavi>	 andrewbogott: did you have a task other than T355356? I'll send a few patches to make it more robust
[20:24:50] <stashbot>	 T355356: No connectivity between cloudcontrol1005 and db replica hosts - https://phabricator.wikimedia.org/T355356
[20:25:07] <andrewbogott>	 I don't and T355356 turns out to be totally wrong, it was just the --port thing
[20:25:34] <taavi>	 ok, I will steal that
[20:27:33] <andrewbogott>	 ok!
[20:29:40] <tacsipacsi>	 Thanks, the `sql` command line works, https://commons.wikimedia.org/wiki/Commons:File_renaming/Stats has been updated by the bot, all seems to be well.