[01:41:02] harej: you mean autocreating your own account? or creating a new account? [01:41:44] Account autocreation should happen automatically and invisibly when you make an API request. I'm not aware of problems with that. [01:42:30] It doesn't reliably happen when you are authorizing an app (that's T208443) but that's a different scenario [01:42:31] T208443: User cannot log in with OAuth on a wiki before visiting that wiki directly - https://phabricator.wikimedia.org/T208443 [01:43:48] account creation should in theory work, I'm not aware of anyone ever trying [01:47:33] I wonder what to do about OAuth requests like https://meta.wikimedia.org/wiki/Special:OAuthListConsumers/view/f007e69d0eb15752350ba46e3700e6e9 or https://meta.wikimedia.org/wiki/Special:OAuthListConsumers/view/dd9bc97268ce824f65fd98a70d59ed85 [01:48:04] is it worth asking them to explain what they need "edit protected pages" for? [11:38:49] FYI, some tools inaccessible (e.g. https://refill.toolforge.org, started at 2023-10-11 12:26:20 UTC+1) [11:39:07] bunch of my tools also just went down [11:39:37] oop its back [11:40:12] we just had a ceph backend storage hicckup, we are investigating [11:40:16] yep, ceph got unhappy for a moment it seems [11:48:46] not sure if this is the right channel, but I was wondering for the wiki replicas, how the initial copy from the production DBs to the sanitisation DBs works? [11:48:46] There's a doc here: https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Wiki_Replicas#Step_1:_sanitization but it just says this: [11:48:46] The production database is copied to sanitarium boxes by means of MariaDB replication mechanisms (TODO: is this true? give more info if possible). [11:49:08] !log tools reboot tools-sgeexec-10-19 [11:49:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:50:20] mainly, I'm curious how it's so fast (I saw a page on wikitech where it says a fresh copy can be done in <1 hr... which seems very fast for a DB that large?). And also how y'all dealt with concerns like a bad subscriber causing too much WAL to be accumulated on the production primary instance, taking up all the free disk space [11:58:29] !log tools.whois-referral webservice restart [11:58:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.whois-referral/SAL [12:01:48] !log tools reboot tools-sgecron-2 T348634 [12:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:01:51] T348634: ceph slowdown 2023-10-11 - https://phabricator.wikimedia.org/T348634 [12:04:04] !log tools reboot k8s workers 72, 75, 82 T348634 [12:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:11:54] !log tools reboot k8s workers 48, 60, 65, 68, 70, 76 T348634 [12:11:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:11:58] T348634: ceph slowdown 2023-10-11 - https://phabricator.wikimedia.org/T348634 [12:15:22] https://iabot.wmcloud.org/ keeps timing out for me. Is this a known issue? [12:21:29] roy649: works for me. there was a minor issue just a few moments ago, but that should be over now and I've been hearing people complain about iabot not working quite a lot recently [12:23:46] I had three failures in the past 30 minutes or so. I'll just give it another shot and see what happens [12:23:57] proc: I'm not very familiar with the wikireplica setup, but I think the issue with WAL using too much space can be solved with expire_logs_days https://mariadb.com/kb/en/replication-and-binary-log-system-variables/#expire_logs_days [12:25:07] Now I'm getting a repeatable 500 Internal Server Error [12:25:32] roy649: works for me [12:25:50] https://usercontent.irccloud-cdn.com/file/5uxQpF1s/image.png [12:45:35] Just got another 504. [12:45:42] Could somebody try it on [[Fleetwood Park Racetrack]] [12:45:51] maybe it's something specific to that page? [14:16:18] !log rebooting tools-sgeweblight-10-16 due to stuck NFS (T348634) [14:16:19] dcaro: Unknown project "rebooting" [14:16:19] T348634: ceph slowdown 2023-10-11 - https://phabricator.wikimedia.org/T348634 [14:16:30] !log tools rebooting tools-sgeweblight-10-16 due to stuck NFS (T348634) [14:16:31] xd [14:16:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:44:58] !log tools.test-lighttpd-trusty disable tool [15:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.test-lighttpd-trusty/SAL [15:47:09] !log tools.test-webservice-generic stop webservice and disable tool, was running php 5.6, previously used by toolschecker but not anymore [15:47:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.test-webservice-generic/SAL [15:51:05] How would I go about figuring out why a trove instance can't 1) be created or 2) be restarted. I think it has to do with the applied configuration group, but I don't know what it wrong since there aren't any logs in horizon. [16:05:41] JJMC89: does it happen only with a specific trove instance? can you create/restart others? [16:05:44] JJMC89: can you open a task with the details? it might be that there's no public logs available [16:06:31] it has happened with multiple instances - I'll open a task [16:10:28] thanks! that'd make it easier to follow up with [16:20:17] dcaro, dhinus: T348668 [16:20:18] T348668: Trove instances not being created or restarted with configuration group applied - https://phabricator.wikimedia.org/T348668 [17:09:56] So if I have a script that authenticates via OAuth (presumably to something like meta.wikimedia) and then I have the script, I dunno, edit the bot's talk page on every wiki, it will transparently create the new account if needed? (re @wmtelegram_bot: Account autocreation should happen automatically and invisibly when you make an API request. I'm not aware of problems wit...) [17:12:36] (InternetArchiveBot sets up soft redirects to Meta on all the talk pages it has.) [18:00:02] Some of my k8s jobs are reporting 'Temporary failure in name resolution' for meta.wikimedia.org on and off [18:17:59] JJMC89: has this only started happening recently? (and if so, from when-ish?) [18:19:19] TheresNoTime: The first timestamp from the emails is 2023-10-11T17:18:56Z [18:20:46] There were some recent issues, though I'm unsure if that affected DNS. Are you still seeing intermittent name resolution failures? [18:22:00] 18:19:57 is the last one I have but that job hasn't run again yet since then. [18:25:12] I get to use my favourite phrase: "Might have just been a transient issue" :-P [18:29:48] maybe - been seeing it on and off for about an hour [18:37:27] I'm running a cookbook to reboot the k8s workers, but I don't think that should be causing it [19:52:01] seems to be ok now; taavi, the reboots where like the cause of the other error I saw: The container could not be located when the pod was terminated