[00:00:25] bvibber: do you have a security group hole open that allows the front proxy to reach that backend service? [00:01:03] Ah I bet that's it gimme a minute [00:01:31] yeah, looks like you've only got the default rules int the default group [00:03:37] I wonder if we could tweak something to make that case respond fast instead of whatever it is doing now. [00:03:45] hrm [00:03:51] i don't see any way to add security groups? [00:04:44] "edit security groups" dialog offers nothing to add, just a "default" with a - button next to it [00:04:46] bvibber: you create them in the "network" section and then apply them on the instances [00:05:01] aha [00:05:13] i swear i've done this before but it was some time ago and it may hav echanged haha [00:05:58] I have to stare at things for a bit after I've been away from Horizon for a while myself. The UX there is "passible" [00:06:29] hehe [00:07:10] "I'm in" :D [00:07:17] now i get the *expected* 403 error ;) https://media-streaming.wmcloud.org/index.html [00:07:25] and i can deal with that on my own time [00:07:27] thanks bd808 ! [00:10:30] yw bvibber [00:17:28] https://media-streaming.wmcloud.org/tmh-tests/TimedMediaHandler_test_small_90p30.webm \o/ success [00:17:33] and with that, good evening to you all :D [03:21:05] bd808: it was all very obvious in retrospect (RE: codesearch firewall between two localhost services) [03:21:16] ferm port 3002 allow $CACHES only [03:21:33] t.opranks found it :) [03:22:27] I would not have guessed that such a rule existed. Maybe legoktm or mutante remember why https://gerrit.wikimedia.org/r/c/operations/puppet/+/607647 was done. [03:22:36] But I'm not removing that either way, just curious for documentation/future reference. [03:24:00] I believe https://gerrit.wikimedia.org/r/c/operations/puppet/+/1017179/ is safe to deploy anytime. I don't need to be around for it. [03:24:26] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016480/ can then be applied sometime later to make use of it. [03:24:40] I've tested the same rule change locally and it works indeed. [11:45:31] Question - how would one run scheduled jobs that rely on the `toolforge` command? That command doesn't seem to be available in the k8s containers [11:45:51] (ping me when you reply pls as I don't monitor this channel constantly - thank you! :) ) [11:45:54] !log lucaswerkmeister@tools-sgebastion-10 tools.bridgebot Double IRC messages to other bridges [11:45:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [11:47:08] firefly_wp: I’m not sure that’s currently supported, but you might be interested in T356377 [11:47:10] which toolforge command(s) do you want to run in the jobs? [11:47:34] the job stop/start commands - in a 'watchdog' script to check if a continuous job is 'stuck' and restart it if needed [11:47:54] it seems that often even after "exiting" the job doesn't actually close down, so it's not restarted [11:50:55] I feel like it should be possible to make kubernetes automatically restart the command, using a deployment (probably) with a probe https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ [11:51:08] though that would mean using kubernetes directly instead of toolforge jobs, so it’s less stable :/ [11:51:18] definitely curious if anyone else here as other thoughts :) [11:51:48] there is now a way to do health checks for continuous jobs too, unfortunately it's new enough that the first line on https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog is the only documentation we have so far [11:52:03] oooooh [11:52:26] oh, and there’s `toolforge jobs dump` now? ❤️ [11:56:31] firefly_wp: would the new `--health-check-script` work for your use case? [11:56:48] Not seen that yet, but quite possibly! [13:15:52] !log taavi@tools-bastion-12 tools.wikibugs toolforge jobs restart irc [13:15:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [13:17:16] Anything I can put in jobs.yaml ? (re @wmtelegram_bot: there is now a way to do health checks for continuous jobs too, unfortunately it's new enough that the first line on htt...) [13:19:06] @sohom_datta: take the CLI flag, s/^--//, and you have a jobs yaml key [13:19:27] Ah nice :) [14:01:12] have we heard any reports that the Toolforge replicas are running slow? I think that might be the crux of the XTools downtime we've been seeing lately [14:01:27] define 'running slow' [14:02:32] !log deltaquad@tools-sgebastion-10 tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC [14:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [14:02:37] like, the main query for https://xtools.wmcloud.org/articleinfo/en.wikipedia.org/Hanksy normally takes under a second to finish, and when I run it on my local and profiled the stack, I see the query took over 2 minutes [14:02:53] I would expect if things are that slow, others would have complained about it [14:03:14] I haven't heard anything.. do you have the query SQL itself somewhere? [14:04:10] was there a recent change to the views? maybe some change is making this query especially slow. I'll paste the SQL here: [14:04:47] https://www.irccloud.com/pastebin/bDAhkyJe/ [14:06:11] I see some recent changes to the block views that TimStarling has apparently done [14:06:50] yeah. for sone reason your query is now doing two unindexed queries on the block table [14:06:52] some* [14:07:33] yep! I just ran an EXPLAIN and that's apparently it [14:07:42] mind filing a task? [14:07:47] will do [14:12:17] looks like the index maintenance script for the index in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1016066 was never ran [14:13:56] ah! [14:13:56] T361945 [14:13:57] T361945: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945 [14:14:09] so maybe we just need to run the script, then [14:15:00] maybe. it's not exactly the type of script I like running on friday afternoons though [14:15:01] let's see [14:16:43] gosh the temptation to say "what's the worst that could happen".. [14:17:04] replication breaking I guess? [14:17:34] (/j) [14:17:37] !log admin run maintain-replica-indexes on clouddb1017 T361945 [14:17:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:18:19] as long as I don't break anything within the next bit over 41 minutes then I'm happy [14:19:48] does xtools use the analytics or web replica cluster? [14:20:05] web [14:20:27] oh oops, I started from the analytics ones :/ [14:21:01] well those need to be done too I suppose [14:21:12] we've got some replag now but I assume this is expected https://replag.toolforge.org/ [14:21:51] that's showing everything as up-to-date for me? [14:22:26] yeah it looks fine now... seems intermittent [14:26:47] I feel bad for whoever's queries I'm constantly having to kill to get the lock needed to add that index [14:27:01] !log admin run maintain-replica-indexes on remaining analytics replicas T361945 [14:27:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:27:06] T361945: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945 [14:37:01] !log admin run maintain-replica-indexes on all web replicas T361945 [14:37:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [14:37:05] T361945: Toolforge view for blocks is very slow - https://phabricator.wikimedia.org/T361945 [14:37:10] musikanimal: still seeing slowness anywhere? [14:37:52] looks great now! thank you!! [14:38:03] great [15:06:43] !log anticomposite@tools-sgebastion-10 tools.stewardbots SULWatcher/manage.sh restart # SULWatchers disconnected [15:06:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [22:50:29] Hello everybody! I can not download any books from Wikisource. I get this message: "Wikimedia Cloud Services Error. This web service cannot be reached. Please contact a maintainer of this project. Maintainers can find troubleshooting instructions from our documentation on Wikitech". What should I do? [23:06:26] @Alexandros_Antonios_Sekares: maybe look at https://wikisource.org/wiki/Wikisource:WS_Export and see if that gives you any ideas about how to contact the maintainers or report a bug? [23:08:22] * bd808 is trying to figure out where that service runs these days [23:17:37] !log wikisource Rebooted wsexport-prod02.wikisource.eqiad1.wikimedia.cloud after reports on irc of https://ws-export.wmcloud.org/ returning unreachable proxy errors [23:17:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisource/SAL [23:20:11] @Alexandros_Antonios_Sekares: I rebooted the server that service runs on. Let's hope this fixes things. I have also poked the WMF team that maintains the service, but I assume they are all off for the weekend. [23:21:30] Thank you very much! Downloading books is available now via Wikisource. God bless you! [23:22:09] heh. that last part is unlikely, but thanks for the sentiment ;) [23:52:56] !log bd808@tools-sgebastion-10 tools.wikibugs-testing Build new container based on MR!28 and restarted web, irc, gerrit, and phorge tasks (T361518) [23:53:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs-testing/SAL