[07:18:33] !log arthurtaylor@tools-bastion-13 tools.phpunit-results-cache deployed 3fb13097895 (build notification support) [07:18:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.phpunit-results-cache/SAL [07:33:11] !log tools add AAAA record on *.toolforge.org T211575 [07:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:33:16] T211575: Enable IPv6 on toolforge.org - https://phabricator.wikimedia.org/T211575 [13:53:05] !log dcaro@tools-bastion-13 tools.wm-lol testing [13:53:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wm-lol/SAL [14:00:57] !log dcaro@tools-k8s-worker-111 wm-lol test [14:00:59] wmbot~dcaro@tools-k8s-worker-111: Unknown project "wm-lol" [14:00:59] wmbot~dcaro@tools-k8s-worker-111: Did you mean to say "tools.wm-lol" instead? [14:42:48] !log tools.cluebotng-staging cleanup 200G+ of old log files per T395006 [14:42:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cluebotng-staging/SAL [14:42:51] T395006: cluebotng-staging tool uses ~560G of disk space - https://phabricator.wikimedia.org/T395006 [14:58:19] Is there a way i can take snapshot of toolsdb? like `mysqldump`? [14:58:46] from toolforge? [15:00:22] mysqldump used to be installed but isn't anymore 😔 T378882 [15:00:36] so, am I out of luck? [15:01:05] well, you can go to that task and look at the workaround mentioned there [15:01:32] it amounts to putting the dump in NFS, because we all love NFS [15:01:36] We just switched to toolforge from our droplet [15:02:07] there's a pre-built image that has mysql on it [15:02:11] * dcaro looking [15:03:47] I was already dumping snapshot, compressing, then uploading it on `backup-bot` for the last month [15:20:52] `toolforge jobs run --command "umask o-r; ( mariadb-dump --defaults-file=~/replica.my.cnf --host=tools-readonly.db.svc.wikimedia.cloud credentialUser__DBName > ~/DBname-$(date -I).sql )" --image mariadb backup` [15:21:36] Seems like it would do the job, but it comes with a big warning, *Note that we don't recommend storing backups permanently on NFS (/data/project, **/home**, or /data/scratch on Toolforge) or on any other Cloud VPS hosted drive* [15:24:43] Suddenly, the tools became slower [15:26:49] We are keeping transition period of 30 days for monitoring this tool to determine whether toolforge is the right decision or not [15:48:51] nokibsarkar: yep, you should not rely on toolforge nfs for backups, currently we don't have a backup solution, so you would have to transfer them somewhere else in the long run [15:48:57] can you elaborate on "Suddenly, the tools became slower" ? [15:49:07] (which tool, what action, etc.) [15:49:14] might be NFS misbehaving [15:49:57] toolforge nfs is still better than nothing, i.e. if for some reason there is any data corruption in toolsdb, or unintentional human error that deletes some data, you would still have the nfs backups [15:51:06] yep, but if the backups are too big they might get truncated by us to free space [15:52:23] you're going to have to have some pretty big and fast-growing backups for that to happen without warning [15:52:37] yep [15:52:56] i am planning to keep only one week of daily compressed snapshot. is it ok? [15:53:17] do you care if it's actually there when you need it? (re @nokibsarkar: i am planning to keep only one week of daily compressed snapshot. is it ok?) [15:53:19] if it's about 100MB per each day, it's ok [15:53:34] (I see the last one is 84MB) [15:54:32] with that size, even 2 or 4 weeks would not be an issue [15:54:59] after compression, it became 6mb [15:55:06] then you can keep 1 year :D [15:55:09] if size became an issue you could use something like bup, rdiff-backup. [15:55:10] but if the NFS disk failed then you're screwed. [15:56:02] I think the comparison was vs. previous hosting not a previous state of wmcloud. (re @wmtelegram_bot: can you elaborate on "Suddenly, the tools became slower" ?) [15:56:23] yep [15:56:34] `-rw-r--r-- 1 nokibsarkar tools.backup-bot 6.9M May 22 12:00 campwiz-backup-2025-05-22_12-00-01.sql.tar.gz` [15:56:55] are those numbers real? actual sql takes 84 mb [15:57:10] that's peanuts yep [15:57:13] do you have an example operation we can try to reproduce on toolforge AND on your previous hosting, to compare the speed? [15:57:17] and here compressed for takes less than 1/10 [15:57:18] sql is text, text compresses well? [15:57:33] tar.gz is pretty good at compressing text, you could use bz2 to compress even more :P [15:59:13] I think there's some comment in https://phabricator.wikimedia.org/T394730#10848302 [15:59:37] > My current performance issue remains loading thumbnail from wikimedia commons server. [15:59:57] dcaro: thanks, I didn't see that task [16:00:22] nokibsarkar: can you elaborate a bit more on how that loading is done? is it the backend pulling directly from commons and then serving the user? or the user's js pulling directly from commons? [16:00:28] is it a background process? [16:00:43] are they stored somewhere? or served on-demand? [16:01:54] user pulling from commons, backend is also providing the thumbnail ur [16:01:59] I'm also logged in now, is there a way I can go to one of those pages? [16:02:20] if it's the user's browser connecting to commons directly, it's out of toolforge :/ [16:02:37] it has some kind of access management. Let me create a test campaign for u (re @wmtelegram_bot: I'm also logged in now, is there a way I can go to one of those pages?) [16:02:49] ack [16:02:51] thanks [16:03:07] btw. it's really snappy :) [16:03:13] (so far) [16:03:49] Unfortunately I added the redirection rules from previous host to toolforge. if we want to compare, I have to re configure the nginx [16:04:29] we can check first a bit more in detail what's the issue [16:04:36] (what's what is slow) [16:04:57] can u load this: https://campwiz.toolforge.org/campaign/c2c7piesolaf4 [16:05:44] it is awfully slow now, I cannot even go to my admin panel [16:05:44] done, full refresh is <2s, should I go to some of those? [16:05:57] still fast for me :/ [16:06:01] i think my internet is slow then [16:06:21] your wiki username? [16:06:41] dcaro? [16:06:55] DCaro (WMF) [16:08:02] can u go this >> https://campwiz.toolforge.org/campaign/c2c7piesolaf4 Then hit `Evaluation Area` [16:08:23] 1.94s [16:08:41] then my ISP is the villain, i guess [16:08:54] that image is directly from commons yep, so it does not even pass through toolforge [16:09:23] 6MB [16:09:31] that feels like a lot for a thumbnail [16:10:00] I think it might be downloading the full image twice, then the thumbnail [16:10:04] https://usercontent.irccloud-cdn.com/file/D25JN74v/image.png [16:10:48] wow. Now i see the high bandwidth usage complaint by one of the user [16:11:06] luckily the project lead gave them bandwidth stipend [16:11:40] xd [16:12:09] I found metered networks quite annoying [16:12:20] why did it not show up on my terminal? [16:12:37] what terminal? [16:12:47] that's from firefox 'network' tab of the dev tools [16:12:53] my network tab, previously [16:12:54] (the screenshot) [16:13:05] yep, I am talking about that (re @wmtelegram_bot: that's from firefox 'network' tab of the dev tools) [16:13:21] you had some kind of filter set in network tab? [16:13:25] try again? [16:13:34] on the developer console, I was looking for high bandwidth usage, but with no luck [16:13:54] maybe you had it cached [16:14:22] maybe. But thanks for giving me a lead on a unsolvable case [16:14:22] if I re-enable the broswer cache, it does not download the images again [16:14:30] np :), happy to help [16:15:34] my load takes 6.89s [16:16:27] what is the slowest thing showitg up in the network tab? [16:17:34] https://campwiz.toolforge.org/?_rsc=flnrj [16:17:55] hmm... for me that takes 1.30s [16:18:13] there's a 1.7MB svg too [16:18:19] (not much) [16:18:43] 1.7 MB is huge [16:18:50] it's the background image I think [16:19:12] https://usercontent.irccloud-cdn.com/file/asSwbLUV/image.png [16:19:29] it is the animated loader [16:19:51] the thing that changes color while navigating and clicking bunch of stuffs [16:20:54] it is so much used that it should be cached already [16:21:08] it's cached yes [16:21:14] for me that url opens this [16:21:17] https://usercontent.irccloud-cdn.com/file/Ubcyo7jr/image.png [16:24:19] oooo [16:28:22] how can upload photo here? [16:29:51] the api that loads images on the front page is the public api https://campwiz-backend.toolforge.org/api/v2/campaign/ [16:30:01] I use irclcoud.com client, it does it itself [16:30:04] it takes 1.81s here [16:30:33] that url takes 0.5s for me [16:31:03] is it some kind of proximity thing? [16:31:14] my droplet was located in india [16:31:37] might be, our hosting is in eqiad datacenter olny, no CDN [16:32:06] (that's Ashburn in the US) [16:32:27] we also did not have api, but i think being in my neighboring country was the reason behind performance [16:32:42] I turned off my cache [16:32:50] changed group settings, try again (re @nokibsarkar: how can upload photo here?) [16:33:15] cc bd808 (re @jeremy_b: changed group settings, try again) [16:34:03] https://tools-static.wmflabs.org/bridgebot/73eaf9c3/file_70673.jpg [16:34:10] using the VPN through india jumps the loading time of that page to 3s [16:34:14] this is without cache [16:34:21] are your users all in a particular region? [16:34:33] mostly indian (re @jeremy_b: are your users all in a particular region?) [16:35:03] but a few from Africa as well [16:35:24] 42s is a lot yep [16:35:51] I'm 400km from Ashburn now [16:37:47] from ghana the speed is quite better than india too :/ [16:38:36] picking up any twi while you're there? [16:38:37] ivory coast is fast too [16:38:48] oh this is all VPN? [16:38:53] you tricked me! [16:38:56] did u try disabling cache? [16:39:38] jeremy_b: yep sorry :), all jumping with the VPN [16:39:51] I have the cache disabled yep [16:40:34] bangladesh also jumps to over 2s for the api call :/, I think that might be proximity yes [16:41:17] (hopefully a little bit better as my traffic has to go all the way there before going to the datacenter and back, so i'm doing twice the distance) [16:41:56] But, is it possible that toolforge backend with the nginx we are using is adding some delay of 1s? [16:42:13] I don't think so, or I would be seeing that too [16:42:34] (maybe I just have been lucky so far though) [16:42:57] Toolforge is all served from the Ashburn DC with no edge caching anywhere. So speed of light and network speed to Ashburn is going to be baseline latency. [16:43:13] Hi. Is there a phab task for supporting Toolforge Build Service images for ARM64 platform? [16:43:29] DaxServer: we don't ahve any ARM servers at the moment [16:43:44] DaxServer: I don't think so, but yep :), you would not have where to run it in toolforge [16:44:20] because, even with caching (the campaign list is loading 2s slow) [16:45:23] DaxServer: The Toolforge build service is currently imagined for running code on Toolforge and not as a generic container creation and management system. That makes ARM image support relatively difficult to prioritize. [16:46:18] nokibarkar: that might be just network latency, that's the bump I see more or less when connecting through the VPN (difference between connecting directly from switzerland vs india) [16:47:16] nokibarkar: can you ping api.svc.toolforge.org? [16:47:23] I get ~400ms [16:48:30] and even if it was supported then how do we test that it's working properly without a place to run the images? (re @wmtelegram_bot: DaxServer: The Toolforge build service is currently imagined for running code on Toolforge and not as a generic containe...) [16:48:48] (pyvenv) tools.backup-bot@tools-bastion-13:~/backups/CampWiz-NXT$ ping api.svc.toolforge.org -n 5 [16:48:49] PING 5 (0.0.0.5) 56(124) bytes of data. [16:48:51] ^C [16:48:52] --- 5 ping statistics --- [16:48:54] 11 packets transmitted, 0 received, 100% packet loss, time 10239ms [16:48:55] bd808 dcaro I have M1 Macbook. And Portable Antiquities Scheme website has their API behind Cloudflare which is rejecting requests from Toolforge and sending a human challenge (which is of course not possible as I'm not emulating a human there). This led to the scenario that a Commons bot I run had to be run from my computer. I could directly run [16:48:55] the command from the project from terminal, but it also means I have to be wary of any changes that I make to the codebase, which is currently in active development. If there was an arm64 heroku builder, I could have built the image on my system and run the image, rather than running directly from active project. Ref: [16:48:56] https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Portable_Antiquities_Scheme [16:49:10] from within toolforge [16:49:24] from home not toolforge (re @nokibsarkar: (pyvenv) tools.backup-bot@tools-bastion-13:~/backups/CampWiz-NXT$ ping api.svc.toolforge.org -n 5 [16:49:24] PING 5 (0.0.0.5) 56(124) byte...) [16:49:58] nokibsarkar: can you paste one of the intermediate lines? where it says the single-request time [16:49:59] 64 bytes from api.svc.toolforge.org (185.15.56.11): icmp_seq=3 ttl=53 time=415 ms [16:50:21] or the last line with the stats [16:50:25] rtt min/avg/max/mdev = 414.092/414.714/415.432/0.551 ms [16:50:49] ping api.svc.toolforge.org -c 5 [16:50:49] PING api.svc.toolforge.org (185.15.56.11) 56(84) bytes of data. [16:50:51] 64 bytes from instance-tools-proxy-9.tools.wmcloud.org (185.15.56.11): icmp_seq=1 ttl=45 time=496 ms [16:50:52] 64 bytes from api.svc.toolforge.org (185.15.56.11): icmp_seq=2 ttl=45 time=575 ms [16:50:54] 64 bytes from instance-tools-proxy-9.tools.wmcloud.org (185.15.56.11): icmp_seq=3 ttl=45 time=457 ms [16:50:55] 64 bytes from toolforge.org (185.15.56.11): icmp_seq=4 ttl=45 time=455 ms [16:50:57] 64 bytes from api.svc.toolforge.org (185.15.56.11): icmp_seq=5 ttl=45 time=458 ms [16:50:58] --- api.svc.toolforge.org ping statistics --- [16:51:00] 5 packets transmitted, 5 received, 0% packet loss, time 4189ms [16:51:01] rtt min/avg/max/mdev = 454.697/488.143/575.166/46.070 ms [16:51:03] from home [16:51:13] that's quite good actually [16:52:21] this is from toolforge: PING api.svc.toolforge.org (172.16.18.101) 56(84) bytes of data. [16:52:22] 64 bytes from tools-proxy-9.tools.eqiad1.wikimedia.cloud (172.16.18.101): icmp_seq=1 ttl=63 time=1.49 ms [16:52:24] 64 bytes from tools-proxy-9.tools.eqiad1.wikimedia.cloud (172.16.18.101): icmp_seq=2 ttl=63 time=1.21 ms [16:52:25] 64 bytes from tools-proxy-9.tools.eqiad1.wikimedia.cloud (172.16.18.101): icmp_seq=3 ttl=63 time=0.511 ms [16:52:27] 64 bytes from tools-proxy-9.tools.eqiad1.wikimedia.cloud (172.16.18.101): icmp_seq=4 ttl=63 time=0.629 ms [16:52:28] 64 bytes from tools-proxy-9.tools.eqiad1.wikimedia.cloud (172.16.18.101): icmp_seq=5 ttl=63 time=0.367 ms [16:52:30] --- api.svc.toolforge.org ping statistics --- [16:52:31] 5 packets transmitted, 5 received, 0% packet loss, time 4018ms [16:52:33] rtt min/avg/max/mdev = 0.367/0.840/1.488/0.431 ms [16:52:51] that should be localhost [16:52:52] yep, that's from the same local network [16:52:59] from toolforge isn't interesting. it's from the same building [16:53:00] (it's not localhost though) [16:53:15] ok [16:53:34] nokibsarkr [16:53:37] oops [16:53:51] nokibsarkar: can you try `curl -k -o /dev/null -w '\n* Response time: %{time_total}s\n' https://campwiz-backend.toolforge.org/api/v2/campaign/` from your machine? [16:54:23] % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed [16:54:24] 100 4117 0 4117 0 0 2822 0 --:--:-- 0:00:01 --:--:-- 2823 [16:54:25] * Response time: 1.458476s [16:54:53] that for me takes ~0.4s [16:54:59] * Response time: 0.315861s [16:55:01] The API call takes 1.4s from my network [16:55:13] DaxServer: Can you just use a Dockerfile to build your local container? Would you be interested in help doing that if it sounds like it would work but you need some help figuring out the specifics? [16:56:00] ill join in, campwiz backend, i get 0.40s - 0.43s ish [16:56:11] I have an M3 macbook rather than an M1, but modern Docker Desktop supposedly can run AMD images on either most of the time. [16:56:50] nokibsarkar: it's interesting though that ping is fast, but curl is not [16:57:12] I turned on previous droplet [16:57:35] still redirecting [16:57:45] you could just run your own cdn in a droplet? varnish? [16:58:02] fronting toolforge [16:58:25] that should help [16:58:44] without cache : https://tools-static.wmflabs.org/bridgebot/2985ead0/file_70674.jpg [16:58:52] from previous droplet [16:59:11] can u try this: https://campwiz.wikilovesfolklore.org/campaign/c2c7piesolaf4 [17:00:19] ~3s [17:00:35] (no vpn, from switzerland) [17:00:43] you could even serve different DNS entries depending on geoip and send some people straight to toolforge. if toolforge allowed custom domain. (re @jeremy_b: you could just run your own cdn in a droplet? varnish?) [17:02:08] previous droplet feels like instant (figuratively) [17:02:22] hmm.. the droplet takes ~7s from california [17:02:49] so I think it's most probably jus the latency between US-India [17:02:54] *just [17:03:17] can u load the home page without cache? https://campwiz.wikilovesfolklore.org/ [17:03:18] is it HTTP 2? [17:03:40] bd808 Thanks. I'm using podman. I'll try to spin up an amd arch machine and will try to rebuild the image and see how it works! [17:03:41] can you maybe shift some things to lazy loading? [17:04:31] bd808: DaxServer: you could also try building locally with "pack", see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_images#Testing_locally_(optional) [17:04:41] nokibsarkar: from california it took me ~8s [17:04:51] you would need to replace tools-harbor.wmcloud.org/toolforge/heroku-builder:22 with the upstream heroku builder, which also exists for arm [17:04:54] most of the things are lazy loaded, some are server rendered (re @jeremy_b: can you maybe shift some things to lazy loading?) [17:05:26] dhinus: unfortunately, that will not build the exact same thing than the build service (we inject some fixes/stuff during build that pack does not) [17:05:43] if everything is client rendered then the initial loading time would've busted [17:06:02] dcaro: yes, depends if the tools requires those extra things or not... you could also run the tools-harbor image with AMD emulation with something like https://lima-vm.io/docs/config/multi-arch/ [17:06:33] DaxServer: https://podman-desktop.io/docs/podman/rosetta would in theory be the config you need to make AMD images work with Podman on an M1 mac. [17:07:26] bd808: yep, that should also work, I'm not sure if that works together with "pack", or if you would need to build a custom image [17:07:35] So, the public API for listing campaigns is running ~500ms : https://tools-static.wmflabs.org/bridgebot/ef673aee/file_70675.jpg [17:07:46] https://campwiz.wikilovesfolklore.org/api/v2/campaign/ [17:09:07] nokibsarkar: that takes 1.6s from california [17:09:27] without cache? [17:09:33] ~0.52 from switzerland [17:09:36] yep [17:10:10] so, what should be the conclusion? Who is the villain? my shitty code? or proximity thing? [17:10:23] bd808: dcaro: what about installing lima-kilo and running the build from there? overkill? :P [17:11:05] I'd say proximity adds the extra ~2s you see from india, then there's the other (not so critical) issue of downloading the original/big images [17:15:19] bd808 Thanks for the link. Looks like I have it enabled - probably correct working would be: it was enabled by default without my interventions [17:15:36] dhinus: maybe a bit yep xd [17:16:17] dhinus lima-kilo seems interesting read on the wikitech. Is someone already using it? [17:17:00] DaxServer: it's used daily by the toolforge developers, for testing toolforge itself [17:17:19] but actually once the image for your tool is built for AMD, and stored in Harbor... you shouldn't need it [17:17:28] yep. i am kinda skeptical about the decision now. but first i have to fix the big image thingy (re @wmtelegram_bot: bd808: dcaro: what about installing lima-kilo and running the build from there? overkill? :P) [17:19:02] @nokibsarkar my suggestion of lima-kilo was for DaxServer, not related to campwiz. there are two intertwined conversations, sorry for the confusion. [17:19:30] yeh, i understood, no worries [17:19:44] ghana<->india seems preety fast too, I think that the main delay is US<->india, unfortunately toolforge is only in the US (for now) [17:20:25] i think one reason might be too much traffic goes through US? [17:20:26] probably if it was hosted in europe/africa, you'd get a better "mean" response time from both US and India [17:22:12] is public wikiusername considered PII? [17:22:44] Okay.. good news is that I figured out what was missing.. "platform: linux/amd64" in compose.yaml and that did the trick. Next to figure out is how to inject podman secrets (pywikibot creds). Thanks y'all :) [17:25:25]