[08:28:28] <dcaro>	 GergesShamon: I'd say so, though I'm not sure, you might want to ask in the #wikimedia-gitlab irc channel
[09:35:11] <wm-bb>	 <Christian> need help, no space on device for https://hub-paws.wmcloud.org/user/Herzi%20Bot%20Pinki/lab
[09:35:12] <wm-bb>	 <Christian> (I remember there are some hidden backup files I do not see and cannot delete)
[09:41:39] <dcaro>	 Christian: looking, this affects all of paws
[09:41:48] <dcaro>	 I freed some space
[09:42:10] <dcaro>	 log paws deleted the collected files-to-remove due to 100% capacity on nfs
[09:42:13] <dcaro>	 !log paws deleted the collected files-to-remove due to 100% capacity on nfs
[09:42:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL
[09:49:54] <wm-bb>	 <Christian> @dcaro thanks, seems to work now
[10:09:36] <dhinus>	 !status toolsdb replication lag
[11:45:55] <dcaro>	 !log metricsinfra added a new global alert when nfs space is >90%
[11:45:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/SAL
[11:46:38] <taavi>	 global? what condition does that use?
[11:49:11] <dcaro>	 `1 - (node_filesystem_avail_bytes{mountpoint=~"/srv/.*"} / node_filesystem_size_bytes{mountpoint=~"/srv/.*", instance=~".*-nfs-.*"}) > 0.9`
[11:51:05] <dcaro>	 it might not be 100% reliable (not sure if you can create nfs instances with a different name, or mount the drive in a different path), but works for the default cases, that would allow us to notice paws for example
[14:23:19] <taavi>	 !log terraform publish v0.3.1 of the provider with the fix for T398117
[14:23:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Terraform/SAL
[14:23:23] <stashbot>	 T398117: Creation of Hiera Puppet Prefix via OpenTofu fails - https://phabricator.wikimedia.org/T398117
[15:37:27] <wm-bb>	 <Yetkin> Dumps not coming any longer
[15:37:28] <wm-bb>	 <Yetkin> https://dumps.wikimedia.org/other/pageviews/2025/2025-06/
[16:16:59] <bd808>	 @Yetkin: I know there is active work on the Dumps generation pipelines, but I'm not seeing any specific messages about pageview dumps in my email history. You will probably have better luck with a Phabricator task and/or an email to the xmldatadumps-l@lists.wikimedia.org mailing list.
[16:43:37] <wm-bb>	 <Yetkin> bd808: Thanks, I have sent an email
[16:46:14] <andre>	 ^ There's already two tasks
[16:46:35] <andre>	 with Priority = Unbreak now
[16:47:47] <wm-bb>	 <Yetkin> where?
[16:48:43] <bd808>	 I assume T398150 and T398187 based on a UBN! search
[16:48:43] <stashbot>	 T398150: Pageviews / Mostread Data not available since June 28, 2025 - https://phabricator.wikimedia.org/T398150
[16:48:43] <stashbot>	 T398187: eventgate-analytics has stopped producing events since 2025-06-25 - https://phabricator.wikimedia.org/T398187
[16:50:13] <bd808>	 neither of these directly talk about dumps, but they both are about pageviews data having issues
[16:51:04] <bd808>	 well.. you might have to squint to connect eventgate-analytics with pageviews really
[16:54:14] <wm-bb>	 <Yetkin> What does this mean? (re @wmtelegram_bot: <bd808> well.. you might have to squint to connect eventgate-analytics with pageviews really)
[16:56:41] <bd808>	 I mean that https://phabricator.wikimedia.org/T398187 is related to pageviews in that the Data Engineering folks are involved there and also the owners of pageviews. Eventgate itself carries data from MediaWiki internals which I do not believe are directly connected to counting pages served at the CDN edge.
[17:18:56] <wm-bb>	 <nokibsarkar> Does the new `toolforge components config generate` should include the webservice?
[17:20:37] <wm-bb>	 <nokibsarkar> Like, I created a `fake-nginx` custom built image and using it as webservice. But today I got the news of push-to-deploy beta and very excited to try it out. But upon running `toolforge components config generate`, it does not show my webservice (although that was a custom built).
[17:21:30] <wm-bb>	 <nokibsarkar> custom built = custom built image
[17:21:35] <taavi>	 per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Deploy_your_tool#Current_supported_and_non-supported_features web services are not yet supported by this system
[17:22:16] <wm-bb>	 <nokibsarkar> So, even if the webservice is a custom built image?
[17:22:51] <taavi>	 web services are not yet supported by this system
[17:23:05] <wm-bb>	 <nokibsarkar> Also, does it auto-triggers restart? or I have to manually `toolforge jobs restart`?
[17:32:34] <wm-bb>	 <sohom_datta> Question: How hard would it be for y'all to support building from a nfs directory or a directory of a GitHub repo ?
[17:34:47] <wm-bb>	 <sohom_datta> Should I consider putting in a phab ticket (for context, I have three services in three directories for a golang project and I want to be able to build and deploy it through build services, I am currently just building things locally and then scping the binaries over)
[17:35:01] <taavi>	 i think subpaths is a reasonable feature request
[17:38:01] <taavi>	 if https://gitlab.wikimedia.org/repos/cloud/toolforge/builds-builder/-/blob/main/deployment/chart/templates/pipeline.yaml.gotmpl?ref_type=heads#L45 does what it says then it should be not too much effort
[17:39:19] <taavi>	 but having a nfs -> build service integration does not sound like something we would want to support
[18:21:32] <wurgl>	 bd808? do you remember a few days ago: "sockets disabled, connection limit reached" I do not really understand the problem. Limit is now 500. When I do a "show processlist" on toolsdb and on the web-cluster I see just 3, seldom 4 open connections to the database. Where do these ~500 connections come from?
[18:22:40] <wurgl>	 See /data/project/persondata/error.log
[19:11:33] <wm-bb>	 <jeremy_b> I thought HTTP connections not DB? (re @wmtelegram_bot: <wurgl> bd808? do you remember a few days ago: "sockets disabled, connection limit reached" I do not really understand the probl...)
[19:12:57] <wurgl>	 Yes, but since almost all of my tools use either tools-Db oder deWP-db, I expect that they should show up in parallel
[19:29:33] <dcaro>	 nokibsarkar for the restart, it will only happen automatically if the code changed or the config changed, otherwise it will not restart. You can force it to always rerun the job with --force-run (details in the docs)
[20:30:11] <wm-bot>	 !log bd808@tools-bastion-12 tools.whois-dev Hard stop/start cycle of webservice because of all requests timing out
[20:30:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.whois-dev/SAL
[20:30:56] <bd808>	 ugh. I think that whois-dev has been discovered by crawlers.
[21:02:46] <africanhope>	 Hey there, does anyone know why a toolforge url may return a 404 not found error even when the service is running?
[21:03:31] <africanhope>	 Here is the output of the console in case that's helpful:
[21:03:32] <africanhope>	 tools.consultation-stats@tools-bastion-12:~$ toolforge webservice buildservice start --mount=none
[21:03:32] <africanhope>	 Your job is already running
[21:03:33] <africanhope>	 tools.consultation-stats@tools-bastion-12:~$ toolforge build show
[21:03:33] <africanhope>	 Build ID: consultation-stats-buildpacks-pipelinerun-qh9zf
[21:03:34] <africanhope>	 Start Time: 2025-06-30T20:57:58Z
[21:03:34] <africanhope>	 End Time: 2025-06-30T20:58:41Z
[21:03:35] <africanhope>	 Status: ok
[21:03:35] <africanhope>	 Message: Tasks Completed: 1 (Failed: 0, Cancelled 0), Skipped: 0
[21:03:36] <africanhope>	 Parameters:
[21:03:36] <africanhope>	     Source URL: https://gitlab.wikimedia.org/toolforge-repos/consultation-stats.git
[21:03:37] <africanhope>	     Ref: N/A
[21:03:37] <africanhope>	     Envvars: N/A
[21:03:38] <africanhope>	 Use latest versions: False
[21:03:38] <africanhope>	 Destination Image: tools-harbor.wmcloud.org/tool-consultation-stats/tool-consultation-stats:latest
[21:03:39] <africanhope>	 tools.consultation-stats@tools-bastion-12:~$
[21:07:29] <wurgl>	 404 means "not found" when the service is not running, you get 502 (or 503)
[21:10:36] <africanhope>	 wurgl that makes but I've come across some 502 before bit it's the first time I see a 404 when going to https://consultation-stats.toolforge.org/ . On the admin (toolsadmin.wikimedia.org) the tool exists and is active (not disabled) so not sure why I am getting a 404
[21:12:38] <taavi>	 africanhope: your webesrvice is failing to start
[21:12:48] <taavi>	 in particular the system is somehow very confused which type of service to run
[21:12:59] <taavi>	 so for example:
[21:12:59] <taavi>	 > tools.consultation-stats@tools-bastion-12:~$ webservice status
[21:12:59] <taavi>	 > Your webservice of type php8.2 is running on backend kubernetes
[21:13:13] <taavi>	 but what it's trying to currently run is node 10
[21:13:21] <taavi>	 and your command lists tries to use buildservice
[21:13:34] <taavi>	 i suggest you completely stop your webservice with `webservice stop` and then start it again
[21:14:24] <africanhope>	 Thanks taavi, that's some helpful context. I'll try that. Should I also edit the service.manifest to err on the safe side?
[21:15:09] <bd808>	 africanhope: no. service.manifest is the system state that the `webservice` tool documents. editing it will not change anything
[21:15:25] <taavi>	 as the comment in that file says, it should not be edited manually
[21:15:44] <bd808>	 if you have a service.template that would be a file that you could edit to change the defaults given to `webservice`
[21:16:08] <bd808>	 https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Webservice_templates
[21:17:01] <africanhope>	 bd808 that fully makes sense. That service.template would be overriding the default service.manifest If I choose to create it
[21:18:09] <africanhope>	 for now stopping the service and starting it fixed the issue. Not sure why webservice restart didn't solve it when I did it the first time.
[21:18:20] <africanhope>	 The issue is resolved. Thanks a lot for your time bd808 taavi wurgl
[21:19:23] <bd808>	 `webservice restart` generally takes a short cut and just kills the running Pod and lets it get recreated. This can be problematic when something unexpected has happened to the ReplicaSet that manages that Pod.
[21:21:10] <africanhope>	 oh I see that explains why
[22:45:05] <kanashimi>	 Hello, everyone. Toolforge uses docker, which doesn't touch the original files. Every time the code gets updated, we have to log in again to update it. Is it possible for me to run a program periodically to update the code?
[22:46:25] <kanashimi>	 Anyway, it's no longer possible to `crontab -e`.
[22:47:18] <bd808>	 kanashimi: can you talk a bit more about what you are hoping to accomplish? I'm worried I may not understand the intent behind the question.
[22:48:01] <bd808>	 I do have some tools that still have automated git pulls. They are kind of legacy at this point, but still possible generally.
[22:49:46] <bd808>	 The "push-to-deploy" project is maybe the longer term solution. There was a big announcement about that earlier today -- https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/5D7NK7Z7KMWQPWQC23453YB7FV555Q5R/
[22:50:59] <kanashimi>	 I have five accounts of robot programs hosted on github. Previously I used cron to update the program. But now I have to log into all my accounts one by one to update the program, which is a bit tedious, so I was wondering if there is a simpler way to eliminate this manual step.
[22:52:04] <bd808>	 If these bots are still running from the "shared" containers and just need a git clone on NFS updated, that is still possible.
[22:53:30] <bd808>	 https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/blob/main/etc/jobs.yaml is a job spec that runs a script that does various file system updates for a tool that runs in the "php8.2" container
[22:55:04] <bd808>	 https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_jobs#Creating_scheduled_jobs_(cron_jobs) is the general replacement for the prior `crontab -e` system for Grid Engine.
[22:58:12] <bd808>	 An early beta solution that was announced today is https://wikitech.wikimedia.org/wiki/Help:Toolforge/Deploy_your_tool. Today there are a lot of restrictions that might keep you from using it yet, but this will be the generally recommended solution for "how do I deploy an updated tool" in the future.
[23:08:18] <kanashimi>	 My updates are more complicated, so I use sh. Does Toolforge/Deploy your tool support sh scripts for updates?
[23:09:17] <bd808>	 Not at this time, no.
[23:09:34] <wm-bb>	 <jeremy_b> https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/blob/main/etc/jobs.yaml#L10
[23:09:36] <wm-bb>	 <jeremy_b> > command: ./bin/update.sh
[23:09:42] <wm-bb>	 <jeremy_b> sounds like yes to me
[23:10:37] <wm-bb>	 <jeremy_b> or maybe this is the page title not 2 different things separated by a slash (re @wmtelegram_bot: <kanashimi> My updates are more complicated, so I use sh. Does Toolforge/Deploy your tool support sh scripts for updates?)
[23:10:51] <bd808>	 two different things. The beta deployment system only works for continuous jobs with build service containers today. But a timer job can do other things.
[23:18:31] <bd808>	 Does anybody remember if there is an existing pywikibot script to delete empty User pages? Use case would cleaning up blank User namespace pages on Wikitech so that Extension:GlobalUserPage pages from metawiki would show up.
[23:19:31] <wm-bb>	 <jeremy_b> are there many? seems like an odd thing to have (re @wmtelegram_bot: <bd808> Does anybody remember if there is an existing pywikibot script to delete empty User pages? Use case would cleaning up bl...)
[23:20:13] <wm-bb>	 <jeremy_b> leftover from openstackmanager?
[23:20:33] <bd808>	 @jeremy_b: I don't know honestly how many there might be. I just ran across one and handled it manually which made me think about a cleanup.
[23:22:28] <bd808>	 The one I found organically looked like a vandal made a edit to a redlink and the "fix" was blanking rather than deleting the page.
[23:22:30] <kanashimi>	 I have actually set up an update program. https://github.com/kanasimi/wikibot/blob/master/wikitech/toolforge-jobs-anchor-corrector.yml#L7
[23:22:50] <kanashimi>	 I use image: node18 when I run the bot program, but when I update it I use image: bookworm .
[23:22:57] <kanashimi>	 Please let me clarify one thing. Every time I use image: node18, will I create a new container, or will I use the container I used before? Does each bot have its own container?
[23:24:41] <bd808>	 kanashimi: `node18` reuses an existing image, but as a new container if that makes sense. The container should be mounting your tool's $HOME from NFS which persistent.
[23:26:17] <wm-bb>	 <jeremy_b> I guess now wikitech should be on a replica since the move
[23:26:27] <bd808>	 PHP based tools get the neat side effect that PHP reads everything fresh from disk for each request, so updating them can be less complicated.
[23:27:14] <kanashimi>	 bd808 I 'm thinking that if we'll reuse the container, then maybe I can just change the image: tf-bullseye-std to image: node18 in the update code. I wonder if I'm wrong?
[23:32:00] <bd808>	 kanashimi: the node18 container has a newer base layer than the tf-bullseye-std container, but yes that would likely work. The main difference between the various pre-build images is the software they have installed. node18 starts from our bookworm base and then adds nodejs and a fewmore bits.
[23:33:06] <bd808>	 You can poke around and see what is in each shared image in the https://gerrit.wikimedia.org/r/plugins/gitiles/operations/docker-images/toollabs-images/+/refs/heads/master repo.
[23:39:47] <wm-bb>	 <jeremy_b> bd808
[23:39:48] <wm-bb>	 <jeremy_b> 114 pages
[23:39:49] <wm-bb>	 <jeremy_b> https://quarry.wmcloud.org/query/95085
[23:41:08] <kanashimi>	 bd808 No, it doesn't work. Every time I run toolforge-jobs load or toolforge-jobs run it uses the existing code. So the question comes back to how do I update my toolforge code. I wonder if we have any other automated means than logging in and running the program manually?
[23:41:41] <wm-bb>	 <jeremy_b> oops sorry that should have been all one line facepalm
[23:41:49] <wm-bb>	 <jeremy_b> trying to sort out removing subpages
[23:42:21] <wm-bb>	 <jeremy_b> 64 hits without subpages
[23:42:44] <bd808>	 kanashimi: your scheduled job would need to do something to update the tool's $HOME content, like `git pull` or similar.
[23:42:53] <wm-bb>	 <jeremy_b> I think that's as far as I can go without a laptop here
[23:43:44] <kanashimi>	 bd808 Yes, exactly.
[23:43:56] <bd808>	 kanashimi: The job from https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/blob/main/etc/jobs.yaml runs a shell script from the tool's $HOME that does a git pull.
[23:44:32] <bd808>	 specifically it runs https://gitlab.wikimedia.org/toolforge-repos/mwdemo/-/blob/main/bin/update.sh
[23:45:44] <wm-bb>	 <jeremy_b> would need some manual review? e.g. Aude was manually self blanked. (re @wmtelegram_bot: <bd808> Does anybody remember if there is an existing pywikibot script to delete empty User pages? Use case would cleaning up bl...)
[23:49:14] <bd808>	 @jeremy_b: Yeah I suppose. That one looks like Aude self-blanked after having left their user page as a random line copied from IRC for 6 years. :)
[23:49:44] * bd808 deletes
[23:51:50] <bd808>	 I'll maybe poke at the list from https://quarry.wmcloud.org/query/95087 when I'm bored
[23:55:40] <kanashimi>	 bd808 I found the reason. It seems to be because the necessary utility program was not found.
[23:55:40] <kanashimi>	 . /wikibot/init.sh: 44: /usr/bin/wget: not found
[23:55:41] <kanashimi>	 . /wikibot/init.sh: 44: /usr/bin/unzip: not found
[23:55:41] <kanashimi>	 Are there any containers that contain these tools?
[23:58:51] <bd808>	 Nothing seems to have `wget`, but everything has `curl`. Apparently we have `unzip` in bookworm based images. That should include the "node18" image.