[03:31:26] Does someone here also run icinga-wm ? It's SASL configuration is wrong and it's seems to be repeatedly failing to authenticate. [03:32:45] well, something running on *.wikimediacloud.org, not necessarily the actual bot [05:16:24] godog: See glguy's message about icinga-wm ^. [06:40:18] JJMC89: thank you [06:40:55] glguy: that was likely one of the testing environments, did it stop now ? [07:39:33] Hello all: We're developing a tool here in WMDE that is deployed to toolforge, and so I have a couple of questions I couldn't necessarily find answers to in the documentation: [07:40:52] 1) Is it possible to use php 7.3 as the tool? Currently we use the default lighttpd server (PHP 7.3) but the tool account still has php 7.2 installed. Is there a way for me to run php scripts under 7.3 without shelling into the webservice? [07:41:42] 2) What would be the storage limit for each tool account? Our tool involves file uploads and so we would like to prepare for that accdordingly [07:43:55] 3) Does anyone have any experience in `rsync`ing files from their home folder into the tool acount? We want to switch from `rm`ing the whole repo and recopying it on each deploy, but are running into permission issues. [07:47:37] itamarWMDE: for 1), we're in progress of migrating the bastion hosts and the job grid to a newer operating system which would upgrade the php version from 7.2 to 7.3 on the side. that might however still take some time, for now if you need php 7.3 you can try using login-buster.toolforge.org / dev-buster.toolforge.org instead of login. / dev. [07:47:37] respectively, however the bastion and kubernetes webservice containers are fairly different environments so if you experience weird issues you might want to try just running things in a container with `webservice shell` [07:49:40] for 2), there are no hard limits at the moment, but we'll get in touch with you to find better solutions if you're causing issues with the disk usage. the shared nfs isn't unfortunately very suitable for storing tons of larger files, but if you do use it consider using the scratch volume [07:49:40] (https://wikitech.wikimedia.org/wiki/Help:Shared_storage#/data/scratch) for any temporary storage [07:54:11] not at all sure about rsync, i imagine it does not work that well with the concept of a tool account, but you can consider pulling the code from a git repo or similar as the tool account instead of pushing it with rsync / etc [07:54:26] majavah: Thank you for the thorough answers! I didn't know about login-buster, I'll have a look, and if not we'll just keep `webservice shell`ing it for the time being :D. Also thanks for pointing me towards the scratch volume, this sounds like something we could make use of. [07:56:13] yeah, we haven't widely advertised the buster hosts as the job grid isn't ready for debian buster yet, but you work mainly with kubernetes they should work just fine [07:57:30] As for rsync, we currently employ a CI workflow where we automaticall scp into (unfortunately) my home directory and then copy the files over to the tool account. We found it to be more ion line with our way of working than using cron to pull directly into the tool account. I'll let you know if we find a workaround to these permission differences, [07:57:31] though. [07:57:43] As for rsync, we currently employ a CI workflow where we automatically scp into (unfortunately) my home directory and then copy the files over to the tool account. We found it to be more ion line with our way of working than using cron to pull directly into the tool account. I'll let you know if we find a workaround to these permission differences, [07:57:44] though. [07:58:09] (Apologies, keyboard mishap :\) [08:01:25] !log tools.paws Manually run puppet, and it started systemd-timsyncd but did not fail (T287068) [08:01:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.paws/SAL [08:01:29] T287068: Puppet agent failure detected on instance paws-k8s-control-3 in project paws - https://phabricator.wikimedia.org/T287068 [10:09:31] !log toolsbeta livehacking puppetmaster with https://gerrit.wikimedia.org/r/c/operations/puppet/+/705848 [10:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:47:14] !log toolsbeta enabling TTLAfterFinished feature gate on kubeadm live configmap (T286108) [10:47:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [10:47:17] T286108: toolforge-jobs: Clean up old individual job objects - https://phabricator.wikimedia.org/T286108 [10:51:13] !log toolsbeta enabling TTLAfterFinished feature gate on static pod manifests on /etc/kubernetes/manifests/kube-{apiserver,controller-manager}.yaml in all 3 control nodes (T286108) [10:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:01:45] !log tools enabling TTLAfterFinished feature gate on static pod manifests on /etc/kubernetes/manifests/kube-{apiserver,controller-manager}.yaml in all 3 control nodes (T286108) [11:01:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:01:50] T286108: toolforge-jobs: Clean up old individual job objects - https://phabricator.wikimedia.org/T286108 [11:04:57] !log tools enabling TTLAfterFinished feature gate on kubeadm live configmap (T286108) [11:05:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:58:38] !log toolsbeta deploying jobs-framework-api 07346d715d17585db9c16dd152cc91ef0bea33c3 (T286108) [11:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [11:58:42] T286108: toolforge-jobs: Clean up old individual job objects - https://phabricator.wikimedia.org/T286108 [11:59:12] !log tools deploying jobs-framework-api 07346d715d17585db9c16dd152cc91ef0bea33c3 (T286108) [11:59:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:35:18] !log tools updating systemd on toolforge stretch bastions T287036 [14:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [14:35:23] T287036: Figure out a patched backport of systemd 241 for stretch - https://phabricator.wikimedia.org/T287036 [15:00:33] !log paws starting kubernetes upgrades T280302 [15:00:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [15:00:36] T280302: Upgrade PAWS Kubernetes to the latest 1.18 release - https://phabricator.wikimedia.org/T280302 [15:52:12] godog: something on nat.cloudgw.eqiad1.wikimediacloud.org is still failing to identify to icinga-wm as recently as 4 minutes ago [15:52:47] !log paws add my key to passwords::root::extra_keys [15:52:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [16:59:15] !log paws deploying calico v3.18.4 T280342 [16:59:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [16:59:19] T280342: Upgrade Calico to 3.18 - https://phabricator.wikimedia.org/T280342 [17:10:27] !log tools deploying calico v3.18.4 T280342 [17:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:10:31] T280342: Upgrade Calico to 3.18 - https://phabricator.wikimedia.org/T280342 [17:39:15] bd808: I think we may be dealing with some potential file system corruption on the Cloud VPS. [17:40:00] There is a folder in cyberbot-exec-iabot-01 called MemoryFiles with files that can neither be accessed nor deleted with the error "Structure needs cleaning" [17:40:27] 👋 [17:42:15] andrewbogott: ^ [17:46:28] Cyberpower678: where is the folder? [17:46:37] and also what is the fqdn of the affected VM? [17:46:45] andrewbogott: /home/IABot/MemoryFiles [17:47:24] cyberbot-exec-iabot-01.cyberbot.eqiad1.wmflabs [17:47:52] that is definitely not the fqdn but I'll figure it out [17:48:22] Sorry. :p [17:48:34] Maybe it would help if I knew that too. :p [17:50:24] Is 'MemoryFiles' something that your software created or is it a dir that just appeared? [17:51:15] It's created by IABot [17:51:35] It is maintained by IABot, as in it deletes the junk files automatically. [17:51:51] ok [17:57:11] andrewbogott: ping me if you find something interesting there. [17:57:17] sure [17:57:27] And if you can get that folder deleted, I would greatly appreciate it. [19:49:17] Cyberpower678: fyi, I am running a massive job looking for files that are broken like that on other VMs. It'll take a few hours before I learn whether or not it's just you :) [19:52:59] !log paws deployed new rbac for maintain-kubeusers changes T285011 [19:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [19:53:03] T285011: Add process to stop and disable k8s jobs for disabled tools - https://phabricator.wikimedia.org/T285011 [19:53:48] !log paws deployed new maintain-kubeusers T285011 [19:53:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [19:55:14] !log tools deployed new rbac for maintain-kubeusers changes T285011 [19:55:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:01:54] !log tools deployed new maintain-kubeusers to toolforge T285011 [20:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:01:59] T285011: Add process to stop and disable k8s jobs for disabled tools - https://phabricator.wikimedia.org/T285011 [20:03:13] One thing that’s been discussed before is bringing your own Docker containers to the Toolforge Kubernetes cluster. Have we gotten any closer to that? [20:05:20] @harej: kind of, but not exactly in a byo way. There has been work on buildpack based container creation that would allow a Heroku like way to describe the custom software you want installed on top of a Toolforge specific base. [20:06:09] The k8s cluster in Toolforge currently has requirements for NFS mounting and LDAP NSS data that are not compatible with "bring your own". [22:30:15] It's arguable that those particular requirements are unnecessary with custom images...and that stuff is likely to be optional or even unavailable directly in buildpack-based deployemnts [22:30:34] In fact, all PoC work so far has quite specifically disallowed them [22:30:51] buildpacks require a consistent UID across the build cycle [22:31:00] that is, not your developer account [22:32:10] A bigger blocker to straight byo is the cranky WMCS admins who don't want chaos in the image repository. That's why we want buildpacks: to control some of the layers that get in the repo.