[01:20:29] could someone with a working email please send a message to whatever mailing list it is, that dupdet is crashing and anyone with knowledge of php is welcome to query Gryllida on-wiki to request access to the tool for debugging. source of tool https://github.com/jamesryanalexander/Duplication-Detector May be issue with version of php or webserver. Thanks [12:30:40] !log tools.wmde-graphql-demo disabled tool per T305687 [12:30:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wmde-graphql-demo/SAL [12:30:45] T305687: Archive/delete tool wmde-graphql-demo - https://phabricator.wikimedia.org/T305687 [12:31:54] !log tools.fc-importer disabled tool per T305404 [12:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.fc-importer/SAL [12:31:56] T305404: Archive/delete tool fc-importer - https://phabricator.wikimedia.org/T305404 [12:32:38] !log tools.welcomer disabled tool per T305388 [12:32:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.welcomer/SAL [12:32:40] T305388: Delete tool welcomer - https://phabricator.wikimedia.org/T305388 [12:33:40] !log tools.wlm-de-redirect disabled tool per T305377 [12:33:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wlm-de-redirect/SAL [12:33:42] T305377: Archive/delete tool wlm-de-redirect - https://phabricator.wikimedia.org/T305377 [12:34:28] !log tools.wikidiff2-dev-test disabled tool per T305376 [12:34:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikidiff2-dev-test/SAL [12:34:30] T305376: Archive/delete tool wikidiff2-dev-test - https://phabricator.wikimedia.org/T305376 [12:35:59] !log tools.catgraph disabled tool per T305374 [12:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.catgraph/SAL [12:36:02] T305374: Archive/delete tools catgraph, catgraph-jsonp & cgstat - https://phabricator.wikimedia.org/T305374 [12:36:22] !log tools.catgraph-jsonp disabled tool per T305374 [12:36:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.catgraph-jsonp/SAL [12:36:50] !log tools.cgstat disabled tool per T305374 [12:36:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cgstat/SAL [12:38:05] !log tools.james disabled tool per T305289 [12:38:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.james/SAL [12:38:07] T305289: Archive/delete tool “james” - https://phabricator.wikimedia.org/T305289 [12:39:00] !log tools.hoo-propertysuggester-test disabled tool per T303597 [12:39:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.hoo-propertysuggester-test/SAL [12:39:02] T303597: Delete tool hoo-propertysuggester-test - https://phabricator.wikimedia.org/T303597 [14:15:02] !log tools.shex-simple toolforge-jobs run update --command '~/update.sh' --image tf-php74 --schedule '0 * * * *' # T305944; php74 image chosen because tf-bullseye-std doesn’t have git [14:15:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.shex-simple/SAL [14:15:30] !log tools.shex-simple crontab -r # T305944 [14:15:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.shex-simple/SAL [15:28:18] !log toolhub Updated demo server to 5c2ef1 [15:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolhub/SAL [16:23:02] !log tools.shex-simple updated public_html worktrees and update.sh to change master to main (public_html/master is now a symlink to main) [16:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.shex-simple/SAL [16:34:23] !log gitlab-runners pausing runner-1013, then will remove it and create new bullseye runner to replace it [16:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Gitlab-runners/SAL [17:02:51] !log gitlab-runners pausing runner-1014, then will remove it and create new bullseye runner runner 1025 to replace it [17:02:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Gitlab-runners/SAL [17:35:57] hmmm.. some issue with cinder volumes that I'm not sure how to debug [17:36:40] I have been creating new instances in gitlab-runners and 4 out of 5 it just worked and automatically got a /var/lib/docker mount in /etc/fstab [17:36:53] also I can see the volumes in Horizon [17:37:23] there are 10 volumes and 10 instances, though they are not mapped to instances but to the project as such [17:37:42] now with my latest new instance I get " no volumes are available to mount" [17:38:04] I deleted an old instance, then created a new instance to replace it, as before [17:38:36] maybe the only difference was I did it a bit faster.. so maybe there is some race about releasing the volume [17:40:19] maybe because I did the 'switch puppetmasters' process faster.. but there is that mechanism that prevents me from running puppet manually before bootstrap is done..so.. [17:43:31] rebooting the instance fixed it! :) [17:43:55] didn't have to do that before but .. just randomly tried it. good [17:44:32] have you tried turning it off and on again? :-P [17:44:38] lol, precisely [17:45:42] or of course it's possible it was just "the 5 minutes are over" and reboot was pure coincidence.. but I'm ok with not knowing [17:47:06] but no.. I created it 20 min ago and I did that yesterday also within 20min, anyways :) [18:07:04] 'no volumes are available to mount' is a message from Horizon? [18:07:36] mutante: ^ [18:08:21] andrewbogott: no, it's a message you see in puppet output when it tries to use "cinderutils" [18:08:24] file: /etc/puppet/modules/cinderutils/manifests/ensure.pp [18:08:45] Error: Could not retrieve catalog from remote server: Error 500 on SERVER .. [18:08:52] To proceed, create and attach a Cinder volume .. [18:09:29] that then says it's because "No mount at /var/lib/docker and no volumes are available to mount." [18:09:38] after reboot or in the other cases.. it "just works" [18:09:47] ok. [18:10:03] I don't know, that sounds like it wasn't seeing the volume in lsblk, so that's what I would check (if we had a misbehaving VM now which we don't) [18:10:07] an instance in that state does not have "docker" in /etc/fstab [18:10:11] while the working ones do [18:10:36] I have a few more to do over the coming days [18:10:47] so let's see if it happens again [18:10:54] one out of 10 does not count yet [18:12:13] I feel like it is about timing between deleting an old instance and creating a new instance and if you don't rush it and give it time then it's not an issue. [18:12:25] but will try to confirm that [18:13:13] ACK @ lsblk, thanks [18:17:06] mutante: that puppet manifest is using wmcs-prepare-cinder-volume as far as I know, so all the interesting stuff is probably happening in there. [18:17:46] andrewbogott: *nod*, thank you [18:23:07] Hi all, I started the reimaging of clouddb hosts for T299480, and clouddb1013 completed successfully; however there's an error about `/usr/bin/wmf-pt-kill` not found, and sure enough, it is missing on the newly reimaged clouddb1013, and I see `/usr/bin/wmf-pt-kill` on clouddb1017 which I haven't reimaged yet. I guess this script is missing from puppet? Anybody have context? No rush, the whole host is downtimed currently [18:23:08] T299480: Upgrade clouddb* hosts to Bullseye - https://phabricator.wikimedia.org/T299480 [18:34:34] razzi: is it still gone after running puppet a second or third time? [18:34:43] or maybe just on the first run [18:35:31] that is fairly common that a puppet role works after the 2nd run and then it's fine [18:35:47] except when using cookbook and the role is already applied ..then that would fail [18:36:52] razzi: It looks to me like that should get installed by apt. Try apt-get update and then re-run puppet? [18:36:52] oh wait, ignore that, I found something different, there is code that absents the wmf-pt-kill stuff [18:37:17] oh, uhoh [18:38:13] "if $instances" then it removes it [18:38:58] it does a lookup('profile::wmcs::db::wikireplicas::mariadb_multiinstance::instances' [18:39:08] to then decide based on that [18:39:19] whether it should stop and mask wmf-pt-kill or not [18:40:27] hosts/clouddb1013.yaml:profile::wmcs::db::wikireplicas::mariadb_multiinstance::instances: [18:40:38] clouddb1013 has it's own hosts file [18:40:42] that sets this value [18:42:04] but the other hosts do as well.. [18:43:21] razzi: sorry for distracting with that.. do what andrewbogott said , he is right. the reason is: [18:43:27] [apt1001:~] $ sudo -E reprepro ls wmf-pt-kill [18:43:27] wmf-pt-kill | 2.2.20-1+wmf5 | stretch-wikimedia | amd64, i386, source [18:43:27] wmf-pt-kill | 3.1.0-1+wmf6 | buster-wikimedia | amd64, i386 [18:43:36] if that is for bullseye..it's just not in APT [18:44:58] oh, the package isn't there at all? Maybe we can just copy it over. [18:45:04] if it isn't python2 :/ [18:45:11] * taavi guesses Perl [18:45:21] "but maybe you can copy it over" was about to type that..but depends [18:47:28] https://gerrit.wikimedia.org/r/q/project:operations/debs/wmf-pt-kill [18:48:06] lib/Percona/Toolkit.pm [18:48:59] maybe ask the owner of https://gerrit.wikimedia.org/r/c/operations/debs/wmf-pt-kill/+/585495 about it [18:56:48] sorry, I'm in too many conversations at once. Trying to circle back to this one... [18:59:04] razzi: T305974 [18:59:05] T305974: Provide wmf-pt-kill on Debian Bullseye - https://phabricator.wikimedia.org/T305974 [18:59:37] OK if we block on that until Manuel expresses an opinion? If you're super blocked I can try a package copy right now but no idea if it'll work (and I'd have to do some research even to know how to test) [19:00:37] also razzi your downtime on 1013 just expired [19:57:10] razzi, mutante, I copied that package from the buster repo to the bullseye one and everything seems fine *shrug* [20:00:21] andrewbogott: cool, confirmed it's on apt1001. ahh..and just read the ticket, even DBA already chimed in, how nice and quick [20:20:46] Can I bribe a merge if https://gerrit.wikimedia.org/r/c/labs/tools/wikibugs2/+/779112 [21:19:48] !log tools.grid-deprecation Added komla as co-maintainer [21:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.grid-deprecation/SAL [21:24:01] !log tools Add komla as projectadmin (T305986) [21:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:24:03] T305986: Grant Komla Sapaty tools admin rights - https://phabricator.wikimedia.org/T305986 [21:27:07] !log tools Added komla to 'roots' sudoers policy (T305986) [21:27:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:31:16] !log tools.admin Added komla as co-maintainer (T305986) [21:31:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL [21:31:19] T305986: Grant Komla Sapaty tools admin rights - https://phabricator.wikimedia.org/T305986 [21:32:44] !log tools Added komla to Gerrit group 'toollabs-trusted' (T305986) [21:32:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:44:45] Hi there! I was looking to do some AI stuff on Toolforge/Cloud VPS. Are the servers CUDA-enabled (a.k.a. have GPUs)? I'm 99% leaning towards no but just wanted to make sure before I ask for a grant for APIs :) Thanks! [22:47:47] Nope [22:52:21] darn, there go my plans to run a bitcoin mining op [22:54:01] hasn't stopped people from trying [22:58:34] I think we only has 8 GPUs anywhere and certainly none in Cloud VPS at this point. It turns out that finding a GPU with open source firmware is hard.