[14:45:50] urandom: deployment-sessionstore06.deployment-prep.eqiad1.wikimedia.cloud is now confused about cassandra versions... do you have time (and interest?) to sort that out? I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/1121102 is the breaking change but I don't know what the right version is. [14:46:56] andrewbogott: a) oh how I loathe deployment-prep, and b) sure, happy to help! :) [14:47:08] thank you, and I'm sorry [14:47:20] naw, I'm joking [14:47:53] looks like deployment-restbase05 has the same error -- both are bullseye hosts trying to install 4.1.7 when they already have 4.1.8 installed. [14:56:00] andrewbogott: it's a ....err race between updating the apt repo and applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1123471 (a race with 12 or so hours between steps) [14:56:28] oh, so the hosts just need 'apt update'? [14:56:29] so as soon as puppet syncs up it should be ok (will monitor) [14:56:35] great, thank you! [14:57:00] well, the component/cassandra41 repo now has 4.1.8, but the corresponding puppet code doesn't yet pin that to 4.1.8 [14:57:04] (it's still 4.1.7) [14:57:14] ah, ok. [16:46:25] i have an instance, taxonbot3 of the dwl project: the crontab reportedly has stopped working and i can't reach it via ssh. can somebody please look into it? is it safe to reboot it to make it work again? is that even necessary? [16:50:39] gifti: I can ssh to it [16:52:19] I don't see any crontab though, do you know where I should look? [16:52:47] rebooting is generally safe, but some services might not restart automatically [16:54:06] ok I think I found the crontab, under user "taxonbot" [16:54:24] yes [16:54:51] it looks like it's running, according to /var/log/cron.log [16:55:03] is it a network issue? [16:55:38] might also be a permissions thing [16:55:52] I see some files are being written in /home/taxonbot [16:55:56] what is the issue you are seeing? [16:56:11] the bot doesn't make edits [16:56:19] and i can't connect via ssh [16:56:41] taxonbot (the human user) can't either [16:56:42] ok. so the bot not making edits could be a permissions issue yes, but it's just a guess [16:56:55] let's try solving the ssh issue, what error are you getting when trying ssh? [16:57:07] Connection closed by UNKNOWN port 65535 [16:57:16] i can connect to other instances fine [17:00:04] can you please run the ssh command adding "-vv" and copy/paste the output to https://etherpad.wikimedia.org/p/yAqOuwkDKuhwhwjvd44j ? [17:00:28] or to https://phabricator.wikimedia.org/paste/ [17:00:41] anywhere I can look at it [17:01:29] it's on the pad [17:02:05] I think etherpad is having some issues with big pastes :/ I cannot see anything [17:02:35] ok, let me try the phabricator paste [17:04:05] https://phabricator.wikimedia.org/P73889 [17:04:10] thanks! [17:10:26] oops, i realized just now that i somehow left out the 'a' from taxonbota3 above [17:13:33] ahhh ok that makes sense! [17:13:39] let me look at the one with the "a" [17:14:37] fatal: Access denied for user gifti by PAM account configuration [preauth] [17:16:38] I also see some errors related to the "taxonbot" user [17:18:17] can you try now? [17:18:44] ssh works [17:18:58] nice! I restarted the sssd service that manages the authentication [17:19:18] thank you a lot, dhinus! [17:19:36] !log dwl "systemctl restart sssd.service" to fix authentication issues [17:19:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Dwl/SAL [17:19:55] I hope that also fixes the other problem, otherwise let me know :) [17:24:54] will do [18:36:14] Hi, I get this error [18:36:16] `ERROR - Failed to start application: [Errno 13] error while attempting to bind on address ('0.0.0.0', 80): permission denied` [18:36:26] In toolforge [18:38:48] from aiohttp import web [18:38:49] site = web.TCPSite(self.web_runner, '0.0.0.0', 80) [18:44:28] What are the most common ways of monitoring toolforge deployed web applications? Memory, CPU, number of requests, or else. Where can I take a look at already available metrics? [18:49:42] I would argue that the one thing you care about is that it returns a 200 on the actual web page and all other things don't have much relevancy. as in "either the user gets content or they dont" [19:07:38] @GergesShamon: you should be using port 8000 rather than port 80. Binding to port 80 requires root privileges. [19:11:12] @arcstur: the dashboard at https://grafana.wmcloud.org/d/TJuKfnt4z/kubernetes-namespace?orgId=1 will give you some basic heath metrics based on Kubernetes namespace. There is also https://toolviews.toolforge.org/api/ for getting info on 2xx status HTTP requests per tool. [19:12:07] Will web run like this in toolforge? (re @wmtelegram_bot: @GergesShamon: you should be using port 8000 rather than port 80. Binding to port 80 requires root privileges.) [19:12:38] All of the `webservice` tooling expects that your app presents on port 8000 [19:27:39] (the port to bind on is also available in the `$PORT` environment variable but I guess the 8000 is expected to be relatively stable ^^) [19:39:15] I ran a project on port 8000 in toolforge, how do I run web? (re @lucaswerkmeister: (the port to bind on is also available in the $PORT environment variable but I guess the 8000 is expected to be relatively stabl...) [20:40:07] @GergesShamon: Start by reading https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web and then do feel free to ask specific questions that are not covered in the documentation.