[05:13:58] Is [05:13:59] ``` [05:14:00] You were added to the group tools.wsstats after you started this login session. [05:14:02] You need to log out and in again to be able to "become wsstats".``` [05:14:03] [05:14:05] expected even after relogging in ? [05:14:34] Oh, ah it worked :) [10:34:14] !log taavi@tools-sgebastion-11 tools.wikibugs toolforge jobs restart irc [10:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [11:01:15] !log tools disable grid access for remaining tools still running on the grid T314664 [11:01:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:01:20] T314664: [infra] Decommission the Grid Engine infrastructure - https://phabricator.wikimedia.org/T314664 [11:02:45] !log tools stop grid related VMs T314664 [11:02:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:05:00] !log lucaswerkmeister@tools-sgebastion-10 tools.bridgebot Double IRC messages to other bridges [11:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [13:17:53] If I want to query the database for https://wikisource.org from a replicadb, what should it's database name ? [13:18:58] https://db-names.toolforge.org/ says `sourceswiki` [13:21:17] Ah, hmm, that's interesting, phetools has been running for a while using `oldwikisource` as a alias, I assume it got changed sometime recently ? [13:26:51] There are a few databases outside of oldwikisource that the tool seems to have difficulty connecting to (For example: ```(1044, "Access denied for user 's55771'@'%' to database 'jvwikisource_p'")```) [13:27:27] However, `sql jvwikisource` appears to work [13:34:15] sohom_datta, where does the code get the user/password from? the envvars+replica files have a different one [13:34:26] https://www.irccloud.com/pastebin/9t0lvBn6/ [13:36:19] It just uses `TOOL_DATA_USER` for now (which I explicitly set) (re @wmtelegram_bot: sohom_datta, where does the code get the user/password from? the envvars+replica files have a different one) [13:37:23] sohom_datta: I recommend using the TOOL_REPLICA_USER, that's where the user will be set by the system (even if it changes), it's set by default on all k8s containers (jobs + webservice) [15:12:16] I have two questions if someone could answer them please...... [15:12:55] Instead of jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest --command "pwb -family:PROJECT -lang:LANGUAGE SCRIPT_NAME arg1 -always" JOB_NAME Couldn't I create a job name to call it up when needed? [15:13:34] and when created what would be the test command so it doesn't edit but instead would produce a list so I can see what it wants to do? [15:31:04] 🎊 [15:39:04] PotsdamLamb: for the first one, your command does create a job with a name, so I'm not quite sure what you're asking here. [15:39:18] for the second, you can use a script like https://www.mediawiki.org/wiki/Manual:Pywikibot/listpages.py to list what pages the selectors you give pywikibot would return [15:40:49] bd808: is it intentional that wikibugs now sometimes makes a task title and phabricator link gray? [15:47:07] ^ (re @wmtelegram_bot: taavi: yes, I implemented T140881 yesterday.) [15:47:12] taavi: yes. I implemented T140881 this week along with a number of other small display changes from the long neglected backlog. [15:47:12] T140881: Print events in closed tasks in grey - https://phabricator.wikimedia.org/T140881 [15:47:31] ah, sorry ^^ [15:47:38] ah, I missed that. thanks. [15:48:16] no worries. I moved the cheese. You and greg-g are the two to ask aloud what happened so far. :) [15:48:58] * greg-g highfives taavi [15:53:00] hi [15:54:25] Am I being seen? [15:55:46] Yes, I believe you can define tasks in yaml (re @wmtelegram_bot: Am I being seen?) [15:56:06] Oh hi! Thanks wm.bb [15:56:08] PotsdamLamb: yes, and I think you left irc jsut before taavi answered your questions. Check out the archives [15:56:37] https://wm-bot.wmcloud.org/logs/%23wikimedia-cloud/ [16:01:38] Quick question, my understanding is that phetools (https://phetools.toolforge.org) got killed yesterday in the grid apocalypse. How is it still managing to host it's front page ? [16:06:08] wm-bb Thanks. I need to run the command without making the edits though. I am looking at references.py and I do not want to run against the wiki and get banned [16:07:26] @taavi I understand it create a job, but does it execute it when I submit or just hold it? [16:12:31] creating an one-off job will immediately submit it for execution. [16:13:29] (Also more importantly, as a tool maintainer which file do I modify to add a notice to it) (re @sohom_datta: Quick question, my understanding is that phetools (https://phetools.toolforge.org) got killed yesterday in the grid apocalypse. ...) [16:13:35] interesting that both wikibugs deployments restarted their irc connections at basically the same time... [16:16:57] the SGE hosts were shut down so that is definitely not running there. my best guess is that it's an error in the error handler. I'll have a look later. [16:17:39] @taavi that is what I don't want to do unless there is a command to run as a test? [16:36:15] PotsdamLamb: if you do not want to immediately submit a job, do not use the `toolforge jobs run` command. [16:36:45] @sohom_datta: fixed. the home directory of that tool is not publicly accessible which was breaking the code checking whether a tool was disabled or not. [16:48:31] * dcaro back [17:40:08] !log jjmc89@tools-sgebastion-11 tools.eranbot restart plagiabot jobs T360135 [17:40:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.eranbot/SAL [17:58:21] TFW you fix a failing test and then cannot figure out how that test ever passed before the fix... [17:59:54] I just broke the Cloud VPS web proxy for a few seconds. working on a fix [18:09:48] @taavi I was wondering why we all left lol [18:33:36] !log anticomposite@tools-sgebastion-10 tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # RC reader not reading RC [18:33:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [18:34:05] !log anticomposite@tools-sgebastion-10 tools.stewardbots SULWatcher/manage.sh restart # SULWatchers disconnected [18:34:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:57:09] !log bd808@tools-sgebastion-10 tools.sal Hard stop/start cycle. Pod was running, but somehow the proxy layer was no longer seeing the webservice as active. [19:57:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [20:26:49] @taavi you still around? [20:31:59] PotsdamLamb: you will get an answer much more likely if you ask your question in this public channel with a bit less than 200 people on it compared to asking it in the private messages of an individual person. [20:35:47] I run toolforge jobs -simulate --image tool-pywikibot/pywikibot-scripts-stable:latest --command "pwb -family:simple -lang:en references.py -log" test-references-missing and it tells me usage: toolforge jobs [-h] {images,run,show,logs,list,delete,flush,load,restart,quota}toolforge jobs: error: argument operation: invalid choice: 'tool-pywikibot/pywikibot-scripts-stable:latest' (choose from 'images', 'run', 'show', 'logs', 'list', [20:35:47] 'delete', 'flush', 'load', 'restart', 'quota')I just want to run a test and see the results. I do not want it to write to the wiki [20:36:53] `-simulate` needs to be passed to the pywikibot command inside the `--command` flag of `toolforge jobs run`, not to the jobs command itself [20:38:21] so leave it at run and change --command to -simulate [20:43:50] ok so now the error is i have an invalid argument [20:44:14] do i just change it to -always? [20:46:38] oh well [21:31:27] !log deployment-prep shutting down deployment-puppetdb03, deployment-puppetdb04, deployment-puppetmaster04. These have been replaced with new puppet infra and can be deleted in a couple of weeks if all is well. [21:31:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [21:32:47] andrewbogott: did you start anything like 50 minutes ago on deployment-prep [21:33:03] Beta jobs are failing with host verification issues [21:33:22] I switched everything over to a new puppetserver [21:33:31] I wouldn't expect that to cause failures, what are you seeing? [21:34:16] andrewbogott: https://integration.wikimedia.org/ci/job/beta-scap-sync-world/146771/console [21:34:38] andrewbogott: thcipriani points out https://phabricator.wikimedia.org/T144647#2648622 [21:34:39] Missing public key for deployment-snapshot03.deployment-prep in deployment-deploy03.deployment-prep:/etc/ssh/ssh_known_hosts (managed by puppet) [21:35:20] That ticket is about keyholder. I don't think it's relevant here. [21:36:41] Probably known host keys are managed by puppet and stored in puppetdb. I certainly wouldn't expect them to change, though, unless the old puppetdb had something cached that isn't current anymore? [21:37:18] Does anybody know how to query puppetdb to verify? [21:37:42] you can look in the db itself. I don't know otherwise... [21:37:48] I'll turn the old puppetdb hosts back on [21:38:01] Is puppet running correctly on the hosts whose keys are missing? [21:39:03] I do not know what VMs we're talking about. Maybe it's mentioned in that error log but I don't see it. [21:39:15] If deployment-puppetmaster04 is shut down, what host serves as the puppetmaster? [21:39:20] * andrewbogott is meanwhile wondering why ssh doesn't work on deployment-ms-be08.deployment-prep.eqiad1.wikimedia.cloud [21:40:03] deployment-puppetserver-1 [21:40:11] crikey :-) [21:40:28] It's new, puppet-7 capable. [21:41:04] Does that mean that the `puppetmaster` hiera value needs to be changed to point to it in https://horizon.wikimedia.org/project/instances/e1cc0a74-1f8e-4299-aaf2-6e86c32c14f6/ ? [21:41:05] let me guess: puppet on it has been broken long enough that it doesn't have the current bastions in its firewall config [21:42:00] ugh, the puppetmaster is set in the project-wide puppet, should be removed anywhere it's set for a particular host [21:42:07] dancy: I'll fix that one [21:43:44] good god it's set individually for 29 of the hosts there. [21:43:53] Guess I'll get to work [21:44:09] (this is all likely unrelated to the host key thing, but who knows) [21:45:28] someone really wanted to do this the hard way, thus ensuring that I must also do it the hard way :) [21:45:31] deployment-ms-be08 says 'The last Puppet run was at Thu Aug 24 05:38:35 UTC 2023 (293286 minutes ago).' [21:45:53] hey, you were able to ssh into ms-be? I couldn't even do that with my root key [21:46:00] no, logged in to the console [21:46:03] heh, ok [21:48:04] manually hacked deployment-ms-fe08 so that at least it lets you log in. [21:50:28] Puppet agent has run on deployment-snapshot03.deployment-prep and deployment-deploy03.deployment-prep but deployment-snapshot03.deployment-prep did not end up in /etc/ssh/ssh_known_hosts [21:51:08] taavi: thanks, I'll check that once I fix all these per-host puppetmaster settings [21:51:39] I need to step away to pick up a child. I'll check back in about 30 minutes. [22:01:05] dancy: andrewbogott: all of the VMs listed on https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive&q=project%3Ddeployment-prep&q=alertname%3DPuppetAgentNoResources are now at least reachable via SSH. they're all failing to run puppet with different catalog compilation errors [22:01:14] hrm, I think deployment-snapshot03 is now back in known hosts [22:01:36] went away at 20:42, back at 21:59 [22:02:01] * taavi is trying his best to avoid the usual 'beta is broken as a concept' rant [22:05:52] I'm removing role::swift::storage from the ms-be hosts just to get them a semi-modern run. Then I guess I'll put it back and break them again [22:06:39] beta will never be resourced because there will always be something that's about to replace it but never actually will :) [22:07:08] thcipriani: I would guess andrewbogott fixing the puppetmaster setting will fix the host keys that are breaking scap, as puppet must have been running successfully recently as the host keys were there a bit ago [22:07:40] it might. All the loose ends should be flipping over to the new server after another run or two [22:08:54] seems strange that host keys have been changing every puppet run on the deploy host at least as far back as we have puppet logs, but ... maybe that's expected for some reason? [22:09:10] or maybe something's wheel waring with it [22:10:45] thcipriani: so there are these four hosts that have had broken puppet for 200+ days... can I just delete them? Surely they aren't doing anyone any good. [22:10:57] * andrewbogott proposes the same solution for every problem [22:11:16] webperf21/22 and ms-be07/08 [22:11:49] are these the ms-be hosts? ... yeah. Presumably they're powering something with image uploads [22:12:11] we could shut them off and see what breaks [22:13:05] I note that these would be emailing their distress except that puppet emails seem to be disabled in that project [22:13:44] I'm guessing something will break, but at least that would confirm or deny that they do anyone any good [22:14:32] ok, I'm not curious enough to get into it but https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive&q=project%3Ddeployment-prep&q=alertname%3DPuppetAgentNoResources is a useful thing to look at for anyone who cares [22:14:53] (and https://prometheus-alerts.wmcloud.org/?q=%40state%3Dactive&q=project%3Ddeployment-prep more generally) [22:16:19] meanwhile... I think I have all the (working) host son the same puppet page now. Are you still getting disagreements about host keys? [22:16:31] Running a test now [22:19:41] Lookin' good [22:19:46] Thanks for the fixin! [22:19:48] ok, great [22:19:54] So this is just a lesson in DRY I guess [22:19:59] Sorry for the noise!