[00:12:44] !log wikisp Removing role::simplelamp2 from ceres-01 [00:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikisp/SAL [00:24:38] (I won't be able to restart bridgebot until tomorrow I'm afraid) [03:10:31] !log tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # Ping timeout [03:10:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [05:07:29] !log tools.stewardbots ./SULWatcher/manage.sh restart # SULWatchers disconnected [05:08:44] !log tools.stewardbots ./stewardbots/StewardBot/manage.sh restart # Ping timeout [08:37:29] Hi, I have a script that I estimate can require up to ~100 GB of memory. How can I run it here? [08:37:45] I tried the grid, but it seems to make my job queue endlessly if I ask for more than ~17.5 GB [08:44:18] !log tools.bridgebot Quintuple IRC messages to other bridges [08:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [08:45:27] that's a lot of memory Leaderboard. The only way to run that script on Cloud VPS would be to use a dedicated virtual machine, and even with that it would be problematic. I suggest opening a phab task a discussing with others what you want to achieve. Perhaps the are coding alternatives to such amount of memory needs [08:45:55] perhaps there are* [09:05:04] arturo: my code is https://github.com/Leader-board/Wikimedia-statistics/blob/main/Global%20user%20table%20generator/Main.java [09:06:39] It's not complex; the high memory requirement is due to the volume of data it needs to process. [09:06:58] I do intend to file a phab task if needed. [09:36:56] thanks [11:35:03] topranks: you around today? [11:37:17] I've read today's a bank holiday in Ireland :) [11:37:54] dhinus: indeed yes :) [11:38:27] arturo: I’m close to my machine if it’s something urgent or quick, otherwise catch up tomorrow [11:38:57] topranks: ok nevermind then! Tomorrow is a bank holiday in spain, so talk to you on Wed [11:40:24] Doh! I’m off for the rest of the week then :( [11:40:29] I will do some work tomorrow based on our discussion on Friday and send it to you anyway and we can review when we’re both back. [11:40:44] ok :-) enjoy your time off [11:41:06] my update was on the keepalived mystery: I see no hints in the packet captures [11:41:22] I plan to update T320975 anyway [11:41:22] T320975: Toolforge hosted IRC bots occasionally disconnecting - https://phabricator.wikimedia.org/T320975 [11:52:37] !log paws jupyterhub to 3.0 T318271 dc95efa69e1e9887daaa6320dcbd4acc94f624de [11:52:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Paws/SAL [11:52:40] T318271: Upgrade jupyterhub - https://phabricator.wikimedia.org/T318271 [13:09:23] !log admin restart keepalived on all 4 cloudgw servers to run them with `-D` in /etc/default/keepalived to further debug T320975 [13:09:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [13:09:27] T320975: Toolforge hosted IRC bots occasionally disconnecting - https://phabricator.wikimedia.org/T320975 [13:12:49] I think I had a dream last night where I got something in the mail (post office) from someone that said it was hard to understand what I was saying on telegram in a direct 1:1 DM not involving a bridge because messages were tripling. [13:16:32] but still why does it need to all be in memory? (re @wmtelegram_bot: It's not complex; the high memory requirement is due to the volume of data it needs to process.) [13:26:38] hi, can someone restart stashbot please? https://wikitech.wikimedia.org/wiki/Tool:Stashbot#Maintenance are docs :) [13:28:03] thanks to whoever did it [13:33:00] short-lived, though it may have been. :D [13:53:02] Someone can review this? T321763 [13:53:03] T321763: [ERROR] Can’t open and lock privilege tables: Table ‘mysql.servers’ doesn’t exist in engine - https://phabricator.wikimedia.org/T321763 [13:53:36] I've one VM with simplelamp2 role and a lot of info in the database [13:56:14] when you say rebuild on the ticket what exactly do you mean? (re @wmtelegram_bot: I've one VM with simplelamp2 role and a lot of info in the database) [13:56:23] what have you done so far? [13:57:08] Deus: if you have the database data on a cinder volume, check that the extra volume was mounted correctly [14:00:29] jeremy_b In the second vm I tried to enter mariadb via skip-tables and found that only information-schema was the only database available. Then I tried to make him rebuild the vm to evaluate, after all it was just a test environment. There I realized that the problem was in the role. [14:01:13] Since the October 20 restart, this situation has occurred. [14:01:49] maybe there's something useful in puppet logs on the old VM? [14:03:17] Sadly, I don't have cinder volume [14:19:35] Deus: if this is a brand new setup you probably still need to do 'mysql_install_db' or similar to get the initial db set up. [14:19:49] (sorry, I can't guess at specifics but that might give you an idea of what to google) [14:20:12] If it's an old setup then... tell me the fqdn and I can take a look [14:20:50] "Debug: /Stage[main]/Mariadb::Config/File[/srv/sqldata/my.cnf]: Nothing to manage: no ensure and the resource doesn't exist" [14:23:23] andrewbogott, mars.wikisp.eqiad1.wikimedia.cloud [14:37:03] Deus: I have to go in a minute and I'm not finding anything obvious. This link is possibly related, although it doesn't really explain why this would've happened on two servers at once: https://mellowhost.com/blog/error-cant-open-and-lock-privilege-tables-table-mysql-servers-doesnt-exist-in-engine-resolution.html [14:37:26] I also wonder if for some reason mariadb was upgraded by accident? But that's just a guess, no evidence for that. [14:45:45] andrewbogott, The last time I logged into the vm was October 14 and it was working fine. I logged in on the 20th to the three platforms that exist and they were all throwing the same. [14:45:46] I assumed it was something related to the reboot you mentioned. [14:46:26] yes, could be due to the reboot although ideally that doesn't cause db corruption :) [14:46:32] I need to go but someone else may be able to help. [14:48:37] I will try to recover it somehow, if it is not possible I will rebuild it too. [15:14:19] Deus: I can't help but think it's just looking in the wrong place for those tables and not finding them. A quick look at my.cnf didn't really support that theory though [15:17:53] I reinstalled mariadb, now I can access. There is still the matter of putting the file I copied in its place. I guess by "missing authorizations" it means that I have to recreate the database users, isn't it? [15:18:20] Sounds like it but I'm not positive. [15:18:40] the existence of /srv/sqldata/mysql_upgrade_info on the mars instance with a timestamp of 2022-10-31 14:54 (~20 minutes ago) seems suspicious... [15:19:23] did apt try to upgrade for some reason that tripped off the permissions problems? [15:20:06] bd808: they just reinstalled as part of troubleshooting. [15:22:37] Deus: ^^ [15:23:52] argh. the flapping of irc bots continues [15:24:29] bd808: hopefully that's the last one [15:24:37] I just merged the puppet change that I believe will solve the problem [15:24:51] bd808: yup, I reinstalled mariadb. I saw no other options beyond that. Now there is the issue of passing the databases. [15:25:12] arturo: :) awesome. [15:25:25] Deus: *nod* [15:27:41] !log tools.bridgebot Double IRC messages to other bridges [15:30:59] !log tools.stewardbots Restart StewardBot [15:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [15:46:13] !log tools.stewardbots ./SULWatcher/manage.sh restart # SULWatchers disconnected [15:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [15:56:33] !log deployment-prep shutting down deployment-echostore01, deployment-ms-be0[56], deployment-mdb01, deployment-prometheus02, deployment-wikifeeds01 as per https://phabricator.wikimedia.org/T306068 [15:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [22:35:14] Self-service tool deletion is finally here! See https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/6NMELOY26VDPSN66VJ7DZBWOVM7ZS6GM/ for more details. [22:46:01] !log tools.bridgebot Double IRC messages to other bridges [22:46:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bridgebot/SAL [22:47:15] ^ fyi arturo re of “hopefully that’s the last one” earlier [22:47:21] but maybe bridgebot’s issues were different anyways, idk [22:47:43] it seemed to sometimes *not* have the duplicate-messages issue when other bots were having problems [22:48:39] bd808: nice \o/