[00:48:37] !log tools.lexeme-forms deployed 1fc2f98450 (l10n updates) [00:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL [02:03:17] ooooops [09:13:22] !log admin restarting mariadb @ cloudcontrol1004 (T302146) [09:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:13:28] T302146: Galera on cloudcontrol1004 going out of sync - https://phabricator.wikimedia.org/T302146 [09:24:29] !log admin restarting mariadb @ cloudcontrol1003 (T302146) [09:24:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [09:24:33] T302146: Galera on cloudcontrol1004 going out of sync - https://phabricator.wikimedia.org/T302146 [10:29:10] hey folks. is anyone able to take care of recreating views on the wikireplicas? https://phabricator.wikimedia.org/T302233 is the task. a schema change i'm running is causing the views to break. [11:21:02] kormat: you may contact folks on the Data Engineering team [11:24:33] wait, has responsibility for wikireplicas moved teams? [11:27:22] if so, i clearly missed an announcement. can you point me to it, arturo? [11:28:31] kormat: I don't think there was a formal announcement [11:28:45] and is a "recent" change anyway [11:30:25] kormat: it is 100% true that we should send you and your team some information on this. I'll make a note here for my manager [11:32:06] also pretty sure this has been discussed in some monday SRE meeting in the past [11:32:15] discussed, or mentioned at least [12:49:43] oh, interesting. good to know, thanks [12:49:51] do you know who on data-eng i can contact about this? [13:42:43] kormat: you can add the Data-Engineering project tag iirc, and they will pick it up [13:42:59] kormat: try with Razzi in particular [13:43:03] (sorry for the delayed reply) [13:43:10] ^^U [13:43:27] ah, balloons already added it :), so just wait then unless it's urgent [13:45:19] i have no idea how urgent or not it is, but as it's (now :)) their service, i guess they can figure that out :) [17:23:16] I'm seeing errors trying to list and create instances in deployment-prep in horizon, e.g.: [17:23:16] Error: Unable to retrieve instances. [17:23:17] Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. (HTTP 500) (Request-ID: req-e08108eb-d4ab-4842-9016-28958c5d162c) [17:25:49] andrewbogott: ^ possibly related to the angry galera cluster? [17:26:30] dpifke: that is certainly worth a bug report in phabricator. [17:26:47] OK, will report there. [17:33:23] The trace in logstash for that request-id is not really more descriptive. the orm blew up trying to do something from the labweb1001 host. [17:37:25] Filed T302323. [17:37:26] T302323: Errors listing or launching instances in WMCS horizon - https://phabricator.wikimedia.org/T302323 [17:37:29] ARturo and I are in an interview, will catch up with that in 20 [17:38:13] oslo_messaging.rpc.client.RemoteError: Remote error: OperationalError (pymysql.err.OperationalError) (1040, 'Too many connections') [17:39:26] taavi@cloudcontrol1005 ~ $ sudo mysql [17:39:26] ERROR 1040 (08004): Too many connections [17:42:36] not sure if this is normal, but I see tons of neutron things doing something like `UPDATE agents SET heartbeat_timestamp='2022-02-22 17:41:46.726950' WHERE agents.id` [17:43:48] same thing for nova, but neutron is way more common with that I think [17:46:16] most of those are from cloudcontrol1003, so I'm going to restart some of its -api services [18:02:12] ok, I'm here now, catching up [18:02:21] restarting API services is what I would do first :) [18:03:01] we moved to -cloud-admin [22:10:34] !log admin raising project 'maps' quota by two tb -- T300160 [22:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [22:10:39] T300160: Request increased quota for maps Cloud VPS project - https://phabricator.wikimedia.org/T300160