[07:21:32] (MysqlReplicationLag) firing: MySQL instance pc2014:9104 has too large replication lag (6m 46s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [07:31:32] (MysqlReplicationLag) resolved: MySQL instance pc2014:9104 has too large replication lag (15m 46s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [07:34:24] pc2014 is maintenance? [07:34:32] (MysqlReplicationLag) firing: MySQL instance pc2014:9104 has too large replication lag (14m 33s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [07:34:39] Yes I was upgrading mysql [07:35:28] ah, ok [07:39:32] (MysqlReplicationLag) resolved: MySQL instance pc2014:9104 has too large replication lag (6m 19s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [07:45:01] I'm going to continue my maint work, first old master of s3 [07:51:32] (MysqlReplicationLag) firing: MySQL instance pc2014:9104 has too large replication lag (6m 54s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [07:54:12] at least we know the alerts work :D [09:15:36] Something is going on with pc2014 I think [09:15:53] It's been stuck in a delete for 1h [09:17:07] And it looks like it was lagging before I upgraded it too [09:21:43] "init for update" :-( [09:21:47] stuck at https://phabricator.wikimedia.org/P34324 [09:26:48] Looking at the binlog it isn't different from any other delete on tha table [09:37:59] how did you fix it [09:38:26] it is not fixed [09:38:30] I am still checking [09:43:29] Might open a bug [09:43:38] I don't think I can get more to the root of what is causing it [09:46:06] mmm what I am going to do is downgrade the package, I just realised it is running 10.6.9 which wasn't recommended by mariadb [09:46:12] So going to go to 10.6.8 [09:47:34] is it because there is/there will be a newer one or something else? [09:49:21] jynus: https://jira.mariadb.org/browse/MDEV-27983?focusedCommentId=233888&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-233888 [09:49:45] I see [09:50:28] so the delete ran manually flies [09:51:32] (MysqlReplicationLag) resolved: MySQL instance pc2014:9104 has too large replication lag (2h 1m 54s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [09:52:32] (MysqlReplicationLag) firing: MySQL instance pc2014:9104 has too large replication lag (2h 5m 22s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [10:04:57] hmm, it resolves and fires again, I guess because of "no response" that happens sometimes [10:05:09] it is catching up for now [10:05:13] but still lagged [10:24:52] pc2014 caught up [10:24:59] no idea if removing 10.6.9 was the fix or what [10:27:32] (MysqlReplicationLag) resolved: MySQL instance pc2014:9104 has too large replication lag (5m 6s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=pc2014&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [10:42:43] ms-codfw has rings updated to full weight on new nodes and 0 weight on old nodes \o/ (it'll finish rebalancing to reflect this change over the next few hours) [12:03:19] lukasz just shared a slides template to be used for the State of the Union presentation at the summit [12:03:43] I'll make a copy for this team to be worked on, i imagine we'll divide it into the 3 areas: databases, distributed storage (swift/cassandra...) and backups? [12:05:30] that makes sense yeah [12:05:49] although it's already divided into two "chapters": what has been achieved, and what's ahead [12:05:58] but maybe at least separate slides for these areas then make sense :) [12:12:11] bah, shared it with a "copy of" title [12:13:09] remind us how long presentation (in time or slides) should be? [12:38:19] Lukasz says, 15-20 minutes [12:41:26] so I guess 5 min per area in total is not a bad aim to start with :) [12:41:48] so essentially 2 slides per area (one for achievements and one for what's ahead)? [12:41:55] or more [12:42:09] it really depends on the slides [12:42:20] sometimes you spend multiple minutes on a slide, sometimes it's 10s [12:42:29] I can see dbs taking more time tbh [12:42:42] yeah maybe there's more to say on dbs than on some other areas [12:42:44] that's fine :) [12:42:51] let's start and see [14:37:21] Question: How do you connect to a server's management interface? Specifically, how do you authenticate? [14:42:25] urandom: do you have access to pwstore repo? [14:42:54] marostegui: I was afraid this was going to be the answer :( [14:43:01] No, and it's not for a lack of trying [14:43:09] :( [14:43:23] did you talk to moritzm about it to see when you can get it granted? [14:44:15] I was working with mutante, who -now that you mention it- was going to consult moritz...and then a long weekend or something happened, and I never followed up [14:45:15] if you need something simple like a powercyle or something I can do that for you [14:46:19] Ok, that would be great: restbase1021 is unresponsive [14:47:12] ok let me see [14:48:57] I can access the prompt [14:49:04] Let me check if I can login and see what's going on [14:51:56] I have rebooted it, I wasn't getting to the prompt after tying to log as root :( [14:52:10] awesome; thank you [14:52:19] I am attached to the serial to see if there are any errors during the boot up [14:54:25] urandom: server is back [14:55:47] yup, I'm in. Thanks for the assist; I'll resume my efforts to get my key added :/ [14:55:53] good luck! :) [21:04:32] (MysqlReplicationLag) firing: MySQL instance db1139:13311 has too large replication lag (58m 42s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1139&var-port=13311 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:09:32] (MysqlReplicationLag) resolved: MySQL instance db1139:13311 has too large replication lag (18m 25s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1139&var-port=13311 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:14:32] (MysqlReplicationLag) firing: (2) MySQL instance db1139:13311 has too large replication lag (9m 58s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:19:32] (MysqlReplicationLag) resolved: (2) MySQL instance db1139:13311 has too large replication lag (18m 25s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:29:32] (MysqlReplicationLag) firing: (2) MySQL instance db1150:13314 has too large replication lag (1h 39m 43s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:34:32] (MysqlReplicationLag) resolved: (2) MySQL instance db1150:13314 has too large replication lag (22m 14s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:44:32] (MysqlReplicationLag) firing: (2) MySQL instance db1150:13314 has too large replication lag (22m 14s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:49:32] (MysqlReplicationLag) firing: (3) MySQL instance db1150:13314 has too large replication lag (9m 44s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [21:54:32] (MysqlReplicationLag) resolved: (3) MySQL instance db1171:13318 has too large replication lag (13m 29s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag