[10:26:32] Emperor: o/ - Jesse has reimaged thanos-be2005 yesterday, I hope to finish 1005 today, after that you should be free to proceed [10:30:31] super cool, thanks :) [11:05:19] 1005 done! I'll wait for dcops to confirm that we are good [11:05:24] after that the hosts will be ready [11:05:41] Emperor: if you want to check 1005/2005, if the have all the disks etc.. please go ahead [11:05:53] let's just wait for the green light from dcops before adding them to prod [11:06:01] hopefully in the eu afternoon [11:13:40] 2005 looks good [11:17:57] sdv1 on 1005 was a bit sad, I'm trying a wipe-and-recreate [11:18:39] seems better now. [11:20:44] elukey: I'm happy to proceed once dcops say they're OK [11:26:04] super [12:02:28] Could I get reviews on 3 changes, please? They are: [12:02:28] https://gitlab.wikimedia.org/repos/data_persistence/swift-ring/-/merge_requests/8 to teach the ring manager about codfw rack D3's network [12:02:28] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1093884 to add the new nodes to swift::backends [12:02:28] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1093885 to actually add them to the rings [12:03:03] The first of these I can/will merge now once approved, the second and third are awaiting DCops confirming they're happy with the new nodes, but they could usefully get +1s now so I can +2/merge once we're good to go [17:19:08] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 111.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [17:25:48] FIRING: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (3m 20s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [17:25:48] FIRING: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (3m 20s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [17:35:48] RESOLVED: MysqlReplicationLag: MySQL instance db1206:9104@s1 has too large replication lag (1m 23s). Its replication source is db1163.eqiad.wmnet. - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [17:35:48] RESOLVED: [2x] MysqlReplicationLagPtHeartbeat: MySQL instance db1206:9104 has too large replication lag (1m 23s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=db1206&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLagPtHeartbeat [17:39:08] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 1 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:32:51] jhathaway: DYT the two new thanos-be nodes are OK for use, please? e.lukey earlier thought we should get dcops signoff, but I see the codfw tickets are now closed [18:57:20] Emperor: yes they should be [18:57:32] given those tickets are closed [19:15:57] cool. I'll try and get some code review tomorrow, then I can get them in service on Monday. [19:16:10] thanks again :) [19:19:27] of course