[08:15:58] all ok to start using cumin1001? [08:26:18] $ uptime [08:26:18] 08:26:14 up 23 min, 5 users, load average: 0.12, 0.07, 0.02 [08:26:19] looks so! [08:26:21] moritzm: ^ [08:32:47] yeah, all fine to resume! [08:33:01] \o/ thanks [08:33:51] cool [08:48:33] I am going to switch masters in es1, es2 and es3 in codfw, it is a noop as they are standalone [10:08:09] All RO es in codfw rebooted [10:17:02] jynus: just fyi, I have restarted es1,es2 and es3 hosts in codfw, there were no backups running [10:17:21] yeah, those only run every 5 years 0:-) [10:17:39] I am going to go for es4 and es5 now [10:17:44] But I won't do all the hosts today [10:17:57] is there anything running at the moment? (I don't see anything, but just double checking) [10:18:01] consider doing the backup ones first [10:18:19] so es2025 and es2022 [10:18:22] they run on tuesday+ [10:18:56] I will do those two now then [10:18:58] Thanks [10:19:19] es2022, es2025, es1025, es1022 [10:19:27] cool [10:19:27] thanks [10:19:52] sadly, I have to say tuesday+ because last one they took 27 hours to finish [10:20:02] but they are finished now, right? [10:20:16] yep: http://localhost:8000/dbbackups/jobs/?search=es [10:20:50] cool that matches what I saw on dbbackups :) [10:22:54] thank you a lot for communicating! [10:30:32] (MysqlReplicationLag) firing: MySQL instance es1022:9104 has too large replication lag (6m 31s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=es1022&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [10:30:42] ^ me [10:35:32] (MysqlReplicationLag) firing: (2) MySQL instance es1022:9104 has too large replication lag (7m 37s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [10:40:32] (MysqlReplicationLag) resolved: (2) MySQL instance es1022:9104 has too large replication lag (7m 37s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [11:07:03] bacula backups are finally in a good, expected state (I don't expect more alerts - for the eqiad backups) [11:09:38] db1124 has been complaining (alerts are disabled) about its read only status for a week. Is the check bad or is it the config? [11:09:57] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=db1124&service=MariaDB+read+only+test-s4 [11:20:30] ah that's when I installed the new package [11:20:32] I will get that fixed [11:20:35] it is the testing cluster [12:38:42] I am going to switch masters in es1, es2 and es3 in eqiad, it is a noop as they are standalone [13:15:10] marostegui: I'm back from another joyful episode with my dentist, I'm planning to run schema changes on s1, s4, s6 and s7. Are you doing anything on them? [13:15:33] Amir1: Nope, but let's not depool more than one host per section, see -sre-private [13:15:38] We have two of them depooled at the moment [13:17:29] I would like to get those two repooled if they are not running anything [13:18:32] oh interesting, I hold on for anything on s1 or s4 then [13:18:43] possibly s7 [13:18:57] yes [13:19:08] so we have that host in s7 depooled, db1136, are you doing anything there? [13:28:08] not me [13:28:19] I haven't run anything on cumin yet [13:29:26] It's old s7 master [13:29:26] sigh [13:29:36] I repool it [13:30:50] repooling [13:31:26] marostegui: the old master of s1 is still not pooled, I can repool it now [13:31:36] ugh, no, s3 master [13:32:02] and a host from s7 [13:32:19] db1136, I'm repooling that back [13:32:44] only old s3 master left [13:35:38] repooling old s3 master back as well, just to be safe