[09:14:20] thank you both [09:25:06] Thanks marostegui [09:26:43] Amir1: are you on db1170's idrac? [09:26:59] not anymore [09:27:16] should I more than getsel to get logs? [09:27:56] I cannot login :( [09:29:23] I think I logged in again [09:29:26] Amir1: let's pause the reboots for whichever section db1170 is in [09:29:59] yup, I don't think I'm gonna reboot anymore in Friday in case any more host does not come back (that was the plan) [09:30:12] finally worked for me [09:30:34] I still don't know how to exist ipmi console [09:33:08] Amir1: I might have been able to get db1170 back [09:33:09] one sec [09:33:30] oh fancy stuff [09:33:41] what kind of wizardery is that [09:34:40] [09:34:32] marostegui@db1170:~$ [09:34:45] wow [09:34:51] Starting mariadb [09:35:06] I did ipmitool -I lanplus -H "$HOST.mgmt.$DC.wmnet" -U root -E chassis power cycle [09:35:16] I did a hard reset [09:36:20] Can you write the exact command so I can document it? I wasn't aware of it :D [09:36:23] How is db1170 7h behind?? [09:36:40] serveraction hardreset [09:36:48] marostegui: the script hit the issue when I was asleep and didn't continue afterwards [09:36:54] ah [09:36:58] I thought it just happened [09:36:59] Cool [09:36:59] which is good. I'm happy about that. [09:37:16] Can you take care of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/915712 and repooling once it has caught up? [09:37:22] si [09:37:25] gracias [11:19:43] marostegui: given a ticket is filed, ok for me to disable the ipmiseld alert from firing for, e.g. 5 days? [11:21:09] yeah thanks jynus [11:21:52] I saw it yesterday, but only once and it recovered, so I didn't give it much importance [11:22:12] but now I want to avoid spaming the other channel so I don't miss other alerts [11:26:00] I think it's done: https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed&q=team%3Ddata-persistence&q=%40receiver%3Ddata-persistence-irc-feed [16:57:26] PROBLEM - MariaDB sustained replica lag on s8 on db2181 is CRITICAL: 67.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2181&var-port=9104 [16:57:26] PROBLEM - MariaDB sustained replica lag on s8 on db2167 is CRITICAL: 86.75 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2167&var-port=13318 [16:57:47] See -sre and https://phabricator.wikimedia.org/T336072 regarding s8 [16:57:50] PROBLEM - MariaDB sustained replica lag on s8 on db2164 is CRITICAL: 4.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2164&var-port=9104 [16:57:52] PROBLEM - MariaDB sustained replica lag on s8 on db2161 is CRITICAL: 4.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2161&var-port=9104 [16:57:52] PROBLEM - MariaDB sustained replica lag on s8 on db2162 is CRITICAL: 4.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2162&var-port=9104 [16:58:44] RECOVERY - MariaDB sustained replica lag on s8 on db2181 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2181&var-port=9104 [16:59:08] RECOVERY - MariaDB sustained replica lag on s8 on db2164 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2164&var-port=9104 [16:59:10] RECOVERY - MariaDB sustained replica lag on s8 on db2161 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2161&var-port=9104 [16:59:10] RECOVERY - MariaDB sustained replica lag on s8 on db2162 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2162&var-port=9104 [16:59:56] RECOVERY - MariaDB sustained replica lag on s8 on db2167 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2167&var-port=13318