[00:01:46] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [00:02:46] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 3.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [00:09:46] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 26.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [00:24:48] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [04:56:56] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 25.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [04:58:05] I am going to start s8 switchover [04:58:56] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [05:51:56] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 103.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [05:57:56] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 2.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [07:43:28] These replags are weird [07:45:32] those are dumps [07:57:28] I asked arnaud to have those into account on the ticket [08:56:59] I will start repooling es1025 [09:41:38] jynus: I was looking at zarcillo to refresh my memory which data is stored and I noticed something that might be a data error (or a wrong encoding on my client side). What do you see in the ipv6 column for: "select * from servers where fqdn = 'db1216.eqiad.wmnet';" ? [09:42:18] I see it for few others, all the ones that have ipvs not NULL, fwiw [09:46:09] select inet6_ntoa(ipv6) from servers where fqdn = 'db1216.eqiad.wmnet'; [09:47:12] https://dev.mysql.com/doc/refman/8.4/en/miscellaneous-functions.html#function_inet6-ntoa [09:47:25] ahh ok they're encoded, thx [09:48:02] there is no ip address type on mysql, unlike postgres [09:48:49] yep yep, the v4 I saw it as integer and so didn't bother me [09:49:20] "Because numeric-format IPv6 addresses require more bytes than the largest integer type, the representation returned by this function has the VARBINARY data type: VARBINARY(16)" [10:25:02] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 47 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [10:29:02] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [10:49:06] volans: wanna still do a quick meetup this week? [10:51:09] jynus: I'm happy to know what are your ideas/requests with regards to automation and DBs [10:52:34] oh, I don't think I have nothing to provide in that front that the dbas cannot, it was more of a refreshed of database backups and maybe a few things? [10:52:46] *refresher [11:07:41] I am going to switch s6 codfw master [11:18:07] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 23.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:18:46] ^ was that reported in the ticket finally arnaudb? [11:19:08] For context, yesterday's conversation: [09:57:28] I asked arnaud to have those into account on the ticket [11:20:07] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [11:29:01] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 89.2 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [11:30:00] ^ a full scan for 74M nice... [11:30:02] I am creating a task for it [11:33:44] I wonder why we have 2 candidate masters in s6... [11:34:01] Ah I know why, I will fix it [11:35:00] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 1 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [12:10:35] sorry I was afk eating [12:10:46] how was the baguette? [12:10:53] x) [12:12:43] checking the ticket about db1206 → https://phabricator.wikimedia.org/T368098 it seems that it's been reported indeed, should I add something more? [12:13:25] I'd probably add that it just caused lag again in production [12:15:58] done! [12:16:20] thank you [12:38:11] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 26.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [12:41:11] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:28:13] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:36:14] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:55:16] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 53.6 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [13:56:35] arnaudb: hey we're gonna postpone today's switch maintenance until next Wed, July 10th [13:56:42] hope that's not knocking you around too much [14:00:31] topranks: ack, seen, rescheduled :) everything ok on my end [14:00:52] arnaudb: thanks for your felxibility! [14:01:16] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 3.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [14:01:21] as long as I don't need to perform limbo we should be fine [14:07:43] I can’t promise anything but we’ll do our best :P [14:13:01] * arnaudb fears [14:48:58] * volans with the clinic duty hat... what should be the priority for T368898 ? [14:51:37] i'd say medium [14:52:24] thx [14:58:06] volans: i think it's just a task to track the last incident but there's hardly anything we can do apart from trying to find what the cause was, which I think it's almost impossible other than just some theories. Ideally we should make an incident report I guess but I don't see much willingness to do so anyways, so probably we will end up closing that task [14:58:30] ack, thanks [16:17:21] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 102 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:17:30] Heya data persistence opsen. Question, who in your team is best to review technical quotations for the upcoming thanos-be ordering of a single host in both codfw and eqiad? [16:17:58] I have the quotes ready and will escalate to sobanski for mgmt review since kofori is out but likely need an IC review as well [16:18:12] and since its a stand in manager approval i figured may as well cc the IC onto the reviews to save time [16:27:21] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:39:21] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 20.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [16:41:21] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [17:21:21] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 11 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [17:22:21] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:40:25] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 79 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [18:46:25] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [19:59:25] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 26 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [20:02:25] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [20:43:27] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 11.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [20:46:27] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [21:47:29] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 15.4 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [21:49:29] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:13:29] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 11 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:16:29] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [22:27:29] PROBLEM - MariaDB sustained replica lag on s4 on db1248 is CRITICAL: 10 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [22:29:29] RECOVERY - MariaDB sustained replica lag on s4 on db1248 is OK: (C)10 ge (W)5 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1248&var-port=9104 [23:30:29] PROBLEM - MariaDB sustained replica lag on s1 on db1206 is CRITICAL: 47.8 ge 10 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104 [23:34:29] RECOVERY - MariaDB sustained replica lag on s1 on db1206 is OK: (C)10 ge (W)5 ge 2.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1206&var-port=9104