[03:50:50] PROBLEM - MariaDB sustained replica lag on s2 on db1182 is CRITICAL: 4.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [03:54:50] PROBLEM - MariaDB sustained replica lag on s2 on db2175 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104 [03:57:50] RECOVERY - MariaDB sustained replica lag on s2 on db1182 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [04:00:50] RECOVERY - MariaDB sustained replica lag on s2 on db2175 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2175&var-port=9104 [11:00:54] marostegui: arnaudb the refactor of LoadMonitor has landed, gonna be deployed next week, it'll change how dbs are picked up: https://phabricator.wikimedia.org/T314020 [11:01:13] :o [11:01:32] that means we'll be able to repool nodes faster? [11:03:59] Amir1: I am out next week, so good luck XD [11:06:12] arnaudb: nah, it supposed to take care of a replica getting slow but the side effects might including that our weight might need adjusting [11:06:17] marostegui: :((((((( [12:01:01] PROBLEM - MariaDB sustained replica lag on s2 on db1246 is CRITICAL: 3.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1246&var-port=9104 [12:01:59] RECOVERY - MariaDB sustained replica lag on s2 on db1246 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1246&var-port=9104 [13:29:10] urandom: o/ - I see some warnings related to Cassandra AQS TLS certs expiring (in 50+ days, nothing urgent). Could it be a good occasion to move them to PKI? (https://phabricator.wikimedia.org/T352647) [13:36:01] elukey: it could, yeah [13:36:19] though we probably don't need to tie it to that event [13:37:12] I've been waiting for your return actually; I wanted you around before tackling that, but it's a KR in-waiting [13:37:57] if you're fully back, and have your head above water, I can add it to next quarter :) [13:40:21] definitely yes! [13:45:32] the only thing that I am still not 100% sure is how to set the rigth TLS bundle for clients connecting to AQS Cassandra [13:45:40] since I assume that AQS 2.0 runs on k8s right? [13:48:45] because in theory we'd need to create a bundle with: Root PKI cert and Root ca-manager (IIRC the name for the custom root in cassandra) certs [13:49:08] and then deploy it in k8s, force the clients to use it [13:49:22] and after that, we will be able to freely proceed [13:49:47] in puppet we can create the bundle easily, in k8s it may need to be a one-off blurb to deploy [13:50:03] (to replace it after the migration with the standard pki bundle that we use everywhere) [13:50:11] elukey: cumin [13:50:39] marostegui: don't spicerack your afternoon please [13:51:30] should I maybe go ahead and read a cookbook? [14:07:17] elukey: yes, aqs 2.0 is on k8s (and I remember now why I wanted to wait until you got back 😄) [14:07:30] elukey: also, welcome back btw [14:08:24] thanks! <3 [16:58:09] PROBLEM - MariaDB sustained replica lag on s2 on db1222 is CRITICAL: 4.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1222&var-port=9104 [16:58:09] PROBLEM - MariaDB sustained replica lag on s2 on db1156 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1156&var-port=9104 [16:58:09] PROBLEM - MariaDB sustained replica lag on s2 on db1155 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [16:58:11] PROBLEM - MariaDB sustained replica lag on s2 on db1182 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [16:58:23] PROBLEM - MariaDB sustained replica lag on s2 on db2107 is CRITICAL: 4.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2107&var-port=9104 [16:59:11] RECOVERY - MariaDB sustained replica lag on s2 on db1156 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1156&var-port=9104 [16:59:11] RECOVERY - MariaDB sustained replica lag on s2 on db1155 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1155&var-port=13312 [16:59:13] RECOVERY - MariaDB sustained replica lag on s2 on db1182 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1182&var-port=9104 [16:59:23] RECOVERY - MariaDB sustained replica lag on s2 on db2107 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2107&var-port=9104 [17:00:11] RECOVERY - MariaDB sustained replica lag on s2 on db1222 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1222&var-port=9104