[04:41:06] 10Data-Persistence-Backup: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10Marostegui) [04:42:11] 10Data-Persistence-Backup: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10Marostegui) p:05Triage→03Unbreak! Setting this to unbreak now as I believe this host isn't a test but a real used one. [05:12:35] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2113.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/2021060... [05:38:22] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2113.codfw.wmnet'] ` and were **ALL** successful. [05:38:47] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) [05:42:43] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) Candidate master in codfw done (db2113) - checking its tables before proceeding with the master. @jcrespo this can probably be pushed: https://gerrit.wikimedia.org/r/693142 anytim... [05:46:04] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) [05:46:40] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) s6 is fully done - waiting a few days before proceeding with the next section to make sure no optimizer changes are present (I couldn't find anything related on... [05:59:20] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [05:59:57] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) Replication positions: {P16309} [06:00:34] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [06:01:34] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) ` root@dbstore1007:/srv# sudo lvextend -L+1100G /dev/mapper/tank-data && sudo xfs_growfs /srv Size of logical volume tank/data changed from <7.5... [06:01:49] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) Transfer between dbstore1004 and dbstore1007 started [06:20:20] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) @razzi dbstore1007 still needs to get the proper FW rules (https://gerrit.wikimedia.org/r/c/operations/homer/public/+/697704), as it cannot reach... [06:34:56] 10Blocked-on-schema-change, 10DBA: Schema change for adding oi_timestamp on oldimage table - https://phabricator.wikimedia.org/T284221 (10Marostegui) [08:31:38] 10Data-Persistence-Backup: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10jcrespo) p:05Unbreak!→03High Fixed. This was because of temporary workaround for T283995. Happily, this is why we have redundancy for backups, too, on eqiad- backups continued there unaffected. I will see wh... [08:45:38] 10Data-Persistence-Backup, 10database-backups: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10jcrespo) [08:46:47] 10Data-Persistence-Backup, 10database-backups: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10jcrespo) I am waiting for a succesful run of failed backups before closing. [09:14:41] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10Parsoid (Tracking), 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Run finished at 2021-06-05T14:30. Running optimize over all pc* tables now. [09:37:07] hello, db1125 seems to have a normal puppet role (mariadb::core) but in Netbox its state is planned and not staged/active [09:37:34] hmm. did i miss a step [09:37:40] yep, i did. [09:37:41] 10DBA, 10Data-Services: Prepare and check storage layer for banwikisource - https://phabricator.wikimedia.org/T284390 (10LSobanski) p:05Triage→03Medium Thanks, let us know when the database is created, so we can sanitize it. [09:38:04] :) [09:38:33] oh. actually i missed more than one. i forgot about getting the host relabelled, too. [09:38:47] set the netbox state to 'staged' for now [09:40:13] ack, thx [09:41:38] ah hah. the relabelling the first time was cancelled \o/ https://phabricator.wikimedia.org/T283300 [09:46:52] you are welcome kormat [09:47:08] marostegui: :) [09:47:19] 10DBA: Re-image (rename) dbstore1006 into db1125 - https://phabricator.wikimedia.org/T284128 (10Kormat) Re-labelling not necessary, as it wasn't re-labelled away from db1125 in the first place: T283300 Machine state set to 'active' in netbox. [09:48:46] volans: thanks for catching that :) [09:49:19] it was a netbox report to catch that :) [09:49:23] yw [09:49:58] 10DBA, 10Orchestrator: orchestrator: Upgrade to v3.2.4 (ish) - https://phabricator.wikimedia.org/T275784 (10LSobanski) >>! In T275784#7133984, @Marostegui wrote: > https://github.com/openark/orchestrator/releases/tag/v3.2.5 > > This version does include our patch. > I think we've never upgraded orchestrator s... [09:55:14] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10Kormat) [10:01:13] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [10:09:38] 10Data-Persistence-Backup, 10database-backups, 10Patch-For-Review, 10cloud-services-team (Kanban): Use mariabackup instead of xtrabackup for galera backups? (Or possibly for all maria backups?) - https://phabricator.wikimedia.org/T284157 (10jcrespo) >>! In T284157#7135507, @Andrew wrote: > So that leaves... [10:10:47] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [10:21:37] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts: ` ['db1157.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20210607102... [10:30:01] https://twitter.com/ShlomiNoach/status/1401785894064041984 [10:46:05] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1157.eqiad.wmnet'] ` and were **ALL** successful. [11:56:14] 10DBA, 10Patch-For-Review: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 (10Kormat) db1157 upgraded to buster. Running mysqlcheck now. [12:27:56] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) This host needs ipv6 dns to be deleted from netbox [12:33:02] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) >>! In T283125#7138250, @Marostegui wrote: > This host needs ipv6 dns to be deleted from netbox Done [12:33:49] Hello, not sure if I should be asking it here or in -analytics, but replication between prod wikimaniawiki and dbstore1004 looks to be broken. See paste: https://www.irccloud.com/pastebin/S5u9dqrS/ [12:34:03] urbanecm: yes, dbstore1004 was stopped to clone dbstore1007 [12:34:25] urbanecm: it is now catching up, as the transfer just finished [12:34:35] ah, ok :). So not an actual issue. Thanks marostegui [12:34:41] :) [13:20:39] I will write on a relevant ticket the status / things I will ask you to keep an eye for at the end of my day as notes [13:27:15] thanks jynus [13:27:16] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) Host added to tendril and zarcillo. Set to active on Netbox [13:28:40] jynus kormat did any of you depooled db1157 or was it me? XD [13:28:45] I cannot remember if I did it [13:29:00] if it is dbctl, it won't me, as I barely know how to use it [13:29:08] XDDD [13:29:21] I only read the manual when there is an outage [13:29:24] 10:28 kormat: reimaging db1157 T283131 [13:29:25] marostegui: it's me. i reimaged it [13:29:26] T283131: Upgrade s3 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283131 [13:29:31] yep, thanks kormat! [13:29:42] np :) [13:32:23] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) @razzi transfer has finished and I have configured replication. As soon as you push the new firewall rules, it will start catching up automatically. [14:14:30] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 3.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [14:18:08] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [14:19:55] 10DBA, 10Patch-For-Review: Upgrade s5 to Debian Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T283235 (10Marostegui) db2113 tables checked, all clean. [14:20:43] jynus: on s5 codfw, the candidate master is done, only pending the master whenever you tell me it is ok to proceed (it is fine to wait a week) [14:47:18] you can proceed now if you want- there is no hard requirement [14:48:23] I am running a backup now, so I cannot do it just now, but will do when finished [14:48:58] no rush, I am not planning on reimaging the master now, but maybe tomorrow if you give me green light [14:49:07] ok then [14:49:23] can you rebase https://gerrit.wikimedia.org/r/c/operations/puppet/+/693142 ? [14:49:32] I can push it tomorrow if you like (as you are off) [14:51:01] sure [14:51:34] note I may or may not push also https://gerrit.wikimedia.org/r/c/operations/puppet/+/698506 [14:51:42] but at least leave it ready [14:51:46] Sure [14:51:58] I can push it if needed during the week [14:52:18] I will see as everything gets in place [14:53:14] sure [15:13:52] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [15:21:08] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [15:21:19] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10razzi) @Marostegui new firewall rules are pushed, thanks for the update on your end. [15:23:41] razzi: does puppet need to run somewhere? [15:24:17] marostegui: not that I know of, I ran homer on cumin to apply the firewall change [15:24:53] razzi: dbstore1007 isn't able to reach db1122.eqiad.wmnet 3306 [15:27:07] ack marostegui, not sure why that would be though [15:29:40] razzi: maybe elukey remembers, from the other hsots [15:29:42] hosts [15:31:29] can I be of any help? [15:32:49] volans: have you ever? [15:33:52] I'm always successful in being disruptive ;) [15:34:01] :D [15:34:48] volans: basically, after https://gerrit.wikimedia.org/r/697704 is there anything else needed to make the fw changes effective? [15:35:54] razzi: on which devices did you run homer? [15:38:41] 10DBA, 10Data-Services: Prepare and check storage layer for dagwiki - https://phabricator.wikimedia.org/T284456 (10LSobanski) p:05Triage→03Medium Thanks, let us know when the database is created, so we can sanitize it. [15:39:16] 10Blocked-on-schema-change, 10DBA: Rename name_title index on page to page_name_title - https://phabricator.wikimedia.org/T284375 (10LSobanski) p:05Triage→03Medium [15:41:53] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) As mentioned on IRC, there might be something else needed as it cannot reach its master yet: ` root@dbstore1007:~# telnet db1122.eqiad.wmnet 3306... [15:50:47] volans: Homer run completed successfully on 2 devices: ['cr1-eqiad.wikimedia.org', 'cr2-eqiad.wikimedia.org'] [15:54:31] razzi: ack, and you want to connect in which direction? [15:55:10] because that's the opposite direction as what manuel stated above, also port 3306 is not in that list AFAICT [15:57:10] Also, is dbstore1007 in the same vlan as dbstore1004? [15:57:56] analytics1-d-eqiad [15:57:59] https://netbox.wikimedia.org/dcim/interfaces/18116/ [15:58:36] while dbstore1004 is in private1-b-eqiad [15:58:36] https://netbox.wikimedia.org/dcim/interfaces/16412/ [15:58:45] so I guess that's an issue too [15:58:46] so no :) [15:58:54] according to netbox [15:59:00] Let me paste that on the task too [15:59:44] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) @razzi some more things to double check: ` [17:57:10] <@marostegui> Also, is dbstore1007 in the same vlan as dbstore1004? [17:57:56] an... [16:09:31] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10elukey) I am not sure if it is going to be a nightmare or not, but to avoid wiping the copy that Manuel did between 1004 and 1007 today with a reimage we coul... [16:26:42] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) I spoke with the HP engineer last week, he said that it is true that CPU1 is bad but it might also be the pin on the main board so he will be s... [16:36:12] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) Hey, Papaul, there is no rush (although it is a bit anoying from HP side), but do you think the initial ETA of today will be delayed? Please a... [16:37:31] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Marostegui) We can reimage without wiping /srv if needed too [16:39:56] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) @jcrespo i have no ETA for you for now since HP already send the case to the dispatch (third party UniSys) so someone should contact me. it is a... [16:43:07] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) @Papaul, thanks, that is already useful info for handling the db status, and all I needed! Please contact @Marostegui or @kormat if there are n... [16:43:52] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10Papaul) okay [16:53:29] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10elukey) The reimage is surely good, but I think that we'd need to fix the ips manually anyway in netbox first. @Volans do you have suggestions about what's be... [16:58:55] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [17:01:23] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [17:02:39] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [17:06:19] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 3.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [17:15:11] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [17:20:11] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [17:22:44] 10Data-Persistence-Backup, 10database-backups, 10Patch-For-Review, 10cloud-services-team (Kanban): Use mariabackup instead of xtrabackup for galera backups? (Or possibly for all maria backups?) - https://phabricator.wikimedia.org/T284157 (10jcrespo) Thanks for your time today, please continue with the wor... [17:27:36] 10Data-Persistence-Backup, 10Data-Services, 10bacula, 10database-backups, 10cloud-services-team (Kanban): migrate clouddb backups (openstack) from the old mysqldump system to the new wmfbackups (mydumper) - https://phabricator.wikimedia.org/T284483 (10jcrespo) [17:27:47] 10Data-Persistence-Backup, 10Data-Services, 10bacula, 10database-backups, 10cloud-services-team (Kanban): migrate clouddb backups (openstack) from the old mysqldump system to the new wmfbackups (mydumper) - https://phabricator.wikimedia.org/T284483 (10jcrespo) p:05Triage→03Medium [17:29:05] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [17:29:34] 10Data-Persistence-Backup, 10Data-Services, 10bacula, 10database-backups, 10cloud-services-team (Kanban): migrate clouddb backups (openstack) from the old mysqldump system to the new wmfbackups (mydumper) - https://phabricator.wikimedia.org/T284483 (10jcrespo) We will likely start working on this next qu... [17:34:43] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [17:36:22] 10DBA, 10Data-Persistence-Backup, 10ops-codfw: db2100 rebooted, mysqld alerted after to say it hadn't started - https://phabricator.wikimedia.org/T283995 (10jcrespo) @Papaul I've shutdown this host and downtime'd it until the 16th (when I am back) so it can be serviced at anytime without requiring coordinati... [17:36:58] 10DBA, 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: dbstore1004 85% disk space used. - https://phabricator.wikimedia.org/T283125 (10Volans) @elukey what do you need to change, just the vlan hence the IP? Ping me tomorrow and we can do it together. [17:40:05] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [17:52:50] 10Data-Persistence-Backup, 10database-backups, 10Patch-For-Review: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10jcrespo) So this is the current state of things: * dbprov2003 has [[ https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=12&orgId=1&var-server=dbprov2003... [17:53:17] 10Data-Persistence-Backup, 10database-backups, 10Patch-For-Review: dbprov2003 full disk space - https://phabricator.wikimedia.org/T284415 (10jcrespo) CC @Kormat ^ [18:04:47] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [18:15:29] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [18:42:17] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [18:46:07] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 65.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [21:09:29] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [21:27:15] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 3.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [21:32:35] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [21:55:51] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 5.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [22:03:05] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [22:12:05] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [22:55:23] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [23:00:45] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [23:11:31] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [23:13:19] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104