[05:20:03] I am going to switchover es4 and es5 codfw master [06:01:32] (MysqlReplicationLag) firing: MySQL instance es2024:9104 has too large replication lag (8m 38s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=es2024&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [06:01:42] ^ me [06:06:32] (MysqlReplicationLag) resolved: MySQL instance es2024:9104 has too large replication lag (8m 45s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=es2024&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [08:37:30] codfw snapshot wrong_size 9 hours ago 1.5 TB -12.2 % The previous backup had a size of 1.7 TB, a change larger than 5.0%. [08:37:37] s4 -> templatelinks [08:37:41] \o/ [09:53:23] jynus: double checking, your side of things at https://phabricator.wikimedia.org/T316186 are done (knowing it is in your TO-DO is also good enough) [09:54:32] no idea, I didn't touch any db this time because I was told Amir1 was doing it [09:54:51] I think it should be done on backup sources? [09:55:00] not backups themselves [09:55:40] I did handle backup hosts at T316185, I updated our doc with that weeks ago [09:55:50] I did not touch any db* or mariadb host [09:57:25] Ah, I see [09:58:30] Amir1: yes, backup sources then but I guess it needs coordination to make sure we are not breaking in-progress backups [09:59:06] marostegui: the script avoids rebooting if there is a backup ongoing [09:59:15] Ah ok cool [09:59:40] stuff like https://gerrit.wikimedia.org/r/c/operations/software/+/830659/1/dbtools/auto_schema/rolling_restart.py#48 [09:59:53] and line 37 [10:00:42] right I see [10:18:02] I am checking past pages [10:18:21] I see some alerts from the 21 August about some dbs, but cannot remember what those are [10:19:00] "db1132 #page/MariaDB Replica Lag: s1 #page" 10.6/no impact, maybe? [10:19:43] it is from "Sun Aug 21 12:10:51 UTC 2022" [10:25:43] no idea what that was, that's a long time ago [10:27:27] that's ok, if it is not on incidents, probably nothing important [10:33:20] Amir1: I am going to start a pilot doc for SRE oncall handovers. You don't have to write on it, as it has not yet formalized or automated, but I think it would help. However, I would be glad if you helped me. [10:33:36] *been [10:34:23] jynus: we haven't had any page so far, I'd be happy to help if there is a page or something like that [10:34:31] yes, that's the point [11:13:21] handling "DISK WARNING - free space: /srv/bacula 5897240 MB (5% inode=99%):" on backup2003 [11:27:02] jynus: I looked at docs but I couldn't find docs. So asking here. How can I check history of backups per table (in s4 in my case) as db queries or some export format (csv, etc.) I have seen it on the backup mon ui but it's ui, I need to run some analysis on them [11:28:26] so custom report would need for now to run on the db, that should be the dbbackups db of m1, the tables are documented to some extent on MariaDB/Backups wikitech page [11:29:03] https://wikitech.wikimedia.org/wiki/MariaDB/Backups#Monitoring_and_metadata_gathering [11:29:09] Thanks. [11:29:44] maybe for common tasks we can add a "Download in csv format" functionality [11:30:28] for tables we won't be able to show them on public prometheus, but the other option would be to gather from that when an exporter is available [11:30:36] (not yet) [11:34:39] note there is only per-file stats, not per table, let me find because I may have some sql done in the past to aggregate that [11:38:20] I got the data I wanted, now I'm doing analysis. Thanks! [11:39:51] cool [11:45:35] I did "lvextend -L+15T /dev/mapper/hwraid-backups; resize2fs /dev/mapper/hwraid-backups" to fix the issue: https://grafana.wikimedia.org/goto/mY5J0uM4z?orgId=1 after checking the size matched expected backup size [12:04:47] FWIW (and please excuse the nitpicking) lvextend nowadays has a --resizefs flag too, to skip the second command [12:05:12] nice, godog, I didn't know! [12:05:50] yeah quite handy! can't remember when it got added [12:05:51] probably it is a leftover from when extending xfs filesystems [12:06:25] or does that also handle xfs_growfs ? [12:06:39] (I will look at docs, don't worry) [12:08:31] yeah it does the right thing depending on the filesystem [12:08:37] it claims to use fsadm, which should be smart enough for each filesystem [12:08:52] super useful top, godog [12:09:19] *tip [13:14:54] IME --resizefs _mostly_ works... [13:15:28] mostly? 👀 [13:19:12] I've had some pain with it in the past, but it was long enough ago that I think I'm now just at the "use with care" stage with it... [13:54:19] google docs is substituting ¯\_(ツ)_/¯ -> ¯\(ツ)/¯ :-( [16:11:22] that's very...opinionated