[01:08:58] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [01:14:08] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [02:44:43] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Legoktm) Hi, the GitLab backup cron is failing: ` Cron /usr/bin/gitlab-backup create CRON=1 STRATEGY=copy GZIP_RSYNCABLE=yes SKIP=bu... [03:24:51] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Sergey.Trofimovsky.SF) The installation has not been completed yet, the server has been brought down intentionally, backup failures are expected at the... [03:52:46] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [03:56:14] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [04:20:14] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Legoktm) >>! In T274463#7118272, @Sergey.Trofimovsky.SF wrote: > The installation has not been completed yet, the server has been brought down intentio... [04:24:07] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Sergey.Trofimovsky.SF) I believe this is going to be finished today, you should not get more of these, sorry about that. [05:42:12] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [05:50:56] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Marostegui) [05:50:56] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [05:51:06] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Marostegui) p:05Triage→03Medium [05:51:11] kormat: could you take? ^ [06:03:05] 10Data-Persistence-Backup, 10Wikimedia-Mailing-lists: The Great Clean Up of Mailman2 - https://phabricator.wikimedia.org/T282303 (10jcrespo) It should have ran already, can you check? ` Termination: Restore OK ` [06:40:25] 10DBA, 10SRE, 10ops-codfw: Degraded RAID on db2107 - https://phabricator.wikimedia.org/T282072 (10Marostegui) 05Open→03Resolved RAID back to optimal ` root@db2107:~# megacli -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level... [06:46:17] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [06:47:47] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) s5 eqiad [x] dbstore1003 [] db1161 [] db1154 [x] db1150 [x] db1145 [x] db1144 [x] db1130 [x] db1113 [x] db1110 [] db1100 [x] db1096 [] clouddb1021 [] clouddb1020 []... [06:48:11] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) s5 eqiad [x] dbstore1003 [] db1161 [] db1154 [x] db1150 [x] db1145 [x] db1144 [x] db1130 [x] db1113 [x] db1110 [] db1100 [x] db1096 [] clouddb1021 [] clouddb1020 []... [06:48:14] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) s5 eqiad [x] dbstore1003 [] db1161 [] db1154 [x] db1150 [x] db1145 [x] db1144 [x] db1130 [x] db1113 [x] db1110 [] db1100 [x] db1096 [] clouddb1021 [] clouddb1020 []... [06:50:47] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [06:53:16] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) [06:53:21] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) [06:53:24] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) [07:15:13] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) [07:15:19] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) [07:15:23] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) [07:35:31] 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) I am moving s2 from db2098 to db2097, so I can upgrade db2098 to buster, and setup there s7 and s8. [07:35:40] 10Data-Persistence-Backup, 10Goal: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) a:03jcrespo [07:58:17] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [08:00:05] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [08:06:48] 10DBA, 10SRE, 10ops-codfw, 10serviceops: codfw: Relocate servers in 10G racks - https://phabricator.wikimedia.org/T281135 (10Marostegui) Everything is done from either dbs and backup hosts side of things. Removing DBA tag [08:20:57] marostegui: for you? i guess..... :* [08:21:06] :*** [08:21:07] thanks [08:21:32] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Kormat) a:03Kormat [08:47:05] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) [08:47:08] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) [08:47:10] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) [08:56:23] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) s7 eqiad [x] dbstore1003 [x] db1181 [x] db1174 [x] db1170 [x] db1158 [x] db1155 [x] db1136 [x] db1127 [x] db1116 [x] db1101 [x] db1098 [x] clouddb1021 [x] clouddb101... [08:56:25] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) s7 eqiad [x] dbstore1003 [x] db1181 [x] db1174 [x] db1170 [x] db1158 [x] db1155 [x] db1136 [x] db1127 [x] db1116 [x] db1101 [x] db1098 [x] clouddb1021 [x] clouddb101... [08:56:27] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) s7 eqiad [x] dbstore1003 [x] db1181 [x] db1174 [x] db1170 [x] db1158 [x] db1155 [x] db1136 [x] db1127 [x] db1116 [x] db1101 [x] db1098 [x] clouddb1021 [x] clouddb101... [09:07:20] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of user_touched - https://phabricator.wikimedia.org/T282373 (10Marostegui) [09:07:39] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of page_touched - https://phabricator.wikimedia.org/T282372 (10Marostegui) [09:08:07] 10Blocked-on-schema-change, 10DBA: Schema change for dropping default of ar_timestamp - https://phabricator.wikimedia.org/T282371 (10Marostegui) [09:30:33] 10Blocked-on-schema-change, 10DBA: Schema change for renaming page_timestamp index on revision table to rev_page_timestamp - https://phabricator.wikimedia.org/T283499 (10Kormat) a:05Kormat→03None [09:30:47] 10Blocked-on-schema-change, 10DBA: Schema change for making cuc_id in cu_changes unsigned - https://phabricator.wikimedia.org/T283093 (10Kormat) a:05Kormat→03None [09:42:15] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10Parsoid (Tracking), 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Optimize of pc1007 (and replicas) finished. Disk space usage went from 3.91TB to 2.3TB. {F... [09:47:49] marostegui: mariadb upgraded on pc1007 as requested [10:15:19] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [10:17:07] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [10:19:22] kormat: thanks! [10:33:44] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Kormat) db2082 is db2094:s8's master: ` root@db2082.codfw.wmnet[(none)]> stop slave; Query OK, 0 rows affected (0.036 sec) root@db2082.codfw.wmnet[(none)]> show master status; +-------------------+-----------... [10:35:34] I saw some backup sources under maintenance- wanted to check if there was any blocker for me to proceed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/696027 ? [10:35:56] (e.g. if someone was doing reboots or replication changes, etc.? [10:36:14] jynus: not me, but maybe it's a schema change marostegui is running [10:36:21] ah, could be [10:36:34] will check when to proceed [10:36:42] jynus: you are good to go! [10:36:42] as otherwise the schema may get undone [10:36:59] marostegui, which section were you altering? [10:37:12] I worry that by recovery a backup I may undo your change [10:37:24] jynus: Many of them, but there are no schema changes running now on any backup source [10:37:43] ok, will tell you about the new sections after I set them up [10:37:51] so you can check they are as expected [10:37:59] if you set up a new section it will get the new schema change [10:38:01] as the backups are from yesterday [10:38:10] Ah, then it needs checking yep [10:38:14] let me know once you've got it [10:38:18] I cced you on the change [10:38:20] is it eqiad? [10:38:25] but I will ping you when done [10:38:26] codfw [10:38:33] s2 I see? [10:38:37] yes [10:38:43] ok, I will check once done [10:38:43] but new s7 & s8 [10:38:50] altough I haven't touched those yer [10:38:58] *but alsoe [10:39:08] no worries, just let me know once you are done and I can check [10:39:12] thanks [10:40:23] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [10:41:02] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Kormat) Running: `sudo transfer.py --type file --no-compress --no-encrypt --no-checksum db2082.codfw.wmnet:/srv/sqldata db2094.codfw.wmnet:/srv/sqldata.s8` [10:47:33] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [11:05:18] 10DBA, 10Beta-Cluster-Infrastructure, 10Wikimedia-Rdbms, 10Epic, 10Sustainability (Incident Followup): Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255 (10hashar) For CI that got done by setting TRADITIONAL in DevelopmentSettings.php https://gerrit.wikimedia.org/r/c/mediawik... [11:29:36] 10Data-Persistence-Backup, 10Wikimedia-Mailing-lists: The Great Clean Up of Mailman2 - https://phabricator.wikimedia.org/T282303 (10Ladsgroup) ` root@lists1001:/var/tmp/bacula-restores/var/lib/mailman/archives/private/cloud-announce.mbox# cmp cloud-announce.mbox /var/lib/mailman/archives/private/cloud.mbox/clo... [11:33:47] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [11:38:51] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [11:41:30] 10Data-Persistence-Backup: Backup alert email notification - https://phabricator.wikimedia.org/T283017 (10LSobanski) This turned into a much bigger scope than my original intention but it's a good discussion to have. I am now thinking that my ask was not well defined and needs to be adjusted, especially given my... [11:41:58] 10Data-Persistence-Backup: Backup alert proactive notification - https://phabricator.wikimedia.org/T283017 (10LSobanski) [13:11:09] PROBLEM - MariaDB sustained replica lag on pc2007 is CRITICAL: 364.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [13:12:02] 10Data-Persistence-Backup, 10Wikimedia-Mailing-lists: The Great Clean Up of Mailman2 - https://phabricator.wikimedia.org/T282303 (10jcrespo) The backups ran yesterday, could have it changed since then? Is there a human readable way to see what changed? [13:17:01] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [13:17:58] 10Data-Persistence-Backup: Backup alert proactive notification - https://phabricator.wikimedia.org/T283017 (10jcrespo) As a "complicated" topic- do you mind talking in our 1:1 about the challenges of it- not just doing as requested (I don't mind spamming you with notifications if you really want them, :-D), but... [13:23:41] 10DBA, 10MediaWiki-Parser, 10Performance-Team, 10Parsoid (Tracking), 10Patch-For-Review: purgeParserCache.php should not take over 24 hours for its daily run - https://phabricator.wikimedia.org/T282761 (10Kormat) Current status: - pc1 is repooled and back in service. - pc1010 is now in pc2, and replicati... [13:25:09] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [13:35:09] PROBLEM - MariaDB sustained replica lag on pc2010 is CRITICAL: 486 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [13:38:55] ACKNOWLEDGEMENT - MariaDB sustained replica lag on pc2007 is CRITICAL: 850.4 ge 2 Marostegui checking https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [13:38:55] ACKNOWLEDGEMENT - MariaDB sustained replica lag on pc2010 is CRITICAL: 169.8 ge 2 Marostegui checking https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104 [13:40:06] 10Data-Persistence-Backup: Internal APT repository backup - https://phabricator.wikimedia.org/T276220 (10LSobanski) 05Open→03Resolved I think we're all ok with the current state, resolving. [13:49:11] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [13:50:37] marostegui: o/ :) [13:50:45] quick mysql q [13:50:46] https://airflow.apache.org/docs/apache-airflow/2.1.0/howto/set-up-database.html#setting-up-a-mysql-database [13:51:08] looking at the mysql instance we have on an-coord1001 [13:51:18] it uses utf8mb4 by default [13:51:26] but, we don't have explicit_defaults_for_timestamp=1 [13:51:31] would it be ok to set that? [13:51:39] ottomata: will get back to you later, I am busy with some other stuff, sorry [13:51:47] ok no hurry at all [13:52:03] marostegui: i'll ask on a ticket and ping you and you can answer at your leisure, thanks! [13:52:35] sure [13:58:19] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [14:07:25] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 3 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [14:11:05] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [14:30:50] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Kormat) Status: - Data copy from db2082 completed. - mysql_upgrade ran - redact_sanitarium.sh currently running. [14:43:55] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [14:49:23] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [15:27:43] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [15:36:55] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [15:47:25] 10Data-Persistence-Backup, 10Goal, 10Patch-For-Review: Upgrade pending stretch backup hosts to buster - https://phabricator.wikimedia.org/T280979 (10jcrespo) @marostegui db2097:s2 is the new backup source https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2097&var-port=13312 It has been rec... [15:53:21] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [16:04:19] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 4.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [16:11:09] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [16:17:47] 10DBA: db2094:3318 (sanitarium on codfw) needs recloning - https://phabricator.wikimedia.org/T283793 (10Kormat) redact_sanitarium.sh completed, and a quick check showed it had been successful. db2082 and db2094:s8 are now both up and catching up on replication. [16:21:25] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [16:33:21] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [16:35:03] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [17:11:01] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [17:14:25] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [17:52:17] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [17:54:01] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [18:31:45] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 3.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [18:40:25] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [20:45:27] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10brennen) Temporarily disabled backup cron on `gitlab1001`, just in case. [21:47:33] RECOVERY - MariaDB sustained replica lag on pc2007 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2007&var-port=9104 [21:52:15] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Sergey.Trofimovsky.SF) Looking at gitlab1001, I only see a root volume: `df -x tmpfs -x udev Filesystem 1K-blocks Used Available Use% Mounted o... [22:17:35] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Dzahn) We can create a second virtual hard disk and mount it into the existing file system of the VM. It requires a restart of the VM. How much space... [22:19:14] 10Data-Persistence-Backup, 10GitLab (Initialization), 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10Dzahn) >>! In T274463#7121027, @brennen wrote: > Temporarily disabled backup cron on `gitlab1001`, just in case. Alternatively you can add a MAILTO to... [23:05:25] RECOVERY - MariaDB sustained replica lag on pc2010 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2010&var-port=9104