[05:49:20] Amir1: Are you done with s4? according to the database maintenance map yes, but as yesterday everything was stopped....just double checking [06:33:34] jynus: ok to reboot es2020? [08:39:35] marostegui: I'm not running anything today [08:39:50] cool, I will start s4 on monday then [10:27:01] no es backups are ongoing [10:27:31] cool [10:39:17] was the db maintenance that was stopped yesterday restarted? [10:39:35] I am trying to track everything that was affected at T322360 [10:39:35] T322360: conf* hosts ran out of disk space due to log spam - https://phabricator.wikimedia.org/T322360 [10:39:36] Mine yes [10:39:46] Not sure about Amir1's [10:40:21] I think the externallinks one I have is done [10:40:49] FWIW, it didn't affect the incident [10:41:14] not asking if they finished, it was more of a reminder to restart everything that was stopped/killed as part of a checklist to go back to normal/conditions to close the ticket [10:41:32] (MysqlReplicationLag) firing: MySQL instance es2020:9104 has too large replication lag (11m 53s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=es2020&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [10:42:23] sadly becase db maintenance now happens all the time, people ask to stop it because it always matches with the start of an outage [10:42:29] I'm going to finish the ones I had yesterday but won't start a new thing [10:42:33] even if unrelated [10:43:07] Amir1: not sure I fully follow, my question is if the status is back to normal [10:43:15] or your work is still impacted by it [10:43:25] (other than in time) [10:43:57] for the "pending things" checklist at T322360 [10:46:32] (MysqlReplicationLag) resolved: MySQL instance es2020:9104 has too large replication lag (11m 53s) - https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Depooling_a_replica - https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&var-job=All&var-server=es2020&var-port=9104 - https://alerts.wikimedia.org/?q=alertname%3DMysqlReplicationLag [11:27:45] godog: I am checking dispatch backup jobs- I still cannot see jobs scheduled [11:29:12] I found the issue: Config error: Could not find config Resource "dispatch-postgres" referenced referenced on line 7 : FileSet = "dispatch-postgres" [11:29:41] do you have the backup patch handy, there may be a typo of some missing bits? [11:34:01] jynus: ah it is quite possible I forgot to add the backup set, thank you for the hint [11:34:07] I have to run to lunch now, will take a look later [11:34:11] ah, I was reaching that conclussion [11:34:19] will add it myself and ask for your review [11:34:27] no rush [11:34:50] patch is Ia07b5bb57 though [11:34:53] ok gotta go, ttyl [11:34:59] thank you, later1 [11:59:53] right, let's see if the CI's regression tests for pristine-tar is happy with my work... [12:05:43] https://salsa.debian.org/debian/pristine-tar/-/merge_requests/9/pipelines \o/ [12:43:21] Hopefully I can upload 1.50 next week [13:09:13] godog: dispatch backups are not scheduled: Incremental Backup 10 05-Nov-22 04:05 dispatch-be1001.eqiad.wmnet-Monthly-1st-Tue-productionEqiad-dispatch-postgres productionEqiad0908 [13:11:32] s/not/now/ [13:12:52] I did a full backup now, feel free to test recovery & dumping/reloading on your owe [13:12:56] *own [13:17:36] jynus: did we have a task for simplifying and removing stuff from zarcillio database? [13:31:36] yes [13:31:51] https://phabricator.wikimedia.org/T318404 [13:32:28] We had a bunch of zarcillo open tasks, we should probably review them anyways, they might be old/done/won't fix at this point [13:34:08] Awesome [13:34:15] I'd really appreciate it [14:05:26] jynus: nice, thank you again for your help! [14:05:39] I'll definitely need to test restores too [14:09:08] restore costs extra ;-) [14:14:28] hehehe dammit [14:24:38] marostegui: have fun :P https://phabricator.wikimedia.org/T257821#8370137 [14:25:38] keep in mind that we don't know clouddb* hosts [14:25:44] same with db1108 [14:26:03] and the other labs* as well [14:26:13] yeah, I can probably just anything like that [14:26:35] what about db2217? [14:27:18] aah, I think there was a typo, we have db2217 missing in one and db2117 missing in the other? [14:30:31] could be yeah [14:38:05] Amir1: I am still waiting for your review at: https://gerrit.wikimedia.org/r/c/operations/puppet/+/817181 [14:38:55] hmm, I think I saw +2 and thought it's done [14:39:18] I will check it soon [14:39:26] I explicitely named you at https://gerrit.wikimedia.org/r/c/operations/puppet/+/817181/7#message-56a44e60d7e3793a3965ca3a3f59c887562f090c