[06:38:21] 10netops, 10Infrastructure-Foundations, 10SRE, 10fundraising-tech-ops: Upgrade fasw to Junos 21 - https://phabricator.wikimedia.org/T316542 (10ayounsi) @Dwisehaupt that will be too soon for us (SRE summit + routers upgrades planned this month). Is the following maintenance week known? [10:13:24] XioNoX: by any chance do you know what happened to the netbox DB starting on august 8th? [10:13:28] https://grafana.wikimedia.org/d/000000377/host-overview?var-server=netboxdb2002&var-datasource=codfw%20prometheus%2Fops&orgId=1&viewPanel=28&from=now-30d&to=now [10:15:19] that's the replica and the seesaw behaviour is probably related to the replication being downloaded and then deleted, the primary has a more smooth behaviour [10:15:22] https://grafana.wikimedia.org/d/000000377/host-overview?var-server=netboxdb1002&var-datasource=eqiad%20prometheus%2Fops&orgId=1&from=now-30d&to=now&var-cluster=misc&viewPanel=28 [10:20:32] volans: https://phabricator.wikimedia.org/T310615 [10:20:58] more exactly https://phabricator.wikimedia.org/T262677 [10:22:23] XioNoX: it doesn't have GC? [10:23:51] XioNoX: to be clear, I noticed because of icinga in -ops alerting for 2002's disk [10:24:15] volans: yeah that increase is obviously not normal [10:24:25] it's for a long time ago, trying to remember [10:25:21] I see hourly since 2022-08-22 on netboxdb2002 and no daily or weekly, but maybe it's just because it's a recent change [10:25:39] volans: there is a "systemd::timer::job { 'rotate-postgres-dump':" [10:25:45] so there should be some cleanup [10:26:38] also I was wondering if you had a chat with j.aime about the possible consequences in the bacula side due to the new behaviour [10:26:57] it does just: [10:26:59] /srv/postgres-backup -type f -name '*.sql.gz' -mtime +7 -delete [10:27:04] *find ... [10:28:57] I have to go to lunch, I'll catch up when I'm back [10:35:22] ack [10:35:38] moritzm: can I reimage one of the sretest hosts? I have no preference [10:35:41] on which oen [10:45:24] sure thing, both are good to go [10:45:33] ok, I'll redo 1001 as buster [10:45:41] (same OS), thanks [12:15:42] volans: yo [12:17:38] yo yo [12:19:08] looking at the db stuff again [12:20:41] the old csv dumps on the frontend had a GC script that had the logic to keep progressively less backups (hourly, daily, weekly, etc.) [12:22:47] volans: https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/821180 realistically that script was quite a mess [12:27:10] I'm not saying to restore it, just that we had the logic and how many of each :D [12:27:26] if possile I'd like to keep a bit more of the hourly, like ~3 days [12:30:04] but in general any solution that doesn't fill the disk or causes issues to bacula is ok for me :D [12:30:45] volans: hourly we could even only keep 1 or 2 days, at it's for a restore right after we notice a mistake, passed that we could rely on daily [12:31:08] ack [12:31:23] and passed the week of daily fetch it from bacula [12:32:07] I wonder if for bacula would be better ot have fixed file names [12:32:41] https://stackoverflow.com/a/26037557 [12:33:14] not sure what it's worth [12:33:14] XioNoX: that will just keep the last 7 backups [12:33:34] eh, ok [12:43:42] I deleted everything older than 2 days on netboxdb2002 for the short term fix [12:43:48] ack [12:43:49] thx [12:43:57] and will send a patch to make it permanent [12:44:20] the disk space usage is still growing faster than I'd expect [13:30:36] https://gerrit.wikimedia.org/r/c/operations/puppet/+/828005 patch merged and we can see the disk going back to a better usage https://grafana.wikimedia.org/d/000000377/host-overview?var-server=netboxdb2002&var-datasource=codfw%20prometheus%2Fops&orgId=1&viewPanel=28&from=1659896148000&to=now [13:33:35] yay [13:42:40] 10netbox, 10netops, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10jcrespo) [13:44:52] 10netbox, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10ayounsi) [13:58:52] 10netbox, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10Volans) > because postgress dumps look like this: > `/srv/postgres-backup... [14:04:05] 10netbox, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10jcrespo) > We can make the dumps look like anything we want :) No issue... [14:06:27] 10netbox, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10Volans) If we use a single file for bacula we can keep it uncompressed I... [14:06:31] 10netbox, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10jcrespo) > I'm wondering if we could actually have a single file in bacu... [14:07:58] 10netbox, 10Data-Persistence-Backup, 10Infrastructure-Foundations, 10bacula: Convert Netbox data (PostgresQL) longterm storage backups (bacula) into full backups rather than incrementals - https://phabricator.wikimedia.org/T316655 (10jcrespo) > If we use a single file for bacula we can keep it uncompressed... [14:29:56] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove 185.15.56.0/24 from network::external - https://phabricator.wikimedia.org/T265864 (10ayounsi) >>! In T265864#6995696, @Legoktm wrote: > It would be nice if we could deploy this change for services in...