[06:04:08] mmmm there are pending changes to merge on the dns repo? [06:04:16] git-setup has modifications [06:04:51] brett: It looks like this is it: https://gerrit.wikimedia.org/r/c/operations/dns/+/807229 [06:04:51] I belive there wa an alert about that yesterday night [06:05:17] jynus: right...so that is still pending to merge and I am not sure I want to merge that without knowing the context [06:05:26] jbond: can you take a look at that? (as you were one of the reviewers?) [06:06:13] Seems safe: "This is just copy of a script that was in other repos at that time, see operations/software, operations/puppet, etc. Fine to delete it as far as I'm concerned." [06:08:07] ok, will merge...if not we can revert I guess [06:08:51] When did the alert fire last night? [06:10:10] [21:04:37] 10SRE: pending diff in sre.dns.netbox cookbook [06:10:40] is that UTC?? [06:10:47] I belive yes [06:10:53] 🤦 [06:10:53] maybe it was unrelated [06:11:53] (running the dns cookbook in dry-run mode) [06:12:35] maybe it was this one I saw, unsure: [22:35:05] PROBLEM - Uncommitted DNS changes in Netbox on netbox1002 is CRITICAL [06:12:52] but that recovered soon after [06:15:07] METADATA: {"no_changes": true} [06:30:46] fyi that CR shuold be perfectly safe to merge [06:31:14] (and see it now is) [11:26:13] tiff is such a great format file [11:26:48] I just created a 1 pixel image for testing with the default Gimp options and it is 193KB in size [11:33:06] That's one large pixel [11:34:06] https://test.wikipedia.org/wiki/File:Tiff_test.tiff [11:34:41] I haven't checked, but I think we use more space for uncompressed tiffs than for videos [11:56:46] Everbody knows gzip'd NetPPM is where it's at [11:58:39] Use xz or zstd if you have specific compression needs ;) [13:10:41] I'm a big fan of ZFS with transparent LZ4 compression, personally. [13:19:16] Every time I have tried to use ZFS, it ate all my RAM [13:34:41] klausman: i think if you dont use de-dup then ram usage is not so bad. also have an ssd pool for caching [13:37:30] I think my use case (turn a pile of disks and ssds into storage and support `cp --reflink`) is served well enough by btrfs. Plus, no kernel patching needed. [13:37:36] klausman: You can also set `zfs_arc_max=8589934592` when loading the module. (This example is 8GB.) It has to be in bytes. [13:37:49] Adn no, in >10 years of using btrfs I have lost 0 bytes. [13:38:25] * jbond current ARC is 121.92 for ~6TB file system [13:38:35] (121M) [13:38:37] ARC? [13:38:42] cache [13:39:26] Ah. I dunno if btrfs treats my SSDs special beyond stuff like "has TRIM"), but then again, for my use, what I have is good enough™ [13:39:53] cp -a --reflink to make a quick, cheap copy of a tree before you work on it is super neat, tho. [13:40:05] (yes, I am aware of snapshots) [13:41:22] i wanted to use something like raid 5 (raidz1 in zfs) to save on disk cost and was but of by all of the btrs due to all the warnings around raid5 [13:45:21] I am fine with RAID1 for metadata and data as btrfs does it. I have abandoned my data hoarder ways, and now disks are cheap enough. My file swerver has had a 4T cap for a long time now. [13:45:40] Said server is more for convenience/centralization than sheer capacity [13:46:13] atm, it's 1x2T rust and 3x 2T ssd, mostly since I replaced the original 4x2T rust with ssds as rust broke [13:46:59] I may have to look at a new case, tho, since this Antec case from ~7 years ago doesn't have any 2.5" bays. [13:47:26] :) [13:50:20] * jbond has 3*7TB + 4*4TB rust + 2x1TB ssd (main fs is 14TB not 6TB) [13:53:25] I'm on 2*8TB external usb 3.5" drives, attached to a rpi 4 with 1*32G microsd [13:53:34] some nice /r/DataHoarder vibes! [14:04:20] :-) I was sort of meaning I like ZFS compression for /production/ workloads (as well as personal hoarding). [14:24:55] I've been using BTRFS for my computers for over 2 years now and I'm in love with that file system. It's my first time using a CoW FS as well as adding more drives for RAID. [14:27:21] For my NAS I only have 8 Tb HDD and 1 Tb NVME (for cache, OS, etc). I plan on adding a total of 32 Tb of storage in HDD. [14:27:30] note that cp from coreutils will start to default to try to do --reflink on supported filesystems [14:27:39] (in some future version, that is) [14:28:21] There's also duperemove that will scan a FS (subtree) and dedupe extents that can be [14:28:42] https://github.com/markfasheh/duperemove [14:29:15] Of course doing that is a bit of an eggs/basket thing, but we're all grownups^W SREs, so we know the risks [15:51:12] rzl: mutante: there is ongoing noise with power maintenance on codfw, nothing unexpected so far [15:51:34] except for T311526 [15:51:34] T311526: es2033 crashed at Jun 28 ~15:34 - https://phabricator.wikimedia.org/T311526 [15:51:59] but dbas are aware [15:53:17] (shouldn't be an issue as that db does not serve currently production traffic) [15:54:32] apparently mc2032 also crashed around the same time [15:55:16] got it, thanks! [15:55:40] seems to be an impactful maintenance similar to https://wikitech.wikimedia.org/wiki/Incidents/2022-06-21_asw-a2-codfw_accidental_power_cycle [15:55:59] (sorry, trying to give you a concise update, but things are ongoing still) [15:58:30] aren't they always :) [16:00:00] with the difference here that... /me throws bomb smoke and disappears [16:00:38] * jynus air clears and I am still here standing, because I am a terrible ninja/magician [16:12:15] jynus: I looked at that rack D1 the other day and noticed the DB servers. the other ones in there seemed to all not need depooling [16:12:16] D1: Initial commit - https://phabricator.wikimedia.org/D1 [19:13:50] helpful bot is helpful :D [20:27:39] _joe_: dancy: I see `opcache.validate_timestamps = 0` is set on e.g. mw1315, but not yet in beta cluster. I figured with both canaries and app/api server roles having it, it'd be applied through at least one of those three roles, but apparently not? [20:28:00] e.g.. deployment-mediawiki12 (the only appserver) [20:55:11] Krinkle: I think that's hiera settings, so in beta cluster it probably has to be set manually somewhere. See hieradata/role/common/mediawiki/appserver.yaml [20:55:40] role based hiera does not apply in cloud vps for #reasons [21:01:53] btullis: FYI it seems that you have a screen with a reimage for stat1010 since yesterday on cumin1001 [21:08:11] volans: Many thanks, feel free to terminate it or ack any alert. I'm actively working on stat1010 to try to work out issues with the raid controller, but the screen session can go if needs be. [21:11:09] btullis: was mostly an FYI in case it got forgotten while stuck in some state, or pending user input. [21:11:20] if it's known I'll leave it to you :) [21:12:30] mutante, arnoldokoth: the decom cookbook (and others related to VMs) have been fixed and you can now run them anytime. Just double check the CLI options as they have changed slightly. [21:15:45] https://wikitech.wikimedia.org/wiki/Ganeti#Create_the_VM has been updated accordingly [21:47:21] bd808: ah, right, that sounds familiar [21:48:50] (the #reasons is mostly that we don't have the `role` function magic that is used in prod to find the hiera) [22:08:40] volans: thank you, alright [22:28:49] jbond: (or anyone else) ping for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/806489 ty! [22:29:30] legoktm: can do [22:31:48] thanks :)