[08:59:01] today in "php pitfalls" land https://github.com/librenms/librenms/issues/13390 [09:00:18] "config.php is a minefield" 💣 [09:02:19] indeed, and we're going in without PPE [09:04:19] :D [09:10:47] Get out the Hazmat suits lol [09:14:06] what's the SNMPv3 host out of interest? [09:22:56] topranks: one of the test PDU we got at T265435 [09:22:57] T265435: codfw: Testing Out Sample PDUs - https://phabricator.wikimedia.org/T265435 [09:26:05] godog: lol SNMP v1 or 3. Sticking to the widely-used variants I see :D [09:26:07] I bandaided the specific v3 issue on our end with the review in the task, but I'm sure we'll run into sth like this again [09:26:49] topranks: yeah, I fail to understand what's going on there, papaul has reached out to see if they can do v2c [09:28:36] v2c is not much different from v1 [09:28:49] I agree it's surprising [09:29:06] but not worth having a single v3 device infra wide [09:30:17] this is a test device, however why not worth it ? [09:33:05] godog: to keep everything as standardized as possible. Happy to test it of course. But if we were to start using SNMPv3 we should look at a larger project, and transition all the devices to v3 [09:34:26] yeah if we were to go ahead with this brand of PDUs I'd imagine we'll eventually converge on v3 at least for PDUs [09:34:51] I disagree on the scope creep of embarking on a project to move everything on v3 though [09:34:58] not that standardization isn't good, and definitely for a particular vendor, or class or devices [09:35:23] but moving everything to v3 seems like a big lot of work for not much benefit [09:35:54] I agree v3 is not worth it overall and might bring more issues [09:36:12] godog: I'd suggest we stick with v1 for those PDUs [09:37:17] XioNoX: I must sound like a broken record at this point, but "why?" [09:38:18] my bad, I wasn't clear enough :) less complex to configure/maintain, similar to what we currently use everywhere, less risk of hitting v3 specific bugs, etc [09:40:25] XioNoX: yeah that's fair enough! thanks for the explanation [09:59:01] we have a nrpe check that magically started failing today for some of our servers [09:59:05] https://www.irccloud.com/pastebin/sRDvOQm6/ [09:59:09] rings any bell? [11:36:50] arturo: That sounds like a new pseudo-fs got mounted in that spot, and NRPE tries to find its size. The --exclude-type=tracefs makes me think that maybe its name/type changed? [11:37:52] klausman: [11:37:56] nagios@cloudstore1009:~$ mount | grep trace [11:37:56] tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime) [11:38:58] Hmmm. try moving the --exclude option to the beginning of the command line (before -w 10%) [11:39:31] My best guess is that one of the globs (exp/.* or replicate-.*) expands in a way that messes with command line parsing [11:39:54] klausman: your guess seems right! [11:40:03] https://www.irccloud.com/pastebin/LCfk0VSY/ [11:40:42] I recommend using `echo` to see what the expanded paths are. If check_disk does its own expansion of * etc, you might want to quote those options [11:41:26] It's likely one of them contains a space or similar, and thus breaks the command line. [11:42:58] klausman: again, you are 100% right [11:43:16] this arg `-i /exp/.*` expands to `-i /exp/. /exp/..` [11:43:18] kormat: see? see???? I am finally right about something! woo! [11:43:39] Oh right, the glob should likely be exp/.??* [11:43:45] if I scape it, it works [11:43:47] https://www.irccloud.com/pastebin/KttxVEeE/ [11:44:00] i.e, `-i /exp/.\*` [11:44:01] Well, if you escape it, it only matches a literal star [11:45:02] kormat: :D [11:45:13] `-i /exp/.??*` works just fine as well [11:45:21] https://www.irccloud.com/pastebin/wzFN19GT/ [11:45:57] But make sure it actually picks up things that ou wnat matched by that [11:46:21] klausman: thanks for the hint! I guess next mystery is: how this worked before / why it suddenly stopped working [11:47:40] Best guess: shell env changed (e.g. shopt -s dotglob) [11:49:04] yikes [11:49:10] this could be a SRE interview question [11:49:47] SRE=Shell Repair Engineer. [11:51:24] TIL: dotglob If set, bash includes filenames beginning with a `.' in the results of pathname expansion. [11:51:30] Bonus question: does the "repair" refer to repairing shells? or is it repairing the damage done by shells? Careful, trick question! [11:52:27] none of the above? [11:54:34] root@cloudstore1009:~# shopt | grep dotglob [11:54:34] dotglob off [11:54:39] ^^^ klausman [11:55:27] majavah: correct. The true SRE knows when to ditch shell: at the earliest possible time. [11:55:54] arturo: then I dunno. Maybe /exp didn't exist before and thus didn't expand to anything? [11:56:31] klausman: true [11:56:57] /exp/ indeed exists in the server [11:56:59] not even sure why [11:57:04] klausman: you should move to the cayman islands [11:57:07] start a shell corporation [11:57:21] <_joe_> 🤦 [11:57:26] Shell corporation? You mean like drilling for oil? [11:58:18] <_joe_> klausman: usually my parameter for running away from the shell is "the script needs nested conditionals" [11:58:40] <_joe_> even masked with functions, I mean [11:59:08] There are many indicators. $( ... $( ) ...) is another [11:59:36] <_joe_> oh the cdanis pattern [11:59:38] <_joe_> yes [12:00:10] Also if it has more than one instance of changing IFS [12:00:10] <_joe_> (you're missing a base64 encoding/decoding there to make it proper though) [12:00:22] <_joe_> s/more than// [12:00:44] So two instaces is fine? :-P [12:01:02] <_joe_> s/more than/at least/ [12:01:05] <_joe_> :D [12:01:12] (no points for arguing it must be 2n so it's changed back) [12:01:49] signs you shouldn't be using shell: you're writing a partitioner for use during OS install 😭 [12:01:54] <_joe_> the moment you need to mess with IFS you're getting into a territory better served by a programming language [12:02:13] +1, I truly dislike IFS games [12:02:30] <_joe_> kormat: as our partman expert, you know something about it [12:02:38] * kormat shivers [12:03:00] <_joe_> sorry, I now need to micdrop for lunch. [12:03:59] kormat: just write a better replacement! [12:04:12] SMOP [12:04:14] I hear Javascript is en vogue these days. [12:04:34] klausman: why am i friends with you [12:24:31] kormat: there might actually be a partman replacement on the horizon... https://nick-black.com/dankwiki/images/b/b9/Parting_ways_with_partman.pdf [12:25:38] moritzm: 👀 ! [12:46:28] > There's no compelling reason to replace partman [12:46:30] I object [12:52:42] Other than that, I approve of the content of that PDF [12:53:33] Thought atm, it has no preseed support. [12:53:55] kormat: given that last fact, time for you to get on their mailing list and make outlandish demands [12:57:28] after the debconf presentation (from which the slides are), there was a followup discussion starting at https://lists.debian.org/debian-devel/2021/09/msg00344.html which sounded promising that this is moving forward [12:58:27] apart from a few derailing comments around "if this doesn't support amiga disk labels it's not a suitable partman replacement" comments... [12:59:13] but even for those the author would be willing to add support... https://lists.debian.org/debian-devel/2021/09/msg00382.html [12:59:32] As someone who used to maintain a distro's Alpha port, I understand such concerns and I think keeping partman around for a while would be doable [13:01:59] this wouldn't mean to ditch the old support immediately, those fringe ports already use different toolchains in many regards [13:02:12] Ack [13:02:21] but https://lists.debian.org/debian-devel/2021/09/msg00393.html summarises it well [13:03:01] and the enthusiasm in the m68k porter community is quite remarkable [13:03:32] 1-2 years ago they fundraised a new GCC backend (since the old format m68k was still using is going away) [13:03:50] and recently support was contributed to LLVM and Rust as well [13:04:08] I pretty much agree with that latter mail re: unsupported stuff [13:30:24] _joe_: hey, I resemble that remark [13:30:41] <_joe_> cdanis: :* [13:31:19] I'm remembering that time that rzl baited me into making my already-disgusting shell one-liner we were using to troubleshoot some issue on commons into producing output formatted as Phab table markup [19:53:15] icinga claims puppet-board is down since 6 hours with connection refused.. but my browers says puppet-board works fine [19:53:50] hosts end in 2. I bet it's new hosts with a new role applied that adds monitoring.. common issue [20:05:52] no, those have been around a while [20:06:00] (dns2002 and others) [20:06:06] something else changed in puppet most likely [20:08:08] there was this several hours ago which touched related bits: https://gerrit.wikimedia.org/r/c/operations/puppet/+/732954 [20:08:24] (added drmrs, which may have caused some impact on the other sites' peers settings) [20:09:10] the puppetboard1002/2002 are new hosts, it is running on 1001/2001. but the puppet fails on dns hosts are separate [20:09:16] yeah [20:09:35] I think the change I referenced was what affected dns[12]00[12], I'm just not exactly sure of the details yet [20:13:44] yeah, I'm gonna roll it back. in theory it can be fixed, but I'm starting to get out in the weeds of it, and don't want to leave timesync or puppet on those hosts in a bad state for the weekend. [21:05:47] mutante: those dns hosts are clean now with the revert, we'll follow up next week at the traffic level since those patches and the servers are all ours :) [21:06:12] bblack: :) thanks! great [22:38:06] and now .. to the Bolivian Cooking video :0