[09:38:36] any idea what this means: [09:38:36] netflow4001:~$ sudo /usr/sbin/sfacctd -f /etc/pmacct/sfacctd.conf [09:38:36] free(): double free detected in tcache 2 [09:38:36] Aborted [09:43:21] <_joe_> usually a "double free" means someone is trying to release a memory address that's already been released. Which in C is one of the cases of "undefined behaviour" [09:44:08] <_joe_> not sure if this is the actual response from libc to an attempt at using free() again on a memory address not allocated anymore [09:44:24] great [09:44:53] <_joe_> so yeah that looks like a bug in sfacctd [09:46:10] noted, thanks! [09:46:15] <_joe_> https://github.com/pmacct/pmacct/blob/master/src/sfacctd.c#L300 [09:48:00] <_joe_> I'd need gdb to be sure of what the bug is [09:49:34] _joe_: do you have time to have a look? [09:50:39] note that's it's not urgent at all [09:54:53] <_joe_> XioNoX: not now sorry; also is this a new thing you were trying? [09:55:22] yeah it's https://gerrit.wikimedia.org/r/c/operations/puppet/+/742110 for https://phabricator.wikimedia.org/T263277 [10:07:56] pah, whatever happened to nasal demons? :) [10:10:03] <_joe_> XioNoX: interestingly with a worn-down configuration in the config file, this doesn't happen [10:10:53] _joe_: how much did you remove? [10:11:33] could it be a config option that causes that? [10:13:04] <_joe_> pre_tag_map: /etc/pmacct/pretag-sfacctd.map [10:13:11] <_joe_> this causes the bug [10:15:03] alright, that helps [10:15:26] next question is, how can I figure out how? :) [10:15:56] <_joe_> XioNoX: one way is to run the software via gdb and set a breakpoints I guess [10:16:07] <_joe_> or to get a stack trace [10:19:47] _joe_: I've never done that, but I'll have a look, thanks! [10:20:19] <_joe_> XioNoX: I'm very rusty tbh, but I think a small gdb usage tutorial would be useful for many SREs [10:21:16] often "just run inside gdb and type bt when it crashes" is enough [10:21:31] ...but you may need a debug build or similar, which can get tedious rapidly [10:21:31] https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L2879 [10:22:17] Points to the pmacct authors [10:23:26] also the VM is running buster, which have a pmacctd version from 2018 [10:26:20] <_joe_> XioNoX: I was looking at the pmacct source code, and sfacctd doesn't seem to have support for pre_tag_map? [10:28:36] _joe_: from the doc it does: https://github.com/pmacct/pmacct/blob/1.7.2/CONFIG-KEYS#L1796 [10:29:29] <_joe_> XioNoX: saying this as I see support for pre_tag_map being explicit in other parts of the code [10:30:00] I'll test the same config in a cloudVM on bullseye, see if the issue is there as well [10:30:36] _joe_: it's a core feature of pmacct so I'd be surprised, maybe it's done differently there? [10:31:16] see here also https://github.com/pmacct/pmacct/blob/1.7.2/examples/pretag.map.example#L16 [10:33:59] <_joe_> yeah just saw how it's loaded [10:34:26] <_joe_> but even the precompiled pre_tag example causes a double free [10:39:21] <_joe_> ok found where the issue is [10:40:13] ah? it runs fine on a bullseyes vm (with a more recent pmacct version) btw [10:40:26] <_joe_> yeah the problem is in pretag.c [10:42:43] _joe_: alright! thanks for looking into it, now I have a clear path forward [10:44:26] <_joe_> https://github.com/pmacct/pmacct/blob/1.7.2/src/pretag.c#L620 [10:45:06] <_joe_> and yes, that file has been rewritten from scratch basically between 1.7.2 and 1.7.3 [10:48:07] <_joe_> this is the fix https://github.com/pmacct/pmacct/commit/0646568589b50c1d5a2600c3e2dff8d1a78cda53#diff-75777c601d70a82f5df5537bfb8e76d10ad586b81330554b5e9118509bf3c348 but I'm not sure I want to backport it [10:52:46] hahha, yeah, no :) [10:53:16] we will have to upgrade the netflow boxes to bulleye anyway [10:54:42] moritzm: ^ fyi, is it fine to do it anytime? [12:28:20] anyone have a sec for a quick review? just getting some new restbase hosts ready to pool https://gerrit.wikimedia.org/r/c/operations/puppet/+/746851 [12:32:05] hnowlan: 👍 [12:32:08] <3 kormat [12:34:43] XioNoX: yeah, absolutely :-) [12:39:55] XioNoX: when you create netflow2002, please use row C, this Ganeti group is already completed wrt the ongoing buster update of the ganeti virt nodes [12:40:25] moritzm: ok! [12:41:03] moritzm: anything else to know other than I need to add `option pxelinux.pathprefix "http://apt.wikimedia.org/tftpboot/bullseye-installer/";` ? [12:45:22] no,nothing else :-) [14:10:53] jbond: hi, I don't know if it's in your radar but PCC is 502ing atm https://puppet-compiler.wmflabs.org/pcc-worker1001/1131/ [14:10:58] and https://puppet-compiler.wmflabs.org [14:13:52] Amir1: thanks will look dcaro has been upgrading them today so it could be a transient issue [14:14:10] oh okay [14:14:19] Thanks! let me know once done [14:14:25] will do [14:15:34] looking [14:46:08] XioNoX: an easy option btw is to re-run the binary but with MALLOC_CHECK_=3 http://btorpey.github.io/blog/2019/07/14/memory-checking/ [14:46:16] running under valgrind is another good, easy option [14:47:08] that should get you a backtrace of where in the source the error occurred, good first thing to be able to pass on to upstream [14:47:18] good to know, thanks! [14:50:43] <_joe_> cdanis: yeah the difference is valgrind is not installed on netflow4001 :P [14:53:54] _joe_: I've installed it on the netflow servers before 😇 [14:54:27] <_joe_> yeah I found I was the second person bug-hunting memory management in pmacct [14:54:29] <_joe_> :D [14:54:39] !dns6002 rebooting for firmware updates via T286507 [14:54:39] T286507: (Need By: TBD) rack/setup/install drmrs non-cp-hosts - https://phabricator.wikimedia.org/T286507 [14:54:47] !log dns6002 rebooting for firmware updates via T286507 [14:54:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:37] _joe_: the good news is that upstream was very responsive :) [14:56:58] Amir1: you shold be able to find your report now [14:57:26] <_joe_> cdanis: in my case, the issue is already solved upstream with the next minor version [14:57:40] <_joe_> which completely rewrote the whole functionality [14:58:01] <_joe_> like, 30% change in the single file between minors [14:58:14] <_joe_> sorr, patchset [14:59:28] eheh [15:02:08] "minor" [15:04:34] dns6002 done with bios and rebooted into os, idrac interface updating now [15:15:28] !log dns6002 bios update done, returned to green in icinga, dns6001 coming down next for firmware update via T286507 [15:15:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:15:32] T286507: (Need By: TBD) rack/setup/install drmrs non-cp-hosts - https://phabricator.wikimedia.org/T286507 [15:18:57] jbond: Thanks! [15:25:06] !log dns6001 returned to service (icinga checks going green) via T286507 [15:25:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:11] T286507: (Need By: TBD) rack/setup/install drmrs non-cp-hosts - https://phabricator.wikimedia.org/T286507 [15:34:58] who owns ganeti? root@ganeti2008 is sending sad email [15:37:32] moritzm: ^ [15:43:30] Emperor: can be ignored, caused by ongoing ganeti cluster update [15:50:41] Thanks [17:05:33] jynus: sorry it took so long :) https://phabricator.wikimedia.org/T69818 [17:15:10] Krinkle, time from my question to your answer: 1,436h, from your question to mine: 9 minutes :-) [17:16:31] jynus: I'll try to decrease TTA by 25% by next quarter :) [19:09:10] jynus: if you're not subscribed, https://lists.wikimedia.org/hyperkitty/list/commons-l@lists.wikimedia.org/message/C5QGV5FM3BJMTPLJ3JBYOLB2HNKK6GXT might interest you [19:10:30] thanks, I will answer, legoktm [19:11:18] although I think they are thinking of dumps, not backups [19:13:29] yes they are conflating dumps and backups (surprise!) but the answer about something downloadable is, nope, nothing since when they were discontinued years ago. [19:23:57] the impression I got was that SJ wanted to backup Commons, and was asking for a dump for that goal [19:24:55] but I think "I have the feeling the bulk of Commons media (~300 TB in all) is not mirrored anywhere right now." is no longer correct, or partially correct with the media backups work [19:25:13] even if the "mirror" is our other DC [19:52:12] <_joe_> apergos: well you can think of dumps as sort-of backups [19:52:17] <_joe_> and vice versa [19:52:19] <_joe_> right? [19:52:26] * _joe_ runs [19:58:30] where's that kickban button [20:24:30] Who should I ask about grafana things? Specifically, 'does/should grafana be installable on bullseye'? [20:29:38] andrewbogott: #wikimedia-observability ;) [20:31:17] andrewbogott: also this is a new thing: grep -A1 role_contact hieradata/role/common/grafana.yaml [20:40:01] oh, cool [20:40:04] thanks volans [21:42:20] /ac/ac