[00:01:03] < shantaram> record lows in productivity reported as stackoverflow down: "we can't work without it", "i've never felt so alone in my life" say programmers [08:49:59] ~/.~. [08:50:13] stuck tmux session, sorry about that [10:07:56] hurm. i just had a puppet failure on alert1001: https://phabricator.wikimedia.org/P16338 [10:08:14] jbond: ^ [10:09:22] looking [10:10:25] jbond: might be related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/698796 [10:11:43] kormat: yes have a patch 2 mins [10:15:07] _joe_ jayme is it okay if I set up a shellbox in beta cluster? [10:15:35] <_joe_> Amir1: absolutely, good luck with that! [10:16:00] Thanks! [10:16:01] <_joe_> Amir1: tbh I think shellbox in beta for now should just use the "embedded" version [10:16:05] <_joe_> and not rpc [10:16:22] but I want to test the RPC part :D [10:16:24] <_joe_> just because the rpc part is going to be super painful to set up I fear [10:16:37] kormat: thats fixed now, sorry about that :) [10:16:41] I doubt it [10:16:52] It was really straightforward in my localhost [10:20:54] jbond: np :) [10:23:34] <_joe_> Amir1: you just wrote down your problem statement :D [10:26:16] haha [10:26:27] I'm shipping my localhost (docker :D) [11:25:33] I (with help from vola.ns) have deployed django-cas-ng to netbox-next, all seems to be working as expected. however can anyone who uses it daily give it a check over and make sure everything looks good (paravoid, XioNoX, topranks) [11:26:22] I get past authn, but then getting Access Denied [11:26:53] url is login/?next=%2F&ticket=-idp2001 [11:29:09] * jbond checking [11:33:39] same [11:35:43] Amir1: how are you planning to run it on beta? like the other microservices running in docker? [11:36:02] majavah: yeah on docker [11:36:28] XioNoX: paravoid: yes looks like an issue with the deploy, think i must have tested with cached session [11:43:57] let me know when you'd like me to test again :) [11:47:52] will do [16:07:11] hmm, just got a new error on a reimage: https://phabricator.wikimedia.org/P16360 Seems like the reimage went fine so far though, it's doing a puppet run now [16:07:50] oh, no /usr/bin/cookbook on cumin2001 [16:11:18] hnowlan: see moritz's email, cumin1001/2002 are fully fledged, 2001 is just for DBA-related stuff until the DC switchover [16:11:35] we should probably remove the reimage script too maybe, not sure cc moritzm [16:12:34] ah, oops. [16:12:53] sorry about that [16:13:58] no big deal, looks like it's actually reimaged fine [16:14:44] but has not done all the other steps [16:15:12] cumin still works there so I guess that's ok, let me think [16:15:15] maybe it's just the downtime [16:15:30] but it should not be used for that [16:23:01] yeah,let's remove the reimage script on 2001, doing that npw [16:23:12] thx [16:23:47] done [16:30:13] it looks like the reimage completed successfully at least - should I reimage again from cumin2002 to be sure? [16:30:31] reimage cumin2002 to be sure [16:30:45] (do not listen to me) [16:31:31] https://i.redd.it/4rbuslk1pfl11.jpg [16:31:42] lol [16:44:23] one of those days where a missing tailing comma leads to half a day of debugging. thanks volans <3 https://gerrit.wikimedia.org/r/c/operations/puppet/+/699045/1/modules/netbox/templates/cas_configuration.py.erb [16:44:40] XioNoX: paravoid: topranks: should be good to tes netbox-next again :) [16:46:32] jbond: wfm! [16:46:33] jbond: that works for me [16:46:55] cool will deploy to prod tomorrow [16:46:58] thanks [16:50:27] Hi SREs! Can someone merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/689192 for me please? [16:51:54] dancy: oh no, has that been sitting for a week? looking [16:52:33] It's been lurking for a bit but this is the first time I've made a request for merge. [16:55:08] Thanks rzl! [16:55:20] no worries! [17:02:39] rzl: thanks, I see has been already taken care [17:57:01] kormat: wow, it's... still running. [17:57:11] pc253 [17:57:12] nearly there [17:58:01] Krinkle: yep, i've been watching it. so close! [17:59:20] kormat: is there a short explaination you have at hand why these deletes take so long? It surprises me given little to nothing on the server, and indexed relatively simple delete queries. Is it doing like null overwriting and defragmenting at the same time or something? [17:59:32] and given no throttling/sleeping [17:59:49] I [18:00:03] I'm guessing the size of the blobs must factor into it somehow. [18:00:48] Krinkle: maybe it has something to do with the fact that blobs are not stored in-line in the tables on disk [18:00:59] they're stored elsewhere, and there's a pointer in the row [18:01:10] but i'm just speculating [18:01:21] (and not at all related to the mysql training course i'm currently on ;) [18:02:01] right, I vaguely recall something like that indexes generally are not a full copy of the table, but use a pointer. but that would be helping speed it up I think? [18:02:07] anyway, np :) [18:16:02] yeah I can't imagine that plays a role, would have to know the specific delete that's happening, who knows [19:45:53] to who removed /srv/tftpboot/buster-raid0-installer/pxelinux.cfg it is also on releases* servers but puppet cant remove non-empty directories [22:23:39] You do not have a valid Kerberos ticket in the credential cache, remember to kinit. [22:23:42] [install1003:~] $ [22:24:05] ^ ok.. but why do install servers have kerberos? when I grep -r kerberos::client it doesnt looke like they should either [22:24:12] no access4u mutante ? :) [22:24:41] I don't want that access, I wonder why install servers would have kerberos [22:24:59] and also why it overwrites motd instead of appending [22:28:24] oh, scratch that last part, motd is ok