[04:20:35] Agreed...maybe there's some ugly hack that woulda been possible like lowering the depool threshold all the way to zero [09:16:56] ejoseph: ping me when you're around [10:18:27] meal break [10:54:14] lunch [11:05:35] lunch2 [11:46:47] zpapierski: I am around now [12:34:52] lunch break [12:51:35] zpapierski : maven clean verify works [13:06:04] zpapierski: all tests seem to pass (was that expected) [13:06:11] Am i missing something [13:14:07] launch [13:23:23] ejoseph: David and myself are in the Open Hangout if you want some company. [13:24:12] I'm missing context on your question above, but if you just checked out the project and did not add any modifications, it is expected that `./mvnw clean verify` should work. [13:45:56] zpapierski: I've scheduled a meeting to discuss some interview process but forgot to ping you about, feel free to join anytime in the openhangout I'm there [13:46:43] Let me get some coffee and I'm there [13:48:06] ejoseph: nope, it's ok - now we need to pin point any deprecations in the logs and act upon them [15:05:37] \o [15:38:42] I'm watching the Neural Search meetup from Haystack. He ran a live demo at http://fandom-demo.deepset.ai/ — it's an index the Harry Potter Fandom wiki. The results are definitely not better than keyword search on random queries—both relevant and irrelevant to Harry Potter. I'm bailing on the meetup, and I'll be at the retro at the top of the hour. [15:43:04] dcausse: the fix you did to the rc updater, is it in a release yet? [16:02:03] gehel: not yet [16:03:40] ejoseph: retrospective: https://meet.google.com/ssh-zegc-cyw [16:31:55] ejoseph: I'm back on open hangout if you're still around [16:32:06] Sure [16:39:44] dinner [16:43:03] ryankemper: we should figure out getting the wcqs machines back online. Given the options from morten dropping the 5.x kernel in and trying again seems a reasonable way forward [16:44:22] ebernhardson: seems reasonable [16:44:46] we can try the different kernel version first and if that doesn't work have dcops get on latest firmware [16:45:25] ryankemper: sounds reasonable to me. the machines probably all have to be console rebooted, they wont finish a reboot with the disks stuck [16:45:40] yeah, I'll hard reboot em now [16:48:49] (done) [16:49:15] looking how we choose kernels in puppet, not really sure :) Maybe we just install the package [16:52:57] looks like include ::profile::base::linux510 in the role, along with setting enable in the hiera [16:53:25] ebernhardson: his comment indicated we can just apt install the kernel, altho I have to imagine there's another command to actually make it boot into the new kernel [16:53:53] ryankemper: yea thats the part i was wondering about, multiple kernels can be installed something must choose the winnner [16:54:11] also not sure if we want to muck around with grub to make it load the new kernel every time or just do a one-off reboot [16:54:22] ebernhardson: yeah agreed, it looks like we're using grub as the bootloader afaict [16:55:34] ryankemper: from the ops package, since it's pinning a very generic `linux-image-amd64` package i'm imagining we must be depending on debian defaults to handle choosing [16:56:15] okay I'll see what happens if I apt install on one host and then reboot and check what kernel is running [16:56:22] sure [17:01:50] ebernhardson: Yeah it did boot into the new kernel [17:01:53] https://www.irccloud.com/pastebin/kf2IRMbT/ [17:02:31] awesome, i guess just do it to all of em, and i'll restart the import? [17:03:43] Sure [17:03:44] interestingly I haven't been able to get into `wcqs2001` or `wcqs1001` since I rebooted...I'll try another power cycle for those two [17:04:59] huh [17:06:07] BTW if this isn't a firmware-type issue (which I feel like it probably is, but we'll see), `wdqs` is currently one minor version behind `wcqs` (`Linux 4.19.0-16-amd64` vs `Linux 4.19.0-17-amd64`), so if this is purely a kernel issue we need to make sure wdqs doesn't get onto that version [17:08:10] firmware seems very plausible still, certainly. I worry even if the machines don't fall over on 5.10 it's not clear that the issue is solved. I've run that import once before and the machines didn't fall over [17:09:23] Agreed, a lot of uncertainty here [17:10:03] I was able to get into `wcqs1001`, `wcqs2001` is still giving me trouble. But the new kernel is installed on everything except `wcqs2001` now. If you can't get to the other hosts for the next couple mins that's cause I just rebooted them [17:10:23] Gonna powercycle `wcqs2001` for a third time and see if that does it :) [17:11:06] hmm, can you see the serial console output for wcqs2001? If it's failing to come up plausible it says something [17:11:23] (i imagine they aren't real serial consoles, dunno what we call em now :P) [17:14:13] virtual serial port :) [17:14:34] checking [17:15:21] ahh, yea that makes sense [17:17:48] actually I guess the vsp is the whole management console whereas you can connect to the serial console after logging into the mgmt console [17:20:04] Anyway I'm watching the serial console during a fresh reboot, it just got to the grub stage and is init'ing the ramdisk currently [17:21:33] Woot okay third time was the charm [17:21:41] Maybe because I was watching the console it knew it had to be on its best behavior [17:23:31] Okay installed new kernel on `wcqs2001` as well, so once that comes back up all of wcqs* will be on `Linux 5.10.0-0.bpo.9-amd64` [17:24:16] :) [17:26:13] ebernhardson: okay fleet should be ready for ya [17:30:09] ryankemper: could you copy /srv/query_service/latest-mediainfo.ttl.gz from any of the other machines to wcqs2001? I was just checking and since i staggered the start that was the first machine and it downloaded the previous weeks data [17:34:37] ebernhardson: doing that now [17:36:43] ryankemper: tx [17:49:08] fwiw if useful thtere is also sre.hosts.reboot-single [17:49:12] (cookbook [17:53:45] volans: ah thanks, yeah having it auto downtime is convenient [17:55:41] there is also sre.wdqs.reboot but I'm sure you're aware of that one :D [18:01:26] dinner [18:22:32] ebernhardson: okay file's in place on `wdqs2001` [18:23:20] great. All the imports are started up again, not much to do for now but see what happens [19:17:37] ryankemper, ebernhardson: yeah, let's install the kernel package manually for testing and if it fixes the issue, then we can apply the profile::base::linux510 [19:52:43] thanks for the pointers, that sounds reasonable. (was at lunch :)