[08:33:25] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability: SCS CPU monitoring issue - https://phabricator.wikimedia.org/T285229 (10ayounsi) This regularly alerts and is not actionable as it's a monitoring glitch. The CPU usage on the device is for example: `Cpu(s): 0.3%us, 0.0%sy, 0... [09:08:39] 10netops, 10DC-Ops, 10Infrastructure-Foundations, 10SRE, 10SRE Observability: SCS CPU monitoring issue - https://phabricator.wikimedia.org/T285229 (10fgiunchedi) Agreed the librenms patch is the way to go, I won't have the bandwidth any time soon but happy to assist [10:09:52] godog: FYI I've seen some activity in the prospector bug, so I hope for a new release that fixes it within ~24h, if not I'll add the upper limit to setup.py, sorry for the trouble [10:10:05] it was suppose to be a backward compatible release... [10:16:46] volans: heheh sure no worries, it happens [10:17:14] that was 1.7.0 [10:17:24] we are at 1.7.4 (since like last Thu.) [10:17:46] lol, the release minigun is firing [10:18:24] aside from the CI failures the patch is no longer WIP [10:18:51] at least as a first iteration so the downtime cookbook can be adapted to DTRT [10:19:04] ack, will look at it shortly [10:27:13] volans: if you have a moment at some stage could you maybe advise me about this error I'm getting with sre.hosts.provision ? [10:27:14] https://phabricator.wikimedia.org/P21609 [10:28:43] topranks: checking [10:29:49] interesting, never seen this one [10:30:03] it looks like a "A job operation is already running. Retry the operation after the existing job is completed." [10:30:22] but I'm wondering which one and why [10:30:32] yeah exactly, I've retried a few times, left some time between even, but keep getting the same [10:30:56] ack, let me see what job is that referring to [10:31:10] ok [10:32:11] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10akosiaris) [10:32:35] The iDRAC GUI doesn't show any jobs in progress when I log in [10:32:45] I did [10:32:46] racadm>>jobqueue view [10:32:52] same all are listed as completed [10:33:21] is this today or yesterday? [10:33:22] yeah and they correspond to when I was retrying the cookbook each time [10:33:23] your paste [10:33:48] paste was taken yesterday, I tried again twice today (two most recent on that jobs list) [10:34:38] got it [10:35:02] topranks: I'd try the IT Crowd approach first [10:35:21] haha... I am also a great believer :) [10:35:31] go for: racreset [10:35:38] and then hit retry on the cookbook [10:35:44] but wait ~3 minutes at least [10:35:46] before retrying [10:35:49] it takes a bit [10:36:03] ok yeah... sry 'racreset' how do I do that? [10:36:26] from racadm or you can from the GUI [10:36:31] restart the idrac [10:37:20] yeah, I was gonna just do GUI cos I have it open but the location of it is eluding me [10:37:43] otherwise [10:37:44] ssh an-worker1147.mgmt.eqiad.wmnet [10:37:52] and then racreset [10:38:11] I found it in GUI, there is a 'reboot' and a 'reset to factory default', any advice on which? [10:39:08] I'll do a reboot first and see how I get on [10:46:39] topranks: yeah no need to factory reset at all [10:47:32] and to the previous topic I just got news that 1.7.5 will be out later [10:47:35] (prospector) [10:51:31] volans: Worked perfectly that time :) [10:51:40] eheheh [10:58:16] 10netbox, 10Infrastructure-Foundations: Agree how to document intra-DC patch panels in Netbox - https://phabricator.wikimedia.org/T293221 (10cmooney) > Exporting/displaying the data in a way that facilitates visual inspection (suggested by Rob) I find the 'trace' in Netbox excellent, but happy to discuss if w... [11:02:10] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10akosiaris) >>! In T302423#7733067, @jbond wrote: > @jhathaway thanks for writing this up just a few quick comments. > > In general i think that the foundation has always been [[ https://wikit... [11:11:03] volans: interestingly I'm getting the same thing on another server now. [11:11:34] wut? [11:11:40] I wonder what the trigger is. I guess one thing here is these have been powered up a few weeks before doing this, whereas usually much less time would have elapsed. [11:11:50] nah [11:11:52] that's normal [11:11:53] I'll just reset the drac but yeah it is odd [11:12:03] question is, what actions did you take? [11:12:08] fix netbox or fix idrac? [11:12:10] for the IP [11:12:26] I changed the IP on the iDRAC, via the Web GUI [11:12:32] seemed to work well [11:12:37] is it possible to be pending a restart? [11:12:42] But indeed, I did that for both of these. [11:12:51] so the pending "job" is in the UI as pending changes [11:13:01] hmm maybe. doesn't show that obviously, but perhpas yeah it's needed after IP change [11:13:30] at the bottom usually there is something like save and reboot or similar, I don't know exactly for this change if it's required [11:13:35] but might be related [11:15:14] yeah I just logged on, can't see anything like that. [11:15:34] The IP change itself was straightforward, change it, click apply, after a few second I could connect to it on the new IP [11:15:54] anyway not worth wasting time on this is not something we do normally [11:16:02] ack, sorry, no idea [11:16:10] if that happens on a new or untouched server I'll digg into it [13:33:47] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Suboptimal anycast routing from leaf switches - https://phabricator.wikimedia.org/T302315 (10cmooney) Change has now been rolled out. All seems ok, aggregate route is still being created at POPs where it was previously, and announced exter... [13:37:51] 10netops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: Suboptimal anycast routing from leaf switches - https://phabricator.wikimedia.org/T302315 (10ayounsi) @cmooney thanks! @ssingh let me know when we're good to advertise DoH from drmrs @bblack let me know hwen we're good to advertise nsa.wiki... [15:57:33] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jhathaway) > On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what technical hurdles we 'd meet. Any ideas? Us... [16:22:38] 10Puppet, 10Infrastructure-Foundations: Where to Put Community Modules? - https://phabricator.wikimedia.org/T302423 (10jhathaway) Based on the discussion so far my inclination is that we stick with our current method of vendoring Community modules in `./modules`. Though not a perfect solution, it seems to have... [21:56:17] 10netops, 10Infrastructure-Foundations: SingTel transport circuit ELINEGWR00001716 down - https://phabricator.wikimedia.org/T302841 (10CDanis)