[07:31:17] good morning, the puppet compiler might have some funky issue this morning [07:31:39] change https://gerrit.wikimedia.org/r/c/operations/puppet/+/790350/ has `Hosts: P:zuul::merger` [07:32:11] but the compiler screams about `WARNING: no nodes found for class: Profile::Zuul::Merger` when there is are definitely two nodes having that profile [07:32:15] the build log is https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler-test/1354/console [07:39:44] it works if I replace the header with the server name (`Hosts: contint2001.wikimedia.org,contint1001.wikimedia.org`) [07:39:57] so there is a workaround at least [09:02:23] looks like jbond fixed ^ https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/35978/console :) [09:02:48] hashar: yes i fixed it manually, going to try a more permenet shortly [09:07:25] great, thank you! [09:26:21] jbond, XioNoX: would be ok to merge and migrate VMs in netbox to have group support? I would disable puppet on netbox and stop the timers to make sure config changes and extras changes are all in before restarting them [09:26:41] XioNoX: do you want to have a final look? I added all the links of the test on netbox-next in the CR [09:26:48] volans: fine with me [09:28:28] volans: I'll have a look right now [09:28:33] ack, thx [09:31:57] volans: yep, all good for me! [09:32:07] great, thanks! [09:35:06] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Import row information into Netbox for Ganeti instances - https://phabricator.wikimedia.org/T262446 (10ops-monitoring-bot) Icinga downtime and Alertmanager silence (ID=aee08ee1-85f9-44de-8a66-77195873b06e) set by volans@cumin1001 for 4:00:00 on 1 ho... [09:56:43] XioNoX, jbond: so... I need to add wmflib to teh venv, on which branch should I add it to the netbox-deploy repo? [10:00:27] volans: 3-2 I think [10:00:30] volans: the 3.2 branch [10:00:42] 3-2-2 [10:02:12] ok [10:26:23] the diff of deps is huge... [10:26:30] are you sure it was updated the branch 3-2-2? [10:26:48] things like certifi==2021.10.8 => certifi==2022.6.15 [10:26:54] that seems to hint it was not updated recently [10:31:44] btw we should reset the deploy repo for ne is becoming huge [10:31:48] for no reason [10:36:42] jbond, XioNoX: also, on deploy1002 it's not clear to me the current setup, deploy is at origin/3-2-2, src/ is HEAD detached at v2.10.4-wmf6 [10:36:56] can I move it back to the 3.2 version? [10:37:09] checking [10:38:10] I think we should git checkout v3.2.2-wmf [10:38:50] my git review just failed with ! [remote rejected] HEAD -> refs/for/master%topic=3-2-2 (implicit merges detected) [10:39:04] I just checkout the branch added a commit and reviewed :/ [10:39:06] checking why [10:40:46] ah yes is trying to send it to master and not the branch, my bad [10:42:06] volans: so the packages got updated on May 10th [10:43:00] ah, that explains [10:43:02] and certify released a more recent version on May 18th :) previous was 2021.18 [10:43:19] so we deployed a month old deps... got it [10:43:30] :) [10:43:50] volans: should we update the deps every months? [10:44:58] I didn't meant that, just that we deployed last week without updating them right before, it was a good time for an update ;) [10:46:59] volans: one could argue that's 1 month of testing on -next and updating them right before the upgrade could introduce bugs :) [10:47:08] sure [10:48:03] certifi did sacare me because it seemed from last year ;) [10:48:04] overall I get your point, I'm wondering if we have some alerting in case there are security issues in the deps we use in prod [10:48:06] *scare [10:48:20] we do get emails from github [10:49:25] fyi we now override the ca buyndle shipped with certifi [10:50:17] I know I know, was just the version that got me worried for the venv :) [10:50:24] ack [10:51:11] https://gerrit.wikimedia.org/r/c/operations/software/netbox-deploy/+/807507/ [10:53:05] volans: +1, out of curiosity, what will it be used for? [10:53:43] is in the ganeti refactor to get the http session with timeout and retry logic from wmflib [10:53:52] I forgot I added it... :/ [10:54:38] ok! no big deal [10:55:41] ok to move src/ back to v3.2.2-wmf ? [10:55:45] on deploy1002 [10:55:46] and deploy [10:59:50] XioNoX: do you know why on deploy1002 it gives me [10:59:51] modified: src (untracked content) [11:00:03] no idea [11:00:14] but yeah it should be on 3.2.2-wmf [11:00:36] yeah it is now but thedeploy repo doesn't like it [11:00:41] chekcing submodules [11:02:52] 10SRE-tools, 10Icinga, 10Infrastructure-Foundations, 10SRE, 10observability: Icinga paged for a host that should have been downtimed - https://phabricator.wikimedia.org/T309447 (10MoritzMuehlenhoff) @Volans: Can this task be closed with https://gerrit.wikimedia.org/r/803317 merged? [11:03:04] I had to re-init the submodule, not sure why but is back to normal [11:03:47] 10SRE-tools, 10Icinga, 10Infrastructure-Foundations, 10SRE, 10observability: Icinga paged for a host that should have been downtimed - https://phabricator.wikimedia.org/T309447 (10Volans) I was planning to close it when the new spicerack will be released with the patch... is not yet deployed to prod. But... [11:06:25] ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: 'top_level.txt' [11:06:29] did you see this error? [11:07:09] why is it trying to deploy to netbox-dev too? [11:08:28] using --limit [11:08:53] volans: no i didn;t see that error. as to ignoreing limit now idea [11:09:10] not ignoring, I'm using now --limit [11:09:22] because the default is to deploy everywhere, [11:09:28] not sure if -dev should be part of normal deployments [11:09:47] i think that netbox-next is part of the netbox deploy dsh group so limit is required [11:09:47] XioNoX: it might have broken your test of the metric plugin, sorry if that happened [11:09:55] yes i think thats always been the case [11:10:03] docs should be updated then [11:10:23] we explored having seperated scap targets for bext and proid [11:10:24] what's --limit ? [11:10:32] ah "-l" [11:10:35] nevermind [11:10:49] however its not super simple (need to update all the directories in puppet) [11:11:01] volans: no pb for the metric plugin I don't need it now [11:11:29] ok [11:25:43] ok migration should be completed [11:25:44] https://netbox.wikimedia.org/virtualization/clusters/ [11:25:52] I will delete the old clusters now empy later, just in case [12:36:35] volans: nice job! [12:37:36] jbond: I guess we could add the row info into the hiera export at this point :) [12:39:18] volans: that loops back to https://phabricator.wikimedia.org/T262446#7998749 I think [12:39:43] yep [12:39:48] if we want to go that way [12:39:53] one way or the other we have the info [12:39:54] of course [13:00:48] volans: ack sounds good [14:23:12] 10netbox, 10Infrastructure-Foundations: Netbox: replace getstats.GetDeviceStats with ntc-netbox-plugin-metrics-ext - https://phabricator.wikimedia.org/T311052 (10ayounsi) [14:23:14] 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review: Upgrade Netbox to 3.2 - https://phabricator.wikimedia.org/T296452 (10ayounsi) [15:50:34] 10Puppet, 10netbox, 10Infrastructure-Foundations, 10PostgreSQL: Puppet change at each run on postgres replicas - https://phabricator.wikimedia.org/T311156 (10ayounsi) p:05Triage→03Medium [17:08:13] 10SRE-tools, 10Spicerack: spicerack.redfish: Add handle for when job returns - "Job for this device is already present" - https://phabricator.wikimedia.org/T311162 (10jbond) p:05Triage→03Medium [17:13:36] 10Puppet, 10netbox, 10Infrastructure-Foundations, 10Patch-For-Review, 10PostgreSQL: Puppet change at each run on postgres replicas - https://phabricator.wikimedia.org/T311156 (10jbond) I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/807553 should fix this issue > How to know if it's safe to... [22:23:58] 10netbox, 10Infrastructure-Foundations, 10SRE: Grant cn=nda some sort of read only access to Netbox - https://phabricator.wikimedia.org/T302870 (10Dzahn) Before we talk about technical implementation and putting this on ice. I am wondering..has anyone even had specific concerns or data fields in mind that sh...