[07:08:43] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 2 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10Majavah) [07:22:00] topranks, XioNoX: could either of you rearm the keyholder on cumin1001 using "sudo keyholder arm"? it needs the homer-key-passphrase from pwstore [07:28:33] moritzm: done! [07:28:50] fyi, I'm working from trains today [07:31:22] thanks :-) [07:39:38] Thanks XioNoX, don’t worry too much about any other day to day stuff I’ll try to keep on top of it. [07:39:43] And safe travels! [07:46:07] should we add moritz to the netops group in pwstore? It seems the simplest solution as he's the one usually rebooting those servers and also re-encrypting the whole pwstore for new hires [07:59:40] +1 [08:10:19] as long as noone expects me to actually do things on the routers :-9 [08:16:20] +1 too makes sense [08:28:55] 10netbox, 10Infrastructure-Foundations, 10IPv6, 10User-jbond: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK) - https://phabricator.wikimedia.org/T253173 (10Volans) [08:28:59] 10SRE-tools, 10Discovery, 10Discovery-Search, 10Infrastructure-Foundations, 10IPv6: Some elastic hosts do not have IPv6 DNS records - https://phabricator.wikimedia.org/T271143 (10Volans) 05Resolved→03Open Re-opening because if there is no technical blocker for having the AAAA records on those hosts a... [08:36:19] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Remove legacy functions - https://phabricator.wikimedia.org/T308639 (10jbond) [08:39:34] topranks, XioNoX: is expected that mr1-eqsin ge-0/0/4.401 has 2 distinct IPv6? See https://netbox.wikimedia.org/search/?q=ge-0-0-4-401.mr1-eqsin [08:41:33] Think it’s a typo in the DNS record [08:41:50] Should be 402 not 401, will double check and fix up [08:42:26] yeah that's what seems to be the issue, but I would not touch it :) [08:43:05] context: I'm testing a patch to the zone_validator in the dns repo to validate the whole dataset, both manual and $INCLUDEd from netbox [08:43:12] and some things are coming up [08:47:38] ah nice! It's updated now, am I ok to run the cookbook? [08:50:07] sure [08:57:14] 10Puppet, 10Infrastructure-Foundations, 10SRE, 10User-jbond: Remove legacy functions - https://phabricator.wikimedia.org/T308639 (10jbond) [09:26:06] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 3 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10dcaro) [09:26:14] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 3 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10dcaro) 05Open→03In progress [09:26:22] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 3 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10dcaro) a:03dcaro [09:26:32] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 5 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10dcaro) [09:26:38] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 5 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10dcaro) [09:44:03] I've added you and ar.zhel to a couple of DNS patches I've sent for the same reason ;) [10:04:37] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Remove legacy functions - https://phabricator.wikimedia.org/T308639 (10Marostegui) [12:17:25] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 5 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10jbond) This issue is also affecting production reimages see P27926 [13:19:44] Are we adding the SPDX licens header to EVERYTHING, or just .pp files and scripts? [13:28:32] slyngs: everything or at least everything thats code. see the following for what the rake task dose https://github.com/wikimedia/puppet/blob/production/rake_modules/tasks/spdx.rb#L40-L52 [13:29:39] its possible the CI check is more picky still triggeres after using `bundle exec rake spdx:convert:new_files` if so thats a bug [13:31:40] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Remove legacy functions - https://phabricator.wikimedia.org/T308639 (10jbond) [13:32:09] jbond: Perfect, there are a few files I'm concerned about, but I can just use the script to see what's expected/required [13:32:49] jbond: what about code files that are imported into puppet from 3rd party places? [13:33:01] they might have their own licence that differs from ours [13:33:38] what should we do with those? [13:35:47] volans: right now only new files trigger CI. Currently there is an nundocumented conventsion that if we add 3rd party python files to the repo we name them .original.py and they are skiped in CI (although not this new CI check) [13:36:29] however i think it probably makes more senses to create a dumy repo under vendor_modules (which is skiped by CI) and put 3rd part scripts there [13:36:32] ack, I guess it might not be only python [13:36:39] yeah [13:36:47] e.g. vendor_modules/third_party_scripts/files/foo.py [13:37:22] and then we could specify that puppet repo license is X except that path [13:37:29] where each file has its own license [13:37:38] slyngs: sounds good, also this is a work in porgress so i expect to need to make some tweeks in the comming days/weeks [13:37:55] We also need to check who contributed the file, if a community member wrote it, they might not be okay with me just slapping an Apache 2.0 license on their code [13:39:14] It's file like this one I wonder about: https://github.com/wikimedia/puppet/blob/production/modules/aptrepo/files/updates [13:39:16] i think that the benefit of the SPDX tags is that we can licence different parts of the repo with different licences for instance we could also include a 3rd paty file with a spdx header including whatever the original licence was [13:39:29] and at a later date automate a sbop [13:40:16] hwoever IANAL and no licence expert so i would suggest commenting on the task if there are concernes etc [13:40:33] Yes, so if something is already clearly licensed MIT/GPL whatever, then just put that in the SPDX comment [13:41:24] slyngs: i think going forward any new contribusions we should require that the user licnece the work as apache or something apache compatiable if they are not happy with that then we cant accept the contribusion but thats just my opinion [13:41:33] in relation to the practical way forward see https://phabricator.wikimedia.org/T308013 [13:41:55] basicly we will only trigger CI erros for new files added to the repo to avoid any issue with previous contibusions [13:42:24] for stuff allready in the repo the advice is to only add the header if all contributors are wikimedia or in the list on that phab task [13:42:36] anything elses, for now, at least we should just leave as they are [13:42:56] once we have caught the low hanging fruit we can circle back and see how big a problem everything elses is [13:43:10] also check out T67270 which has some historical context [13:43:11] T67270: Default license for operations/puppet - https://phabricator.wikimedia.org/T67270 [13:44:44] slyngs: in relation to modules/aptrepo/files/updates as far as i can see all contributers are either wikimedia or on the list in T308013 was there some other concern [13:44:45] T308013: Assign SPDX headers to puppet.git - https://phabricator.wikimedia.org/T308013 [13:45:37] jbond: Oh, yeah that wasn't the concern. I was thinking the format might not like comments [13:46:56] slyngs: ahh yes thats different :) and i think i need to patch the rake job to handle files with no extension [13:47:30] Right now it will just end in an exception :-) [13:47:30] i think for now ill probably skip files with no extension unless they are something very static like Rakefile [13:53:51] slyngs: bandage for now https://gerrit.wikimedia.org/r/c/operations/puppet/+/793055 [13:56:32] Perfect, then we can just point out missing license in the code review of those few files that fall through [13:57:49] topranks: FYI https://gerrit.wikimedia.org/r/c/operations/dns/+/793020 [13:58:17] slyngs: yes exactly [14:00:57] volans: thanks. yeah I note I added the entries in Netbox for those, so I guess we should add the zones. [14:01:23] The aren't routable / connected to our network so there is possibly an argument they should not have DNS entries [14:01:31] ah [14:01:32] Or those entries should be under a different parent zone [14:01:39] you tell me [14:01:53] they are currently reported as warning by the zone validator so I thought was just a missed INCLUDE [14:01:54] I think for now what you've added is best, it doesn't really cause any issue and fits our pattern [14:02:08] but you tell me [14:02:12] yep I think I didn't add them due to this question and then forgot - so apologies. [14:02:18] ptr should be there if possible [14:02:22] no prob [14:02:36] The zone validator is proving itself useful already :) [14:09:14] 10netops, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10SRE: Agree strategy for Kubernetes BGP peering to top-of-rack switches - https://phabricator.wikimedia.org/T306649 (10elukey) [14:38:08] jbond: I think the SPDX check is broken [14:38:44] slyngs: you got an example [14:38:53] Yes, https://integration.wikimedia.org/ci/job/operations-puppet-tests-buster-docker/44663/console [14:38:59] ack looking [14:39:49] I'm not great at Ruby, so I might be wrong [14:40:08] slyngs: you its a bug ends_with vs end_with [14:40:55] Aaah, just to back you up, I feel like ends_with sound better :-) [14:42:22] thanks, i completly agree and python uses endswith so this is a common mistake of mine :/ [14:43:00] slyngs: can you try rebasinfg your change on production and let me know if it still fails [14:51:32] It works again [14:53:50] great thanks [15:12:36] 10Puppet, 10Infrastructure-Foundations, 10Patch-For-Review, 10User-jbond: Remove legacy functions - https://phabricator.wikimedia.org/T308639 (10jbond) [15:28:43] Hi folks; when we talked the other day about problems with the installer & puppet & disks, you said it'd be handy to have some outputs of more and less successful installer runs. T308644 is one where the installer works fine, but it takes a large number of reboots thereafter to get devices in the right order. T308677 is one where the installer trashes a filesystem... [15:28:43] T308644: unstable device mapping of SSDs causing swift/puppet problems - example reimage - https://phabricator.wikimedia.org/T308644 [15:28:44] T308677: unstable device mapping of SSDs causing installer problems - example reimage with destruction of swift filesystem - https://phabricator.wikimedia.org/T308677 [15:29:15] Emperor: thanks a lot those are surely useful data [15:29:24] to have a look at [15:29:40] I'm sure mor.itz or jo.hn will have ideas! :-) [15:29:42] * volans hides [15:29:45] If you want more examples, or different bits of data, do shout [15:30:22] I need to put something together about the hardware RAID lying about rotating or not media too. [17:13:55] 10Puppet, 10Cloud-VPS, 10Infrastructure-Foundations, 10Observability-Alerting, and 5 others: Puppet fails on new cloud-vps VMs (with new base images) due to wanting /usr/local/lib/nagios/plugins - https://phabricator.wikimedia.org/T308601 (10Andrew) 05In progress→03Resolved