[08:46:29] <_joe_> arnaudb, kamila_: you should now be able to send commands to sirenbot [08:46:36] <_joe_> !incidents [08:46:36] No incidents occurred in the past 24 hours for team SRE [08:47:04] <_joe_> https://wikitech.wikimedia.org/wiki/Vopsbot#Bot_commands for reference [08:49:00] thanks _joe_ ! [08:49:24] <_joe_> your mappings for irc:victorops names were missing [08:52:49] where is the mapping maintained? [08:53:08] private puppet [08:53:13] thanks! [08:53:33] you should have got an email ~20m ago about that change :D [08:54:45] got it! seeing where I was missing as well, thanks :) [08:56:27] anytime :) [08:57:14] <_joe_> Emperor: I'm reading https://wikitech.wikimedia.org/wiki/Debian_packaging_with_dgit_and_CI and it's not really 100% clear what I should do for code I maintain and I want to build debian packages for, instead of a remote debian package [08:58:57] <_joe_> I mean I think I understood that workflow, but I guess it would deserve its own section [09:05:15] <_joe_> I'm not sure if I'm forced to use dgit for my code, which frankly I think doesn't make that much sense given we don't have an upstream to track [09:15:40] _joe_: there is a separate doc for starting from unpackaged code - https://wikitech.wikimedia.org/wiki/Debian_packaging/Tutorial [09:16:42] (perhaps I should flag that more explicitly; though it is more tutorial-like) [09:17:49] <_joe_> Emperor: yeah in general I think the info is buried a bit in other tutorials/docs and I think 90% of the times people will be interested in that specific piece of info [09:18:18] _joe_: YM something like "I already know about Debian packaging, what's different about the approach being used here?" [09:19:30] <_joe_> yes [09:19:41] <_joe_> "how do I make gitlab build packages for me" [09:20:12] <_joe_> "how do I then fetch them to apt1001 and add them to reprepro" [09:20:30] <_joe_> that's what I think most people will need to learn/remember practically [09:20:47] installing splunk mobile app: is there any way to trigger a retry on the SMS validation? the app is stuck on "verify number" and I did not received anything (I guess I'm luck i did not mistyped my phone number :D) [09:21:06] <_joe_> arnaudb: no idea :/ [09:21:07] lucky* [09:21:23] thanks _joe_ [09:21:28] agree! I also have another use case but is much more uncommon (maintaining a package that is also in debian, but we might endup having a newer version locally) [09:21:30] I'll be patient for a while and wipe the app's data to start again in a while then [09:22:49] _joe_ the Very Short Answer: use a non-quilt packaging format; have a branch name matching ^.*-wikimedia.*$ you want to build from that has the correct suite (e.g. bookworm-wikimedia) in debian/changelog ; set builddebs.yml@repos/sre/wmf-debci as CI/CD file; https://wikitech.wikimedia.org/wiki/Debian_packaging#Upload_to_Wikimedia_Repo [09:23:20] <_joe_> Emperor: yeah I figured it out [09:23:54] <_joe_> Emperor: I would love to be able to trigger builds with tags and not just branches, I might send you a patch to builddebs.yml if I find the time :) [09:24:31] <_joe_> my feedback was that it's not /that/ obvious to figure all the above out as you have to locate the info deep-ish into the pages :) [09:26:05] Mmm. I'm no technical writer; maybe I need a TL;DR on the starting-from-unpackage-software page like https://wikitech.wikimedia.org/wiki/Debian_packaging_with_dgit_and_CI#Executive_Summary_/_TL;DR (on the starting-from-a-Debian-package page)? [09:27:23] _joe_: alternative, is that executive summary section the sort of thing you were looking for (had you been starting from an existing Debian package) in terms of level of detail? [09:27:39] <_joe_> Emperor: yes, mostly [09:28:05] <_joe_> I mean your explanation before was enough [09:28:23] <_joe_> your Very Short Answer :) [09:28:48] OK, I'll add something like that to the top of the Packaging/Tutorial page and signpost it better from Debian_packaging [09:29:46] <_joe_> thanks <3 [09:43:35] maybe it's my fault for running on tea rather than coffee this morning but... [09:43:37] modules/profile/manifests/lvs/realserver/ipip.pp:25 wmf-style: Parameter 'facts['interface_primary']' of class 'profile::lvs::realserver::ipip' has no call to lookup [09:43:50] offeding line: Array[String, 1] $interfaces = lookup('profile::lvs::realserver::ipip::interfaces', {'default_value' => [$facts['interface_primary']]}), [09:47:08] full context on https://gerrit.wikimedia.org/r/c/operations/puppet/+/975342/3/modules/profile/manifests/lvs/realserver/ipip.pp [09:47:16] but it looks like the wmf linter is messing with me [09:48:23] it might be the firsst time we use a fact as a default value and it's possible the parser for wmflib doesn't realize that [09:48:35] have you tried to grep for other occurrences? [09:48:54] modules/profile/manifests/mail/smarthost.pp: $exim_primary_hostname = lookup('profile::mail::smarthost::exim_primary_hostname', {'default_value' => $facts['fqdn']}), [09:48:57] (yep) [09:49:24] maybe it's the first time that's used to provide an Array rather than a String [09:50:11] https://www.irccloud.com/pastebin/eWS5VH6Y/ [09:50:17] most likely [10:01:09] <_joe_> vgutierrez: yes the problem is just parsing I guess [10:08:34] vgutierrez: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet-lint/wmf_styleguide-check/+/refs/heads/master/lib/puppet-lint/plugins/check_wmf_styleguide.rb#265 [10:10:12] irb(main):004:0> [''].empty? [10:10:12] => false [10:10:15] weird [10:10:33] why? [10:10:35] it's ruby [10:10:37] :D [10:10:41] How does one do a Puppet certificate reset with v7? The instructions on Wikitech for v5 doen't seem to quite work. [10:11:25] I tried puppet ssl clean and then puppet agent --waitforcert 1 --test but that makes a self-signed cert and errors out [10:11:43] klausman: AFAIK the sre.puppet.renew-cert has been updated to work on p7 too [10:11:52] ah, goody, will try that [10:12:06] not sure in which state you are right now [10:12:15] can't get much worse :) [10:12:34] yes but the cookbook might not work if you're in a weird sstate ;) [10:12:52] yeah, it doesn't [10:13:40] also love that cumin abbreviates the failing command to the point of me not knowing what it's trying to do :-/ [10:14:02] the logs have it in full [10:15:28] /var/log/cumin/cumin.log? The host I am running against is not mentioned in there at all [10:15:40] no, the cookbook logs [10:16:08] https://doc.wikimedia.org/spicerack/master/introduction.html#log-files [10:16:20] linked from https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Logs [10:17:39] Ah. The problem is that it tries to clean the cert on the puppetserver, but I have already done that. [10:27:08] Yeah, even trying to replicate what the cookbook does doesn't seem to work [10:27:30] I never see the CSR on the puppetserver (nor puppetmaster, for that matter) [10:27:58] The weird thing is that the certs worked for a while, but renewal failed and now they have expired. [10:28:17] jbond: I may need your help once more [10:30:38] IIRC John had to clean the CSR also on puppetserver1001 [10:30:48] that is where the puppet master v7 runs IIUC [10:32:12] (clean on puppet master + clean on ml-serve1008 + run puppet on 1008 to generate the CSR + sign the CSR on puppetserver) [10:32:25] klausman: --^ [10:33:00] yes, I did the clean bit on the puppetserver [10:33:05] /etc/puppet/csr_attributes.yaml [10:33:10] Oops, wrong paste, sec [10:33:25] https://phabricator.wikimedia.org/P53621 But this happens when I try to run the agent on 1008 [10:33:54] Note that /etc/puppet/csr_attributes.yaml is empty, but I dunno if it should be [10:34:13] s/empty/nonexistent/ [10:34:27] the "palladium" reference seems to indicate that the host is trying to use puppet 5 [10:34:35] klausman: hi looking now. i doubt its anything to do with /etc/puppet/csr_attributes.yaml [10:34:41] thanks [10:34:56] klausman: did you clean the node itself too? [10:35:04] yes [10:35:17] but looking at /etc/puppet/puppet.conf ... it looks very v5 ish [10:35:25] e.g. it mentions puppetmaster1001.eqiad.wmnet [10:36:00] How/why it reverted, I don't know. It's not one of the machines that got messed with on Friday, so it's extra-puzzling [10:36:10] klausman: in that case just copy paste the puppet.conf from a machine allready converted [10:36:17] ok, will do [10:37:43] Ok, agent now works and is waiting for cert, signed, and the agent continues after a bit. Phew! [10:38:00] Still puzzling how the config file got reverted [10:38:34] klausman: yuo will need top deleted the cert on puppetserver1001, rm -rf /var/lib/puppet/ssl [10:38:44] then rerun puppet to regenerate `puppet agent -tw1` [10:38:50] then sign on puppetserver [10:39:20] klausman: oh never mind it all lokos good now [10:39:40] jbond: let's update the docs to use the cookbook instead of manual steps, that should be safer ;) [10:40:09] volans: i think this is from a machine that failed half way through for unknown reasons [10:40:19] failed what step? [10:40:29] not sure [10:40:41] No, actually this was the first one I did on Firday, and it went through the migration cookbook normally [10:41:27] then it looks like at some point it got re-ran with force_puppet7 falce [10:41:42] or somene manually updated the puppet.conf file [10:42:49] Not saying that isn't what happened, but I have nfc how that could have happened. [10:43:21] no me neither but it looks like it complete the migration and ythen somehow ended up wih the old puppet.conf [10:44:23] Cosmic rays. That must be it. Anyway, I think I got the three(!) hosts fixed now, waiting for the alerts to clear [10:45:04] :D [10:46:07] thansk once more :) [10:46:16] np [11:37:27] What does one do when pcc fails with ENOSPC for pcc-worker1001? [11:38:21] https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet-diffs#Out_of_disk_space [11:38:44] although we do have a timer that should clean old stuff [11:38:45] merci! [11:38:57] and so it shouldn't happen often unless many build for all hosts [11:39:19] Weeeel, I had a PCC run for 13 hosts that I needed to re-run. And some of them have a _lot_ of facts [11:40:14] 13 is nothing, you can build for all (1 host per node statement in site.pp) [11:41:05] I'll just wipe my own run, that should be enough [11:41:32] ...and then it wasn't :D [11:41:42] Ok, I'll just go have lunch, the change is not urgent [11:55:12] klausman: this shold be fixed now [11:55:26] ty! [11:55:40] volans: ftr this is more often then not caused by an exhastion of inodes not space theses days [11:55:59] see https://phabricator.wikimedia.org/T336350 on the child tasks [11:56:06] ah good to know, shoudl we reformat with more inodes? [11:56:36] volans: possibly but i also think the two child tasks should take care of things [11:56:49] * jbond will create the timer now [11:57:01] ok [11:57:08] sly.ngs: has allready dont the pcc patch just eed to do a release [11:58:12] oh and klausman 13 is quite small im currently doing one with ~800 [11:58:39] I most commonly run against 1-2, maybe 5 hosts :) [11:59:08] yes i thin k thats the common pattern and shouldn;t cause much issues [11:59:28] tbh space hasn;t been an issue for some time as we now store everything compressed [12:02:06] running out of i-node's hasn't happened to me since I've helped running a full feed newsserver (aka, well over than a decade ago) [12:02:16] Also, I own i-no.de :-P [12:02:24] lol [14:17:54] small CR for enabling Puppet 7 on a new host if anyone has time to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/975824 [14:18:23] inflatador: are yuo using the cookbook? [14:19:09] jbond Y, I was prompted by the reimage cookbook to add the hieradata [14:19:18] ack cool +1 [14:19:54] excellent, thanks [14:32:13] hello, may someone merge a change to update the tox config on operations/software? It uses a deprecated setting which emits some warning https://gerrit.wikimedia.org/r/c/operations/software/+/955880 :) [14:40:05] hashar: done :) [14:42:48] elukey: thanks!! :) [14:44:55] does anyone know if setting the CPU freq governor is as easy as setting in puppet ( such as https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/manifests/ceph/osds.pp#L14 ) , or will I have to manually set in BIOS as well? [14:51:17] inflatador: The cpu freq governor is a run-time only setting, so it doesn't need anything to be changed in BIOs. [14:52:41] btullis ACK, thanks [14:53:19] yw [15:23:02] inflatador: one addendum I'd have: add a comment near the hiera changes mentioning the migration ticket number. So when v5 is gone, cleaning up is easier. [15:23:19] (re: puppet v7) [17:00:15] Hi Folks, friendly reminder that the dent ritual is about to begin, today’s topic: Incident Tooling proposal feedback, [17:02:35] nothing to report clinic wise [17:11:41] and nothing to report for on call [17:18:58] I have been banging my heads against docker-pkg for over an hour now. Why does `ocker-pkg -c config.yaml build images/ --select '*kserve*' select the right images but then fails because it can't "see" golang1.21? [17:19:17] (╯°□°)╯︵ ┻━┻ [17:31:50] Doing some I/O tests for WDQS, is this still the ecommended method? https://wikitech.wikimedia.org/wiki/Kafka/Kafka-main-raid-performance-testing-2019