[14:48:06] <Emperor>	 If I'm writing to write my first cookbook and want to try running it, is there an easier way of doing so than cloning the entire repo into ~ on a cumin node and then making a cookbook.yaml setting that as cookbooks_base_dirs? I was sort-of aiming for "Use installed cookbooks et al except for this one particular file"...
[14:50:41] * Emperor trying to resist "a shell script would be easier..."
[14:58:07] <XioNoX>	 Emperor: volans will have the final answer, but I'm not aware of such easier path. What I usually do is https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Creating_your_local_environment so run it from my laptop instead of from cumin hosts, but not all spicerack features can run from outside prod
[14:59:00] <Emperor>	 I can't meaningfully test outside prod, I don't think.
[14:59:43] <volans>	 Emperor: as we don't have a staging environment where all the parts of the infra used by spicerack are replicated yes that's correct. But also it's ok to get it reviewed, merge  it and then fine-tune it
[15:00:16] <volans>	 when running unmerged patches we should just run the dry-run mode
[15:00:20] <volans>	 not the real one
[15:00:36] <Emperor>	 volans: I'm still at the "how do I even...?" stage, so trying to produce a complete thing without trying it is not going to happen, alas
[15:01:07] <volans>	 1) feel free to ask for directions, if you have a task I can advice what to use/how
[15:02:18] <volans>	 2) you can also try in isolation the various bits and pieces from a python REPL, you can use my wrapper if you want (I'll probably puppetize it at some point as it's nowadays used by too many people) [no I'm not happy that that's used, but hey, we don't have much better right now]
[15:02:31] <volans>	 sudo /home/volans/spicerack_shell.py
[15:02:55] <volans>	 you get pre-instantiated spicerack_dry_run and spicerack that are 2 instances of the spicerack.Spicerack class
[15:03:04] <volans>	 and logging is set to DEBUG
[15:04:44] <Emperor>	 volans: 2) YM I can run your shell and then paste in my classes [the suggested extension of CookbookBase and CookbookRunnerBase ] and it'll try and run them?
[15:05:09] <volans>	 no, that's to test the single bits of spicerack in isolation, as you said I don't even know what I need
[15:05:23] <volans>	 you get everything from https://doc.wikimedia.org/spicerack/master/api/index.html
[15:05:35] <Emperor>	 OIC
[15:05:38] <volans>	 the accessors to all the modules and functionalities
[15:05:57] <volans>	 that's a spicerack shell, no cookbooks involved
[15:12:31] <hnowlan>	 Just a heads-up - I'm going to increase logging in changeprop. We have no visibility into how often it sees failures, so I am not entirely sure how much log volume this change will create. It most likely won't be much, but I'll be ready with a quick rollback if needs be https://gerrit.wikimedia.org/r/c/mediawiki/services/change-propagation/+/889123
[15:12:44] <volans>	 Emperor: but please feel free to send a WIP early draft to gerrit and I'd be happy to chime in
[15:12:52] <cdanis>	 hnowlan: <3
[15:13:05] <volans>	 hnowlan: log-it-all!
[15:15:20] <hnowlan>	 given how ~*mysterious*~ changeprop is, there could be 1000s of failures a second but boy let's hope not
[15:15:41] <cdanis>	 ✨
[15:16:03] <claime>	 changepropemons, gotta log'em all
[15:18:03] <Emperor>	 volans: as to "what", v. sketchy outline is https://phabricator.wikimedia.org/T327253#8614625 - run swift-get-nodes to locate sqlite dbs, copy them to cumin node, perform some sqlite operations on them to work out what ghosts to remove, then run swift stat [check for 404] and swift delete [check for 404] on a suitable node (currently the swiftrepl nodes have the relevant credentials available).
[15:20:13] <volans>	 Emperor: do we need to copy the sqlite or we can query them in place? 
[15:21:46] <Emperor>	 need them in one place, because we have to JOIN the tables together
[15:22:28] <volans>	 ah the joins are between files
[15:22:31] <Emperor>	 volans: cf https://phabricator.wikimedia.org/T327253#8597070 the code fragment starting ATTACH DATABASE
[15:22:34] <Emperor>	 volans: yes
[15:23:30] <Emperor>	 <-- starting nice and simple /o\
[15:23:48] <volans>	 the "proper" way to implement this would be to add a swift module to spicerack to abstract most of the complexity. But it's totally ok to start with a more pragmatical approach trying to laid out the parts needed and then abstract them out later
[15:24:49] <Emperor>	 Mmm, I think I don't want to shave the "swift module for spicerack" yak today :)
[15:25:35] <Emperor>	 [and, honestly, I really hope we don't often want to go futzing around inside the bowels of container databases]
[15:26:01] <volans>	 that's the other part... is that a one-off, once a year or more frequent thing?
[15:26:45] <volans>	 coding that properly adds a lot of complexity compared to the boring and manual way to handle all the errors, corner cases, etc...
[15:27:00] <Emperor>	 interesting question. We have 65 sad containers we know about currently (hence my wanting to script it, but obv. I could do that with a shell script and elide the spicerack framework); I suspect when we switch to codfw being primary we'll find some more
[15:27:39] <Emperor>	 65 sad containers> and ~28k sad objects
[15:30:38] <Emperor>	 Our best understanding of the underlying cause is a pretty rare event (essentially a swift node offline for >7 days), so I'm at least hoping it's not going to be a regular thing.
[15:31:57] <volans>	 any node?
[15:33:45] <volans>	 outside the box question: do we backup said databases? (to understand if by any chance they are already all in one place at some point)
[15:36:05] <Emperor>	 volans: any node> any node with a container db on (which is most of the storage nodes)
[15:36:14] <Emperor>	 volans: backup> no
[15:36:27] <volans>	 got it
[17:14:27] <sukhe>	 herron: cwhite: we will be reimaging a dnsrec host; some alerts expected. if something breaks, that's on us (me :)
[17:14:31] <sukhe>	 just as an fyi
[17:15:34] <cwhite>	 Thanks for the heads up :)
[17:15:58] <elukey>	 sukhe: o/ ignorant question - how do we depool a dnsrec nowadays? 
[17:16:14] <elukey>	 (I recall some issues in the past, so I am curious :)
[17:16:54] <sukhe>	 elukey: not ignorant at all please :) as such, nothing is required as the hosts are anycasted
[17:17:36] <sukhe>	 decomm and authdns are a separate case though
[17:17:38] <volans>	 sukhe: what are the alerts not downtimed automatically?
[17:17:47] <sukhe>	 volans: BGP alerts
[17:17:53] <elukey>	 perfect so just a tiny amount of inflight traffic dropped and that's it
[17:17:57] <elukey>	 thanks!
[17:18:23] <volans>	 ah yeah, those are a mess to dowtime without loosing visibility :(
[17:19:19] <sukhe>	 yeah, basically an "issue" with all anycasted hosts, at least (dnsrec, wikidough, durum, centrallogging)