[14:48:06] If I'm writing to write my first cookbook and want to try running it, is there an easier way of doing so than cloning the entire repo into ~ on a cumin node and then making a cookbook.yaml setting that as cookbooks_base_dirs? I was sort-of aiming for "Use installed cookbooks et al except for this one particular file"... [14:50:41] * Emperor trying to resist "a shell script would be easier..." [14:58:07] Emperor: volans will have the final answer, but I'm not aware of such easier path. What I usually do is https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Creating_your_local_environment so run it from my laptop instead of from cumin hosts, but not all spicerack features can run from outside prod [14:59:00] I can't meaningfully test outside prod, I don't think. [14:59:43] Emperor: as we don't have a staging environment where all the parts of the infra used by spicerack are replicated yes that's correct. But also it's ok to get it reviewed, merge it and then fine-tune it [15:00:16] when running unmerged patches we should just run the dry-run mode [15:00:20] not the real one [15:00:36] volans: I'm still at the "how do I even...?" stage, so trying to produce a complete thing without trying it is not going to happen, alas [15:01:07] 1) feel free to ask for directions, if you have a task I can advice what to use/how [15:02:18] 2) you can also try in isolation the various bits and pieces from a python REPL, you can use my wrapper if you want (I'll probably puppetize it at some point as it's nowadays used by too many people) [no I'm not happy that that's used, but hey, we don't have much better right now] [15:02:31] sudo /home/volans/spicerack_shell.py [15:02:55] you get pre-instantiated spicerack_dry_run and spicerack that are 2 instances of the spicerack.Spicerack class [15:03:04] and logging is set to DEBUG [15:04:44] volans: 2) YM I can run your shell and then paste in my classes [the suggested extension of CookbookBase and CookbookRunnerBase ] and it'll try and run them? [15:05:09] no, that's to test the single bits of spicerack in isolation, as you said I don't even know what I need [15:05:23] you get everything from https://doc.wikimedia.org/spicerack/master/api/index.html [15:05:35] OIC [15:05:38] the accessors to all the modules and functionalities [15:05:57] that's a spicerack shell, no cookbooks involved [15:12:31] Just a heads-up - I'm going to increase logging in changeprop. We have no visibility into how often it sees failures, so I am not entirely sure how much log volume this change will create. It most likely won't be much, but I'll be ready with a quick rollback if needs be https://gerrit.wikimedia.org/r/c/mediawiki/services/change-propagation/+/889123 [15:12:44] Emperor: but please feel free to send a WIP early draft to gerrit and I'd be happy to chime in [15:12:52] hnowlan: <3 [15:13:05] hnowlan: log-it-all! [15:15:20] given how ~*mysterious*~ changeprop is, there could be 1000s of failures a second but boy let's hope not [15:15:41] ✨ [15:16:03] changepropemons, gotta log'em all [15:18:03] volans: as to "what", v. sketchy outline is https://phabricator.wikimedia.org/T327253#8614625 - run swift-get-nodes to locate sqlite dbs, copy them to cumin node, perform some sqlite operations on them to work out what ghosts to remove, then run swift stat [check for 404] and swift delete [check for 404] on a suitable node (currently the swiftrepl nodes have the relevant credentials available). [15:20:13] Emperor: do we need to copy the sqlite or we can query them in place? [15:21:46] need them in one place, because we have to JOIN the tables together [15:22:28] ah the joins are between files [15:22:31] volans: cf https://phabricator.wikimedia.org/T327253#8597070 the code fragment starting ATTACH DATABASE [15:22:34] volans: yes [15:23:30] <-- starting nice and simple /o\ [15:23:48] the "proper" way to implement this would be to add a swift module to spicerack to abstract most of the complexity. But it's totally ok to start with a more pragmatical approach trying to laid out the parts needed and then abstract them out later [15:24:49] Mmm, I think I don't want to shave the "swift module for spicerack" yak today :) [15:25:35] [and, honestly, I really hope we don't often want to go futzing around inside the bowels of container databases] [15:26:01] that's the other part... is that a one-off, once a year or more frequent thing? [15:26:45] coding that properly adds a lot of complexity compared to the boring and manual way to handle all the errors, corner cases, etc... [15:27:00] interesting question. We have 65 sad containers we know about currently (hence my wanting to script it, but obv. I could do that with a shell script and elide the spicerack framework); I suspect when we switch to codfw being primary we'll find some more [15:27:39] 65 sad containers> and ~28k sad objects [15:30:38] Our best understanding of the underlying cause is a pretty rare event (essentially a swift node offline for >7 days), so I'm at least hoping it's not going to be a regular thing. [15:31:57] any node? [15:33:45] outside the box question: do we backup said databases? (to understand if by any chance they are already all in one place at some point) [15:36:05] volans: any node> any node with a container db on (which is most of the storage nodes) [15:36:14] volans: backup> no [15:36:27] got it [17:14:27] herron: cwhite: we will be reimaging a dnsrec host; some alerts expected. if something breaks, that's on us (me :) [17:14:31] just as an fyi [17:15:34] Thanks for the heads up :) [17:15:58] sukhe: o/ ignorant question - how do we depool a dnsrec nowadays? [17:16:14] (I recall some issues in the past, so I am curious :) [17:16:54] elukey: not ignorant at all please :) as such, nothing is required as the hosts are anycasted [17:17:36] decomm and authdns are a separate case though [17:17:38] sukhe: what are the alerts not downtimed automatically? [17:17:47] volans: BGP alerts [17:17:53] perfect so just a tiny amount of inflight traffic dropped and that's it [17:17:57] thanks! [17:18:23] ah yeah, those are a mess to dowtime without loosing visibility :( [17:19:19] yeah, basically an "issue" with all anycasted hosts, at least (dnsrec, wikidough, durum, centrallogging)