[10:16:59] <XioNoX>	 godog: I upgraded routinator in codfw, but one of the eqiad graphs looks weird since then: https://grafana-rw.wikimedia.org/d/UwUa77GZk/rpki?orgId=1&from=now-1h&to=now  (see the top right one)
[10:19:18] <godog>	 XioNoX: indeed, looks like both recover after a little bit?
[10:19:35] <godog>	 I'm assuming while data is syncing and/or not already synced ?
[10:20:43] <XioNoX>	 for the codfw instance yeah, but it shouldn't impact the eqiad one
[10:24:33] <godog>	 XioNoX: agreed, the prometheus targets seem correct on the prometheus side
[10:24:43] <XioNoX>	 thanks!
[10:24:59] <XioNoX>	 I'll let it sit for a bit, routinator looks healthy
[10:25:32] <godog>	 sure np
[10:27:05] <godog>	 XioNoX: looks like graph 'weirdness' isn't a new problem, if you zoom out say 24h
[10:27:49] <XioNoX>	 indeed!
[10:34:06] <topranks>	 Setting "null value" to connected in the visualisation makes it look normal fwiw.
[10:57:36] <godog>	 topranks: good point, yeah probably the graph could be bars or sth similar
[11:05:50] <XioNoX>	 good point indeed, null values as 0 works too, but that will probably just hide the real issue
[11:06:16] <XioNoX>	 I'll try to catch it from /metrics when it happen to see what values are exposed
[11:12:16] <_joe_>	 jbond: say I need to download the .pem of our puppet CA, do you know of a place where we expose it?
[11:13:00] <jbond>	 _joe_: its in the puppet repo, not sure if we serve anywhere elses
[11:13:02] <_joe_>	 this is for building a docker image, in CI. I need to add the puppet ca to the cert bundle at build time, and I think it's preferrable not to commit it to the git repo
[11:13:35] <jbond>	 i can add it to pki.discovery.wmnet/bundles 
[11:13:47] <_joe_>	 yeah I was thinking of that
[11:13:47] <volans>	 I'm curious why the puppet one, which service is this?
[11:13:57] <_joe_>	 any service right now?
[11:14:06] <_joe_>	 they're all still using the puppet CA
[11:14:12] <volans>	 not *all* :)
[11:14:17] <volans>	 but almost
[11:14:51] <jbond>	 _joe_: dose it need to be externally avalible as well, if so config-master may be a better option
[11:15:02] <volans>	 the ones with cergen are also signed by puppet CA?
[11:15:11] <_joe_>	 volans: yes
[11:15:16] <volans>	 :/
[11:15:30] <_joe_>	 volans: why do you think it's 4 years I ask for a PKI?
[11:15:35] <volans>	 eheheh
[11:15:39] <volans>	 getting there..
[11:15:59] <_joe_>	 jbond: so, I'm a bit conflicted on that respect, but let's say for now internal-only is ok
[11:16:20] <jbond>	 jobo:  ack ill add it to pki bundles for now and we can change later if needed
[11:16:25] <_joe_>	 to clarify, it's not aestetically pleasing to add this bundle to an image that could be use also externally
[11:16:44] <_joe_>	 but we're not exposing any private data
[11:16:58] <_joe_>	 so 🤷
[11:17:04] <jbond>	 ahh yes dosnt make senses for this to be in an image which is used externally but thought you may want to support building images from your laptop
[11:17:46] <_joe_>	 jbond: no my point is, I want to add the pki bundle only to the final image that we only use in production, and is under /restricted/ on the registry
[11:17:53] <_joe_>	 but that gets built by CI
[11:18:01] <jbond>	 ahh ok got it
[11:18:06] <_joe_>	 I *think* it's built in production, but I'm not 100% sure
[11:18:18] <_joe_>	 anyways, if you add it to the bundle, that's great
[11:18:25] <jbond>	 yes will do
[11:19:08] <_joe_>	 I mean, ideally we'd just create a debian package with all those bundles 
[11:19:22] <_joe_>	 and the debian package could be installed everywhere and we could check versions too
[11:19:37] <_joe_>	 🤔
[11:19:41] <jbond>	 _joe_: seems reasnable i can create a task for that
[11:19:54] <_joe_>	 jbond: yeah I can work on the package, btw
[11:20:27] <_joe_>	 anyways, it's lunch time for me :)
[11:21:15] <jbond>	 ack should be done when you back :)
[11:33:34] <jbond>	 _joe_: curl http://pki.discovery.wmnet/bundles/Puppet_Internal_CA.pem.pem   should work now
[11:34:34] <_joe_>	 jbond: thanks that should be enough for now
[11:34:50] <jbond>	 cool
[14:07:47] <_joe_>	 jbond: so, where can I find the urls of all those bundles? I'm preparing a debian package
[14:11:04] <jbond>	 _joe_: i think yuo should just need to have http://pki.discovery.wmnet/bundles/Wikimedia_Internal_Root_CA.pem and the http://pki.discovery.wmnet/bundles/Puppet_Internal_CA.pem.pem.  the intermeidates should be sent by the server so shouldn;t need to be installed in the ca-certificates.  i think vgutierrez mentioned installing the intermediates had caused issues before
[14:11:25] <_joe_>	 ack
[14:11:32] <vgutierrez>	 yup
[14:12:03] <vgutierrez>	 https://phabricator.wikimedia.org/T271063
[14:12:11] <vgutierrez>	 that's a fine example
[14:12:12] <jbond>	 thx
[15:18:10] <Krinkle>	 kormat: pc1009 will be the next one to get hammered, yes?
[15:18:13] <Krinkle>	 is it ready for purging?
[15:18:35] <kormat>	 Krinkle: yep re: next one. let me put in a downtime, then you can purge
[15:18:43] <Krinkle>	 ack :)
[15:19:46] <kormat>	 Krinkle: alright, do your worst
[15:23:55] <Krinkle>	 kormat: running, tee'd as before.
[15:24:04] <kormat>	 👍
[18:39:31] <mutante>	 bstorm: I read your wiki diff and it looks great to me. thank you !:)
[18:39:48] <bstorm>	 Cool :)
[19:10:58] <nixfloyd>	 hello, got redirected here from #wikimedia-tech, will try my luck on this channel as well: probably a long shot, but I'm looking for an engineer from wikimedia that was at PromCon 2019 in Munich (at the Google HQ), we briefly chatted about Prometheus and Elasticsearch, but I completely forgot the name... so if you've been at that conference - pm me please, I'll introduce myself
[19:12:36] <mutante>	 godog: ^ ?
[19:15:40] <mutante>	 nixfloyd: hi! Nowadays SRE has been split into multiple subteams. One of them is called "Observability" and since they are using Prometheus and ELK stack, I would think it is likely https://www.mediawiki.org/wiki/Wikimedia_Site_Reliability_Engineering#Observability  
[19:16:25] <mutante>	 so I can continue the channel forwarding game and recommend #wikimedia-observability
[19:17:20] <mutante>	 I pinged godog since he seems most likely to have been in Munich 
[19:17:58] <apergos>	 might be off for the evening at this point
[19:19:26] <mutante>	 nixfloyd: try the talk page of https://www.mediawiki.org/wiki/User:Filippo_Giunchedi  maybe. I think it's likely him
[19:23:27] <apergos>	 yeah he was there. 
[19:23:45] <apergos>	 but he's on eu time is all
[19:29:50] <ryankemper>	 so I'm running the wdqs `data-transfer` cookbook to transfer a file from `wdqs1009`->`wdqs1010`. Normally, the cookbook (with puppet disabled) will do a `systemctl start wdqs-updater.service` at the end of the cookbook run
[19:29:52] <ryankemper>	 In this case I *don't* want the updater to start because I need this file to remain unmutated, and there's no convenient flag in our cookbook to prevent that
[19:30:05] <ryankemper>	 My current approach is to mutate the systemd unit file and change the `ExecStart` to a no-op, which - given puppet will be disabled so this unit file shouldn't get overwritten - should prevent the actual updater process from starting. Does that sound right?
[19:30:16] <ryankemper>	 I can't think of any issues with that approach but just looking for a quick sanity check :)
[19:31:31] <bblack>	 I don't know about all the rest, but don't forget to do "systemctl daemon-reload" after manually editing a systemd unit file
[19:31:33] <bblack>	 or it won't pick up the change
[19:31:33] <volans>	 wanna add the flag to the cookbook? if it's a use case that might happen, shouldn't take more than few minutes
[19:32:06] <ryankemper>	 volans: I considered it, but this is a very one-off case so it's hard to envision it cropping up again
[19:32:42] <ryankemper>	 the tl;dr is our test host `wdqs1009` is a snowflake running the latest streaming updater, so its journal file is different from the others, and I'm transferring to our other test host `wdqs1010` while I re-image `wdqs1009`
[19:32:49] <volans>	 ok, then I might have another suggestion, very against all rules (mines inlcuded ;) )
[19:33:04] <ryankemper>	 let's hear it :P
[19:33:22] <volans>	 you're the only one running the wdqs data-transfer cookbooks AFAICT
[19:33:55] <ryankemper>	 ah, so just mutate the cookbook itself on `cumin1001`?
[19:34:08] <volans>	 something like that, but I was then thinking... how long will it run?
[19:34:27] <ryankemper>	 ~1 hour or so
[19:34:32] <volans>	 because if it's days it would be a problem, because it will stop puppet updating the repo because of local changes
[19:34:35] <volans>	 ahhh ok then
[19:34:38] <volans>	 go ahead
[19:34:46] <volans>	 if you want I can double check the diff
[19:35:14] <volans>	 ryankemper:  in /srv/deployment/spicerack/cookbooks/sre/wdqs/
[19:36:23] <volans>	 just make sure to do a git checkout data-transfer.py afterwards
[19:37:23] <ryankemper>	 volans: and just to be clear puppet won't overwrite it on each run? or is it that it will but as long as I run the cookbook right away then it'll be in RAM and so won't matter after that point
[19:37:26] <volans>	 changing line 29 should be enough from a first look, you kust have to stop it yourself because it will not stop it
[19:37:36] <volans>	 both :)
[19:37:49] <volans>	 puppet will try to do a git pull that will fail with local modifications
[19:38:01] <volans>	 and also if you're already running it, it will not be affected
[19:40:20] <ryankemper>	 volans: check the diff now, I edited a different line so that we won't impact the stopping of services, only the start
[19:40:41] <volans>	 but it won't start any of them, is that ok for you?
[19:41:20] <volans>	 diff looks good for that
[19:41:24] <ryankemper>	 Yeah that's fine, neither blazegraph nor updater need to be running
[19:41:27] <volans>	 ack
[19:41:29] <volans>	 +1
[19:41:38] <ryankemper>	 I technically want them running on 1009 but there's no time criticality so I figure I'll just do that part manually
[19:42:28] <volans>	 ack
[19:42:38] <volans>	 wait a sec
[19:42:51] <volans>	 will the wait_for_updater() wait/fail?
[19:43:18] <volans>	 I guess you run it without the lvs option, so there will be no 'pool' at the end, correct?
[19:43:28] <volans>	 ryankemper: ^^
[19:43:44] <ryankemper>	 volans: no pool at end, correct
[19:43:54] <ryankemper>	 I would expect wait_for_updater to hang forever
[19:44:01] <ryankemper>	 I didn't look at the logic for it that's just going off the name
[19:45:17] <volans>	 @retry(tries=1000, delay=timedelta(minutes=10), backoff_mode='constant'
[19:45:45] <volans>	 so it will try for quite a bit, but being the last command you can also just ctrl+c once stuck there
[19:45:49] <ryankemper>	 Close enough to forever :P
[19:46:10] <ryankemper>	 Yeah exactly that's the plan
[19:47:05] <ryankemper>	 I already kicked off the cookbook and restored the state of the git repo btw so everything should be as normal as far as the repo's concerned
[19:47:40] <volans>	 great, thanks
[19:53:42] <nixfloyd>	 mutante, apergos: thanks!
[19:57:02] <mutante>	 you're quite welcome