[13:04:38] <_joe_> hi, say I need to change some variable name in like 10 different graphs in the same grafana dashboard - what's the most convenient way to do it? [13:05:48] _joe_: you _could_ download it as json and make the change locally 😬 [13:06:07] (but really we should be generating dashboards using a templating language; editing them manually doesn't scale) [13:06:10] <_joe_> kormat: that was my current line of thought [13:06:20] <_joe_> kormat: I need to modify the template :P [13:06:47] <_joe_> basically it used the same variable for "service" and "namespace" in k8s, and that is changing [13:09:05] yeah I think bar a dashboard-generating thing like the grafana grizzly work that's ongoing for SLO, the next best thing for batch changes is download -> change -> upload [13:14:32] <_joe_> ack thanks [13:28:25] moritzm: maybe you know, I merged a puppet patch that broke some tests, but the jenkins run did not see it (the tests were for a module that included that one), is there a way to tell jenkins to run all the rspec tests? I would prefer runnign the whole suite (it's not that long anyhow) [13:32:37] no idea, sorry :-) [13:34:01] no prob, I might ping jbond when he's back (if I remember xd) [18:19:01] kormat: The percentage is expected to get slower towards the end for pc1/2 since they've been restarted a few times to the earlier tables are cleaner than the later ones [18:19:13] but yeah, I didn't realize it looked at the full list again for the percentage [18:19:15] I'll fix that [18:34:47] dcaro: this is a bit of a rabbit whole i would advise avoiding but happy to go through the issues when im back (monday) [19:10:12] Hi there, I did a dns change for an internal system and I'd like to check that it is resolving correctly: https://gerrit.wikimedia.org/r/c/operations/dns/+/702731 [19:10:12] However when I run `dig analytics-hive.wikimedia.org` I don't see an-coord1001 as I'd expect - there's no answer? But everything is working, can somebody help me troubleshoot dig? Generally I'm a dns noob... [19:20:43] That name doesn't exist razzi, that is as much as I can really confirm. [19:21:08] hmm [19:21:15] i just got an answer [19:21:16] NXDOMAIN returned from ns0/ns1/ns2 [19:21:26] from stat1004 [19:21:26] dig analytics-hive.eqiad.wmnet [19:21:32] ;; ANSWER SECTION: [19:21:32] analytics-hive.eqiad.wmnet. 300 IN CNAME an-coord1001.eqiad.wmnet. [19:21:32] an-coord1001.eqiad.wmnet. 2204 IN A 10.64.21.104 [19:22:25] https://www.irccloud.com/pastebin/l742IjS1/ [19:22:48] Ah sorry Andrew, razzi I think you got the top-level domain wrong. [19:23:03] OHHH yup sorry i didn't even notice [19:23:06] yup razzi it is eqiad.wmnet [19:23:07] it's not .wikimedia.org, but .eqiad.wmnet [19:23:09] not wikimedia.org [19:23:10] right [19:24:31] Ohhh great [19:25:28] Thanks for the help topranks !! [19:26:23] I’m curious how the eqiad.wmnet thing works, if anybody can point me to documentation on that I’d appreciate. It’s like our own tld! So cool!! [19:27:04] Just from observing all the private addresses seem to go to that, public to wikimedia.org. But I need to search wikitech and dig into out authdns more. [19:27:30] wmnet is internal only dns. you have to be inside the network for it to resolve [19:27:32] And presumably based on the particular subnet too, codfw.wmnet is also a thing. [19:28:53] https://wikitech.wikimedia.org/wiki/DNS might be a place to start pulling threads [19:29:32] bd808: thanks :) [19:30:39] when wondering about ancient SRE secrets, wikitech is a reasonable place to start looking. You may find 5 {{Outdated}} pages before you find the one you need, but there is a ton of documentation there going back many, many years [19:37:40] yeah in general private IPs have hostnames in wmnet, and public ones in wikimedia.org [19:38:03] relatedly: we don't have any kind of NAT routing for the private IPs either, they're only reachable from the inside, and can't reach the outside world directly, either. [19:38:32] (there is an outbound http proxy for special case private->outside access, but that's at L7) [19:39:05] what can make this a little confusing is that "ssh foo.eqiad.wmnet" works from the outside with our standard ssh config for the bastions. [19:39:32] as long as you hit the Wikimedia nameservers, *.wmnet will resolve, e.g. `dig deploy1002.eqiad.wmnet @ns1.wikimedia.org` [19:39:33] it works because the wmnet hostname doesn't get looked up in DNS on your machine, it gets looked up and connected to on the bastion host on the way through. [19:39:43] whoever is merging 'Btullis: Grant icinga permissions to btullis' please merge my patch as well :) [19:40:21] possibly btullis ? [19:40:41] yep, there are some web proxies on the install hosts. Also our "private" v6 space is carved from aggregates that are announced to the internet. But we largely drop it all ingress on the edge. [19:40:54] andrewbogott: That was me. :-) I'll have a look now. [19:42:30] legoktm: interesting yeah. So they are auth but wmnet doesn't exist in the root so resolvers won't ever get there. [19:43:34] * legoktm nods [19:44:25] but typically when I want to find an IP for a host or vice-versa I use https://codesearch.wmcloud.org/operations/?q=deploy1002&i=nope&files=&excludeFiles=&repos= (TTL of ~90min) [19:45:44] nice tip. [19:46:18] andrewbogott: Sorry, not sure where to find your patch. I searched on gerrit, but there's nothing obvious to me. https://gerrit.wikimedia.org/r/q/project:operations/puppet+status:open+owner:abogott%2540wikimedia.org [19:46:56] Hello btullis! I don't believe we've met so I don't know what your role is here -- nice to meet you! [19:47:13] When puppet patches are submitted in gerrit, there is a second 'merge' step that needs to be done by hand by someone with root access. [19:48:15] patches need to be merged in sequence, so as long as one patch is pending merge then nothing else can be easily merged. [19:48:31] So right now https://gerrit.wikimedia.org/r/c/operations/puppet/+/702739 is gumming up the works. Do you have a login on puppetmaster1001 so you can merge it? [19:48:32] Ah thanks. I'm a new SRE in the Data Engineering team, but it's only day 4 for me here, so I've only just got shell access to the puppetmasters. [19:49:52] Welcome! [19:49:54] btullis: so if you log on to puppetmaster1001 and run 'puppet merge' you will be prompted with the pending patches [19:50:05] Yes, got it. Doing so now. [19:50:09] thanks! [19:50:46] Done. [19:50:52] thanks :) [22:25:50] Anyone around who is familiar with `systemd::timer::job`? Having a bit of trouble understanding how to set `$interval` [22:26:09] I want a timer that fires immediately when created and then every 30 minutes after that, indefinitely [22:26:58] `OnActiveSec` set to `30min` should do that, but I think I also need to set `OnBootSec` to 1 second as well...but the syntax of what `systemd::timer::job` wants is a little confusing [22:27:16] Here's the whole job as I have it right now (the `interval` is the part we're concerned with): [22:27:19] https://www.irccloud.com/pastebin/I9800idX/ [22:29:03] Looking at timers for other services they usually look something like `interval => {'start' => 'OnCalendar', 'interval' => 'Mon *-*-* 3:15:0'},` [22:29:55] I don't want a cron-style calendar type interval though, I just want a monotonic timer. But from the syntax of `interval` it's not clear to me how to set both `OnActiveSec` and `OnBootSec`; all the examples i'm finding in our repo are just setting 1 thing [22:31:39] Ah I might have just rubber ducked myself, looking at `modules/systemd/manifests/timer/job.pp:112` it looks like it wants either a `Systemd::Timer::Schedule` OR a `Array[Systemd::Timer::Schedule, 1]]` so I think I can just give it an array with both [22:34:36] ryankemper: interesting, we may want to enable this for purge_parsercache as well. those jobs currently start once a day but typically take (almost) a day to run, so if the server is restarted we'd generally want to start those right away as they were very likely aborted by the reboot [22:34:46] kormat: ^ [22:35:32] Krinkle: I believe `interval => [{'start' => 'OnActiveSec', 'interval' => '30min'}, {'start' => 'OnBootSec', 'interval' => '1sec'}],` is the syntax I'm looking for, I'll let you guys know if it behaves as expected [22:35:55] ack, thx [23:40:20] Anyone around to take a look over https://gerrit.wikimedia.org/r/c/operations/puppet/+/702754? Runs a binary every 30 mins on search team's cirrus `elastic*` hosts [23:41:41] This is a temporary mitigation to address some IO issues we've been running to on our elasticsearch clusters related to the `readahead`; the binary disables the readahead for open files. Because we need to run the binary once for each elasticsearch process, passing in the pid, there's a simple wrapper script; that wrapper script is what the timer is actually running [23:50:00] * legoktm looks [23:50:42] ryankemper: I don't think committing binary files to puppet is a good idea [23:52:45] https://phabricator.wikimedia.org/P5883 would be a trivial deb to package I think [23:53:16] yeah, just needs a quick makefile [23:53:26] ryankemper: how urgent is it to get this running? [23:53:33] so turn it into a .deb and replace the file resource with an apt resource basically? [23:54:16] yeah, just upload it to apt.wm.o and have puppet install that package [23:54:49] legoktm: it's pretty urgent; current readahead behavior leads to unacceptably high IO on the cluster, so we want to have a mitigation in place given we're all on break next week [23:55:00] if you need help with packaging I can help with that [23:55:21] ("pretty urgent" means "I can afford to spend an hour wrangling it into a .deb package" to be clear) [23:56:32] legoktm: ack, I'm not super familiar with .debs but it sounds like this should be a pretty simple one. I'll take a first swing at it and ping you to take a look? [23:56:38] sure [23:56:43] much appreciated [23:56:47] have you used dh_make before? that should give you most of the scaffolding [23:56:56] nope [23:58:00] based off https://manpages.debian.org/jessie/dh-make/dh_make.8.en.html looks like once I make the makefile it'll help me turn that into the actual package? [23:58:31] yeah [23:58:35] create a --native package [23:59:56] alternatively, https://salsa.debian.org/mediawiki-team/poolcounter/-/tree/master/ has the debian files for a pretty straightforward C/make package, you could just copy those and adjust+delete extra stuff