[06:58:01] o/ Happy Monday [06:58:14] marostegui: let me know when you're around. I have some news :D [08:57:57] kormat: I've a (probably horrible) patch on top of the sandbox pontoon mariadb branch; shall I push it to gerrit (i.e. origin) for you to review before faffing around pushing it to the sandbox puppet master? [09:02:11] Emperor: let's do that, yeah [09:04:02] kormat: pushed, also available for your ROTFL pleasure at https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/sandbox/kormat/pontoon-mariadb104-test [09:05:53] (not deployed yet, obv, but the underlying systemd overrides were deployed on slave1) [09:06:16] Emperor: go ahead and deploy [09:06:37] now let's see if I can get that to work :) [09:07:05] :) [09:15:12] I've done the two things in https://wikitech.wikimedia.org/wiki/Puppet/Pontoon#Join_an_existing_stack and then pushed to pontoon-mariadb104-test [09:15:38] which is ssh://puppetmaster.mariadb104-test.eqiad1.wikimedia.cloud/~/puppet.git [09:16:02] ...but I don't know how to make puppetmaster actually use that, since presumably it isn't set up to want to look in ~matthew ? [09:16:18] ~mvernon even [09:16:26] [also, typo break, back in a few mins] [09:17:30] Emperor: there's a git `post-receive` hook in ~mvernon/puppet.git that _should_ update /var/lib/git/operations/puppet [09:18:16] but something looks very wrong [09:18:28] the latter repo is currently on a commit from 1 month ago [09:20:08] remote: Resolving deltas: 100% (64619/64619), completed with 2446 local objects [09:20:08] remote: From /home/mvernon/puppet [09:20:09] remote: * branch HEAD -> FETCH_HEAD [09:20:09] remote: Previous HEAD position was 2cafaa989b mariadb104-test: stack-specific changes. [09:20:12] remote: HEAD is now at 864fadd631 mariadb: Move db1107 from s1 to m2 [09:20:15] To ssh://puppetmaster.mariadb104-test.eqiad1.wikimedia.cloud/~/puppet.git [09:20:18] [09:20:21] is what (I think) that hook output [09:20:43] what was the push command you used? [09:21:06] git push pontoon-mariadb104-test sandbox/kormat/pontoon-mariadb104-test [09:21:16] ok, there's the problem [09:21:35] you're missing a trailing `:production` [09:21:50] the changes need to end up in the production branch on the remote end [09:22:50] I tried that [09:22:56] remote: HEAD is now at 61602f17bc prometheus: couple mysqld exporter service to mariadb service [09:22:56] [09:23:06] ...does that look better? [09:23:09] it does! [09:24:02] so now ssh to puppetmaster and puppet-merge then wait for it to deploy over the next 30m or so? [09:24:12] no puppet-merge for pontoon [09:24:12] IM puppetmaster in the test env [09:24:21] Oh, so should just deploy now? [09:24:23] the hook bypasses that [09:24:48] ssh to the cumin vm, and run `sudo cumin '*' 'run-puppet-agent'` [09:27:17] Sigh, FAIL on some hosts. I'm going to actually take this typing break before thinking about that. [09:28:01] hah, 👍 [09:31:58] Amir1: I'm on holidays today, back tomorrow [09:32:09] marostegui: you're missing all the fun :D [09:37:06] Emperor, what's your phab name? I want to cc you on a ticket but cannot find you by name or nick [09:37:59] I think I found you, phab was making you appear and disappear on autocompletion [09:40:03] jynus: MatthewVernon [09:40:49] kormat: so I might well have done something stupid, but the error from puppet doesn't look like anything I touched [09:41:10] Info: Applying configuration version '(61602f17bc) Matthew Vernon - prometheus: couple mysqld exporter service to mariadb service' [09:41:10] Notice: The LDAP client stack for this host is: sssd/sudo [09:41:10] Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: sssd/sudo' [09:41:10] Error: /Stage[main]/Profile::Idp::Client::Httpd/Profile::Idp::Client::Httpd::Site[orchestrator.wikimedia.org]/Acme_chief::Cert[orchestrator]/File[/etc/acmecerts/orchestrator]: Failed to generate additional resources using 'eval_generate': Failed to open TCP connection to acmechief1001.eqiad.wmnet:8140 (Network is unreachable - connect(2) for "acmechief1001.eqiad.wmnet" port 8140) [09:41:13] Error: /Stage[main]/Profile::Idp::Client::Httpd/Profile::Idp::Client::Httpd::Site[orchestrator.wikimedia.org]/Acme_chief::Cert[orchestrator]/File[/etc/acmecerts/orchestrator]: Could not evaluate: Could not retrieve file metadata for puppet://acmechief1001.eqiad.wmnet/acmedata/orchestrator: Failed to open TCP connection to acmechief1001.eqiad.wmnet:8140 (Network is unreachable - connect(2) for "acmechief1001.eqiad.wmne [09:41:16] Notice: /Stage[main]/Httpd/Service[apache2]: Dependency File[/etc/acmecerts/orchestrator] has failures: true [09:41:20] Warning: /Stage[main]/Httpd/Service[apache2]: Skipping because of failed dependencies [09:41:23] 50% failure rate [09:41:34] Emperor: (it's preferrable to use phabricator's paste function for this sort of thing) [09:41:45] Emperor: sorry, i forgot to tell you - that _always_ fails, and should be ignored [09:42:15] I see [09:42:42] it's attempting to request an ssl cert for the orchestrator instance in pontoon, but firewalls prevent it from reaching the granting server [09:43:07] Ah. [09:45:15] when I log into slave1, though, it says last Puppet run was 40 minutes ago [09:45:55] oh. ugh. you need to use `cumin -x`, sorry [09:45:56] slave2 says 23 minutes ago, and hasn't notably applied my changes [09:47:02] ah, -x means "carry on even if something failed"? [09:47:06] yeah [09:47:12] cumin bailed because one node 'failed' [09:47:15] WCPGW? :-D [09:49:44] OK, now I get an actual error because I can't puppet :) [09:49:50] \o/ [09:51:22] https://phabricator.wikimedia.org/P17060 has error message and the puppet I Did Wrong [09:52:44] Hm, I think I perhaps need to set path explicitly? [09:53:12] but that error reads to me like I spelled the notify wrong somehow [09:54:46] i haven't used exec before, but it looks like you need to provide the `command` attribute [09:56:03] https://puppet.com/docs/puppet/7/types/exec.html says '(Namevar: If omitted, this attribute's value defaults to the resource's title.)' apropos command [09:56:23] (but it does also say to provide path, so I shall do that and see if that fixes it) [10:01:18] Nope [10:12:09] ...answer was quoting :-/ [10:12:15] https://phabricator.wikimedia.org/P17060 [10:37:32] Emperor: https://phabricator.wikimedia.org/P17061 [10:38:59] I bet you can't mask a unit that doesn't currently exist :-/ [10:40:06] Emperor: in the worst case, you symlink it to /dev/null in /etc/systemd/system [10:40:25] but that would be a Lot cleaner if puppet wasn't psuedo-declarative [10:42:32] sobanski: btw, did you get a chance to read https://phabricator.wikimedia.org/P17050? [10:45:35] kormat: you could presumably also stick in an override that replaced the unit with an exit 0 ... [10:45:52] Emperor: you could, but that seems strictly worse [10:46:03] the /dev/null symlink means that systemd will not consider the .service to exist [10:46:11] which is what we'd want for multi-instance hosts [10:46:22] it also simplifies tab-completion [10:46:28] I'd agree with that [10:46:58] woo, run-puppet-agent works now [10:48:26] Emperor: so, i'm dubious about the 'exec' approach, but once you have things working, we can then get one of the puppet experts to chime in on whether there's a better/worse way to do it [10:48:49] (e.g. i think it'll cause the exporter to get restarted on _every_ puppet run. and twice on puppet runs when the config changes) [10:50:14] kormat: I thought puppet's notification system was meant to deal with that properly. BICBW. i.e. setting refreshonly means it'll only run if something notifies it, and that should only happen if one of the config files is changed or the package updates... [10:52:01] kormat: I read it, yes [10:52:24] sobanski: ok cool. i'm hoping for a Pulitzer [10:53:10] Emperor: ahh. apologies. i should have looked at the Exec docs [10:53:18] (puppet is unintuitive as hell) [10:53:29] it is quite confusing [10:53:46] but, e.g. slave2 the PME service has been running for 9 months now [10:54:30] ok, that's reassuring :) [10:54:41] (i just wish there was a systemd-native way to do try-restart) [10:54:50] also, prooobably should reboot all those VMs at some point [10:55:00] alright, bbiab [10:55:29] Do we already have a phabricator task for the prometheus/mariadb things, or should I make one? ISTR a note on the linked-restart got added to something the other week [10:58:20] oh, yes, https://phabricator.wikimedia.org/T252761, which is probably properly a separate thing [11:00:29] (are we still using the DBA tag for our things rather than the data-persistence group?) [11:02:05] data-persistence is like a supertag of DBA, SRE-backups and SRE-media-storage [11:03:02] I think we only use that as a triage area for other people that doesn't know where it goes? [11:03:40] that == data-persistence, but sobanski is the tag wizard :-D [11:05:10] Emperor: what jynus said. For now add #DBA which captures all things database. [11:43:06] 'k [12:31:28] I've created https://phabricator.wikimedia.org/T289488 [12:32:51] kormat: I think I should now take my branch, tidy it into a single commit and push it for review (tagging T289488)? Would you like to look at it further in the test setup first? [12:32:51] Emperor: re: aim 3, that's not something we ever do [12:32:51] T289488: Systemd enhancements for mariadb and prometheus-mysql-exporter - https://phabricator.wikimedia.org/T289488 [12:32:56] (or, at least not in my 1.5 years of being here) [12:33:31] kormat: YM you think we don't want to do that? [12:33:48] well. maybe m.anuel can think of a usecase when he's back, [12:34:15] i haven't figured out whether it's a bad idea to have the _option_ or not :) [12:36:29] If we've rebooted a multi-instance server, it felt like being able to start all the mariadbs with one command rather than having to do each in turn would be a plus. But maybe that's too unusual a use case and normally you want to do each by hand? [12:36:31] Emperor: oh, there is one use-case. when shutting down a machine for reboot, it would be useful to be able to do `systemctl stop mariadb.target` [12:36:47] after reboot, i'd definitely be starting them one-by-one, by hand [12:38:10] Emperor: i do wonder what happens if you do `start mariadb.target` and then stop one of the instances [12:38:25] does systemd consider the target to be reached, or not [12:38:34] or inconsistent [12:39:23] and what happens then if you do start or stop on the target [12:40:18] The usual approach is roughly a Wants= type dependency [12:40:36] (e.g. in Ceph-world, you'd do systemctl restart ceph-osd.target to restart all the OSDs on a host) [12:40:55] operations which apply to all instances seem clear [12:41:04] the semantics for operations which do not, however, are very unclear to me [12:42:20] to continue my ceph example, you can systemctl start/stop/restart ceph-osd@foo to stop/start/restart a single OSD without affecting the others [12:46:41] so if you stop an instance, what does status say about the target? [12:46:57] i guess still 'active' [12:47:19] Yes, targets are useful for start/stop and so on, but I don't think you get useful systemctl status out of them [12:48:02] ack [12:49:44] Emperor: ok, that sounds useful enough to do, even if we won't use it very frequently [12:50:55] kormat: anyhow, shall I make a CR out of my current work for review? [12:51:07] Emperor: yes, please :) [12:53:43] * Emperor goes to drive git [13:02:51] kormat: could you suggest good people to ask to review https://gerrit.wikimedia.org/r/c/operations/puppet/+/714358 please? I mean, I'd guess you and/or marostegui for the DB side but maybe we want a puppet-master too? [13:03:27] jbond would Love to review it [13:03:39] (i can tell these things) [13:05:38] lol, in a meeting now but will take a look later [13:07:31] Emperor: CI hates your editor config [13:14:51] le sigh [13:16:45] sobanski: if I can find someone to merge https://gerrit.wikimedia.org/r/c/mediawiki/core/+/713268 (and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/713649) I will backport it and it will help drastically with ParserCache (at least I hope so). I pinged Daniel, if not, I try to ping people in WMDE [13:18:10] \o/ [13:18:34] kormat: ^ [13:19:01] Amir1: hurry up already. <3 [13:20:02] kormat: when the cleaner script start? [13:20:35] Amir1: the pc purge script? 01:00 UTC nightly [13:21:34] thanks. Let me go around and see who I can ping [13:23:29] if all else fails, just ping m.anuel repeatedly [13:24:19] he would love that [13:24:33] argh, it would be really helpful if deprecated bits of puppet said they were deprecated somewhere [13:25:10] For bonus points, what you're meant to use instead [13:25:40] ...rather than getting to the point of jenkins saying "lol, no, you can't use this" before discovering this :( [13:26:36] ...anyone know what I should be using instead of base::service_unit ? [13:26:37] Emperor: https://bash.toolforge.org/quip/AVfTAUmefIH_7EDsriqu :D [13:27:14] Amir1: lol/oww [13:28:05] Emperor: systemd::service, probably [13:28:24] generally https://bash.toolforge.org/search?p=0&q=puppet is fun :D [13:29:14] Amir1: definitely more fun than _using_ puppet, that's for sure [13:33:36] * Emperor opens https://bash.toolforge.org/quip/AVfTAUmefIH_7EDsriqu [13:33:39] no, not that [13:33:46] * Emperor opens https://gerrit.wikimedia.org/r/c/operations/puppet/+/714362 [13:35:05] I'm getting a bit grumpy about tripping over deprecated bits of puppet which aren't documented as being deprecated [13:35:31] Emperor: oh, has anyone mentioned PCC to you? [13:35:45] it can be useful [13:35:56] https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler [13:36:20] (i've no idea if it catches deprecation errors, prrobably not. but very useful for seeing the actual effect of a change without deploying it to prod) [13:38:24] Hm, how safe is that? [13:38:33] very [13:38:37] it runs in a sandbox [13:38:43] _i_ haven't broken it yet [13:38:51] so you should be fine. ;) [13:40:05] It's not immediately clear how/if that would help here, but I've made a note [13:42:02] Emperor: the usual case is seeing how a change to a generated conf looks, or what classes have been added/removed/had params changed [14:14:41] Emperor: if yuo install puppet-lint-wmf_styleguide-check and configure yuor editor to use puppet-lint for syntax checking it should highlight the same things that ci does [14:15:44] if yu use vim then 'rodjek/vim-puppet' should configure the neccesary checkes to run on write [14:16:12] but you will probably also need something like https://github.com/b4ldr/profile/blob/master/.vimrc#L109-L110 [14:16:52] as we use a 4 space indentiion here (the normal for puppet is like ruby i.e. 2) [14:17:25] <-- not a vim person [14:17:37] Emperor: we all have room for improvement. ;) [14:17:46] <>< [14:18:43] wait, unicode means I can actually brandish a 🐟 at kormat :) [14:18:57] the future is amazing [14:19:06] there some info for emacs https://wikitech.wikimedia.org/wiki/Puppet_coding#Emacs_guidelines (and im sure a google will find some equivilent vs plugin) [14:20:28] Hmph, puppet-el absent since jessie [14:21:45] * jbond only knows of moritz using emacs and hes on vacation [14:22:03] likewise https://github.com/puppetlabs/puppet-syntax-emacs is abandonware [14:23:09] Emperor: does emacs support language servers? [14:23:25] * Emperor finds https://packages.debian.org/search?keywords=elpa-puppet-mode [14:54:51] kormat: yeah, there's lsp-mode, but I've never used it [15:19:51] * Emperor fixes https://wikitech.wikimedia.org/wiki/Puppet_coding#Emacs_guidelines