[05:38:01] fixed the package issue on es1022 [06:20:08] I am going to do a master switchover on es1, should be a noop [11:06:41] marostegui: i have made a bug-fix release for wmfmariadbpy (0.8.1). i'll deploy it after lunch [11:10:07] good [11:10:18] you can test on db1124 if you want [11:16:27] sorry, I had a doctor appointment early in the morning and I will have another later in the day :-( [11:17:34] jynus: :-/ Hope you're OK [11:39:07] Good morning. Sorry to trouble you, but we've just discovered a small but unintended side-effect of this change: https://gerrit.wikimedia.org/r/c/operations/puppet/+/716306 [11:39:33] I wondered if you might be able to advise before I spend much more time on it. Perhaps its' something you've seen before. [11:41:22] Briefly, the `prometheus-mysqld-exporter@matomo.service` service on matomo1002 no longer starts. So stats are missing from here since a reboot last week. https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&refresh=1m&from=now-10d&to=now&var-job=All&var-server=matomo1002&var-port=13306 [11:44:31] btullis: I don't remember the details, but I think puppet/packages changed a bit regarding prometheus startup to be more automatic [11:45:13] maybe it just requires a package upgrade [11:46:12] Yes, agreed. This just might be an edge case. We have a mariadb instance on this host, which uses the `mariadb.service` and not an instance. But we instantiate a mysqld_prometheus_exporter with a name, overriding the socket. https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/piwik/database.pp#L47 [11:47:19] I think that since that change, we can no longer start the `prometheus-mysqld-exporter@matomo.service` because it is dependent on a similarly named mariadb instance, which we dont' have. [11:47:21] I don't necesarilly fully understand what you mean, I just heard something was changed (for the better) on package/puppet [11:47:30] https://www.irccloud.com/pastebin/4XME0368/ [11:47:52] so maybe it is missing an upgrade on either for the analytics dbs [11:48:41] what does it say when you manually start it? [11:48:59] ah, I also remember some past bug/weirdness/regression on the prometheus exporter [11:49:12] OK, thanks jynus. I will check for available package upgrades. I just wondered if this rang any bells for anyone. [11:49:26] I am talking on kind of second account [11:49:41] some dba will be able to pinpoint the exact work done [11:49:48] https://www.irccloud.com/pastebin/HaplOqXb/ [11:49:56] I think also mathew was involved on the changes [11:50:12] so everyone except me will be able to give you more details, sorry :-) [11:50:39] No worries :-) And thanks for your help so far. [11:50:56] btullis: maybe a stupid suggestion, but unmask and start to see what happens? [11:51:07] oh [11:51:09] wait [11:51:12] that should be masked [11:51:16] the service you want [11:51:23] ignore me [11:51:29] that should be the one active [11:51:32] @matomo [11:51:43] at least if it works like the core dbs [11:52:08] (that's what I would do systemctl unmask servicename) [11:52:17] start, and then see if you get any error [11:52:31] why that would be masked, I don't konw [11:52:46] btullis: this is a multi-instance mysql host, right? [11:53:19] Yep, will do. I'm pretty sure that it is this particular part of the change that has caused the issue for us: [11:53:19] https://github.com/wikimedia/puppet/commit/3aefbf2a8ce060ffb0020c1f526be6bdfe99f8c7#diff-40c123ddf30255d60322d990ada1cbf2da3f7deda710a2e3a77eb18c5347c395R5-R8 [11:53:20] ah, maybe the puppet is using the single-instance version? [11:53:31] that's your guess, right, Emperor? [11:53:53] The theory is that for every mariadb@foo.service you also have a prometheus-mysqld-exporter@foo.service that has an After= and Requisite= dependency on the relevant mariadb@foo service [11:54:09] I'm looking to see if there is some easy way that you already know of for us to change *our* setup on this host, to match something that should be working better with your new way of doing it. Not looking for you to accommodate us. :-) [11:54:17] On single-instance hosts, it's just mariadb.service and prometheus-mysqld-exportoer.service [11:54:39] ^hear what Emperor says, he will be the right person to ask [11:55:32] btullis: looking on matomo1002, you look to just have mariadb.service, so a single-instance setup, is that correct? [11:55:34] I think that previously (i.e. before my time) this host might have been multi-instance and then it was reverted to a single instance. [11:55:43] btullis: ah, that might do it, then :) [11:55:48] btullis: ah! that could create garbage [11:56:15] the usual answer for that is "we don't support that, you should reimage" [11:56:23] but of course it just need some manual fixing [11:56:32] btullis: in any case, you want to end up with prometheus-mysqld-exporter.service [11:57:01] unmask the ones that should run, mask other, and remove the old config to prevent accidents [11:57:45] I think there could be leftovers on /lib/systemd/system, /etd/defaults and /etc/mysqld/mysqld.conf [11:57:52] OK. Maybe it's just as as easy as `mv /var/lib/prometheus/.my.matomo.cnf /var/lib/prometheus/.my.conf` and unmask, then start the prometheus-mysqld-exporter service/ [11:58:16] as those old files will not be automatically deleted by puppet [11:58:42] yes, I wasn't suggesting you to reimage :-) [12:00:02] OK, will check it out and make manual changes as required. Thanks for the help. I just wondered whether this was a case you had seen before and had a simple answer to resolve. [12:00:39] btullis: not that exactly, but I do indeed move different sections within the multiinstance role [12:00:55] so it is mostly cleaning the paths I sent you [12:01:10] Great. Many thanks. [12:01:31] (it was not apparently related to the prometheus patch you said, although it could have triggered it) [12:03:52] I suspect the work to make the exporter more tightly coupled to the relevant mariadb service has the side-effect of making it a bit more picky than it once was [12:04:07] yeah [12:50:12] Thanks both. Hopefully this will do the trick: https://gerrit.wikimedia.org/r/c/operations/puppet/+/755971 (plus subsequent manual cleanup) [12:54:30] I see, there was possible some mixture about the multiinstance and the standalone way of defining services [12:54:48] the puppet compller will likely catch any outstanding issue [14:20:10] marostegui: on second thoughts, deploying that wmfmariadbpy release on a friday afternoon seems like Bad Ideas Bears. [14:29:45] WCPGW