[00:32:45] hello! did something involving SSH keys change recently on Cloud VPS? suddenly today `wsexport-prod01.wikisource.eqiad1.wikimedia.cloud` is rejecting my public key, and someone else told me they had the same problem with a different instance recently. I tried both that instance and others and they're all working fine, it's only wsexport-prod01 that isn't letting me in [00:42:12] musikanimal: do you have privileges to control that instance in Horizon? [00:42:21] yes [00:42:49] it may sound odd but .. try just rebooting it and see if that goes away [00:44:45] okay, can't hurt to try! [00:47:32] that worked! thanks :) [00:48:16] musikanimal: :) I am glad. Don't ask me why..it was just a vague memory and intuiton. [00:49:23] the auth.log failures before the reboot were "Invalid user musikanimal" which sounds like LDAP connectivity problems? [00:49:25] yeah I was thinking of trying it, guessing maybe something went wrong with the Puppet run or something [00:51:19] weird! I sure thought I was a valid user hehe [00:52:36] it can be indistinguishable sometimes if it's just that the instance does not exist anymore or the key really gets rejected [00:53:18] had a similar one recently where the user was wondering why their key gets rejected and it really was just that the hostname had changed from "11" to "12" or something [00:53:42] anyways, that was slightly different from this one. cya [00:55:09] everything user-facing was working fine (the tool loaded in my browser). This did follow a brief period of downtime though, which might be related. [00:57:56] I don't understand Puppet very well but I know it's the thingy that sync the public keys (right?), so there's a chance it just happened to do the sync during that period of downtime and failed [03:01:55] idk the latest but that used to be ldap I thought. anyway if it was puppet that fixed it you should be able to see in puppet logs what it changed. (re @wmtelegram_bot: I don't understand Puppet very well but I know it's the thingy that sync the public keys (right?), so there's a ch...) [04:54:23] @jeremy_b is right, ssh keys are fetched over LDAP [04:54:53] so it probably was what b.d808 said about LDAP connectivity problems [08:04:19] !log tools.wikibugs remove 'taxonomy' cron job on the stretch grid, seems to be intended to update [[mw:Phabricator/Projects]] but broken since early 2021 without anyone complaining [08:04:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [08:49:25] !log admin deploying cloudmetrics grafana to grafana 8, T282863 [08:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [08:49:28] T282863: Upgrade Grafana to 8.x - https://phabricator.wikimedia.org/T282863 [08:52:25] taavi: morning! I wanted to ask about this, is there any docs/task with the current and future cloudmetrics setup? [08:56:55] morning dcaro! we have https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Monitoring, if that's what you are asking for? cloudmetrics* are running a few separate services (grafana, graphite, prometheus) [08:57:23] T302493 is what I've been using to hook up the prometheus instances there to alertmanager [08:57:24] T302493: hook up prometheus @ cloudmetrics* to an alertmanager - https://phabricator.wikimedia.org/T302493 [08:57:53] sorry no, I meant metricsinfra xd, though there's some there too [08:58:58] for that I've written https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra/Documentation [08:59:04] 👍 [09:08:30] hi folks, is there an equivalent dashboard for https://grafana.wikimedia.org/dashboard/db/labs-monitoring?orgId=1 ? I'm looking at the interface saturation alerts in the context of T302958 [09:08:31] T302958: Update grafana links to new format - https://phabricator.wikimedia.org/T302958 [09:13:56] godog: hard to tell without knowing what that dashboard contained [09:15:10] fair enough taavi [09:27:40] godog: I think it became https://grafana-labs.wikimedia.org/d/000000059/cloud-vps-project-board?orgId=1, but I'm not sure [09:28:06] dcaro: thank you, yeah given the context of the alerts I'm not sure either [09:28:16] see https://gerrit.wikimedia.org/r/c/operations/puppet/+/767720 [09:29:38] which file specifically? [09:31:33] I see, the monitoring/interfaces.pp, yep, that's labstore, so yes, labstore1004* [09:33:03] dcaro: speaking of monitoring, can you re-review https://gerrit.wikimedia.org/r/c/operations/puppet/+/765567 please? [09:33:18] godog: quick question, do you know if there's a way to link a dashboard that does not depend on it's name? [09:34:45] dcaro: I'm going to need more context, but in general yeah you can use the uid only not the slug [09:34:57] 00000059 in the case above [09:38:19] oh, I see, is that something newish, or it worked before too? [09:39:11] I think that before there was no slug in the url, I see some urls in wikitech like "https://grafana-labs.wikimedia.org/dashboard/db/arturo-toolforge-dashboard?orgId=1", that don't work anymore [09:39:46] that's great then, problem solved :) [09:41:17] I see that that's exactly why you are changing the urls xd [09:59:27] yeah taht's right, the slug in url used to work until grafana 7 [10:00:47] dcaro: could you comment/vote on https://gerrit.wikimedia.org/r/c/operations/puppet/+/767720 ? [11:07:04] done :) [13:28:24] godog: is it a known issue that the prometheus UI isn't loading its javascript when accessing it via an ssh tunnel? [13:33:12] and there was just a netsplit, so if you replied during that I didn't see your response :/ [13:34:58] I did not see a response yet :/ [14:12:22] just noticed, the prometheus=cloud alerts, have team=sre, should we change that to team=wmcs? [14:12:27] (like all the icinga alerts) [18:17:33] !log tools.wikibugs Updated channels.yaml to: 673df01c67a10190f6c2d36e753937684120a182 channels: route Cloud Services Proposals to cloud-feed [18:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:32:04] !log tools.heritage Deploy fcedb0f, 10b63ae, a00f000, 6d2fc8a [20:32:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL