[08:00:29] looks like we had some ceph blips during the weekend? T349425 [08:00:30] T349425: CephSlowOps Ceph cluster in has slow ops, which might be blocking some writes - https://phabricator.wikimedia.org/T349425 [08:41:04] morning [08:41:14] looks like it yes [09:04:37] blancadesal: it seems we forgot to enable puppet on toolsbeta harbor? https://alerts.wikimedia.org/?q=team%3Dwmcs&q=project%3Dtoolsbeta [09:05:17] oh, wait, are we waiting for the upgrade on tools? [09:05:41] dcaro: hmm, I thought I reenabled it? [09:06:22] should we wait for the upgrade on tools? [09:07:09] is the patch still cherry-picked on the puppetmaster? [09:07:26] no [09:07:46] when do you plan on upgrading tools? [09:08:26] I'd have to send out an email beforehand so wednesday? [09:10:13] works for me, then I think it's ok to keep puppet disabled for a couple of days, you can silence the alert adding a note saying that harbor upgrade is going to happen wednesday [09:10:27] you'll need to silence from this UI though https://prometheus-alerts.wmcloud.org/?q=team%3Dwmcs [09:10:41] we don't have yet prod -> metricsinfra silence integration [09:14:37] done, I'll send the email now. Is 12 o'clock on wednesday a good time for you in case I need a hand? [09:53:29] heads up that Filippo and I are changing how our prometheus alerts are routed to alertmanager with https://gerrit.wikimedia.org/r/c/operations/puppet/+/967863/. in theory that should be a no-op, but lmk if you see any issues [12:58:19] dcaro: all ok to make these network changes in eqiad? [12:59:54] topranks: yes, give me a minute to send an email with a reminder [13:01:00] topranks: done, ready, let me know if you want me to do anything else than random pings around :) [13:02:22] * taavi is around just in case [13:02:46] one of the network tests failed already! [13:03:18] it's using arturo-test-tool, maybe because of that? [13:03:23] heh, I didn't start anything :) [13:03:36] dcaro: if you could +1 this patch? [13:03:36] https://gerrit.wikimedia.org/r/c/operations/puppet/+/965708 [13:04:04] dcaro: yeah, I disabled that tool when removing a.rturo's access [13:04:21] ack, I'll change it to something else, will send a patch [13:06:57] if it needs a tool, a dedicated tool account is the best solution I suspect [13:07:15] yep [13:09:39] I've merged cloudgw change, running puppet on them now [13:10:46] 👍 [13:11:12] that seems ok - I'll go ahead with openstack changes? [13:11:20] taavi: did you just create https://toolsadmin.wikimedia.org/tools/id/network-tests ? [13:11:55] dcaro: no, the history button reveals it was created by arturo in early 2021 [13:12:02] that's from february xd, arturo created it, I think it's the replacemente for it yes [13:13:38] Am I ok to proceed with the disruptive changes? [13:14:38] I'd say yes, but up to dcaro I think [13:15:01] topranks: yes :) [13:15:07] ok thanks! [13:15:10] doing it now [13:16:05] ok done [13:16:13] my VPS instance is back pinging [13:16:36] oh wow, that was fast :) [13:16:38] was an outage of about 45 seconds I think [13:16:47] running tests [13:17:42] everything looks good \o/ [13:17:50] woot :) [13:18:00] yeah seems fine from all I can see [13:20:42] awesome :), I'll wait until :30 and I'll send the email, but it was really sooth [13:21:13] *smooth [13:21:33] yeah - nice to be able to rehearse in codfw :) [14:54:21] I tried to send an email to cloud-announce earlier today but it's stuck in "awaits moderator approval" Who could approve? [14:55:43] not me apparently [14:57:00] me neither [14:57:13] let me see [14:57:59] I don't see any held messages [14:58:09] can i become a list admin? [14:58:28] I do see a (approved) message from blancadesal in that list though [14:58:31] I guess xd [14:58:41] "Planned Toolforge build service maintenance outage", received 1 hour ago [14:59:07] looknig [14:59:23] I got that too [14:59:55] so it's been sent then? [15:00:26] I got that email also yes [15:00:38] your message seems to have made it through https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/DZ4KQOA2FODYR2BHO55FTHFGANZ4X4AR/ [15:00:41] taavi: I think I added you to the wrong list (moderator, vs owner) [15:01:24] great, thanks for investigating! [15:01:25] I do see a list management interface now [15:09:57] I'm creating a new CloudVPS project with a (patched) cookbook and I'm getting "Conflict occurred attempting to store project - it is not permitted to have two projects with either the same name or same id in the same domain: name is catalyst, project id catalyst." [15:10:45] dhinus: https://openstack-browser.toolforge.org/project/catalyst shows that project as already existing [15:11:02] huh, that explains it! [15:12:22] and it mentions the same Phab I was looking at, so I guess it was just created but the task was not updated [15:12:27] T349378 [15:12:32] T349378: Request creation of catalyst VPS project - https://phabricator.wikimedia.org/T349378 [15:12:45] dhinus: I think blancadesal might have done it earlier [15:12:51] yep, but no corresponding SAL entry [15:13:37] dcaro: I tried, but got a 504 time out, and at the same moment kola said he was on it so I dropped it [15:13:45] *komla [15:14:00] komla asked me to do it because he's missing the required permissions :D [15:14:10] maybe the time out message was misleading and it actually worked [15:14:18] yeah [15:14:30] I think it is yes, last time even it failed (the cookbook), the project was created [15:14:46] it might be missing something though (probably the task mention + sal log) [15:14:50] dcaro: btw I think I fixed the cookbook, I updated your existing patch https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/966134 [15:14:52] *dhnus, so is it created now? [15:14:56] there's still the users to add [15:14:59] yes the project is there [15:15:45] the first thing I was told after I got cloud vps admin was that you should never rely on horizon for creating projects, always use the openstack cli (which today means the cookbook) instead :-P [15:16:11] *taavi, noted! [15:16:39] I used wmcs-openstack, that's what timed out for me [15:16:43] dhinus: awesome :), did not have time yet to get back to it, hmm, I think though that the quota set actions at the end of the cookbook might get applied to the wrong project no? [15:16:54] blancadesal: oh, that's more interesting [15:17:03] dcaro: good point, please leave a comment in gerrit and I will follow up [15:18:45] dhinus: will you add the project members or should I? [15:20:31] blancadesal: please do it if you have the list at hand [15:21:08] shall we add komla to wmcs-roots so that he can run the create_project cookbook? (once it's fixed) [15:22:22] the openstack api latency alert has been firing for a while, I'll run the restart openstack cookbook [15:24:23] dhinus: ok [17:03:40] * dcaro off [17:04:27] taavi: yep, I have pending to spend a bit more time debugging why it happens, it's a recurring issue, the latency increases with time until it crosses the threshold