[08:08:47] Morning [08:09:46] greetings [08:13:50] morning! [08:31:40] I got a patch adding the k8s 1.30 packages for review https://gerrit.wikimedia.org/r/c/operations/puppet/+/1180502 [08:37:24] dcaro: looks good, but the CI is failing [08:37:33] πŸ‘€ [08:38:13] does not like the bug line xd, with the extra `\` (gitlab needs it to show a newline there...) [08:39:35] gitlab is silly and assumes things are markdown :( [08:39:47] agree [08:40:07] I think there was a long debate about it somewhere in Phab :D [08:54:28] I've just checked cloudcephosd1013 where a drive was replaced in T401319 [08:54:28] T401319: hw troubleshooting: disk sdj failure for cloudcephosd1013.eqiad.wmnet - https://phabricator.wikimedia.org/T401319 [08:54:39] dcops said they installed a new drive, but it's not showing in lsblk [08:55:21] a.ndrew said he was going to decom that host soon anyway [08:55:45] I've seen that in the past too, I guess you tried rebooting? [08:55:51] I haven't [08:56:38] do we have a cookbook for drain+reboot? [08:56:50] and is it worth it? [08:57:25] maybe we can just ignore it, but I'm sligthly annoyed by having a host with a non-standard config [08:57:58] you can just reboot it, it should come up fast enough (as in, no need to drain) [08:58:38] I mean, we don't really need it, and it does not really affect much else one drive, so we don't benefit that much [08:58:55] I'll reboot it just to see if it appears in lsblk [08:59:20] wmcs.ceph.reboot_node [08:59:23] should be enough [08:59:45] great [09:00:02] iirc there was some race condition with removing silences before the alert clears sometimes [09:00:11] (and associated task somewhere xd) [09:06:32] rebooted, the drive is still not visible :/ [09:07:01] :/, well, next step would be tinkering with BIOS or re-socketing the drive [09:13:21] I'll try pinging DCops in the task [09:19:10] an easy one: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1180510 [09:20:44] also my bad, didn't test well enough [09:25:15] LGTM, triggered a pcc run just in case [09:25:23] (and as it has Hosts header :) ) [09:25:58] pcc looks good too πŸ‘ [09:26:03] thank you ! [09:33:36] hmm... I wonder if I should add also the k8s packages to the trixie-wikimedia repo [09:35:16] hmm probably, since at least I'd like to upgrade the bastions relatively soon [09:36:55] okok, I'll try to check if they are available there, there's a way shorter list of third party packages there too [09:37:13] hmm, that should include helm3 also [09:37:26] iirc the kubernetes upstream repos are not distro version specific [09:38:09] uhhh is anyone looking at the (global?) puppet failures? [09:38:19] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'prometheus::instances_defaults' (file: /srv/puppet_code/environments/production/modules/nrpe/manifests/monitor_service.pp, line: 129) on node [09:38:19] cloudinfra-cloudvps-puppetserver-1.cloudinfra.eqiad1.wikimedia.cloud [09:38:38] oh wow, looking [09:38:56] sounds like a missing cloud.yaml entry [09:39:29] yeah, proobably from https://gerrit.wikimedia.org/r/c/operations/puppet/+/1174729 [09:40:12] * taavi asks in -sre [11:09:49] Working on racking and cabling the seed server in eqiad that will use 25G cables. We need to open up a block of 4 ports. With the least amount of moves, we would only need to relocate one cable: cloudcontrol1009-dev (WMF11302), which is currently in port 12. If we move it to port 11, that would free up the block. I also noticed that the device status is listed as Planned in NetBox. [11:39:26] jclark-ctr: they are not currently in use [11:40:07] so feel free to move it anytime, we don't have a plan yet for those I think, though andrewbogot.t might have a better idea [11:40:30] last update I know of is T380805 [11:40:31] T380805: Repurpose 5 config B servers - https://phabricator.wikimedia.org/T380805 [12:05:11] ok after more testing and understanding of socket activation I got https://gerrit.wikimedia.org/r/c/operations/puppet/+/1180534 [12:06:01] affects only codfw and relatively straightforward, if some kind soul feels like +1'ing I'd be grateful, if not that's fine too [12:06:31] I will wait till later in US time for Andrew to confirm pending on him. @topranks will you be available tomorrow on it if the cable is able to be moved? [12:07:05] godog: the `chain` vs `chained` that was used before is intentional? [12:08:17] dcaro: yes, "chain" is from the root pki to the cert (i.e. only one intermediate) and chained is cert + chain [12:08:29] okok [12:08:56] it all makes sense thinking about it, though chained vs chain confuses me ngl [12:13:54] +1d [12:14:11] thank you ! [12:26:34] neat, looks like migration is working as expected on codfw1dev with the pki certs, cc andrewbogott [12:28:03] kinda-related question, is there an utility / wrapper to go from fqdn to nova server id? e.g. to stick the id in a variable and not having to think about it when scripting [12:36:28] not that I know of, but maybe, there's a lot of scripts, and some might have to do the same thing xd [13:02:06] heheh true, ok thank you [13:24:16] I merged a GitLab MR and it entered an infinit spinner... https://gitlab.wikimedia.org/repos/cloud/wikireplicas-utils/-/merge_requests/3 [13:24:21] did anybody see this before? [13:25:38] not exactly, though saw some weird behavior too, today it took me a while (restarted the browser and everything) to see some of the changes in an MR [13:26:23] that one feels stuck though [13:27:11] I like the fact the message changes on every refresh :D [13:28:50] I tried asking in -releng [13:30:16] oh my, itΕ› more [13:30:22] I've been going crazy testing an MR [13:30:28] but it turns out it's not building the image xd https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/203/pipelines?diff_id=126864 [13:32:06] hmm... yep, so it might be related to the not updating of the diff in that MR too that I was testing [13:32:22] it does not show the latest pushes in the mr page either, though the commits are there [13:33:59] godog: great that the new certs are working! [13:34:33] I don't think we have a utility like you're asking for, the closest thing is mwopenstacklib which is used all over the place in python code [13:35:05] ok! thank you [13:35:30] andrewbogott: yeah great news indeed, I'll send patches your way to roll the change out to eqiad1, I'll merge early next week [13:35:44] btw did you already regenerate all the machine IDs? I'm curious to see if that makes tools nfs settle down. [13:36:23] I did yes, as part of T401880 [13:36:24] T401880: Ensure unique machine-id across Cloud VPS VMs - https://phabricator.wikimedia.org/T401880 [13:36:49] great! We will see if that cuts down on alerts and lockups [13:37:47] yeah anecdotally I'd say it has helped, I haven't seen an hard lockup on alerts.w.o [13:55:04] my MR merge did time out eventually, I clicked merge again, and it's stuck again :/ [14:00:24] I worked around my test with https://gitlab.wikimedia.org/repos/cloud/toolforge/jobs-api/-/merge_requests/204 , but can't merge it xd [14:00:31] let me try to ping someone [14:00:39] is that using wmf-managed runners as part of the merge process? might be worth poking devex folks on slack if so... [14:02:09] poked there [14:03:40] dcaro: want to come to triage meeting? [16:14:23] * dhinus off [16:47:42] > with the extra `\` (gitlab needs it to show a newline there...) -- dcaro: two trailing spaces work as a markdown
too. I patched the commit message validator to allow that a while ago. T351253 [16:47:43] T351253: Add support for commit message trailers that GitLab markdown will render on individual lines - https://phabricator.wikimedia.org/T351253 [16:48:30] "a while ago" -- a year ago apparently? where does time go Β―\_(ツ)_/Β― [17:47:41] TIL! [17:47:43] * dcaro off [17:47:44] cya! [19:39:27] Potentially stupid question. I'm building a new tool that would do somewhat expensive stuff on the db, not too expensive but if a scraper finds it, they will waste a lot of cpu cycles. I wonder if there is a out-of-the-box thing I could deploy on toolforge tools? I thought of forcing OAuth (what global search does) but beside privacy stuff, I think it'd be too much [20:47:16] I don't think we have a ready-made solution for scrape avoidance, although we are sure going to need one in the robot-filled future. [20:48:08] Could you just throw together some kind of static API key requirement in your API, and then issue keys to individual users? Or is this something that you want any human anywhere to use without advance arrangement? [20:50:20] Amir1: OAuth is the reasonable auth solution [20:52:43] OAuth is actually the only auth solution for Toolforge that has the blessing of WMF Legal. [20:55:02] * andrewbogott thinks Amir1 should listen to Bryan and not to Andrew [20:55:25] I sorta want to avoid auth altogether cuz I don't want to collect anything on who is visiting the pages, I know I can just discard it after successful handshake but I want to avoid that. There are some fun reasons behind this. If I have no other option, then I think OAuth would be the way to go [20:55:41] I wanted something like anubis but without the anime part :D [20:56:19] Sorta wanted to check before going the oauth route [20:57:22] like POW, check if js is enabled, etc. [20:58:30] you can write your own mess for that, but diminishing returns and sunk costs, etc. [21:00:18] yeah... [21:00:24] storing or even logging the OAuth creds is in the tools' control and certainly is not necessary for the protocol to work [21:07:47] Amir1: you could also use some sort of "captcha" as part of your tool's workflow where you ask the user to give you some data like the current picture of the day on enwiki or something. [21:08:40] most things beyond "follow link or submit form" will thwart a random crawler [21:08:43] yeah, that'd be easiest and most effective (ROI-wise) [21:11:15] I actually made https://gmt.toolforge.org/ a long time ago as part of a "dumb" captcha on something else (I forget what) where the user was asked for the current UTC/GMT day of year. I put a link to that tool next to the form field. :)