[06:19:30] marostegui: the master switch issues with x2 circular, would that affect pc as well? [06:19:59] i will reply later I need to get breakfast [07:08:54] sukhe: please ping me when there's a public incident report [10:32:51] <_joe_> RhinosF1: I'm not sure there will be one; and if there will be, you can check https://wikitech.wikimedia.org/wiki/Incident_status [10:33:37] <_joe_> I mean actually "not sure" as it can be a repetition of a previous incident and/or something that we don't want to make public, necessarily, for security reasons [10:41:15] _joe_: I understand [12:52:48] https://www.theregister.com/2022/07/25/ancient_linux_install_upgraded/ <-- chiark upgrade made El Reg... [13:16:45] Awesome piece, although I'm curious about the hardware. x86_64 didn't exist in 1993, [13:16:51] there must have been a HW upgrade? [13:17:50] ah, details here https://diziet.dreamwidth.org/11840.html [13:19:53] yeah, chiark is on it's Nth lot of hardware [13:19:55] its [13:20:38] That's dedication to your install [13:21:25] I think my home desktop started out somewhere around the Potato era [14:36:19] volans: independant spicerack question...any ideas why py37-prospector always fail with 'pylint: no-self-use / Method could be a function (col 4)' when i run it locally, but CI is fine? [14:37:13] ebernhardson: you might have an older prospector version due to a tox/pip bug on refreshing deps. The quickest test is to rm -rf .tox/ and retry [14:37:31] <_joe_> ebernhardson: because prospector is a tool designed to drive you mad [14:37:43] <_joe_> don't listen to the justifications by volans [14:37:45] <_joe_> :P [14:38:09] lol, both sound plausible :) [14:38:51] <_joe_> ebernhardson: there's the difference between something that's technically true and something that's actually true, for you :P [14:39:06] <_joe_> the latest bug is just a manifestation of the evil nature of the tool [14:41:23] https://github.com/tox-dev/tox/issues/149 FYI, should be fixed in tox 4 [14:41:25] i suppose i've mostly been ignoring it, but the tox tests for spicerack have never seemed to work 100%. They also do this rediculous thing where `tox -e py37` completes, gives a green message suggesting everything is ok. But it runs nothing :P [14:41:42] ? [14:42:24] volans: sec, i'll re-run and paste the output. i deleted .tox so it's taking its time now :) [14:42:41] ahh if you don't specify a specific env to test but one that doesn't make tox fail? [14:43:25] if you only have one python installed in the system you can just run tox, without args to run the whole suite. If you have multiple it will do it for all and it's longer/painful [14:44:33] volans: well, what i wanted was for the `py37` env to run `py37-*`, but i suspect thats not what happens. What i would then expect is an error, but instead is says 'yea thats great, ok': https://phabricator.wikimedia.org/P32060 [14:44:41] yeah [14:45:05] volans: py37-prospector still fails with a new .tox :( https://phabricator.wikimedia.org/P32059 [14:45:05] I can see if can be workarounded, but I'm not sure it's possible without hardcoding all the envs [14:45:09] and so make it not scale [14:45:40] not the biggest deal, but CI argues with me when i do spicerack because i don't always see prospector errors admists the other non-errors [14:45:59] s/admists/amidst/ [14:46:16] you have pylint==2.13.9, CI uses pylint==2.14.5... mmmmh [14:46:46] with the same version of prospector... that's interesting [14:47:11] maybe i should use a different image, thats running from the tox-pyspark:0.7.0-s2 docker image (i just run most tox stuff from it) [14:47:37] tox should install from pip deps, unless told differently [14:47:47] from PyPI I means [14:47:58] s/means/meant/ [14:51:52] maybe i'll try and update this docker image to something else, it's still a debian 9 image [14:53:03] <_joe_> ebernhardson: I usually download the tox-docker image we run in production and use that locally [14:53:16] <_joe_> volans: maybe we should add a makefile that allows to run tox tests there [14:54:26] _joe_: not that trivial, there are drawbacks, https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/665134 [14:54:56] see all the comments there [14:55:14] silly question> is there an easy way to answer "which physical drive is /dev/sdr" and "which device node corresponds to enclosure device ID X"? [14:55:27] on a megaraid system [15:00:05] ebernhardson: for the py37-* issue, the easiest way I found so far is to use the TOX_SKIP_ENV env variable to tell tox to skip some envs... ugly but does the trick :) [15:00:06] <_joe_> volans: you mean "docker on mac doesn't work"? [15:00:08] <_joe_> yteah [15:00:08] TOX_SKIP_ENV='py3(8|9|10).*' tox [15:00:34] _joe_: not only, the fact that recreates all the time the venv from scratch IIRC [15:01:09] ebernhardson: but you should need it only if you have multiple python versions installed [15:01:36] <_joe_> volans: intersting, not my experience [15:01:58] <_joe_> but it's also true that I do "docker start -a spicerack/tests" on my computer [15:02:04] <_joe_> for a stopped container [15:02:06] i'll try it, i'm indeed already using the CI tox images, i suppose i happen to use the tox-pyspark CI image though rather than the top level tox image [15:02:30] ack [15:02:31] plausibly tox-pyspark hasn't been updated to match the tox image in awhile [15:03:05] also CI doesn't have 3.9 that is what we run in prod... [15:03:19] so I run them locally, to test 3.9 and also 3.10... [15:04:46] so I wouldn't like to have everyone testing only 3.7/3.8 and nobody 3.9 :D [15:04:52] :P [15:07:18] yesterday I added two users to the "wmf" LDAP group, to grant them access to alerts.wikimedia.org -- today I received an email saying " present in privileged LDAP group (wmf),but not present in data.yaml". what's the best practice here? [15:08:23] users are raymond-ndibe and sstefanova, both from the WMCS team [15:09:44] dhinus: yes, that should be done as part of clinic duty https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty via a task request [15:10:02] they both have tasks open to create their users in data.yaml [15:10:14] and could have added the request to be added to the wmf LDAP group there [15:10:21] I would have done them at the same time, preventing the alert [15:11:19] thanks, sorry about that! [15:11:31] (as I'm on clinic duty this week, to clarify) [15:11:45] I'm waiting and answer on task from raymond-ndibe [15:11:59] I also found a wiki page saying "When adding a user to the wmf LDAP group, please also add them to the Phabricator group wmf-nda" -- should I mention it in the tasks? [15:12:41] this is the authoritative docs AFAIK on the wmf LDAP group: https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#WMF_group [15:12:53] Emperor: smart can give you the serial number [15:13:35] Emperor: Not sure about telling you the physical drive bay [15:14:44] Emperor: is that for ms-be2067 ? [15:17:36] volans: actually, no, but ms-be1066 has a sad disk that megaraid doesn't think is sad, which makes working out what drive it physically is that needs replacing a pain [15:18:14] be2067 has the megaraid output, and I can tie that to /dev/sdc1 since kern.log is full of unhappyness about that particular drive :) [15:19:37] i.e. on ms-be1066 I know I want /dev/sdr replaced, but not how that corresponds to a PD I can ask dcops to swap [15:24:01] Doesn't the dell utility allow you to blink a particular drive bay ? [15:25:43] on supermicro kit you can use ledctl, but that's not installed on WMF kit and I dunno if it works on these systems anyway [15:26:02] claime: and I'd still need to know _which_ drive bay corresponds to /dev/sdr [15:26:48] I don't think even /dev/disk/by-path helps [15:27:00] I have no access to check, but does smartctl report at least serial number ? [15:27:13] Depending on raid cards, it can also report caddy num [15:27:35] with smartctl you have to tell it which megaraid device you want to know about, so it still doesn't help [15:28:06] i.e. smartctl -i /dev/sdr just emits an error [15:28:55] Oh sdr is the megaraid device, not the /dev/sdx corresponding to the physical device [15:28:57] ok [15:28:59] mb [15:29:22] I think that lshw should give you a pointer to the virtual disk in megacli [15:29:33] but don't recall the exact details [15:30:15] volans: do you think it might be worth adding a line "Requested LDAP group membership" to https://phabricator.wikimedia.org/maniphest/task/edit/form/8/ ? [15:30:19] volans: oh, that might be useful [15:30:30] MegaCli -PdLocate ? [15:31:23] Oh you need the physical device for that ngh [15:31:24] dhinus: for that I'd suggest to check it with jbond ;) [15:32:27] MegaCli -PDList -aAll or MegaCli -LdPdInfo -aAll doesn't give you what you need ? [15:33:33] dhinus: no, that form is for a different access request process [15:33:57] taavi: yep, just found https://phabricator.wikimedia.org/project/view/1564/ which looks more appropriate [15:35:05] claime: unless I'm reading it wrong, no, neither of those tells you what device node corresponds to which drive [15:35:28] Gotta love megaraid [15:35:31] but lshw -C disk tells me physical id and bus info for a drive [15:35:37] Oh so [15:36:12] Nevermind, that still doesn't give you enclosure and slot [15:36:30] Awww always lovely to see megacli syntax [15:39:06] but I'm not sure whether Virtual Drive number matches "physical id" or the SCSI LUN [15:40:55] Emperor: https://serverfault.com/questions/381177/megacli-get-the-dev-sd-device-name-for-a-logical-drive [15:41:09] You can get the serial number from MegaCli [15:41:18] And then match that with serial number from lshw [15:41:52] (maybe probably) [15:43:36] There's an answer down with storcli, which I don't know if you have it on WMF kit [15:46:37] I think lshw (which has both bus info and logical name) and megacli -ldinfo Target Id are enough to tie things together [15:47:25] dhinus: just to confirm that https://phabricator.wikimedia.org/project/view/1564/ is the right wayt to go for ldap requests [15:48:12] Emperor: then you should be able to get enclosure and number, and blink the led with -PdLocate [15:48:24] thanks jbond ! I've asked both users to file a ticket following that link. Sorry for having added them the wrong way! [15:49:11] reported serial numbers don't overlap, mind [15:49:29] dhinus: thanks and no problem :) [15:49:39] Emperor: yeah, some cards won't propagate the SN properly [15:49:59] * Emperor still thinks hardware RAID just adds pain :) [15:50:07] * claime tends to agree [15:50:22] jbond: I'll actually add a task for myself as well, because I was added to "wmf" by andrewbogott without going through the Phab task ;) [15:50:50] :) thanks [15:54:21] mutante: could you please update .users for pwstore? I've added my PGP key to keys/ [15:57:28] dhinus: I will try. (and I'm saying it that way because usually there is some problem, but yea :) [15:58:02] mutante: thanks, it's not urgent at all :) [15:58:13] ok!:) [19:20:12] jynus: cdanis: I've added the following to my Excellence playbook: `Consult the List of Incidents spreadsheet and honour any decision to keep an incident entirely undisclosed, and when creating placeholders for "partially undisclosed" incidents use only information from public tasks or wikimediastatus.net.` - sound good? (it's after the part where I say to basically create Wikitech pages for any SRE/Incident docs that don't yet have one). [19:20:53] This reflects what I'm currently doing, but recording it for someone else who might do this at some point.