[00:39:38] hi.. I had my dev account blocked, how can it be unblocked? https://wikitech.wikimedia.org/wiki/User:P_Darwin [00:44:29] bd808, ^ [00:48:50] @Paulo_Darwin: I would be happy to unblock you. The main thing I need is an email address to associate with your account and then I can unblock and you can use Special:PasswordReset to get control of the account again. [00:49:55] Thanks! you can use paulosperneta@gmail.com , though it seems to eb associated to another account I created long ago but never used 8PESP) (re @wmtelegram_bot: @Paulo_Darwin: I would be happy to unblock you. The main thing I need is an email address to associate with your account...) [00:50:33] that account (PESP) could be deleted, BTW, if it's useful [00:54:33] @Paulo_Darwin: you should be able to use https://wikitech.wikimedia.org/wiki/Special:PasswordReset now to recover User:P_Darwin. I also blocked User:PESP just so you don't have that hanging out and confusing you or others. [00:56:27] I'm now getting this message: Your IP address is blocked from editing. To prevent abuse, it is not allowed to use password recovery from this IP address. https://gyazo.com/54c73e28978cdbd6e39e52b56e7823fa (re @wmtelegram_bot: @Paulo_Darwin: you should be able to use https://wikitech.wikimedia.org/wiki/Special:PasswordReset now to recover User:P...) [00:56:44] grrr... [00:59:02] @Paulo_Darwin: try again please [01:04:08] I used P Darwin but nothing seems to happen πŸ€” should I try with the email? (re @wmtelegram_bot: @Paulo_Darwin: try again please) [01:06:04] @Paulo_Darwin: using both user name and email is reasonable, but you should also check your spam folder for an email giving you the recovery link. [01:08:11] it's not working... maybe because I used it today already (and it returned the old PESP account that I never used) (re @wmtelegram_bot: @Paulo_Darwin: using both user name and email is reasonable, but you should also check your spam folder for an email giv...) [01:10:24] Hmmm... I'm not sure how the 24 hour cool down timer works. [01:14:21] @Paulo_Darwin: I tried sending you a reset token and at least from what I see in the database it looks like it may have worked? There is data in user_newpassword at least which is where the reset token is stored until it is consumed. [01:14:45] thanks, it worked! (re @wmtelegram_bot: @Paulo_Darwin: I tried sending you a reset token and at least from what I see in the database it looks like it may have ...) [01:15:55] Awesome. Sorry you got blocked, but glad you noticed and got unblocked again. :) [01:17:31] Tyhank so much for answering so fast! I just checked and everything seems to be working again 🀩🀩 thanks! (re @wmtelegram_bot: Awesome. Sorry you got blocked, but glad you noticed and got unblocked again. :)) [01:32:51] !log tools.cdnjs-beta Shutdown webservice [01:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cdnjs-beta/SAL [09:21:34] what's the current status of pushing to GitHub from toolforge? I couldn't find any docs... [09:21:49] I know we used to nuke people's private keys when they stored them on toolforge... [09:23:54] YuviPanda: I don't think there is any status... I don't think we've invested many engineering time in supporting that workflow in the last few years [09:31:14] If I have someone that cant login to horizon, what would you say are good debugging steps? [09:31:37] THey are using the right user, password, have 2fa for wikitech, are an admin of a project etc, and thats all I found to check so far [09:43:36] arturo: great! I'm working on something that is specifically targetted at this problem (pushing to GitHub securely from shared infrastructure), primarily for deployment in JupyterHubs. I'll make it work on toolforge too [09:44:38] YuviPanda: that would be great :-) [09:45:06] addshore: works for me... [10:27:10] arturo: any other ideas for debugging? afaik the password is correct, 2fa is correct, username is correct, they all work on wikitech etc [10:27:20] might a password reset nudge something into working? [10:27:35] are there any logs that might say specifically what part of auth is failing? [10:32:15] addshore: username? [10:33:11] https://ldap.toolforge.org/user/kpayne (shell kpayne, wikitech Kara Payne) [10:36:22] addshore: `login failed for user "kpayne" using domain`. The login should be `Kara Payne` [10:37:08] * addshore gets them to try this again [10:38:00] apparently this also fails :/ [10:39:14] `Login failed for user "Kara Payne" using domain "Default",` [10:39:32] addshore: this is 99% of the times wrong password, wrong 2fa code or both [10:39:50] ack, I might just poke this person to reset password and 2fa then [10:40:09] but the password is coming from a password manager and the 2fa works on wikitch it seems! [10:41:48] addshore: might as well be some missing bit on LDAP, some openstack permission or the like. Don't discard that, I've also seen that bit us in the past [10:42:14] bite* [10:42:39] the groups that I can see at least look right accoridng to the docs [10:42:46] the procedure I suggests we follow is to double check everything on your end, before we dive into a deep debugging session [10:43:09] yeah, I'll get them to reset 2fa and password to see if this helps in any way [11:07:58] they are in! [11:09:18] πŸŽ‰ [11:09:21] what was it? [11:48:55] arturo: tada! https://github.com/yuvipanda/github-app-user-auth [11:51:54] lets users grant push access to only very specific repos, and provides a short lived (8h) token they can use. [11:52:17] what do you think of this, bd808? [12:51:42] πŸ‘ [15:32:12] !log tools.cdnjs Manually running update job with `-mem 4096m` to see if that is enough space to generate the ridiculous index file (T299723) [15:32:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.cdnjs/SAL [15:32:15] T299723: Toolforge cdnjs mirror (website) is out of date - https://phabricator.wikimedia.org/T299723 [15:43:32] !log project-proxy update maps-proxy mappings from maps-wma to maps-wma2 per request on T299585 [15:43:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [15:43:36] T299585: Request increased quota for maps Cloud VPS project - https://phabricator.wikimedia.org/T299585 [15:45:56] YuviPanda[m]1: Interesting. Something about the default output location and file permissions makes my eye twitch, but I can see the general utility of a time limited auth token for git actions. [16:11:45] Are Cloud VPS Volumes backed up? [16:14:05] I don’t think so https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_Instances#Backups_of_Cloud_VPS_instances [16:14:24] you should assume that they aren't, unless this is a Maxim 41 situation [16:14:58] dschwen: the backing storage is replicated but no, not in a recoverable sense backed up. [16:15:42] ok. So would it be ok, to perform nightly rsyncs to the nfs scratch? [16:15:43] AntiComposite: Maxim 41? [16:15:59] https://www.schlockmercenary.com/2013-10-04 [16:16:28] "Eliza Dushku, The Greatest War Stories Never Told"? [16:16:28] dschwen: that's not backed up either, but I suppose nobody will stop you until there is some horrible pile of too much junk in scratch [16:16:51] ha ha, is ~80GB a horrible pile? [16:18:00] we see worse :) [16:18:16] how many copies of the 80GB do you intend to keep around? ^^ [16:18:43] (or would you just keep rsyncing to the same dir) [16:18:44] the contract for the scratch volume is also "this can disappear at any time" [16:18:54] well, I'm talking about my rendered tiles here. And those grow. I'm right now limited by the size of the cinder volume I'm using. I have previously cached my tiles on nfs... and the "worse" you've seen, was probably... me [16:19:04] https://wikitech.wikimedia.org/wiki/Help:Shared_storage#Disadvantages_of_Shared_NFS_directories [16:19:27] dschwen: scratch seems a really bad solution of tile cache [16:19:31] wait... /data/projects/ can disappear any time? 8-o [16:19:44] that's not scratch [16:19:48] ooohhh [16:19:48] dschwen: no, that's not scratch [16:20:38] right, my mistake, that's /mnt/nfs/secondary-maps/project .. ok, replace any of my mentions of scratch with that [16:21:07] what's your objective with trying to back up this data? [16:21:20] but really using NFS as primary or secondary tile cache seems like a poor use of IOPS. It's a cache right? Not unique content? [16:22:09] Rendering tiles is costly, and I get a much more responsive UX if I hold on to rendered tiles as long as possible [16:22:33] sure, but restoring from backup ... would you ever actually do that? [16:22:51] hmmm [16:23:25] would I ever actually lose my cinder volume? [16:23:49] I think thats the bigger unknown. Running a single rsync command... yeah, I can see myself doing that, easily :-) [16:24:53] those could be tiles that have been accumulated over weeks of WMA usage. :-/ [16:25:23] how long are you planning to cache tiles for anyway? [16:25:23] It is more likely to be corrupted by fs driver problems than "lost". [16:25:26] there is a line to walk, because at the same time I don't want to serve data, that's too stale [16:26:20] AntiComposite, the duration will now be dictated by available storage on the volume. Once I go above a threshold I start deleting the oldest tiles [16:26:47] bd808, yeah, same deal, unless fs corruption is slowly creeping in and not detected by me [16:26:53] NFS is literally the most expensive and poorly performing storage available in Cloud VPS. We do not have an archival or user facing recoverable backup solution. [16:27:59] rendered map tile cache sounds important for perf but not critical for service operation to me, but I may be misunderstanding the expense of rendering and/or the hit rate of the cache [16:28:17] ok. Well, moving to cinder for serving and caching tiles should be a big improvement then. I'll hold of on rsyncing to NFS for now [16:29:21] this could also be a once a week deal, it would probably just be a few minutes of IO. let me think about it , and if I decide to try, I'll talk to you first, so if this causes problems I can stop quickly [16:31:42] dschwen: it is a bit stalled at the moment, but there has been work towards providing an S3 like blob store. When that is functional it will give Cloud VPS users access to a quota limited (this is the big plus for the backend operations side) place to do things like cache map tiles and save tarballs of "backup" content. [16:33:18] the biggest operational issue with NFS today in Cloud VPS is the lack of quotas. When something goes rogue adding data we rely on OS level disk utilization warnings to find out and begging to get things cleaned up. [17:11:12] Is Doug here? [19:46:21] bd808: oooh, what do you think the permissions should be? I've them set to 0550... Makes sense re: default output location - i built that with containers in mind, since I didn't want them to be stored on disk... [19:56:03] YuviPanda[m]1: why would it need group read perms? [19:56:31] I feel like I'm probably missing some context for the whole thing [20:23:48] bd808: people keep putting their private ssh keys on JupyterHub so they can push to GitHub despite the fact I keep telling them to not [20:24:25] bd808: hmm, that's right. let me push it to 0o500 [20:25:46] i meant 0o600 [20:28:27] N00b to the foundation here, wanted to stand up a few new elastic VMs in the deployment-prep project. If I just launch from horizon, will there be any problems? Just reading thru docs and want to make sure I'm not going to start any fires on a Friday [20:29:14] @bd808: fixed (it's 0o600 now) and made a release. I think I'll have to make it read a config file maybe, so the default path can be done better. Thank you so much for pointing this out [20:29:38] YuviPanda[m]1: glad you found my griping useful :) [20:30:12] inflatador: seems safe enough. worst thing I can think is that your instance goes boom, but that shouldn't harm anything else [20:33:08] bd808 I would like to respectfully disagree with "the biggest operational issue with NFS today in Cloud VPS is the lack of quotas". From my perspective, the biggest issue is that NFS latency is so high as to make it unusable for anything other than static bulk storage. See https://phabricator.wikimedia.org/T256426 [20:33:55] Thanks bd808 , will launch momentarily [20:34:09] roy649_: that's a client issue ;) [20:34:34] I don't understand. In what way is it a client issue? [20:35:46] operational == issues that cause me problems operating the service; client == unmet expectations of users [20:37:05] The issue at T256426 is very much about your expectations [20:37:06] T256426: Extremely high latency over NFS between kubernetes node and bastion host - https://phabricator.wikimedia.org/T256426 [20:38:13] Dusting off my long-disused network admin hat, I feel your pain. But putting on my much newer volunteer developer hat, my own pain seems much more important to me at the moment :-) [20:38:27] You should watch this video: https://photos.google.com/share/AF1QipN14SYnMWJGc4BWX0MC8ioIIH7ep7vj7VqDG7ND3nFvRDBXK-LA9nlUc5rwGUnixQ/photo/AF1QipOFHnlajM6UKGDMehMoEejfHiKipD5TKX6kXKqN?key=RUFuUlpVbUxQVXlhYUdyOVJhS18tbGJQU1JwaFFR [20:38:35] demonstrates just how slow NFS is. [20:42:27] * YuviPanda[m]1 feels a sense of deja vu [20:42:28] roy649_: I know this won't make you happier, but really I think the use case that you point out in that ticket (and some past ones if I remember correctly) is invalid. There is no reason to assume that traffic from NFS client A to server and then back to NFS client B would be fast. [20:42:56] the two problems are somewhat connected though [20:42:57] 14 seconds of latency??? [20:43:00] if you are on a nfs cluster where it is, that's awesome. but it's not a goal of NFS [20:43:34] "there's too much junk on NFS" -> NFS has to spend more IO time replicating that [20:44:19] and 14-15 seconds is not "extremely hight latency" [20:46:49] I guess we'll just have to agree to disagree on that point. [21:07:29] haha, fancy looking into the task linked above and finding https://phabricator.wikimedia.org/T127367 and thinking 'oh yeah, that seems interesting' and turns out I opened that issue [21:09:33] heh [21:10:06] it's when you start filing dupe tasks of your own tasks... [21:46:25] Speaking of T127367, what is the actual status of that? It's marked "Open, High", but realistically, no progress appears to have been made in several years. [21:46:25] T127367: Provide modern, non-NFS error log solution for Toolforge webservices and bots - https://phabricator.wikimedia.org/T127367 [21:57:23] !log devtools - deleted instances "doc" and "doc1002" to make room for gitlab instance T299561 - T297411 [21:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [21:57:27] T299561: Request increased quota for devtools Cloud VPS project - https://phabricator.wikimedia.org/T299561 [21:57:27] T297411: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 [22:11:53] !log devtools - created new instance gitlab-prod-1001T297411 [22:11:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [22:11:59] !log devtools - created new instance gitlab-prod-1001 T297411 [22:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Devtools/SAL [22:12:01] T297411: Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 [22:29:28] roy649_: I experimented a bit with grafana loki earlier this month and now think it can probably work for that use case when we have a swift object storage service that can be used from cloud vps [22:29:33] no other real progress I think [22:35:37] and it all comes back around [22:35:59] taavi not sure if you got my DM, but are you available next wk to talk deployment-prep? Maybe 30m or so [22:37:27] also, just noticed your comment on T299797 , reading now [22:37:28] T299797: Deploy new elastic cluster nodes on deployment-prep - https://phabricator.wikimedia.org/T299797 [22:37:29] inflatador: yeah I saw that but didn't manage to reply yet, sorry :/ [22:39:19] no worries , it's Friday (at least in my TZ) ;) . Will follow your advice in the ticket, but if you have availability for a quick meet up next wk let me know. My hours are 1400UTC-2300 UTC but can adapt as necessary [22:39:39] err, 1400 UTC - 2100 UTC that is