[06:15:42] hello folks! [07:57:32] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Match model-server dockerfiles with blubber files - https://phabricator.wikimedia.org/T289127 (10elukey) Partially related, but going to add a note in here as well: the SRE team suggested that we may want to factor out a base image for ores/revscoring/etc..... [09:05:37] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Add inference-services CI pipelines to the Zuul gate-and-submit - https://phabricator.wikimedia.org/T289562 (10kevinbazira) [09:09:31] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Add inference-services CI pipelines to the Zuul gate-and-submit - https://phabricator.wikimedia.org/T289562 (10kevinbazira) a:03kevinbazira [09:28:30] ok so thanos swift migrate to bullseye, some errors popped up with other clients, so I killed the pods to see if the storage initializer was ok [09:28:33] and [09:28:36] botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://s3.us-east-1.amazonaws.com/wmf-ml-models?prefix=goodfaith%2Fenwiki%2F202105140814%2F&encoding-type=url [09:29:00] so it seems that for some reason the kfserving endpoint override doesn't work anymore [09:29:05] not sure why [10:10:00] 10Lift-Wing, 10Machine-Learning-Team (Active Tasks): Match model-server dockerfiles with blubber files - https://phabricator.wikimedia.org/T289127 (10elukey) Adding also another thought - it would be great that CI alerted us when Dockerfiles are not in sync with the blubber specs. If the community starts using... [10:42:29] * elukey lunch! [13:24:54] ok this makes zero sense - https://github.com/boto/boto3/pull/2746 [13:25:05] but apparently we use it in the storage initializer [13:25:11] how did it work before? [13:32:50] it must be kubeflow's python code that reads it [13:34:46] You mean the KF code reads the env var and then populates other places? [13:35:24] I think so, or similar [13:35:29] in the container's env I see [13:35:30] S3_ENDPOINT=thanos-swift.discovery.wmnet [13:35:30] AWS_ENDPOINT_URL=https://thanos-swift.discovery.wmnet [13:36:10] and https://github.com/kubeflow/kfserving/blob/e394be47a7c4cc93b023c9b2b1760e54d1257551/python/kfserving/kfserving/storage.py#L106 [13:36:41] But why would it stop doing that? [13:37:10] I killed the pods previously, so it seems something related to a fresh start [13:37:14] that I don't fully grasp [13:37:46] Hrm. I'd expect a pod restart to be no different from a fresh start? [13:38:01] But maybe some version changed behind the scenes [13:39:25] in theory no, the images are the same [13:39:41] maybe I have configured something manually that it is now gone [13:40:17] Did you start the pods from the same host? Doe sit maybe have some env var set? [13:42:25] I did some manual changes to the inferenceservice stuff IIRC, I am going to re-followhttps://github.com/kubeflow/kfserving/blob/master/docs/samples/storage/s3/README.md [14:50:16] elukey: on another note, we got new kernels because of course we do. I'd like to do a round of reboots tomorrow, wdyt? [14:55:14] klausman: +1 [14:55:32] I'll let you know before I start, so we don't step on each other's toes. [14:56:10] ack sure [16:06:12] o/ [16:06:27] elukey: welcome back to KF-land :) [16:23:54] accraze: helloooo [16:24:09] thanks I missed staring at stack traces wondering about my career choices :D [16:24:51] jokes aside, I have no idea why killing containers caused this [16:25:16] it seems as if kfserving/boto/etc.. do not pick up our endpoint anymore, defaulting to aws [16:25:28] the environment variables that should be populated are there (checked with kubectl) [16:25:59] that's really weird, is it some sort of network change on the thanos swift side? [16:26:15] and I was checking https://github.com/kubeflow/kfserving/blob/release-0.6/python/kfserving/kfserving/storage.py#L103-L105 [16:26:49] accraze: so everything started from https://phabricator.wikimedia.org/T289076, thanos + swift migrated to bullseye [16:27:12] so since the region seemed to become "us-east-1" rather than "US" I wanted to check [16:27:23] but now it is boto trying to connect to AWS [16:27:30] instead of swift [16:27:36] botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://s3.us-east-1.amazonaws.com/wmf-ml-models?prefix=goodfaith%2Fenwiki%2F202105140814%2F&encoding-type=url" [16:33:32] and boto's version is the same, the docker image didn't change (for the storage initializer) [17:25:30] I hate technology, I should have been a baker [17:26:30] I'll restart tomorrow with a fresh mind [17:29:36] chrisalbon: bakers hate cupcakes [17:29:55] https://bash.toolforge.org/quip/AVMFV-wdAwBIDrTOgNCr [17:30:18] lol [17:30:27] lol amazing [17:31:03] That's one of my favourites [17:48:41] * elukey afk! [17:52:21] Google Drive is like a bottomless pit. You just throw documents in it and then can't ever find them again [17:53:09] I JUST MADE YOU WHERE ARE YOU DSFDAR ARRGGGHHH HSHFDHSFHHFDSHFSD [18:16:04] @chrisalbon on your tweet question about IRC: Far from being an IRC regular for years it took me a few minutes to find my way to the webchat and get my nick registered as seemed necessary for this channel [18:16:26] awesome thanks! I think maybe 2 people got in [18:16:42] Definitely not as easy as I wish it was [18:17:47] yeah, requiring an account (+r mode) definitely makes it much harder [18:17:51] chrisalbon: my challenge was that I thought it was wikimedia-ai [18:18:04] i.e., why I wasn't here before [18:18:13] hi, well you are an old hand at this! [18:19:52] We are slowly crushing the AI hype, one IRC channel name at a time [18:20:15] haha [18:23:14] I wonder if we could remove the nickserv requirement for this channel, allowing anonymous folks to join, just to make the UX more streamlined [18:25:25] technically possible (if that's what you're asking) [18:27:35] as a member of a research group that has a public irc i can share that it's been very nice to have the occassional gentoo hacker be able to drop in, we get newcomers set up with irccloud [18:28:30] Hmmmmmm. That could be nice. Then people could at least use web.libera.chat and get in easily [18:28:45] we don't have +r set and we occassionally get spam [18:28:56] sometimes the spam has been very gross and annoying [18:29:15] but registering with nickserv is a bad ux [18:29:27] Yeah one the big things Ive learned working at WMF is that volunteer's activity is really spike-y (for obvious reasons) and removing barriers means they wouldn't need to relearn and resetup things each spike, they could just jump back in [18:29:28] auto-send anonymous folks a message to help them register? [18:29:40] people just don't expect to have to handle identity by talking to a bot [18:31:28] in case of loads of spam: A captcha-like question to answer within a minute or get auto-kicked? [18:32:00] the spam has been really bursty [18:32:07] so we just set up +r mode during the attacks [18:32:11] or just temporarily set +r during spam, no need to over-engineer [18:32:14] yeah [18:32:25] libera folks are pretty good at dealing with it [18:32:47] we're #communitydata@oftc.net btw [18:33:33] My ideal UX would be someone sees we have a public IRC, then they see a link they can click on that brings them into this chat, showing them that it is active and they can get in, after that if they need to do additional setup to get nickserv setup that is fine [18:34:59] Right now, I can't even give people a single link to click to get here. Which you can see on our team page with the IRC box https://www.mediawiki.org/wiki/Machine_Learning [18:35:13] "Discuss machine learning and watch the team work joining our public IRC chatroom #wikimedia-ml on irc.libera.chat" [18:35:20] That UX is bad [18:35:37] web.libera.chat/#wikimedia-ml [18:36:06] TIL! Okay cool, that solves one issue! [18:36:19] Also TIL you can have # in urls [18:37:37] If we removed the identity requirement then combined with majavah's link, the UX could be great. It would be a one-click to join. [18:38:38] Removing the +r channel mode does that, I don't have chanserv perms needed to do it myself [18:39:21] Me either! [18:39:31] Let me go ask around [18:39:41] you seem to do [18:40:17] Wait really? I thought I needed to be Op'd [18:40:17] use /msg chanserv op #wikimedia-ml to "op up" which lets you do most things, then /mode #wikimedia-ml -r [18:40:43] sheesh wow, alright [18:41:01] Majavah you are a hero right now [18:43:27] That seems way easier [18:44:29] ohai (this is greg-g :P) [18:44:35] GREG! [18:44:43] Alright cool, I am liking this. [18:44:56] Thanks Majavah and groceryheist [18:45:29] this is chrisalbon [18:45:41] No chat history, which was obvious [18:45:46] it even has typing... notification?! [18:45:53] but definitely smoother than before [18:46:08] * not-greg goes back to regularly scheduled activities [18:46:12] oh wow it does [18:46:17] bye greg! [18:49:52] nice! [18:53:43] Hi everyone [18:54:23] hi! [18:54:35] This feels a lot nicer of a UX [18:54:51] Hi [18:55:07] I wish there was chat history, but alas, IRC is IRC [18:58:24] chrisalbon: eventually [18:58:34] There is playback on some networks for recent messages now [18:58:41] oh really? [18:58:46] That could be cool [19:01:23] alright back to meetngs [19:07:05] chrisalbon: yeah [19:07:08] Snoonet does it [19:07:27] It's per channel and normally only like 5-10 messages [19:09:38] channel logs (https://wm-bot.wmflabs.org/libera_logs/%23wikimedia-ml/20210824.txt) exist too, but that isn't as convinient as a client properly displaying something [19:27:29] ml-team [20:51:22] Hi Chris