[06:23:41] yeah, diagnose, don't treat :) [06:23:54] oops, that was an accidental history recall :) [06:57:44] Amir1: o/ if you have any doubt, https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage [07:11:31] 10Machine-Learning-Team, 10Research (FY2022-23-Research-April-June): (stretch) Deploy multilingual readability model to LiftWing - https://phabricator.wikimedia.org/T334182 (10MGerlach) @achou this is great. I tried from the stat1008 and can confirm that this works. Would it be possible to make it available pu... [07:21:12] 10Machine-Learning-Team, 10Research (FY2022-23-Research-April-June): (stretch) Deploy multilingual readability model to LiftWing - https://phabricator.wikimedia.org/T334182 (10elukey) >>! In T334182#9003995, @MGerlach wrote: > @achou this is great. I tried from the stat1008 and can confirm that this works. > W... [08:03:08] o/ probably I messed it up with my consecutive changes and removed the trailing slash, but port is correct [08:12:24] 10Machine-Learning-Team: Create access logs and logstash dashboard for ores-legacy - https://phabricator.wikimedia.org/T341547 (10isarantopoulos) [08:13:49] 10Machine-Learning-Team: Create access logs and logstash dashboard for ores-legacy - https://phabricator.wikimedia.org/T341547 (10isarantopoulos) [09:11:11] Hi elukey, I worked on pushing the recommendation-api CI pipelines in: https://gerrit.wikimedia.org/r/935880 [09:11:11] As per your comment, it looks like that is what was left before we merge the images: https://gerrit.wikimedia.org/r/c/research/recommendation-api/+/932810/comments/a68216b7_86dc77a5 [09:11:12] Should we finalize and go ahead to merge the images in: https://gerrit.wikimedia.org/r/932810 [09:11:29] isaranto: --^ [09:11:40] I think that we should be ready to go, thoughts? [09:11:55] hi kevinbazira :) [09:13:45] I haven't LGTM'd on that one because I wasn't sure if that open discussion was still ongoing. Other than that, I can +1 [09:13:56] (03CR) 10Klausman: [C: 03+1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [09:27:30] (03PS1) 10AikoChou: readability: raise 400 when failing to fetch revision from MW API [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/937052 [09:40:02] I'm reviewing again now! [09:47:22] elukey: FYI https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/936796/1#message-31a095025473b140bbadb45fbaaf401a87bacd07 [09:47:32] I think we eventually need to have envoy proxy for it [09:50:22] Amir1: liftwing is already in the envoy proxy list [09:50:52] not sure if it is in the appservers [09:51:25] oh nice, we need to find a way to use it, because the host header is set not to liftwing but to a different host so we can't override it [09:52:41] lemme check [09:53:53] (03CR) 10Ilias Sarantopoulos: Set up production and test images for the recommendation-api migration (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [09:54:10] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/926420/14/includes/LiftWingService.php#147 [09:54:13] it is not enabled on appservers, trying to figure out where to set it [09:54:13] Line 147 [09:54:38] I know where to set it: Bug Joe [09:54:47] nono found it [09:55:49] kevinbazira: o/ I left a comment. I think we need to add config.yaml and we would be ready to go! [09:56:26] Amir1: is there a task that I should use? [09:56:32] I tried running the test image but I got an error that it wasnt finding the tox.ini file. But perhaps I've done sth wrong so I'll wait for CI to show [09:56:41] elukey: T319170 ? [09:57:41] https://gerrit.wikimedia.org/r/c/operations/puppet/+/937056 [09:57:43] running pcc now [10:01:27] yep looks good, after service ops reviews it we should be good [10:01:34] is it super urgent Amir1 ? [10:01:42] nope [10:01:45] I mean, is anything broken etc.. [10:01:46] perfect [10:02:24] for now, it should be able to connect to inference directly. It's not working on testwiki yet but that's probably broken for other reasons [10:05:01] Amir1: I agree at the moment I'd blame isaranto :D [10:05:38] haha, I have trouble debugging it, I probably should run the job in debug mode and see how it goes [10:10:41] (03PS26) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [10:10:51] 10Machine-Learning-Team, 10API Platform, 10Anti-Harassment, 10Content-Transform-Team, and 18 others: Migrate PipelineLib repos to GitLab - https://phabricator.wikimedia.org/T332953 (10fnegri) [10:13:46] (03CR) 10Kevin Bazira: Set up production and test images for the recommendation-api migration (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [10:15:13] Running the job in debug mode gives this [10:15:15] [debug] [ORES] Requesting: https://inference.discovery.wmnet:30443/v1/models/testwiki-damaging:predict [10:15:16] [info] [http] [2023-07-11T10:14:29+00:00] POST https://inference.discovery.wmnet:30443/v1/models/testwiki-damaging:predict HTTP/1.1 - 404 NULL [10:15:33] does this look correct (the URL, the response is obviously not right) [10:15:38] it is definitely my fault :) !! Amir1 I'm interested in learning how to debug this.. is there a dashboard with logs at least? I cant find sth on logstash [10:16:27] isaranto: it's actually quite simple if you have access to mwmaint, you login to mwmaint, run "mwscript eval.php --wiki=testwiki -d 3" and then make the job and run it [10:16:36] $job = new ORES\Services\FetchScoreJob( Title::newFromText( 'Maria_Louisa_Bustill' ), [ 'revid' => 575471, 'precache' => true ] ); [10:16:36] $job->run(); [10:17:11] (you can try it in beta cluster though but it works there) [10:17:52] unfortunately it doesn't log what headers it set or what's the body [10:18:45] * elukey lunch! [10:20:02] the url is correct and if I ran it from a statbox it returns a 200 response with the predictions. By the 404 response it could mean that the header is not set correctly. lemme check.. [10:20:02] ``` [10:20:03] curl "https://inference.discovery.wmnet:30443/v1/models/testwiki-goodfaith:predict" -X POST -d '{"rev_id": 575471}' -i -H "Host: testwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org" [10:20:03] ``` [10:31:03] (03CR) 10Ilias Sarantopoulos: "recheck" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [10:31:26] isaranto: the headers [10:31:28] https://www.irccloud.com/pastebin/3s8vgFNs/ [10:32:46] the Host looks correct too [10:34:29] in mwdebug I'm getting this [10:34:32] https://www.irccloud.com/pastebin/A4J8Cxia/ [10:35:03] progress? [10:38:43] loool [10:39:23] Amir1: I figured out I never deployed damaging...the request I pasted above was for goodfaith [10:39:38] oopsie [10:39:50] your debugging helped [10:40:14] nah, I was mostly a glorified logstash [10:40:29] still [10:45:30] ok, I just deployed so it will take a while. super super thanks Amir1 <3 [10:45:49] we hope to soon be out of your way [10:46:01] Thank you for deploying [10:46:10] let me see if it stores the scores now [10:46:53] the pods haven't started yet . I'll let u know when they do [10:48:47] ah okay [10:48:51] I'm too excited [10:55:07] I was excited too, that's why I forgot to deploy that model server! [10:55:07] hmm I dont see the testwiki pods coming up. I'm going for lunch and will check again afterwards [11:00:25] the pods are up and running. I tested damaging and it works! going for lunch! [11:23:22] [warning] [ORES] Service failed to respond properly: Failed to make LiftWing request to [https://inference.discovery.wmnet:30443/v1/models/testwiki-damaging:predict], There was a problem during the HTTP request: 404 Not Found 😭 [11:24:26] the url works from mwmaint though so some progress [11:36:52] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [12:59:24] still no luck :( [12:59:30] I'm jumping into meetings for now [13:09:19] Amir1: the pods are up afaics [13:11:15] Yeah. The curl now works but the job doesn't [13:11:59] I'll debug more once I'm out of meetings and some other high prio fires I need to attend [13:13:55] Amir1: ack, the envoy proxy on mw nodes is rolling out - https://phabricator.wikimedia.org/P49547 [13:14:08] cc: isaranto: --^ [13:21:13] Awesome. Thanks [13:29:45] Good morning all [13:30:15] morning! [14:04:08] (03CR) 10Klausman: "wget exit code (4) is indicating "network failure", but no further specifics." [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [14:06:08] (03PS27) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [14:09:34] (03CR) 10Kevin Bazira: Set up production and test images for the recommendation-api migration (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [14:09:42] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [14:23:03] 10Machine-Learning-Team: FeatureNotFound exception in revertrisk multi-lingual - https://phabricator.wikimedia.org/T340812 (10isarantopoulos) a:05isarantopoulos→03None [14:56:15] 10Machine-Learning-Team, 10Patch-For-Review: Create ORES migration endpoint (ORES/Liftwing translation) - https://phabricator.wikimedia.org/T330414 (10isarantopoulos) 05In progress→03Resolved [14:56:17] 10Machine-Learning-Team, 10Epic: Migrate ORES clients to LiftWing - https://phabricator.wikimedia.org/T312518 (10isarantopoulos) [15:20:02] (03PS1) 10Ilias Sarantopoulos: fix lift wing URL by adding slash suffix [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) [15:23:06] (03CR) 10Ladsgroup: "In future, I would like to have this as null and skipping liftwing when it's null. This is extremely wikimedia-specific and I want to have" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:23:53] (03CR) 10Elukey: "should we use the localhost endpoint since we need another deploy?" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:25:38] (03CR) 10Ladsgroup: fix lift wing URL by adding slash suffix (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:26:02] (03CR) 10Ilias Sarantopoulos: fix lift wing URL by adding slash suffix (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:27:14] (03PS28) 10Kevin Bazira: Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) [15:29:46] (03CR) 10Hashar: Set up production and test images for the recommendation-api migration (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [15:30:47] (03CR) 10CI reject: [V: 04-1] Set up production and test images for the recommendation-api migration [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [15:40:02] (03CR) 10Elukey: fix lift wing URL by adding slash suffix (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [15:40:47] (03PS2) 10Ilias Sarantopoulos: fix lift wing URL by adding slash suffix [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) [15:56:55] elukey: you aware of things going on with ml-staging-ctrl2002? [15:58:00] nvm, alert about failed systemd unit cleared [15:59:14] klausman: didn't see it, feel free to investigate if you want :) [15:59:33] I didn't pay attention _which unit_ it was before it was gone :-/ [15:59:46] also, whole bunch of "unknowns" about NREP services [16:00:18] from our POV specifically checking for envoy on ORES, but those, too, are gone now. [16:01:21] This is the downside of having the AM page open: you get these blips [16:30:47] going afk folks! Have a nice rest of the day :) [16:57:31] progress! [17:01:01] good afternoon Luca [17:01:21] I figured out the headers in the request were not set correctly! [17:01:48] * isaranto is mumbling about ores extension [17:06:02] (03PS1) 10Ilias Sarantopoulos: fix: add request headers properly [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937158 (https://phabricator.wikimedia.org/T319170) [17:11:41] (03CR) 10Ilias Sarantopoulos: "The headers were not set correctly. I don't remember where I saw it and I put it like that previously. 😭" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937158 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [17:12:11] going afk, ciao folks! [17:16:17] (03CR) 10Ilias Sarantopoulos: "tested on wmaint" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937158 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [17:41:40] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10Miriam) p:05Triage→03Medium a:03Isaac [18:51:16] (03CR) 10Ladsgroup: [C: 03+2] "Fixes the issue but worth noting that we need articlequality model too: https://phabricator.wikimedia.org/P49554 (and draftquality too) se" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937158 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [18:53:53] (03Merged) 10jenkins-bot: fix: add request headers properly [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937158 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [21:46:17] 10Machine-Learning-Team, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Python torch fills disk of CI Jenkins instances - https://phabricator.wikimedia.org/T338317 (10hashar) As an update, I have rebuild all the Jenkins agent instances last week. The disk space allocated to Docker went...