[05:52:23] Hola & Happy Friday o/ [07:27:59] 06Machine-Learning-Team, 13Patch-For-Review: Support building and running of logo-detection model-server via Makefile - https://phabricator.wikimedia.org/T363294#9747133 (10kevinbazira) [07:33:44] 06Machine-Learning-Team, 13Patch-For-Review: Support building and running of logo-detection model-server via Makefile - https://phabricator.wikimedia.org/T363294#9747136 (10kevinbazira) 05Open→03Resolved Support for building and running the logo-detection model-server using the `Makefile` was added and... [07:49:07] ciao! buongiorno o/ [07:52:06] hey Aiko! [08:31:54] (03PS1) 10Kevin Bazira: logo-detection: restrict image processing to trusted domains [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) [08:42:24] (03CR) 10Kevin Bazira: "This restriction has been tested locally and here are the results:" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [09:00:35] (03PS4) 10Ilias Sarantopoulos: utils: slow function execution wrapper [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1024425 (https://phabricator.wikimedia.org/T362663) [09:00:41] (03PS5) 10Ilias Sarantopoulos: utils: slow function execution wrapper [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1024425 (https://phabricator.wikimedia.org/T362663) [09:01:38] (03PS6) 10Ilias Sarantopoulos: utils: slow function execution wrapper [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1024425 (https://phabricator.wikimedia.org/T362663) [09:01:51] Morning [09:04:25] hey Tobias o/ [09:04:44] hope you feel better <3 [09:18:13] it's so-so. Showering and steam/moisture help to a degree. [09:18:33] but then a breeze comes through and I can't see out of my eyes :) [09:19:01] Unfortunately, the antihistamines I take need like 2-3 days before they work reliably [09:38:06] 06Machine-Learning-Team: Sprint: Airflow training pipeline - https://phabricator.wikimedia.org/T363554 (10achou) 03NEW [11:18:20] * klausman lunch [11:30:18] * isaranto lunch as well [12:14:27] hello folks [12:25:17] hey Luca! [12:32:30] I am going to rollout the new istio changes to ml-staging, it may affect connections to mw-api. In case lemme know :) [12:33:08] go ahead! [12:44:43] 06Machine-Learning-Team, 06Research: Allow calling revertrisk language agnostic and revert risk multilingual APIs in a pre-save context - https://phabricator.wikimedia.org/T356102#9747778 (10achou) Hi @kostajh, yes, this is something we can work on this quarter. I am wondering if there's an ongoing project or... [12:47:46] isaranto: if you have a moment https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1023475 [12:47:55] this is after https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1021981 [12:48:40] basically the idea is that the chart injects WIKI_URL=http://en.wikipedia.org if it is not already set [12:48:47] atm only for revscoring services [12:49:10] thankss [12:52:00] so far all works (the new istio configs I mean) [12:52:12] going to test the WIKI_URL stuff for revscoring now [12:53:26] yesss it works! [12:54:05] partially, not for wikidata of course [12:54:08] * elukey checks [12:54:40] WIKI_URL: http://wikidata.wikipedia.org:80 [12:54:47] * elukey cries in a corner [12:54:53] I thought I had fixed that use case [12:56:41] I'm testing the patch locally atm [12:57:07] what patch? [12:58:16] the chart changes [12:58:21] I mean I'm reviewing it [12:58:54] a it is merged, lol [12:58:57] nevermind :) [12:59:02] I think I know what's happening [12:59:16] Cannot connect to host wikidata.wikipedia.org:80 ssl:default [Name or service not known] [12:59:35] for non-wikipedia.org use cases we override the host header [13:00:00] but we never did it for wikidata, implicitly using wikidata.wikipedia.org [13:00:14] because it was ending up to the same backends that accepted the name [13:00:23] now that name is resolved before hitting the istio proxy [13:00:25] so it complains [13:01:10] I need to re-think a little the chart's template [13:05:34] ook I should have it [13:05:43] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1024661 and next [13:09:14] yep the CI diff looks good :) [13:13:38] I still get this as WIKI_URL "http://wikidata.wikipedia.org:80" [13:13:59] ? [13:14:12] the wiki url in the diff is changing, only for staging though [13:15:36] I tested the new patch locally with `helmfile -e ml-staging-codfw template` and got the above for ` kserve-predictor-wikidatawiki-damaging` [13:16:40] oh nevermind [13:17:16] just realized it was a test. So for me to see a difference we'd need to add this entry to the values.yaml [13:19:08] yep yep [13:19:34] clear! I added the host entry to ml-staging and got WIKI_URL "http://wikidata.org:80" and WIKI_HOST "wikidata.org" [13:19:39] \o/ [13:20:22] in theory all the corner cases should be fixed, I'll run httpbb to confirm [13:20:47] and the prod's upgrade will be with a DC depooled every time, so we'll be able to run httpbb as well to check isvcs [13:20:56] if anything returns an error we'll get it easily [13:26:35] ack [13:27:39] I just remembered I added this task/bug report a couple of days ago when I noticed an httpbb test failing https://phabricator.wikimedia.org/T363334 [13:29:17] probably has to do withe revision id but need to follow up. Just keep it in mind if you get a failure for enwiki-articletopic [13:29:59] * elukey nods [13:32:03] weird, now the wikidata error is that it tries to use https directly [13:33:17] (03PS1) 10Elukey: outlink_topic_model: fix formatting of README.md to please CI [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1024688 [13:34:25] (03CR) 10Ilias Sarantopoulos: [C:03+1] outlink_topic_model: fix formatting of README.md to please CI [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1024688 (owner: 10Elukey) [13:35:52] (03CR) 10Elukey: [V:03+2 C:03+2] outlink_topic_model: fix formatting of README.md to please CI [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1024688 (owner: 10Elukey) [13:40:39] 06Machine-Learning-Team, 13Patch-For-Review: Deploy RR-language-agnostic batch version to prod - https://phabricator.wikimedia.org/T358744#9747955 (10achou) I got an error when testing the batch model after deploying the new image of kserve 0.12.1 for revert risk models ` aikochou@deploy1002:~$ curl "https://i... [13:57:28] (03PS3) 10AikoChou: revertrisk: support all wikis and upgrade KI to v0.7 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023809 (https://phabricator.wikimedia.org/T363203) (owner: 10Ilias Sarantopoulos) [14:00:49] isaranto: ---^ I updated the patch since the knowledge integrity has merged the change of default thresholds and released a new ver [14:01:08] tested it locally without any issue [14:02:06] we'll get logs like INFO:root:Received request for revision 1234 (zgh). [14:02:06] INFO:root:Unsupported lang: zgh. [14:02:06] INFO:knowledge_integrity.featureset:Could not find quality thresholds for wiki: zgh, using default values instead. [14:04:35] that's awesome! [14:10:07] (03CR) 10AikoChou: [C:03+1] revertrisk: support all wikis and upgrade KI to v0.7 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023809 (https://phabricator.wikimedia.org/T363203) (owner: 10Ilias Sarantopoulos) [14:14:24] 06Machine-Learning-Team, 13Patch-For-Review: Investigate how to implement batch inference for revertrisk-multilingual - https://phabricator.wikimedia.org/T355656#9748042 (10achou) 05Open→03Declined [14:18:47] (03CR) 10Ilias Sarantopoulos: [C:03+1] revertrisk: support all wikis and upgrade KI to v0.7 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023809 (https://phabricator.wikimedia.org/T363203) (owner: 10Ilias Sarantopoulos) [14:19:05] aiko: shall I merge the above then? [14:21:12] yes thankss [14:22:44] * elukey bbiab [14:23:18] (03CR) 10Ilias Sarantopoulos: [C:03+2] revertrisk: support all wikis and upgrade KI to v0.7 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023809 (https://phabricator.wikimedia.org/T363203) (owner: 10Ilias Sarantopoulos) [14:28:07] (03Merged) 10jenkins-bot: revertrisk: support all wikis and upgrade KI to v0.7 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023809 (https://phabricator.wikimedia.org/T363203) (owner: 10Ilias Sarantopoulos) [14:41:31] I updated the rrla image and plan to deploy it to staging today. I'll leave production deployment for monday https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1024699 [15:06:17] ack! I left a comment [15:12:46] so for some reason, the wikidata calls are getting various 301 redirects to https://etc.. and that fails [15:12:52] sort of what we noticed in revert risk [15:14:42] I think it is because.... [15:14:46] it's Friday! [15:15:56] aiko: thanks for spotting the batcher thing. I updated the patch [15:17:54] mmm maybe there is a solution, aiohttp allows the usage of a proxy [15:26:30] elukey: do you need any help? I'm trying to wrap up the stuff I'm working on at the moment but I can help on Monday morning! [15:28:41] isaranto: nono I think I know what is the required change, but it is probably needed in python-mwapi :( Need to run some tests, but if I am right we should also solve the zh-yue -> yue kind of redirect [15:30:08] ack! [15:30:53] * isaranto afk - bbl [16:04:07] no I think that the proxy is not the solution [16:04:28] we should, in theory, avoid redirects and fix the Location: etc.. content [16:10:23] ok solved the mistery, s/wikidata.org/www.wikidata.org/g [16:11:18] but the redirect issue remain [16:11:30] if anything changes on the mw api and they start returning 301s, it is a problem [16:13:12] this in theory is a problem also for the current code [16:23:38] Sending to inference-staging.svc.codfw.wmnet... [16:23:38] PASS: 13 requests sent to inference-staging.svc.codfw.wmnet. All assertions passed. [16:23:41] \o/ [16:23:46] Very nice! [16:24:31] the redirect/301 issue is sneaky though, need to think a little more how to fix it [16:28:27] Are those path-level redirects or is it trying to move the request from http to https? [16:29:22] sometimes it is only the latter, but it could be both [16:30:06] Do you think we can tell istio/Envoy to rewrite the redirect on the way back fro https to http? [16:30:44] it should be a matter of fixing the Location header before it arrives back to aiohttp, but in the VS configuration I don't see a way to apply a regex or similar [16:30:50] https://istio.io/latest/docs/reference/config/networking/virtual-service/#Headers-HeaderOperations [16:32:26] but we can surely do it in the code [16:32:28] in theory: [16:32:35] 1) tell aiohttp to not follow redirects [16:32:57] 2) use a custom function that checks the response code, and if it is a 301 it fixes the Location header [16:33:29] should be as simple as s|https://|http:// [16:35:57] not great but I don't have a better solution [16:39:51] 06Machine-Learning-Team, 13Patch-For-Review: Improve Istio's mesh traffic transparent proxy capabilities for external domains accessed by Lift Wing - https://phabricator.wikimedia.org/T353622#9748592 (10elukey) Test in staging has been done, and it was successful! All the revscoring services are now running wi... [16:43:52] klausman: as FYI admin_ng's knative-serving diff for all ml prod cluster shows the diff for the new changes to be deployed, it will be like that until we do the prod rollout [17:04:03] going afk for the weekend! [17:04:11] Enjoy the rest of the day and the weekend folks [17:07:22] ciao Luca have a nice weekend! [17:22:08] I deployed the changes for revertrisk to ml-staging and it works great! [17:24:08] 06Machine-Learning-Team, 13Patch-For-Review: Unsupported lang error for some wiki for revertrisk-language-agnostic calls - https://phabricator.wikimedia.org/T363203#9748826 (10isarantopoulos) The wiki/language restrictions have been lifted. The new changes have have been deployed to ml-staging for the moment a... [17:31:43] (03CR) 10Ilias Sarantopoulos: "Thank you for working on this Kevin!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1023542 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [17:52:00] (03CR) 10Ilias Sarantopoulos: "Nice work here Jason! I have a small question, other than that LGTM!" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) (owner: 10Jsn.sherman) [17:52:16] going afk for the weekend folks o/ [19:37:33] (03PS6) 10Umherirrender: Migrate usage of Database::delete, insert, update and upsert to QueryBuilder [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1007862 (https://phabricator.wikimedia.org/T358831) (owner: 10MPGuy2824) [19:37:38] (03CR) 10Umherirrender: Migrate usage of Database::delete, insert, update and upsert to QueryBuilder (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1007862 (https://phabricator.wikimedia.org/T358831) (owner: 10MPGuy2824)