[01:17:22] 10artificial-intelligence, 10Structured-Data-Backlog: Implement NSFW image classifier using Open NSFW - https://phabricator.wikimedia.org/T214201 (10Frostly) https://github.com/infinitered/nsfwjs might be interesting for implementation too (it can be run on Node) [05:43:44] ragesoss: We'll take a look! afaik CORS wouldn't be enabled on LW [05:44:26] I made an effort here based on Luca's previous patches for ores-legacy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/964625 [05:44:40] but no diff! so sth is missing [06:46:52] isaranto: o/ [06:46:54] almost! [06:47:14] ingress is not under networkpolicy, but at the same level [06:47:21] if you move it it should work :) [06:48:52] ah no wait, still sleeping :D [06:49:26] we cannot use that one for inference, since the "ingress" module is only available for services like ores-legacy and rec-api-ng (that use the "serviceops" template) [06:49:42] so we'll need to inject those values to our istio configs [06:49:48] via isvc probably [06:50:31] https://github.com/kserve/kserve/issues/721 doesn't look good [06:51:32] people in https://github.com/kserve/kserve/issues/1902 suggest to set headers via uvicorn, sigh [06:58:24] ok will check! [06:58:31] afk, back online later! [08:26:52] elukey: o/ [08:26:52] are you able to check rec-api-ng logs on LiftWing staging? [08:26:52] on my end, running `kubectl logs -p recommendation-api-ng-main-675599698-t8nl8 -c recommendation-api-ng-main` [08:26:52] returns `Error from server (BadRequest): previous terminated container "recommendation-api-ng-main" in pod "recommendation-api-ng-main-675599698-t8nl8" not found [08:26:52] ` [08:26:53] yet the pod shows that it is running `kubectl get pods [08:26:53] NAME READY STATUS RESTARTS AGE [08:26:54] recommendation-api-ng-main-675599698-t8nl8 2/2 Running 0 18m [08:26:54] ` [08:28:54] kevinbazira: `kubectl logs recommendation-api-ng-main-675599698-t8nl8 -n recommendation-api-ng recommendation-api-ng-main` works [08:29:14] in your case you may not need the -n etc.. [08:30:34] I am testing the endpoint, I see [08:30:35] Tue Oct 10 08:29:46 2023 - *** HARAKIRI ON WORKER 2 (pid: 139, try: 1) *** [08:31:13] "Every request that will take longer than the seconds specified in the harakiri timeout will be dropped and the corresponding worker recycled." [08:31:16] ah okok [08:31:59] monrin' [08:32:15] morning :) [08:32:20] elukey: I do find it puzzling that kubectl logs -n recommendation-api-ng -p recommendation-api-ng-main-675599698-t8nl8 -c recommendation-api-ng-main doesn't work [08:32:36] And the error message is confusing, too: [08:32:36] klausman: no idea [08:32:40] Error from server (BadRequest): previous terminated container "recommendation-api-ng-main" in pod "recommendation-api-ng-main-675599698-t8nl8" not found [08:33:08] kevinbazira: I checked the pods cpu consumption etc.., nothing weird registered https://grafana.wikimedia.org/d/hyl18XgMk/kubernetes-container-details?orgId=1&var-datasource=codfw%20prometheus%2Fk8s-mlstaging&var-namespace=recommendation-api-ng&var-pod=recommendation-api-ng-main-675599698-t8nl8&var-container=All [08:33:28] but I suspect that the 10 processes spawned to fetch data from wikidata are the culprit [08:33:33] Oh, I know: I though -p selected the pod, but it means "previous" [08:33:39] The pod is just a plain arg [08:34:02] So this works: kubectl logs -n recommendation-api-ng recommendation-api-ng-main-675599698-t8nl8 -c recommendation-api-ng-main [08:34:17] -n for namespace, -c for the container, but no flag for the pod name. [08:34:22] -c is not needed [08:34:38] and Kevin doesn't use admin, so the -n is not needed too [08:34:44] oh alright [08:34:51] (it is already assumed) [08:35:11] (03CR) 10AikoChou: [C: 03+1] events: drop support for /mediawiki/revision/create#1.x events [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/930665 (https://phabricator.wikimedia.org/T267648) (owner: 10DCausse) [08:36:01] It's a bit odd that the help text does not mention that -c optional [08:36:21] thanks elukey and klausman. I am now able to see the logs :) [08:42:51] (03CR) 10AikoChou: [C: 03+1] "LGTM! I like you moved everything to the same place, so they don't scatter around different folders." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/963367 (https://phabricator.wikimedia.org/T347404) (owner: 10Ilias Sarantopoulos) [08:43:10] 10Machine-Learning-Team: Investigate recommendation-api-ng internal endpoint failure - https://phabricator.wikimedia.org/T347475 (10elukey) >>! In T347475#9235450, @Isaac wrote: >> @Isaac do you reckon if we could use multi-threading instead of multiprocessing? Are those all HTTP-like calls (hence preemptable) o... [08:43:47] kevinbazira: from https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/api/external_data/wikidata.py#L105 it seems that we use a process pool, but the import is multiprocess.dummy, that afaics from the docs uses a thread pool [08:43:53] so it shouldn't be a problem in theory [08:47:03] but I have a suspicion on what is happening [08:47:06] checking [08:47:36] elukey: sure, I am running tests locally to see whether the harakiri is affecting the workers i.e if a uwsgi worker is killed and respawned, does the request continue or it gets affected? Trying to get an answer to this. [08:49:48] yes, it does. [08:51:01] kevinbazira: I think I found the problem :) [08:52:18] we totally forget one thing, and we are battling against it [08:52:30] what's that? :) [08:53:06] kevinbazira: Let's try to brainbounce/debug it, it will be surely helpful for the next time [08:53:20] so what I did was asking to myself - what is the current issue? [08:54:16] and afaics what is happening at the moment is that we try to hit the endpoint, and the response never comes, but instead we end up in a timeout [08:54:37] in our case it is the envoy proxy that tells us "look, I am giving up waiting, here's a 50x" [08:55:23] second question - why don't we get any response? Is the container trying to get something that never returns as well? [08:55:33] or is its connectivity ok? [08:56:18] one of the things that we have to remind is that a container cannot make, by default, calls to any endpoint without specific allowance [08:56:41] the third step was checking the liftwing.ini config [08:56:42] https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/data/recommendation_liftwing.ini#L5 [08:57:17] same thing for the line above (#4) [08:58:02] kevinbazira: if you recall we have a specific envoy proxy to use for calls to external endpoints, for example we worked to allow calls to swift via localhost:port [08:59:50] great. so envoy should allow access to the apis that the container tries to access right? https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/data/recommendation_liftwing.ini#L1-L7 [09:00:00] exactly yes :) [09:00:06] in our case, it should be the mw api [09:00:29] so we have to add the 'discovery' config in values.yaml for the mw api (I can take care of it) [09:00:29] ok, thank you for the clarification. let me prepare a patch for this. [09:00:35] kevinbazira: wait wait :) [09:00:39] there is another bit missing [09:00:44] ok ok :) [09:01:19] if we add as endpoint something like "http://localhost:port/api/etc.." it will not work, because the HTTP Host header will not be set correctly [09:03:10] kevinbazira: for example, I think that https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/api/external_data/wikidata.py#L119 [09:03:49] calls https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/api/external_data/fetcher.py#L25 [09:04:18] mmmm I am now wondering though if envoy sets the Host header for us [09:04:23] kevinbazira: --^ [09:04:32] great. please add me as a reviewer to the patch. I'd like to see the fix. [09:05:02] no I think it will probably set the .discovery name as Host Header [09:05:20] in our case, we'll need to explicitly set stuff like "wikidata.org" or "en.wikipedia.org" [09:05:35] and force the post() method to use them [09:14:49] kevinbazira: this is the first bit https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/964859 [09:15:03] but I think we'll also need to change the python code to set the host header where needed [09:15:10] * elukey bbiab [09:15:21] klausman: --^ [09:15:35] Looking [09:17:11] Was the intent for me to +2 and deploy it? [09:44:52] back [09:45:24] klausman: nono just to get your opinion, maybe you and kevin can work together on it? [09:46:45] I am unsure how the string mw-api-int-async-ro ties to specific egress rules. [09:47:33] Is it via the fixtures files? [09:48:19] (I may also be misunderstanding the problem) [09:48:31] fixtures are only used when CI renders the charts' diffs [09:48:40] in this case, it is related to helmfiles [09:49:01] it is all related to the related module, the mesh one [09:49:14] there are some configs rendered on deploy2002 via puppet [09:49:19] containing the varioius ips etc.. [09:49:31] klausman: check envoy.yaml in puppet [09:49:48] https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/production/hieradata/common/profile/services_proxy/envoy.yaml#304 [09:49:52] (under profile service+proxy) [09:49:54] yes [09:50:23] Ah, I wasn't aware the name spanned repos [09:50:24] so the mesh module adds two things [09:50:38] 1) tls terminator for ingress traffic [09:50:50] 2) tls proxy to $services [09:51:04] for 2) we need to explicitly add what services we want to contact [09:51:13] and it adds networkpolicies, configs, etc.. [09:51:50] the result of adding mw-api-etc.. is that it will be available a localhost:6500 endoint in the pod [09:52:00] for stuff like /w/api.php etc.. [09:52:43] klausman: the main issue atm is that we configure stuff like https://github.com/wikimedia/research-recommendation-api/blob/master/recommendation/data/recommendation_liftwing.ini#L4 [09:52:48] (see also the line for wikidata) [09:53:06] in there we should add http://localhost:6500/w/api.php [09:53:20] but, we'd also need to set the Host header [09:53:20] ah, right, so I understood that part correctly [09:53:34] and I believe we'll need to modify the code to allow it [09:53:47] yes, localhost will likely not work as a Host: header [09:53:48] so merging the above patch and changing the .ini file is not enough [09:54:11] klausman: the envoy proxy may set stuff like mw-api.discovery.wmnet etc.., but it is not correct either [09:54:26] so, having said this, klausman and kevinbazira, ok to work together on it? [09:54:34] Does it leave already-set headers alone? [09:54:49] I believe so, but needs to be verified [09:55:31] yeah, sure, though I have a question. [09:55:54] How are the names in the ini mapped to e.g. localhost:6500? [09:56:30] the port is specified in envoy.yaml [09:56:36] with the endpoint etc.. [09:56:47] there is a template that renders all [09:56:53] it should be in the mesh module [09:57:15] before proceeding with https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/964859 we need to get the sign-off from serviceops [09:57:34] because the endpoint is the mw-on-k8s stuff, and they are selecting what to onboard etc.. [09:57:54] (someone that is not me should follow up, to be clear :) [09:59:42] I'ms still a bit lost re: mesh [09:59:59] please ask all the questions that you want :) [10:00:12] but part of the answer is probably buried inside deployment-chart [10:00:14] *charts [10:00:46] modules/mesh contains a bunch of files, but I don't understand their significance here [10:03:04] --verbose :) [10:06:38] So tehenvoy.yaml stuff I get. [10:07:27] The INI file... well, how does the tool know that to talk to something is port 6500? [10:08:29] For example ,t eh ini file has `wikipedia = https://{source}.wikipedia.org/w/api.php`. How does that become localhost:6500? [10:08:37] no that needs to be changed [10:08:43] this is what I was writing above [10:08:57] < in there we should add http://localhost:6500/w/api.php [10:09:19] the main problem is that only changing the ini is not enough [10:09:29] we need to add the Host header in the python code [10:09:54] so we'll call http://localhost:6500/w/api.php + Host: en.wikipedia.org or Host: wikidata.org [10:09:57] for example [10:11:22] o/ back here [10:11:55] wow many messages. catching up! [10:12:05] Ok, so `wikipedia = ...` changes to `... http://localhost:6500/w/api.php` and the code needs to add the relevant host header [10:12:16] correct, this is my theory [10:12:33] plus we need to add the support for the 6500 port on the pod [10:12:42] And we can deploy that after change 964859, which needs erviceops sync [10:12:54] that is the deployment-chart patch (that needs a follow up to service ops because we'd use the mw k8s api) [10:12:58] exactly [10:13:04] Ok, now I get it :) [10:13:22] Is there someone in serviceops that is already familiar with the topic of moving rec-api to LW? [10:13:48] sort of, but I think we can just tell them that it will be a low traffic volume [10:13:56] (also, Kevin had a aquestion about wikidata on the change, I dunno if that has been answered) [10:15:44] yep, in the recommendation_liftwing.ini I can change `wikipedia = https://{source}.wikipedia.org/w/api.php` to `wikipedia = http://localhost:6500/w/api.php` but what does `wikidata = https://www.wikidata.org/w/api.php` change to? [10:16:11] this bit is left for exercise :D [10:16:32] (but it is a very good question) [10:17:04] kevinbazira and klausman, if you can pair up to anwer the question and follow up etc.. [10:17:16] ack [10:17:21] I am available for clarifications, but my goal is to start staying aside [10:17:24] as much as possible [10:19:23] I am looking at the envoy.yaml and doesn't seem to have a listener for wikidata: https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/production/hieradata/common/profile/services_proxy/envoy.yaml [10:22:20] 10Lift-Wing, 10Machine-Learning-Team: kserve CORS error - https://phabricator.wikimedia.org/T348511 (10isarantopoulos) [10:23:06] 10Lift-Wing, 10Machine-Learning-Team: kserve CORS error - https://phabricator.wikimedia.org/T348511 (10isarantopoulos) At the moment it seems that we can modify the fastAPI app but if I am not mistaken it is more difficult to do the fix directly on istio [10:28:00] kevinbazira: hint - you should focus on what servers do render www.wikidata.org [10:44:45] klausman: I had already created the two tasks for the decoms [10:44:51] oops. [10:44:57] I'll close mine, then [10:44:58] let's merge them in [10:46:18] 10Machine-Learning-Team, 10SRE, 10decommission-hardware, 10ops-codfw: decommission ores{2001..2009}.codfw.wmnet - https://phabricator.wikimedia.org/T348462 (10klausman) [10:46:45] Done [10:47:06] ack thanks [10:47:08] * elukey lunch [10:47:26] 10Machine-Learning-Team, 10SRE, 10decommission-hardware, 10ops-eqiad: decommission ores{1001..1009}.eqiad.wmnet - https://phabricator.wikimedia.org/T348144 (10klausman) [10:51:11] I've come accross this: https://github.com/wikimedia/operations-deployment-charts/blob/9da9b1874ea44363ef1a8a2979bed473f2129487/helmfile.d/admin_ng/values/ml-serve.yaml#L370-L382 [10:51:11] It looks like they were added to resolve timeouts similar to what we are experiencing with the rec-api: https://github.com/wikimedia/operations-deployment-charts/commit/14d5aa89602a4c1b2c907cb734fb610c49d6922d [10:51:11] Would adding hosts to the mesh resolve our issue vs adding envoy listeners? [11:00:34] Morning alll [11:01:07] hi Chris o/ [11:13:10] I need coffee so bad [11:14:41] o/ hey [11:29:48] Anything I can help with? [11:34:25] hi Chris! [11:35:00] * aiko lunch+coffee [11:47:28] chrisalbon: I think Kevin and I got it covered [11:48:48] Thanks Klausman and Kevinbazira for working on that. [11:51:53] as for the serviceops side: they're fine with us sending them the traffic. [12:47:44] kevinbazira: very good point! So for isvcs we use istio/envoy but they are set up in a way that we don't need to specify the localhost:port endpoint, since they are (sort-of) transparent proxies/sidecars [12:48:18] kevinbazira: for ores-legacy and rec-api-ng we use the serviceops template, that uses istio/envoy but in a different way (with an explicit proxy, namely something that your code needs to be aware of) [12:48:47] very interesting: https://grafana-rw.wikimedia.org/d/n3LJdTGIk/kserve-inference-services?forceLogin&from=now-6h&orgId=1&to=now&var-cluster=codfw%20prometheus%2Fk8s-mlstaging&var-component=All&var-namespace=revscoring-editquality-goodfaith [12:49:09] this is only available in staging at the moment, for good faith, I am going to roll it out everywhere :) [12:49:25] there are also some python-gc metrics IIUC [12:49:45] sweeet [12:57:54] elukey: neat! [12:58:49] elukey what am I looking at? Is this preprocessing the features or a cache? [13:01:29] the latency for the preprocess and predict steps in the inference service(s) [13:02:22] updated the dashboard with RSS, CPU time, GC, etc.. [13:02:30] (you can refresh it) [13:02:57] chrisalbon: we are now collecting metrics from kserve related to the preprocess and predict python methods (so before cache etc..) [13:03:10] so we will have an idea where we spend the time on [13:03:21] ah got it thanks [13:03:29] also good morning :) [13:09:07] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/964899 [13:09:14] ready to roll it out everywhere :) [13:09:44] Ship It! [13:10:18] danke [13:10:51] nice! [13:10:57] And good afternoon! [13:16:42] 10Machine-Learning-Team, 10SRE, 10decommission-hardware, 10ops-codfw: decommission ores{2001..2009}.codfw.wmnet - https://phabricator.wikimedia.org/T348462 (10Papaul) a:03Jhancock.wm [13:17:26] rolling out the new metrics in staging [13:35:25] isaranto: very interesting data for goodfaith in prod - https://grafana-rw.wikimedia.org/d/n3LJdTGIk/kserve-inference-services [13:35:49] we kinda knew it but preprocess is really the bulk of the time [13:36:11] so maybe we could think about offloading to a process only that part [13:36:13] and not predict [13:40:02] That fits my mental model. These are small xgboost models, they should be blazing fast, but they aren't. [13:40:23] So if it isn't the model, it has to be the preprocessing [13:40:32] * elukey nods [13:41:06] the last time we tried to offload both predict and preprocess to a model, and we saw a big price in terms of latency for some rev-ids [13:41:27] I guess that serialize/deserialize of features to and from processes running predict() is not worth it [13:41:37] but it may be worth it only for features [13:42:23] 10Machine-Learning-Team: Test the kserve batcher for Revert Risk multilingual isvc - https://phabricator.wikimedia.org/T348536 (10achou) [13:49:54] damaging seems to be way worse than goodfaith [13:52:21] ack [13:53:13] will try using processes in preprocess :P [13:53:19] process process process [13:53:44] ahahahha [13:54:03] https://imgflip.com/memegenerator/Inception :D [13:54:50] klausman: with the new k8s alarms I don't see a ton of latency alerts anymore for the k8s control plane [13:54:55] fingers crossed [13:55:03] ahhaha just said it, alarm fired [13:55:12] Classic :) [13:55:31] but it makes sense [13:55:38] https://grafana.wikimedia.org/d/ddNd-sLnk/kubernetes-api-details?var-site=eqiad&var-cluster=k8s-mlserve&var-latency_percentile=0.95&var-verb=PATCH&orgId=1 [13:55:49] there is a sustained latency for isvcs [13:56:03] maybe we could think about adding more resources to the control plane [13:57:26] You mean the VMs? atm they have 2cpus, and msc1001 shows a load of 3/4 [13:57:32] (as in 0.75) [13:58:57] the vms yes [13:59:56] the RAM is definitely used, and I am not sure if the goroutines usage is reflected by the load [14:00:36] https://grafana.wikimedia.org/d/000000342/node-exporter-server-metrics?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-node=ml-serve-ctrl1001:9100&var-disk_device=All&var-net_dev=All looks relatively tame. But switching to 4 cores might be a good idea. [14:01:47] memusage is slightly more more than half available RAM, so I doubt that is an actual problem. The control plane should not really need a large amounts of page cache. [14:02:00] in theory :) [14:18:30] 10Machine-Learning-Team, 10Patch-For-Review: Investigate recommendation-api-ng internal endpoint failure - https://phabricator.wikimedia.org/T347475 (10Isaac) > After a closer look I think that we are already using a thread pool: > https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.d... [14:43:29] 10Machine-Learning-Team, 10SRE, 10decommission-hardware, 10ops-codfw: decommission ores{2001..2009}.codfw.wmnet - https://phabricator.wikimedia.org/T348462 (10Jhancock.wm) 05Open→03Resolved [14:53:56] 10Lift-Wing, 10Machine-Learning-Team: kserve CORS error - https://phabricator.wikimedia.org/T348511 (10calbon) a:03isarantopoulos [14:58:09] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298 (10klausman) [15:05:51] 10Machine-Learning-Team, 10ORES: ORES extremely slow when to return when asking for multiple scores. - https://phabricator.wikimedia.org/T347612 (10calbon) a:03isarantopoulos [15:06:53] 10Machine-Learning-Team, 10Patch-For-Review: Upgrade Revert Risk Multilingual docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347551 (10calbon) a:03elukey [15:26:58] so I rolled out the metrics change in all isvcs [15:27:14] except drafttopic eqiad, that for some reason ends up in a helmfile error [15:27:23] Error: query: failed to query with labels: proto: Unknown: illegal tag 0 (wire type 0) [15:27:26] codfw is all good [15:27:49] aiko: https://grafana.wikimedia.org/d/n3LJdTGIk/kserve-inference-services?orgId=1&var-cluster=eqiad%20prometheus%2Fk8s-mlserve&var-component=All&var-namespace=revertrisk [15:27:52] looks nice :) [15:28:41] 10Machine-Learning-Team: Visualize KServe latency metrics in a dashboard - https://phabricator.wikimedia.org/T348456 (10elukey) Created the first dashboard: https://grafana.wikimedia.org/d/n3LJdTGIk/kserve-inference-services [15:31:52] elukey: wow nice dashboard and numbers :) [15:32:24] elukey: <3 [15:32:35] thanks! [15:34:45] 10Machine-Learning-Team: Upgrade outlink docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347549 (10achou) a:03achou [15:36:19] (03CR) 10AikoChou: [C: 03+1] revert-risk: upgrade to KServe 0.11.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/964559 (https://phabricator.wikimedia.org/T347550) (owner: 10Elukey) [15:40:52] if there isn't any objection with this patch I plan to merge it tomorrow https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/963367 [15:46:42] (03CR) 10Ilias Sarantopoulos: revscoring: customize kserve logs (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/964568 (https://phabricator.wikimedia.org/T333804) (owner: 10Ilias Sarantopoulos) [15:57:44] isaranto: didn't have time to fully review, if Aiko already did it go ahead! [15:58:29] sry I dont want to rush things, just that it enables local runs and allows to debug other things easily (like I did with mp and logging) [15:58:55] so doesnt have actual changes BUT moves things around (which may cause issues sometimes ofc) [15:59:29] I think we can start adding better unit tests now also for the model servers :) [16:00:01] isaranto: definitely, if you want I can review it tomorrow morning [16:00:06] or you can proceed, as you wish [16:00:58] tomorrow is fine, even thursday [16:03:17] (03PS3) 10Ilias Sarantopoulos: revscoring: customize kserve logs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/964568 (https://phabricator.wikimedia.org/T333804) [16:03:49] (03CR) 10Ilias Sarantopoulos: [C: 03+1] revert-risk: upgrade to KServe 0.11.1 [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/964559 (https://phabricator.wikimedia.org/T347550) (owner: 10Elukey) [16:07:25] (03PS7) 10Ilias Sarantopoulos: revscoring: allow local runs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/963367 (https://phabricator.wikimedia.org/T347404) [16:08:22] 10Machine-Learning-Team, 10Epic, 10Patch-For-Review: Add meaningful access logs to KServe's pods - https://phabricator.wikimedia.org/T333804 (10isarantopoulos) Since asgi-logger can only be used if we specify the `access_log_format` we defined the environment variable `LOGGING_FORMAT` in the above patch to a... [16:18:11] (03CR) 10AikoChou: [C: 03+1] "After the patch is merged, I will do the deployment and testing (in staging). :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/964559 (https://phabricator.wikimedia.org/T347550) (owner: 10Elukey) [16:24:31] * elukey afk! [16:24:38] have a good rest of the day folks :) [16:25:29] bye luca! [16:29:03] ciao and enjoy the evening [16:32:40] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10User-notice: Deploy "add a link" to 15th round of wikis - https://phabricator.wikimedia.org/T308141 (10Sgs) a:05kostajh→03Sgs I ran this script for adding the link-recommendation task type and populating the excluded sections entries: `lang=bash PHA... [16:38:51] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team, 10Patch-For-Review, 10User-notice: Deploy "add a link" to 15th round of wikis - https://phabricator.wikimedia.org/T308141 (10Sgs) [17:31:37] (03PS1) 10Ilias Sarantopoulos: revscoring: bump scipy version [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/964954 [17:32:30] aiko: I managed to fix the issue we had with articlequality and apple silicon --^ [17:32:36] hope it indeed works [17:39:37] isaranto: wow nice, thank you <3 I'll let you know if it works on my end [18:16:46] (03PS1) 10Varnent: Update link to privacy policy. [services/ores] - 10https://gerrit.wikimedia.org/r/964960 (https://phabricator.wikimedia.org/T331680) [19:43:02] (03PS1) 10Ladsgroup: Migrate away from LB/LBF to ICP [extensions/ORES] - 10https://gerrit.wikimedia.org/r/964969 (https://phabricator.wikimedia.org/T330641) [19:44:59] (03CR) 10CI reject: [V: 04-1] Migrate away from LB/LBF to ICP [extensions/ORES] - 10https://gerrit.wikimedia.org/r/964969 (https://phabricator.wikimedia.org/T330641) (owner: 10Ladsgroup) [19:46:07] (03PS2) 10Ladsgroup: Migrate away from LB/LBF to ICP [extensions/ORES] - 10https://gerrit.wikimedia.org/r/964969 (https://phabricator.wikimedia.org/T330641) [21:41:03] (03CR) 10Jforrester: [C: 03+2] Migrate away from LB/LBF to ICP [extensions/ORES] - 10https://gerrit.wikimedia.org/r/964969 (https://phabricator.wikimedia.org/T330641) (owner: 10Ladsgroup) [21:54:00] (03Merged) 10jenkins-bot: Migrate away from LB/LBF to ICP [extensions/ORES] - 10https://gerrit.wikimedia.org/r/964969 (https://phabricator.wikimedia.org/T330641) (owner: 10Ladsgroup)