[06:58:56] <isaranto>	 o/ good morning!
[07:09:17] <kevinbazira>	 o/ kalimera
[07:09:29] <kevinbazira>	 thanks for the review, Ilias!
[07:09:55] <isaranto>	 o/ kevin
[07:10:01] <kevinbazira>	 I am goind to deploy the model-servers that rely on the updated events module one-by-one
[07:10:12] <isaranto>	 np I have a patch for you as well https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1131323
[07:11:02] <isaranto>	 I'll submit another one for the api gateway afterwards but for that one we'll need Tobias to deploy
[07:22:16] <kevinbazira>	 right! I've +1'd the patch. 
[07:22:17] <kevinbazira>	 are there tests currently running on the edit-check endpoint? if so, will both the `edit-check-staging` patch and the APIGW one be deployed at the same time?
[07:29:23] <isaranto>	 I'll deploy the change for the service now and later today we can deploy the one I just opened for API GW https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1132534
[07:29:39] <isaranto>	 + I'm opening one now to fix the load tests to match the staging name
[07:33:07] <wikibugs>	 (03PS1) 10Ilias Sarantopoulos: locust: fix model name for edit check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1132535 (https://phabricator.wikimedia.org/T388817)
[07:33:10] <isaranto>	 done!
[07:36:09] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+1] locust: fix model name for edit check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1132535 (https://phabricator.wikimedia.org/T388817) (owner: 10Ilias Sarantopoulos)
[07:38:27] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] locust: fix model name for edit check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1132535 (https://phabricator.wikimedia.org/T388817) (owner: 10Ilias Sarantopoulos)
[07:49:24] <kevinbazira>	 article-country deployed. outlink predictor next: https://gerrit.wikimedia.org/r/1132537
[08:13:30] <isaranto>	 I've +1. shall we also update the transformer image to have an up2date deployment?
[08:44:01] <kevinbazira>	 sure sure ... I've updated the patch with tne transformer image  too
[08:52:08] <isaranto>	 thanks!
[08:53:18] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [V:03+2 C:03+2] locust: fix model name for edit check [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1132535 (https://phabricator.wikimedia.org/T388817) (owner: 10Ilias Sarantopoulos)
[08:53:40] <wikibugs>	 (03PS14) 10Gkyziridis: inference-services: edit-check GPU version for batch prediction. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100)
[10:19:59] * isaranto lunch!
[10:20:33] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: LiftWing model-servers log improper JSON in stderr - https://phabricator.wikimedia.org/T389768#10693087 (10kevinbazira)
[10:24:42] <klausman>	 ditto :)
[10:26:42] <kevinbazira>	 outlink deployed. will deploy RRLA once the event stream is in prod.
[11:39:33] <wikibugs>	 (03PS15) 10Ilias Sarantopoulos: inference-services: edit-check GPU version for batch prediction. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[11:41:20] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "Resolving the previous comments as all have been implemented" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[11:44:03] <wikibugs>	 (03PS16) 10Ilias Sarantopoulos: inference-services: edit-check GPU version for batch prediction. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[11:44:28] <isaranto>	 aiko: the above patch is now ready for review. I have tested it as well locally
[12:10:12] <wikibugs>	 (03PS17) 10Ilias Sarantopoulos: edit-check: implement for batch prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[12:10:47] <isaranto>	 klausman: let me know if you can deploy the api gw patch sometime today https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1132534
[12:10:49] <isaranto>	 thanks!
[12:12:46] <klausman>	 yeah, I was about to do that :)
[12:13:18] <aiko>	 isaranto: alright! I'll review it 
[12:13:37] <isaranto>	 great, thank you both!
[12:14:55] <isaranto>	 I'm following up on an alert we got on saturday for reference-need and I am seeing this chart for a pod that worries me https://grafana.wikimedia.org/goto/4yxH_8THR?orgId=1
[12:15:35] <isaranto>	 memory usage is increasing which likely indicates that there is a memory leak. this seems consistent in all pods
[12:16:04] <klausman>	 It seems it did something similar before (go to "2 days") yesterday 9am-noon
[12:16:49] <isaranto>	 I increased memory limits/requests on saturday as I saw the same thing happening
[12:18:08] <klausman>	 Think it might be a memory leak?
[12:18:59] <isaranto>	 this would be my guess. Something we missed when adding multiprocessing to the service
[12:26:49] <isaranto>	 my assumption is that the process pool isn't managed properly and a process that has died isn't shut down properly so it still occupies memory - which means that we load the model once more in the new process that is spawned
[12:27:03] <isaranto>	 taking a quick look and opening up a task
[12:37:27] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 06Wikimedia Enterprise: Increased latencies in reference-quality models (ref-need) - https://phabricator.wikimedia.org/T387019#10693471 (10isarantopoulos) We are no longer getting 500s as before so the stability has improved BUT the overall latency of the service is stil...
[12:56:09] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 06Wikimedia Enterprise: Increased latencies in reference-quality models (ref-need) - https://phabricator.wikimedia.org/T387019#10693506 (10isarantopoulos) There is an increasing memory consumption which ends up in pods getting killed because they get out of memory (OOMKi...
[13:01:57] <klausman>	 isaranto: APIGW change has been pushed everywhere
[13:02:05] <isaranto>	 awesome thank you!
[13:04:24] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Investigate options for providing beta cluster / patchdemo access to liftwing staging - https://phabricator.wikimedia.org/T388269#10693521 (10isarantopoulos) **request**: ` curl  https://api.wikimedia.org/service/lw/inference/v1/models/edit-check-staging:pre...
[13:04:40] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Investigate options for providing beta cluster / patchdemo access to liftwing staging - https://phabricator.wikimedia.org/T388269#10693522 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos
[14:28:59] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] "LGTM! Only a few minor issues." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[14:41:37] <wikibugs>	 (03PS18) 10Ilias Sarantopoulos: edit-check: implement for batch prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[14:42:18] <isaranto>	 aiko: thanks for the review, I updated it, lemme know if it is ok!
[14:42:21] <wikibugs>	 (03CR) 10CI reject: [V:04-1] edit-check: implement for batch prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[14:42:51] <wikibugs>	 (03PS19) 10Ilias Sarantopoulos: edit-check: implement for batch prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[14:49:33] <wikibugs>	 (03CR) 10AikoChou: [C:03+1] edit-check: implement for batch prediction (033 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[14:57:17] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 06Wikimedia Enterprise: Increased latencies in reference-quality models (ref-need) - https://phabricator.wikimedia.org/T387019#10694215 (10isarantopoulos) I have verified the above by looking at a specific pod:  1. Found some BrokenProcessPool [[ https://logstash.wikimed...
[15:12:39] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] edit-check: implement for batch prediction (032 comments) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:13:12] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: edit-check: implement for batch prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:13:23] <wikibugs>	 (03PS20) 10Ilias Sarantopoulos: edit-check: implement batch prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:13:37] <wikibugs>	 (03PS21) 10Ilias Sarantopoulos: edit-check: implement batch requests/prediction [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:13:42] <wikibugs>	 (03PS22) 10Ilias Sarantopoulos: edit-check: implement batch requests/predictions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:13:47] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: [C:03+2] edit-check: implement batch requests/predictions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:14:38] <isaranto>	 thanks for the review Aiko! I fixed the commit msg and merged!
[15:19:34] <wikibugs>	 (03CR) 10DCausse: "I think this should be ready to go" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1130530 (https://phabricator.wikimedia.org/T375821) (owner: 10DCausse)
[15:19:40] <wikibugs>	 (03PS2) 10DCausse: search weighted_tags: drop BC for rc0 weighted_tag stream [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1130530 (https://phabricator.wikimedia.org/T375821)
[15:22:40] <wikibugs>	 (03Merged) 10jenkins-bot: edit-check: implement batch requests/predictions [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1131045 (https://phabricator.wikimedia.org/T386100) (owner: 10Gkyziridis)
[15:36:13] <wikibugs>	 (03CR) 10Kevin Bazira: "Thank you for working on this, David. LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1130530 (https://phabricator.wikimedia.org/T375821) (owner: 10DCausse)
[15:36:50] <wikibugs>	 (03PS3) 10Kevin Bazira: search weighted_tags: drop BC for rc0 weighted_tag stream [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1130530 (https://phabricator.wikimedia.org/T375821) (owner: 10DCausse)
[15:44:03] <wikibugs>	 10Lift-Wing, 06Machine-Learning-Team, 10EditCheck: Investigate options for providing beta cluster / patchdemo access to liftwing staging - https://phabricator.wikimedia.org/T388269#10694418 (10isarantopoulos) Updated request after batch prediction implementation   ` curl  https://api.wikimedia.org/servic...
[15:50:48] <wikibugs>	 (03CR) 10Kevin Bazira: [C:03+2] search weighted_tags: drop BC for rc0 weighted_tag stream [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1130530 (https://phabricator.wikimedia.org/T375821) (owner: 10DCausse)
[16:00:50] <wikibugs>	 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 06Wikipedia-Android-App-Backlog: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298#10694499 (10Samwalton9-WMF)
[16:01:34] <wikibugs>	 (03Merged) 10jenkins-bot: search weighted_tags: drop BC for rc0 weighted_tag stream [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1130530 (https://phabricator.wikimedia.org/T375821) (owner: 10DCausse)
[16:04:19] <isaranto>	 going afk folks, have a nice evening/rest of day!
[17:23:06] <wikibugs>	 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 06Wikipedia-Android-App-Backlog: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298#10694988 (10Kgraessle) Adding the thresholds we arrived at from the analysis that was complete...
[20:36:03] <wikibugs>	 06Machine-Learning-Team, 10MediaWiki-extensions-ORES, 06Moderator-Tools-Team, 06Wikipedia-Android-App-Backlog: Add revertrisk-language-agnostic to RecentChanges filters - https://phabricator.wikimedia.org/T348298#10695756 (10kostajh) >>! In T348298#10694988, @Kgraessle wrote: > Adding the thresholds we arr...