[00:07:54] (03Merged) 10jenkins-bot: tests: Migrate assertSelect() to SelectQueryBuilder [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029283 (owner: 10Umherirrender) [06:21:19] Good morning! [07:52:48] 06Machine-Learning-Team, 13Patch-For-Review: Investigate the inconsistent load test results (locust) for revertrisk - https://phabricator.wikimedia.org/T361881#9782574 (10isarantopoulos) a:03isarantopoulos [07:52:56] 06Machine-Learning-Team, 13Patch-For-Review: Investigate the inconsistent load test results (locust) for revertrisk - https://phabricator.wikimedia.org/T361881#9782575 (10isarantopoulos) a:05isarantopoulos→03None [08:52:24] (03PS13) 10Jsn.sherman: Exclude first/only revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) [08:56:10] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9782634 (10kevinbazira) Regarding image sizes, at the moment Wikimedia Commons cannot serve a file larger than 1MB from the UploadStash. I am getting the following er... [08:59:21] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9782636 (10mfossati) >>! In T363506#9780991, @kevinbazira wrote: > If one user sends a request with 50 image URLs and another sends a request with 50 serialized image... [09:08:23] 06Machine-Learning-Team, 10MW-on-K8s, 06serviceops, 06SRE, 13Patch-For-Review: Migrate ml-services to mw-api-int - https://phabricator.wikimedia.org/T362316#9782652 (10jijiki) [09:14:42] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9782659 (10mfossati) >>! In T363506#9757394, @isarantopoulos wrote: > We would need the upload wizard to send a resized image (224x224) instead of the whole file. I c... [09:43:30] (03PS1) 10Kevin Bazira: logo-detection: use cookie to access stash images [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1028937 (https://phabricator.wikimedia.org/T363449) [10:35:46] * isaranto lunch [11:32:16] (03PS1) 10Thiemo Kreuz (WMDE): Make all @covers tags in tests absolute [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029502 [11:36:39] (03PS1) 10Thiemo Kreuz (WMDE): Inject ConnectionProvider instead of DBLoadBalancerFactory [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029503 (https://phabricator.wikimedia.org/T330641) [11:39:43] (03CR) 10Ilias Sarantopoulos: [C:03+2] Exclude first/only revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) (owner: 10Jsn.sherman) [11:40:51] (03PS1) 10Thiemo Kreuz (WMDE): Add missing type declarations to DB-related class properties [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029505 [11:42:57] (03Merged) 10jenkins-bot: Exclude first/only revision on page from scoring [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1014572 (https://phabricator.wikimedia.org/T356281) (owner: 10Jsn.sherman) [11:53:27] (03CR) 10Ladsgroup: [C:03+2] Inject ConnectionProvider instead of DBLoadBalancerFactory [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029503 (https://phabricator.wikimedia.org/T330641) (owner: 10Thiemo Kreuz (WMDE)) [11:54:02] (03PS1) 10Thiemo Kreuz (WMDE): Replace custom test mocks with trivial value holders [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029510 [11:56:29] (03PS1) 10Thiemo Kreuz (WMDE): Use IReadableDatabase interface where possible [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029511 [11:56:34] (03Merged) 10jenkins-bot: Inject ConnectionProvider instead of DBLoadBalancerFactory [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029503 (https://phabricator.wikimedia.org/T330641) (owner: 10Thiemo Kreuz (WMDE)) [11:58:30] (03CR) 10CI reject: [V:04-1] Use IReadableDatabase interface where possible [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029511 (owner: 10Thiemo Kreuz (WMDE)) [12:06:13] (03PS1) 10Thiemo Kreuz (WMDE): Replace expensive explode/implode with string manipulation [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029513 [12:45:24] hello folks! [12:45:35] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9783171 (10isarantopoulos) >>! In T363506#9782659, @mfossati wrote: >>>! In T363506#9757394, @isarantopoulos wrote: >> We would need the upload wizard to send a resiz... [12:45:44] o/ elukey [13:12:17] (03PS2) 10Thiemo Kreuz (WMDE): Use correct IReadableDatabase interface in queryCallable callbacks [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029511 [13:14:48] (03CR) 10CI reject: [V:04-1] Use correct IReadableDatabase interface in queryCallable callbacks [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029511 (owner: 10Thiemo Kreuz (WMDE)) [13:38:03] (03CR) 10Ilias Sarantopoulos: "Accessing an API using a cookie wouldn't be the proper way to access it. Cookies are more suitable for managing sessions and not stateless" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1028937 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [13:41:13] good morning [13:42:47] so many meetings today [13:42:58] at least they are all packed into one mega block [13:43:41] :) [13:45:20] hey elukey! [13:47:42] 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk multilingual to kserve 0.12.1 - https://phabricator.wikimedia.org/T363129#9783326 (10isarantopoulos) I double checked and revertrisk has already been deployed to staging/prod so this task is done https://gerrit.wikimedia.org/r/c/operations/dep... [13:48:19] 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk to kserve 0.12.1 - https://phabricator.wikimedia.org/T363127#9783329 (10isarantopoulos) I double checked and revertrisk has already been deployed to staging/prod so this task is done https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/... [13:49:07] 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk multilingual to kserve 0.12.1 - https://phabricator.wikimedia.org/T363129#9783327 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [13:50:21] good morning Chris! [13:50:21] 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk wikidata to kserve 0.12.1 - https://phabricator.wikimedia.org/T363130#9783335 (10isarantopoulos) I double checked and revertrisk-wikidata has already been deployed to staging/prod so this task is done https://gerrit.wikimedia.org/r/c/operations/de... [13:50:32] 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk to kserve 0.12.1 - https://phabricator.wikimedia.org/T363127#9783330 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [13:52:38] 06Machine-Learning-Team, 13Patch-For-Review: Update revertrisk wikidata to kserve 0.12.1 - https://phabricator.wikimedia.org/T363130#9783336 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [13:54:43] 06Machine-Learning-Team, 13Patch-For-Review: Unsupported lang error for some wiki for revertrisk-language-agnostic calls - https://phabricator.wikimedia.org/T363203#9783342 (10isarantopoulos) This has been deployed to production and can be used via the api gateway. `bash curl https://api.wikimedia.org/service... [13:57:19] 06Machine-Learning-Team, 13Patch-For-Review: Unsupported lang error for some wiki for revertrisk-language-agnostic calls - https://phabricator.wikimedia.org/T363203#9783344 (10isarantopoulos) 05Open→03Resolved a:03isarantopoulos [14:16:10] 06Machine-Learning-Team, 13Patch-For-Review: Unsupported lang error for some wiki for revertrisk-language-agnostic calls - https://phabricator.wikimedia.org/T363203#9783430 (10prabhat) Thank you, @isarantopoulos and team. [14:25:13] very interesting pattern: [14:25:13] https://grafana.wikimedia.org/d/-D2KNUEGk/kubernetes-pod-details?orgId=1&var-datasource=codfw%20prometheus%2Fk8s-mlserve&var-namespace=revscoring-editquality-reverted&var-pod=viwiki-reverted-predictor-default-00019-deployment-bcd64fbqdtbq&var-pod=viwiki-reverted-predictor-default-00019-deployment-bcd64fbw6svr&var-pod=viwiki-reverted-predictor-default-00019-deployment-bcd64fbwd25v&var-pod=viwiki-revert [14:25:19] ed-predictor-default-00019-deployment-bcd64fbxqtnw&var-container=All&from=now-3h&to=now [14:25:23] sorry link too long :) [14:25:43] the viwiki reverted pods are now 4, and despite a lot of cpu usage we are not seeing a ton of errors [14:25:57] so scaling up does help a little [14:26:30] we still have some background timeout errors though [14:27:52] and we need four pods for very little traffic [14:28:05] I'd be curious to know if we could migrate those clients away [14:29:35] nice they come through ORES legacy service - https://logstash.wikimedia.org/goto/0c6446acea3ec1a54a3e160affb7b4e3 [14:36:45] 06Machine-Learning-Team, 06Structured-Data-Backlog: [SPIKE] Resize an image file to 224x224 pixels within Upload Wizard - https://phabricator.wikimedia.org/T364551 (10mfossati) 03NEW [14:37:13] and these are the requests: https://logstash.wikimedia.org/goto/8083cb33ec10877b99a362c02b28dc04 [14:37:24] the client is using the multi-revids feature [14:37:42] so every request carries multiple revisions, this is why it hits so heavily the pods [14:40:19] please lemme know if you follow the debug steps above --^ [14:40:25] otherwise we can check together on them [14:41:55] the ip address is only one, so single client [14:45:57] o/ [14:47:02] yes I'm following! thanks for the thorough explanation [14:48:00] <3 [14:53:05] if we look at the bright side: this means that autoscaling kicks in to serve 5rpsx20 = 100 rps. Cause I was troubled yesterday that we set autoscaling to 5rps [14:54:18] and this also opens the discussion for batch requests: autoscaling config isn't straightforward when a model server allows batch requests [14:55:25] > If we look at the bright side: this means that autoscaling kicks in to serve 5rpsx20 = 100 rps. Cause I was troubled yesterday that we set autoscaling to 5rps [14:55:25] nevermind. --^ this is wrong. it is still 5rps, it just happens to be a lot of requests due to the ores legacy service. [15:02:48] yep exactly :( [15:27:49] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9783689 (10mfossati) >>! In T363506#9781491, @mfossati wrote: >>>! In T363506#9781301, @isarantopoulos wrote: >> @mfossati We noticed that the user can define the wid... [15:36:44] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9783712 (10mfossati) >>! In T363506#9757394, @isarantopoulos wrote: > @mfossati I am in favor of passing the image object in some serialized form. > We would need th... [16:04:58] (03CR) 10Elukey: "I agree on the cookie, it seems something prone to error. From the security standpoint, it would also mean forwarding/impersonating a user" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1028937 (https://phabricator.wikimedia.org/T363449) (owner: 10Kevin Bazira) [16:09:44] Going afk folks, have a nice rest of day! [16:14:13] you too! [16:40:38] (03CR) 10DannyS712: [C:03+2] Make all @covers tags in tests absolute [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029502 (owner: 10Thiemo Kreuz (WMDE)) [16:41:39] (03PS2) 10DannyS712: Replace custom test mocks with trivial value holders [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029510 (owner: 10Thiemo Kreuz (WMDE)) [16:41:49] (03CR) 10DannyS712: [C:03+2] "fixed a typo in the commit message, hope thats okay" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029510 (owner: 10Thiemo Kreuz (WMDE)) [16:42:33] (03CR) 10DannyS712: [C:03+2] Add missing type declarations to DB-related class properties [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029505 (owner: 10Thiemo Kreuz (WMDE)) [16:44:33] (03CR) 10CI reject: [V:04-1] Replace custom test mocks with trivial value holders [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029510 (owner: 10Thiemo Kreuz (WMDE)) [16:44:44] (03CR) 10DannyS712: [C:03+2] Replace expensive explode/implode with string manipulation (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029513 (owner: 10Thiemo Kreuz (WMDE)) [16:44:49] 06Machine-Learning-Team, 06Structured-Data-Backlog: Pass image objects to the logo detection service - https://phabricator.wikimedia.org/T363506#9783940 (10kevinbazira) >>! In T363506#9783710, @mfossati wrote: > I've opened {T364551} to investigate the feasibility of this solution. Thank you for dedicating a t... [17:03:07] going afk folks! [17:03:15] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: GPU errors in hf image in ml-staging - https://phabricator.wikimedia.org/T362984#9784022 (10elukey) On ml-staging2001 I checked the pod's details (via docker inspect) and found: ` "Devices": [ { "PathOn... [17:07:04] (03CR) 10CI reject: [V:04-1] Make all @covers tags in tests absolute [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029502 (owner: 10Thiemo Kreuz (WMDE)) [17:07:13] (03CR) 10CI reject: [V:04-1] Add missing type declarations to DB-related class properties [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029505 (owner: 10Thiemo Kreuz (WMDE)) [17:07:14] (03CR) 10CI reject: [V:04-1] Replace expensive explode/implode with string manipulation [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029513 (owner: 10Thiemo Kreuz (WMDE)) [19:42:41] (03CR) 10Bartosz Dziewoński: [C:03+2] "Retrying after CI breakage was fixed (T364569)" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029513 (owner: 10Thiemo Kreuz (WMDE)) [19:47:35] (03Merged) 10jenkins-bot: Replace expensive explode/implode with string manipulation [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1029513 (owner: 10Thiemo Kreuz (WMDE))