[05:55:57] 10artificial-intelligence, 10VisualEditor, 10VisualEditor-MediaWiki-References: generate citation error should auto-consult AI, or notify devs, so they can try to improve it - https://phabricator.wikimedia.org/T341420 (10ThurnerRupert) @Dalba created citer: https://citer.toolforge.org/ which is cool. it gen... [08:00:02] 10Machine-Learning-Team, 10DC-Ops: hw troubleshooting: iDrac stuck for ores2003.codfw.wmnet - https://phabricator.wikimedia.org/T341657 (10klausman) [08:00:08] elukey: fyi ^^^ [08:00:11] also morning :) [08:04:16] klausman: nice! morning :) [08:15:19] o/ heat is here! [08:16:50] (03CR) 10Hashar: Set up production and test images for the recommendation-api migration (032 comments) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/932810 (https://phabricator.wikimedia.org/T339890) (owner: 10Kevin Bazira) [08:19:54] in here too :) [08:38:31] if you don't think about it and don't discuss it, it gets better, like all problems in life 😛 [08:38:48] I have a patch if any volunteer wants to review https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/937438 [08:50:03] +1ed [08:50:19] isaranto: the local envoy proxy on mw nodes should work fine [08:50:50] see https://phabricator.wikimedia.org/P49547 [08:51:08] so the extension can use it now [08:53:24] 10Machine-Learning-Team, 10DC-Ops: hw troubleshooting: iDrac stuck for ores2003.codfw.wmnet - https://phabricator.wikimedia.org/T341657 (10klausman) 05Open→03Resolved a:03klausman The iDrac is reachable again, so it likely was a different issue. [09:02:52] Ok thanks! [09:18:26] elukey: so for the envoy proxy this means I can replace inference.discovery with localhost:6031 on anything that runs in production cluster? [09:27:21] isaranto: correct yes [09:32:01] Ack [09:52:33] * klausman early lunch [10:28:02] (03PS1) 10Ladsgroup: fix: add request headers properly [extensions/ORES] (wmf/1.41.0-wmf.17) - 10https://gerrit.wikimedia.org/r/937122 (https://phabricator.wikimedia.org/T319170) [10:31:32] (03CR) 10Ladsgroup: "And I think we need a check somewhere to make sure the config being null wouldn't lead to explosions" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/937142 (https://phabricator.wikimedia.org/T319170) (owner: 10Ilias Sarantopoulos) [10:32:06] Amir1: I deployed draftquality and articlequality as well so lets have some hope once more :) [10:32:39] Awesome. I'm about to backport the fix [10:32:55] in an hour or so, will see [10:36:55] 10Machine-Learning-Team, 10Research: Index out of range in revert risk multi-lingual - https://phabricator.wikimedia.org/T340811 (10Iluvatar) More logs (161) if you need (30.06-03.07): ` [2023-06-29T18:21:09.472]: {error: 'IndexError : list index out of range'} | enwiki: 1162529317 [2023-06-29T19:08:53.805]:... [10:41:22] (03CR) 10Ladsgroup: [C: 03+2] fix: add request headers properly [extensions/ORES] (wmf/1.41.0-wmf.17) - 10https://gerrit.wikimedia.org/r/937122 (https://phabricator.wikimedia.org/T319170) (owner: 10Ladsgroup) [10:42:07] thanks! [10:43:19] (03Merged) 10jenkins-bot: fix: add request headers properly [extensions/ORES] (wmf/1.41.0-wmf.17) - 10https://gerrit.wikimedia.org/r/937122 (https://phabricator.wikimedia.org/T319170) (owner: 10Ladsgroup) [10:51:55] thanks Amir1 ! [10:53:48] so it's mostly fixed. One last thing [10:54:17] https://phabricator.wikimedia.org/P49556 [10:54:44] (couldn't check if everything is okay so I went with wmf-nda visibility) [10:55:45] that's only for ores_models table though, so it should be rather rare [10:56:28] yup, re-running it works just fine [11:01:11] I dont have access to the link you pasted [11:02:14] finally it works 🎉 thanks for deploying! [11:02:32] wooowww [11:03:21] isaranto: try again, I added you to wmf-nda [11:07:22] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) There was an issue and headers were not set correctly so requests to th... [11:07:24] I can see now [11:08:49] I don't know how big of a problem it is, but worth noting [11:10:13] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10Ladsgroup) Note that the liftwing backend makes a req per model meaning it'll be make p... [11:12:28] doesnt seem to be a problem, but I will check if I can fix it anyway [11:14:19] * isaranto goes for lunch [11:58:11] * elukey lunch [12:18:55] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) I added the envoy proxy for Lift Wing. At the moment all wikis have at... [12:22:07] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) Before deploying the extension we need to make sure that we have deploy... [12:23:50] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10Ladsgroup) >>! In T319170#9008593, @isarantopoulos wrote: > I added the envoy proxy for... [13:05:31] 10Machine-Learning-Team, 10DC-Ops: hw troubleshooting: iDrac stuck for ores2003.codfw.wmnet - https://phabricator.wikimedia.org/T341657 (10klausman) 05Resolved→03Open Correction: the iDrac is still down. Note to self: the AM UI hides acknowledged alerts. [13:32:20] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10MW-1.41-notes (1.41.0-wmf.17; 2023-07-11), 10Patch-For-Review: Move backend of ORES MediaWiki extension to Lift Wing - https://phabricator.wikimedia.org/T319170 (10isarantopoulos) The following models are available in ORES but not in Lift Wing: ` arti... [14:10:57] 10Machine-Learning-Team: proposed: observability - https://phabricator.wikimedia.org/T341693 (10calbon) [14:12:50] 10Machine-Learning-Team: proposed: inference batching, swagger ui to the services - https://phabricator.wikimedia.org/T341694 (10calbon) [14:13:49] 10Machine-Learning-Team: proposed: host an llm - https://phabricator.wikimedia.org/T341695 (10calbon) [14:14:06] 10Machine-Learning-Team: proposed: completely done with ORES - https://phabricator.wikimedia.org/T341696 (10calbon) [14:14:47] 10Machine-Learning-Team: proposed: carry over Lift Wing MVP improvements - https://phabricator.wikimedia.org/T341697 (10calbon) [14:15:56] 10Machine-Learning-Team: proposed: Zero Traffic On Bare Metal ORES - https://phabricator.wikimedia.org/T341696 (10calbon) [14:16:12] 10Machine-Learning-Team: Proposed Goal: Zero Traffic On Bare Metal ORES - https://phabricator.wikimedia.org/T341696 (10calbon) [14:16:40] 10Machine-Learning-Team: Proposed: revert risk reliability - https://phabricator.wikimedia.org/T341698 (10calbon) [14:17:00] 10Machine-Learning-Team: Proposed Goal: Zero traffic on bare metal ORES servers - https://phabricator.wikimedia.org/T341696 (10calbon) [14:20:25] 10Machine-Learning-Team: Goal: Define GPUs we will buy - https://phabricator.wikimedia.org/T341699 (10calbon) [14:23:22] 10Machine-Learning-Team, 10DC-Ops: hw troubleshooting: iDrac stuck for ores2003.codfw.wmnet - https://phabricator.wikimedia.org/T341657 (10klausman) a:05klausman→03None [14:25:47] 10Machine-Learning-Team: Proposed Goal: One SLO for every important service - https://phabricator.wikimedia.org/T341693 (10calbon) [14:25:58] 10Machine-Learning-Team: Proposed Goal: Defined and measured SLO for every production service - https://phabricator.wikimedia.org/T341693 (10calbon) [14:29:05] 10Machine-Learning-Team: Proposed Goal: Defined and orders the GPUs for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:30:49] 10Machine-Learning-Team: Proposed Goal: Swagger UI implemented for every production inference service - https://phabricator.wikimedia.org/T341701 (10calbon) [14:31:39] 10Machine-Learning-Team: Proposed Goal: Defined and ordered the GPUs for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:33:16] 10Machine-Learning-Team: Proposed Goal: Inference batching is tested to our satisfaction - https://phabricator.wikimedia.org/T341702 (10calbon) [14:34:52] 10Machine-Learning-Team: Proposed Goal: Hosting a production version of an LLM - https://phabricator.wikimedia.org/T341695 (10calbon) [14:35:59] 10Machine-Learning-Team: Proposed Goal: Hosting a production ready version of an LLM - https://phabricator.wikimedia.org/T341695 (10calbon) [14:39:02] 10Machine-Learning-Team: Proposed Lift Wing announced at MVP to the public - https://phabricator.wikimedia.org/T341703 (10calbon) [14:39:12] 10Machine-Learning-Team: Proposed Goal: Lift Wing announced at MVP to the public - https://phabricator.wikimedia.org/T341703 (10calbon) [14:39:56] 10Machine-Learning-Team: Proposed Goal: Content Recommendation API migration completed - https://phabricator.wikimedia.org/T341704 (10calbon) [14:46:38] 10Machine-Learning-Team: Proposed Goal: Defined GPUs to purchase and order 2 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:46:55] 10Machine-Learning-Team: Proposed Goal: Order 2 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:49:50] 10Machine-Learning-Team: Proposed Goal: Order and install 2 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:53:29] 10Machine-Learning-Team: Proposed Goal: Order 2-4 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:53:43] 10Machine-Learning-Team: Stretch Goal: Swagger UI implemented for every production inference service - https://phabricator.wikimedia.org/T341701 (10calbon) [14:53:59] 10Machine-Learning-Team: Stretch Goal: Inference batching is tested to our satisfaction - https://phabricator.wikimedia.org/T341702 (10calbon) [14:54:06] 10Machine-Learning-Team: Stretch Goal: Hosting a production ready version of an LLM - https://phabricator.wikimedia.org/T341695 (10calbon) [14:54:17] 10Machine-Learning-Team: Goal: Order 2-4 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [14:54:24] 10Machine-Learning-Team: Goal: Content Recommendation API migration completed - https://phabricator.wikimedia.org/T341704 (10calbon) [14:54:30] 10Machine-Learning-Team: Goal: Defined and measured SLO for every production service - https://phabricator.wikimedia.org/T341693 (10calbon) [14:54:35] 10Machine-Learning-Team: Goal: Zero traffic on bare metal ORES servers - https://phabricator.wikimedia.org/T341696 (10calbon) [14:54:41] 10Machine-Learning-Team: Goal: Lift Wing announced at MVP to the public - https://phabricator.wikimedia.org/T341703 (10calbon) [14:55:39] 10Machine-Learning-Team: Goal: Support WME migration to Lift Wing - https://phabricator.wikimedia.org/T341698 (10calbon) [15:03:48] 10Machine-Learning-Team, 10Epic: Goal: Lift Wing announced at MVP to the public - https://phabricator.wikimedia.org/T341703 (10calbon) [15:04:25] 10Machine-Learning-Team: proposed: carry over Lift Wing MVP improvements - https://phabricator.wikimedia.org/T341697 (10calbon) 05Open→03Invalid [15:04:37] 10Machine-Learning-Team: proposed: inference batching, swagger ui to the services - https://phabricator.wikimedia.org/T341694 (10calbon) 05Open→03Invalid [15:06:12] 10Machine-Learning-Team, 10Epic: Goal: Zero traffic on bare metal ORES servers - https://phabricator.wikimedia.org/T341696 (10calbon) [15:06:19] 10Machine-Learning-Team, 10Epic: Goal: Defined and measured SLO for every production service - https://phabricator.wikimedia.org/T341693 (10calbon) [15:06:27] 10Machine-Learning-Team, 10Epic: Goal: Content Recommendation API migration completed - https://phabricator.wikimedia.org/T341704 (10calbon) [15:06:48] 10Machine-Learning-Team, 10Epic: Goal: Support WME migration to Lift Wing - https://phabricator.wikimedia.org/T341698 (10calbon) [15:06:56] 10Machine-Learning-Team, 10Epic: Goal: Order 2-4 GPU for Lift Wing - https://phabricator.wikimedia.org/T341699 (10calbon) [15:07:07] 10Machine-Learning-Team, 10Epic: Stretch Goal: Swagger UI implemented for every production inference service - https://phabricator.wikimedia.org/T341701 (10calbon) [15:07:24] 10Machine-Learning-Team, 10Epic: Stretch Goal: Inference batching is tested to our satisfaction - https://phabricator.wikimedia.org/T341702 (10calbon) [15:07:32] 10Machine-Learning-Team, 10Epic: Stretch Goal: Hosting a production ready version of an LLM - https://phabricator.wikimedia.org/T341695 (10calbon) [15:12:58] 10Machine-Learning-Team, 10Patch-For-Review: Host open source LLM (bloom, etc.) on Lift Wing - https://phabricator.wikimedia.org/T333861 (10calbon) 05Open→03Declined [16:01:37] 10Machine-Learning-Team: [ores-legacy] add message that v1 support for ORES has been dropped - https://phabricator.wikimedia.org/T341486 (10isarantopoulos) [16:02:02] 10Machine-Learning-Team, 10Epic: Goal: Order 2-4 GPU for Lift Wing and Statbox - https://phabricator.wikimedia.org/T341699 (10calbon) [16:02:40] 10Machine-Learning-Team: [ores-legacy] add message that v1 support for ORES has been dropped - https://phabricator.wikimedia.org/T341486 (10isarantopoulos) a:03isarantopoulos [16:08:58] * elukey afk! [16:09:00] o/ [16:41:33] (03PS1) 10Ilias Sarantopoulos: ores-legacy: increase logging level to DEBUG [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/937506 [16:42:17] I'm going afk as well. cu tomorrow folks!