[06:24:57] 10Machine-Learning-Team, 10ORES: Help migrate SDZeroBot to Lift Wing - https://phabricator.wikimedia.org/T342960 (10isarantopoulos) Hi @SD0001, thanks for merging that PR! I see that all requests coming from SDZeroBot return a 200 response now. At the moment you can't request multiple revision ids in the same... [06:35:41] Good morning! o/ [06:35:41] SDZeroBot and EyeBot have done the appropriate changes and get 200 responses now (both were getting 400s because of too many revision ids in the requests) [07:26:16] nice work isaranto! [07:29:53] (03CR) 10Elukey: [C: 03+1] ores-legacy: fix 500 issues with wrong wikiId [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961408 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [07:33:23] well, I mostly pinged folks! either way it is nice to see all those errors gone :) [07:36:28] (03CR) 10Ilias Sarantopoulos: [C: 03+2] ores-legacy: fix 500 issues with wrong wikiId [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961408 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [07:40:04] (03Merged) 10jenkins-bot: ores-legacy: fix 500 issues with wrong wikiId [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961408 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [07:55:33] (03PS1) 10Ilias Sarantopoulos: ores-legacy: fix pass a string to get_check_models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) [07:56:38] oh well, one more patch and one more unit test. [07:56:39] the patch I supplied solved this https://ores.wikimedia.org/v3/scores/commonswiki/1234/damaging BUT not this [07:56:39] https://ores.wikimedia.org/v3/scores/enwiki/12345/reverted?features=true [07:59:03] sry fot the back and forth [08:00:02] (03CR) 10Elukey: ores-legacy: fix pass a string to get_check_models (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [08:03:33] (03CR) 10Ilias Sarantopoulos: ores-legacy: fix pass a string to get_check_models (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [08:31:14] (03PS2) 10Ilias Sarantopoulos: ores-legacy: fix pass a string to get_check_models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) [08:31:57] added a docstring --^ [08:38:11] 10Machine-Learning-Team: Upgrade outlink docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347549 (10achou) [08:38:46] (03CR) 10Elukey: [C: 03+1] ores-legacy: fix pass a string to get_check_models (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [08:38:54] 10Machine-Learning-Team: Upgrade outlink docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347549 (10achou) [08:38:56] 10Machine-Learning-Team: Update to KServe 0.11 - https://phabricator.wikimedia.org/T337213 (10achou) [08:42:14] 10Machine-Learning-Team: Upgrade Revert Risk Language-agnostic docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347550 (10achou) [08:42:42] 10Machine-Learning-Team: Upgrade Revert Risk Language-agnostic docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347550 (10achou) [08:42:44] 10Machine-Learning-Team: Update to KServe 0.11 - https://phabricator.wikimedia.org/T337213 (10achou) [08:44:56] 10Machine-Learning-Team: Upgrade Revert Risk Multilingual docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347551 (10achou) [08:45:10] aiko,klausman - o/ what is the status of readability? Are we done or do we still need to add things etc.? [08:45:25] 10Machine-Learning-Team: Update to KServe 0.11 - https://phabricator.wikimedia.org/T337213 (10achou) [08:45:27] 10Machine-Learning-Team: Upgrade Revert Risk Multilingual docker images to KServe 0.11 - https://phabricator.wikimedia.org/T347551 (10achou) [08:47:47] elukey: I think now it is only missing API docs [08:48:05] and the SLO dashboard right? [08:48:30] ahhh yes [08:49:31] I'll work on the API docs today [08:49:47] super thanks :) [08:49:59] klausman: when you have a moment can you add the SLO dashboard? [08:50:03] so we can wrap up the task etc.. [08:50:50] ack [08:51:08] SLO I'll do in a moment. Readability only needs the API docs updated [08:51:46] oh, aiko alread answered :D [08:53:13] :D [08:54:04] Did we agree on specific numbers for readability? [08:54:20] Or should we just go with the sorta default 95$/5000ms? [08:54:37] let's go with the defaults [08:58:17] (03CR) 10Ilias Sarantopoulos: [C: 03+2] ores-legacy: fix pass a string to get_check_models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [08:59:12] (03Merged) 10jenkins-bot: ores-legacy: fix pass a string to get_check_models [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/961690 (https://phabricator.wikimedia.org/T347480) (owner: 10Ilias Sarantopoulos) [09:07:11] 10Machine-Learning-Team, 10ORES, 10Patch-For-Review: [ores-legacy] Model not available should return 404 instead of 500 response code - https://phabricator.wikimedia.org/T347480 (10isarantopoulos) All the mentioned 5xx errors have been fixed! [09:09:44] 10Machine-Learning-Team, 10ORES: User-scripts running on Wikipedia can no longer use ORES (CORS issue) - https://phabricator.wikimedia.org/T347344 (10Ciell) I can confirm the scripts on Dutch Wikipedia are working again. As far as I can see, same for en-wp. [09:13:13] elukey: grafana grizzly preview for the Dashboard is at https://grafana.wikimedia.org/dashboard/snapshot/hvy8dubzBWoEAHkx0GygYjn0LQe4eKTb?orgId=1 [09:21:42] 10Machine-Learning-Team, 10ORES: User-scripts running on Wikipedia can no longer use ORES (CORS issue) - https://phabricator.wikimedia.org/T347344 (10isarantopoulos) The issue has indeed been fixed. We will resolve this task as soon we apply the permanent fix descibed by @elukey [[ https://phabricator.wikimedi... [09:41:38] isaranto: I finally have a chain of fixes for CORS :D - starting from https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/961379 [09:48:08] \o/ [09:58:46] 10Machine-Learning-Team, 10Patch-For-Review, 10Research (FY2023-24-Research-July-September): Deploy multilingual readability model to LiftWing - https://phabricator.wikimedia.org/T334182 (10achou) API documentation has been added to the API Portal: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_... [10:11:16] * aiko lunch [10:26:11] aiko: really nice doc in https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_readability_prediction [10:26:14] thanks! [10:26:21] * isaranto luuuunch [10:37:49] * elukey lunch [10:41:11] * klausman doc appt and quick lunch after [12:54:13] merging the long chain of patches for CORS support [12:54:58] the deploy should be a no-op [12:56:12] Mornin [12:56:22] elukey: I am having a hard time figuring out which the right instance name for the ProbeDown silence would be [12:56:37] chrisalbon: o/ [12:56:48] ores.wikimedia.org? ores.discovery.wwnet? ores:443? [12:56:57] hey chris \o [12:57:20] klausman: not super easy yes, I never done it.. in theory right after the merge you'll need to roll restart pybals (coordinating with traffic first) and some alerts may fire [12:57:28] but nothing will page, I already added the false flag [12:58:08] Ok. I was going by https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service which says to add a silence first [12:58:44] good point, never really done it before [13:00:28] I'll poke #wm-o11y [13:06:17] 10Machine-Learning-Team, 10ORES: Cannot set Api-User-Agent header when making requests to ORES from a user script - CORS - https://phabricator.wikimedia.org/T347214 (10elukey) 05Open→03Resolved Applied a permanent fix, thanks for reporting! [13:06:54] isaranto: cors policy in place (permanently), all use cases seem to work, please when you have a moment double check :) [13:07:02] 10Machine-Learning-Team, 10ORES, 10Patch-For-Review: User-scripts running on Wikipedia can no longer use ORES (CORS issue) - https://phabricator.wikimedia.org/T347344 (10elukey) 05Open→03Resolved Applied the permanent fix, all use cases work afaics! Thanks for reporting :) [13:08:42] klausman: one qs - from https://wikitech.wikimedia.org/wiki/LVS#Remove_a_load_balanced_service it seems that we should remove the DNS discovery record first [13:09:04] and the "Remove network probes" may mean the puppet/service.yaml specific bits [13:09:12] ah, ack. [13:09:18] worth to ask to the traffic team [13:09:22] The DNS change is about to be sent for review [13:09:30] we can expand the docs with precise info afterwards [13:09:40] https://gerrit.wikimedia.org/r/c/operations/dns/+/961797 <- DNS change [13:10:36] Note that the ores DNS entry has a geoip record. I was wondering if we need to add that for o-l as well [13:12:22] klausman: there is one [13:12:31] oh, then I just missed it [13:13:37] klausman: see https://phabricator.wikimedia.org/T299700 as guidance [13:13:52] I think that the dns discovery record only means the .discovery bit, not the .svc. ones [13:13:55] at least, not yet [13:14:22] and indeed setting the status to `state: lvs_setup` is a nice tricky [13:14:24] *trick [13:15:14] but we need to ask to traffic [13:15:18] so we can expand the docs [13:15:38] https://gerrit.wikimedia.org/r/c/operations/puppet/+/961799 is the lvs_setup change [13:16:57] check what Keith did in the task, at some point he switched to service_setup, I am wondering if it was the right one (instead of lvs_setup) [13:17:15] ah snap I am an idiot, there are docs below [13:17:17] uff [13:17:19] sorry [13:17:26] I thought we had only the bullet points [13:17:44] grmbl. messed up my git state [13:18:07] ah ok lvs_setup first, then service_setup [13:19:54] https://gerrit.wikimedia.org/r/c/operations/dns/+/961802 fixed DNS change [13:22:34] klausman: for all the steps you'd need to get the signoff of traffic + their ack to proceed etc.. [13:26:37] ack, I usually send it to you first to minimize the amount of people that see completely broken changes :D [13:28:35] elukey: everything CORS related seems to work! thanks a lot for this fix :) [13:28:40] \o/ [13:29:37] klausman: ack yes, in this case go ahead without me, I'll be available if needed of course :) [13:29:49] thx :) [13:35:41] we have some more 5xx responses coming from requests trying to use a callback function through Javascript. e.g. /v3/scores/enwiki?callback=jsonp_ [13:36:17] I propose we return a 400 error message that these type of requests are not supported. wdyt? [13:39:18] Or just return the standard json response [13:57:58] nono +1 for a 400 [13:58:03] clear msg [14:24:24] elukey: Running into this during the second-last step of the LVS turndown (running run-puppet-agent on the ORES backends): https://phabricator.wikimedia.org/P52724 [14:25:16] yeah but it is on ores nodes, we can follow up later, the important bit is LVS nodes [14:26:30] Those are all done. Will continue with the doc steps (removing service stanza and conftool-data [14:30:08] super thanks [14:30:14] there are also lvs configs in puppet-ores [14:30:21] I think those are the responsible for the above error [14:30:29] (they should be in hiera [14:30:54] hieradata/role/common/ores.yaml:profile::lvs::realserver::pools: [14:30:54] modules/role/manifests/ores.pp: include ::profile::lvs::realserver [14:30:57] klausman: --^ [14:31:17] Will add them to the currently open patch, thanks! [14:31:38] pools is already gone [14:31:46] those should try to add loopback addresses for the LVS VIPs [14:37:36] Ok, we're done. [14:37:56] One (big) step closer to complete decom of ORES hw! [14:38:06] nice work! [14:40:44] Yesssssss [14:43:31] \o/ [14:50:11] niceeee work!!! \o/ 🎉 [15:03:13] 10Machine-Learning-Team, 10Patch-For-Review: Decommission ORES configurations and servers - https://phabricator.wikimedia.org/T347278 (10klausman) [15:08:47] taking a break and doing an eagle stomp [15:10:20] Going afk folks have a nice evening/ rest of day [15:13:11] 10Machine-Learning-Team: Decommission ORES configurations and servers - https://phabricator.wikimedia.org/T347278 (10MoritzMuehlenhoff) [15:13:29] 10Machine-Learning-Team: Decommission ORES configurations and servers - https://phabricator.wikimedia.org/T347278 (10MoritzMuehlenhoff) [15:38:31] good night you two [15:50:42] going afk too! o/ [15:50:56] night Luca! [16:32:40] 10Machine-Learning-Team, 10ORES: ORES extremely slow when to return when asking for multiple scores. - https://phabricator.wikimedia.org/T347612 (10Halfak)