[07:23:26] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of user نعمان حمداوي - https://phabricator.wikimedia.org/T345305 (10Ladsgroup) @isarantopoulos I think the threshold for arwiki is wrong [07:32:57] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES: Fatal Exception (MWException) in arwiki when opening prefrences - https://phabricator.wikimedia.org/T345320 (10Zabe) [07:34:05] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: Fatal Exception (MWException) in arwiki when opening prefrences - https://phabricator.wikimedia.org/T345320 (10Zabe) [07:34:11] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of user نعمان حمداوي - https://phabricator.wikimedia.org/T345305 (10Ammarpad) [07:34:13] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: Fatal Exception (MWException) in arwiki when opening prefrences - https://phabricator.wikimedia.org/T345320 (10Ammarpad) [07:34:41] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of user نعمان حمداوي - https://phabricator.wikimedia.org/T345305 (10Ammarpad) p:05Triage→03Unbreak! [07:44:14] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of user نعمان حمداوي - https://phabricator.wikimedia.org/T345305 (10isarantopoulos) I am investigating and will report back in a bit [07:45:31] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of user نعمان حمداوي - https://phabricator.wikimedia.org/T345305 (10hubaishan) This Bug is not only for user نعمان حمداوي, it is for most users, and onl... [07:45:47] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of user نعمان حمداوي - https://phabricator.wikimedia.org/T345305 (10Ladsgroup) soft is likelybad. And it's disabled in arwiki (and it's default I think)... [07:53:47] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of most users - https://phabricator.wikimedia.org/T345305 (10hubaishan) [08:05:52] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of most users - https://phabricator.wikimedia.org/T345305 (10isarantopoulos) You are right, it is the one that was returning null and we set it to false... [08:08:03] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of most users - https://phabricator.wikimedia.org/T345305 (10Ladsgroup) yeah, let's go with the latter. [08:23:59] hello folks! [08:24:07] isaranto: do you need any help with the mw extension? [08:27:22] isaranto: gonna deploy it now [08:30:09] 10Machine-Learning-Team, 10Patch-For-Review, 10Research (FY2023-24-Research-July-September): Deploy multilingual readability model to LiftWing - https://phabricator.wikimedia.org/T334182 (10MGerlach) >>! In T334182#9130664, @elukey wrote: > @MGerlach I added a step in https://wikitech.wikimedia.org/wiki/Mach... [08:34:08] o/ elukey: no we are ok. Thanks though :) [08:34:42] Amir1: thanks for the deployment. I need to start deploying myself as well. I'll also create to patch to enable Lift Wing [08:35:02] it's quite easy these days [08:40:48] isaranto: let's enable lift wing for one/two wikis maximum [08:41:06] maybe fiwiki that broke last time? Then we can contact them and ask if they can test it etc.. [08:41:44] Amir1: I still see and error in https://ar.wikipedia.org/wiki/Special:Preferences. does it work for you? [08:42:00] elukey: ack, I agree [08:42:02] the deployment is still ongoing, [08:42:07] today is quite slow [08:42:11] aa ok nevermind :) [08:51:06] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of most users - https://phabricator.wikimedia.org/T345305 (10hubaishan) it is OK at test server [08:55:03] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of most users - https://phabricator.wikimedia.org/T345305 (10Ladsgroup) yeah, I'm deploying but k8s cluster is being rebooted and thus deployments are q... [09:02:26] deployed the new drafttopic settings [09:02:36] the weird thing is that the old revision is not torn down [09:02:39] mmmm [09:03:27] this happened to me also last week when I deployed articlequality model server and Tobias removed it manually [09:03:40] I forgot to open a task so we can investigate [09:05:04] and the other pod was up without issues right? [09:05:07] not crash looping etc.. [09:05:23] yy [09:05:49] the issue was that desired replicas in the old revision was not set to 0 [09:16:07] it seems as if the controller fails to reconcile [09:25:41] mmm it must be some setting that we applied, it does the same on both eqiad and codfw [09:25:52] I tried to delete the knative autoscaler pods, there were some leader election weird errors [09:31:02] https://github.com/knative/serving/issues/2720 is interesting [09:33:52] so `kubectl get routes -n revscoring-drafttopic -o yaml` shows the correct revision [09:34:14] the old one should be eventually garbage collected [09:43:51] (03PS13) 10Ilias Sarantopoulos: read thresholds numeric values [extensions/ORES] - 10https://gerrit.wikimedia.org/r/948584 (https://phabricator.wikimedia.org/T343308) [09:44:45] (03CR) 10Ilias Sarantopoulos: read thresholds numeric values (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/948584 (https://phabricator.wikimedia.org/T343308) (owner: 10Ilias Sarantopoulos) [09:52:23] 10Machine-Learning-Team, 10MediaWiki-extensions-ORES, 10Wikimedia-production-error: MWException: Default '"soft"' is invalid for preference oresDamagingPref of most users - https://phabricator.wikimedia.org/T345305 (10Ladsgroup) 05Open→03Resolved [09:56:29] mmm it may be our config-gc settings [09:58:57] going afk for (early) lunch [10:14:55] Amir1: do you think we should deploy this -> https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/948584 along with LW activation? [10:15:48] isaranto: we technically can deploy it but rather not, we have to backport it to two deploment branches etc. It'll be messy [10:17:33] ok! I don't think there is going to be any issue to go without it, just the fact that it removes all queries for thresholds gave me more confidence (along with teh removal of the threshold config class) [10:17:56] we can deploy that before we proceed with the "big" wikis then [10:18:02] * isaranto going for lunch [10:48:12] usually the configuration should be treated separately from the code and maintain some level of b/c for a while [11:04:08] Ack! [11:53:12] isaranto: when I merge this, it'll take a week to reach production. I'm keen on merging it right now. Would that be okay? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/948584 [11:53:31] (we could shorten it by backporting next week) [11:58:15] Amir1: yes it is ok! It is more of a cleanup + a fix for beta [11:58:25] (03CR) 10Ladsgroup: [C: 03+2] read thresholds numeric values [extensions/ORES] - 10https://gerrit.wikimedia.org/r/948584 (https://phabricator.wikimedia.org/T343308) (owner: 10Ilias Sarantopoulos) [11:58:40] done [12:07:25] (03Merged) 10jenkins-bot: read thresholds numeric values [extensions/ORES] - 10https://gerrit.wikimedia.org/r/948584 (https://phabricator.wikimedia.org/T343308) (owner: 10Ilias Sarantopoulos) [12:14:39] 10Machine-Learning-Team, 10Item Quality Evaluator, 10Wikidata, 10wmde-wikidata-tech, and 2 others: Update API calls from ORES to Lift Wing - https://phabricator.wikimedia.org/T343731 (10noarave) [12:24:14] o/ elukey [12:25:06] o/ [12:25:17] I found the problem with the revisions [12:25:27] We are in the middle of deploying LW usage for itwiki and fiwiki and I noticed an issue on mwdebug [12:25:35] ah ok [12:25:52] while manually running a Job I got `Service failed to respond properly: Failed to make LiftWing request to [http://localhost:6031/v1/models/itwiki-damaging:predict], There was a problem during the HTTP request: 503 Service Unavailable` [12:26:08] I checked LW side and all seems ok. pod no errors , it is there for 3 days [12:26:28] I reran then job and it was fine. So could it be something with the proxy (?) [12:28:43] so from the istio dashboard, itwiki-damaging doesn't report any 503 [12:29:13] mw calls us via envoy proxy [12:29:21] that may have returned a 503 in theory [12:29:42] it is maybe a temporary hiccup, I'd say to proceed [12:29:45] yes that's the only thing that makes sense. Is there a way for me to validate that [12:29:50] ? [12:30:08] (we are procceeding since the jobs now run fine) [12:31:12] I am thinking, since it could be any of the mw nodes [12:31:18] not sure how to check [12:32:00] ah ok [12:32:01] isaranto: https://grafana.wikimedia.org/d/VTCkm29Wz/envoy-telemetry?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-origin=appserver&var-origin_instance=All&var-destination=inference&from=now-30m&to=now [12:32:36] Wow awesome thanks! [12:32:56] there was a connect timeout [12:33:05] around 12:11 UTC [12:33:09] does it match? [12:34:13] exactly! this was the first request which got a timeout [12:34:38] all subsequent requests are coming in fine [12:35:51] now I am wondering if the same thing happened to the first request for fiwiki or just the first request towards localhost:6031 got that timeout [12:37:04] it may be something related to new TCP connections, I've seen envoy doing it [12:41:01] ok. letting it go :) [12:41:06] thank youu [12:45:04] filed a fix for the stale revisions - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/954047 [12:45:17] I think it should be ok for the moment, we don't really need old revisions to stick around [12:45:36] lemme know your thoughts [12:50:19] reviewing now [12:51:59] <3 [12:52:39] +1 from me. I'm sceptical if it will resolve the issue though. The problem is that the old revision remains active [12:52:53] isaranto: it is fixed in staging, I already tested it :) [12:53:12] ok then! [12:53:41] the current settings clash with the minScale: 1, for some reason (there are corner cases from what I can read from upstream) that you cannot deliberately get rid of the old non-active revision [12:53:52] we didn't set a gc time, so it kept them indefinitely [12:54:34] (IIUC if you have docker images that take 10 mins to spin up, and you want a fast change in revision etc.., you want to have the non-active revision up etc.. to avoid paying the startup price) [12:54:45] but our use case is way different [12:54:58] curious though... when is a revision active? [12:56:10] IIUC when it has a knative route that points to it [12:56:19] (kubectl get route -n etc..) [12:56:35] you can also add annotations to a specific revision to avoid the GC [13:02:56] ok, thanks for clarifying! i checked route `itwiki-damaging-predictor-default` and it has a specified revision. iiuc so when this route is ready it is active [13:05:33] exactly [13:08:13] oof the deployment of knative-serving times out in codfw [13:08:13] sigh [13:08:46] I think it is the error with the webhook not coming up in time [13:15:56] * isaranto l8 lunch! [13:16:19] wrong message :) [13:16:50] commuting from coworking. will be in time for our meeting [13:24:27] tried the RC filters for itwiki, looks good afaics [13:24:36] the definitely damaging works [13:28:06] isaranto: let's send an email to wikitech-l about itwiki and fiwiki [13:28:15] (when you have time, not urgent) [13:36:41] Ack will do after the meeting [14:45:26] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team (Current Sprint): Automate unpublishing of add-a-link datasets - https://phabricator.wikimedia.org/T344799 (10Urbanecm_WMF) 05Open→03Resolved [14:45:30] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team: Remove models with poor evaluation metrics from the published datasets repo - https://phabricator.wikimedia.org/T344319 (10Urbanecm_WMF) [15:04:59] * elukey bbl [15:19:35] 10Machine-Learning-Team, 10Add-Link, 10Growth-Team: Remove models with poor evaluation metrics from the published datasets repo - https://phabricator.wikimedia.org/T344319 (10kevinbazira) 05Open→03Resolved a:03kevinbazira To prevent mishaps like T344319#9109329 in the future, we have automated the unpu... [15:19:41] 10Machine-Learning-Team, 10Add-Link, 10CommRel-Specialists-Support, 10Growth-Team, 10Chinese-Sites: Support languages whose add-a-link models were not published - https://phabricator.wikimedia.org/T309263 (10kevinbazira) [17:14:19] I sent an email to wikitech. logging off folks o/ [21:05:09] 10Machine-Learning-Team: Increase Lift Wing rate limit for ImpactVisualizer OAuth2 client - https://phabricator.wikimedia.org/T345394 (10Ragesoss)