[04:45:51] (03PS1) 10Santhosh: Add CXSERVER_HEADER config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060192 (https://phabricator.wikimedia.org/T371465) [04:47:40] 06Machine-Learning-Team, 10MW-1.43-notes (1.43.0-wmf.17; 2024-08-06), 07OKR-Work, 13Patch-For-Review: Deploy Modernized Recommendation API to LiftWing - https://phabricator.wikimedia.org/T371465#10047305 (10santhosh) @kevinbazira I added `CXSERVER_HEADER` config value to match the env values in https://ger... [04:52:22] (03PS1) 10Santhosh: Remove unused method get_related_articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060193 [05:18:16] (03CR) 10Kevin Bazira: [C:03+1] Add CXSERVER_HEADER config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060192 (https://phabricator.wikimedia.org/T371465) (owner: 10Santhosh) [05:48:48] (03CR) 10KartikMistry: [C:03+2] Add CXSERVER_HEADER config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060192 (https://phabricator.wikimedia.org/T371465) (owner: 10Santhosh) [05:49:28] (03Merged) 10jenkins-bot: Add CXSERVER_HEADER config [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060192 (https://phabricator.wikimedia.org/T371465) (owner: 10Santhosh) [06:11:23] (03PS1) 10Santhosh: Add support for using both topic and seed filters [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060333 [06:20:47] (03CR) 10KartikMistry: [C:03+2] Remove unused method get_related_articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060193 (owner: 10Santhosh) [06:21:26] (03Merged) 10jenkins-bot: Remove unused method get_related_articles [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1060193 (owner: 10Santhosh) [06:36:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:41:44] FIRING: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [06:46:44] RESOLVED: [2x] LiftWingServiceErrorRate: LiftWing service has a high rate of non 2/3/400 error code responses - https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Alerts#LiftWingServiceErrorRate - https://alerts.wikimedia.org/?q=alertname%3DLiftWingServiceErrorRate [08:12:13] So this was hewiki [09:02:10] o/ if it is the "usual" issue, we may need to investigate a bit more the known root cause and possibly improve the parser's performances. aiko IIRC we didn't receive any update from upstream right? Maybe worth to follow up again? [09:18:50] do you ahve the bug for that handy, I'm trying to find ti [09:22:36] Ah. PyEnchant hasn't had a non-rc release since 2021 :-/ [09:23:21] Ans we already use that latest release [09:26:16] Or did you mean mwpfh? [10:12:50] the latter, we don't have a bug to upstream, Aiko contacted the main dev for some follow ups [10:13:40] I know that we are all for moving people to newer models, but the older ones should get some attention too :( [10:15:54] yeah, agreed [10:55:15] * klausman lunch [11:28:40] (03PS4) 10AikoChou: readability: updates according to the new TRank model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1059032 (https://phabricator.wikimedia.org/T369712) [11:51:37] (03CR) 10AikoChou: [C:03+2] "Thanks for the review! :)" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1059032 (https://phabricator.wikimedia.org/T369712) (owner: 10AikoChou) [11:52:19] (03Merged) 10jenkins-bot: readability: updates according to the new TRank model [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1059032 (https://phabricator.wikimedia.org/T369712) (owner: 10AikoChou) [13:28:51] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1060437 [13:29:05] ---^ update readability model [13:40:01] +1'd [13:45:05] good morning all [15:13:54] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: ml-serve2001.codfw.wmnet: continued uncorrectable ECC errors - https://phabricator.wikimedia.org/T371872#10048594 (10Jhancock.wm) this one has been out of warranty for more than a half a year. We do have a spare DIMM on hand to repl... [15:23:23] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: ml-serve2001.codfw.wmnet: continued uncorrectable ECC errors - https://phabricator.wikimedia.org/T371872#10048641 (10Jhancock.wm) a:03Jhancock.wm [15:34:10] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: ml-serve2001.codfw.wmnet: continued uncorrectable ECC errors - https://phabricator.wikimedia.org/T371872#10048669 (10klausman) @Jhancock.wm machine is drained, feel free to proceed. [15:36:07] (03PS13) 10Jsn.sherman: Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4) [15:44:57] (03CR) 10CI reject: [V:04-1] Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4) [15:54:02] 06Machine-Learning-Team, 13Patch-For-Review: Reorganize LiftWing isvcs repo structure to improve maintainability - https://phabricator.wikimedia.org/T369344#10048711 (10kevinbazira) I've deployed langid in staging, but the pod is running into a [[ https://phabricator.wikimedia.org/P67245 | CrashLoopBackOff ]]... [15:54:52] (03PS14) 10Jsn.sherman: Adds article topic model to ORES [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4) [15:55:59] klausman: thank you for the reviews :) [15:55:59] the langid deployment is running into a similar issue we run into with outlink: https://phabricator.wikimedia.org/T369344#10048711 [15:55:59] going to test the solution we used then on langid and push a patch [15:56:54] ack! ping me if you require anything [15:57:42] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: hw troubleshooting: ml-serve2001.codfw.wmnet: continued uncorrectable ECC errors - https://phabricator.wikimedia.org/T371872#10048716 (10klausman) 05Open→03Resolved Machine has had DIMM replaced and is back in service. [16:24:56] (03PS1) 10Kevin Bazira: langid: match python module usage with other isvcs [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1060471 (https://phabricator.wikimedia.org/T369344) [16:33:52] (03CR) 10Kevin Bazira: "For more context, this fix is similar to: https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/1055195" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1060471 (https://phabricator.wikimedia.org/T369344) (owner: 10Kevin Bazira) [16:34:17] --^ pushed a patch with the fix. please review whenever you get a minute. thanks! [16:42:36] (03CR) 10Jsn.sherman: "Based on a read through the code looks reasonable, but I never got this working locally like I have with the existing models; I just naivi" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/1035044 (https://phabricator.wikimedia.org/T218132) (owner: 10Rockingpenny4) [16:58:44] 06Machine-Learning-Team, 06DC-Ops, 10ops-codfw, 06SRE: Q1:rack/setup/install ml-serve20[09-11] - https://phabricator.wikimedia.org/T371920#10049008 (10Jhancock.wm)