[01:20:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [01:20:49] Deployment cope-b-a4b-predictor-00001-deployment in experimental at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=experimental&var-deployment=cope-b-a4b-predictor-00001-deployment - ... [01:20:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [01:25:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [01:25:49] Deployment cope-b-a4b-predictor-00001-deployment in experimental at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=experimental&var-deployment=cope-b-a4b-predictor-00001-deployment - ... [01:25:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [03:36:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [03:36:49] Deployment cope-b-a4b-predictor-00001-deployment in experimental at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=experimental&var-deployment=cope-b-a4b-predictor-00001-deployment - ... [03:36:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [03:46:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [03:46:49] Deployment cope-b-a4b-predictor-00001-deployment in experimental at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=experimental&var-deployment=cope-b-a4b-predictor-00001-deployment - ... [03:46:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [04:55:34] (03PS1) 10Jelto: update blubber buildkit image to v1.8.0 [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1297797 (https://phabricator.wikimedia.org/T321316) [06:52:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [06:52:49] Deployment cope-b-a4b-predictor-00001-deployment in experimental at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=experimental&var-deployment=cope-b-a4b-predictor-00001-deployment - ... [06:52:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [06:57:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [06:57:49] Deployment cope-b-a4b-predictor-00001-deployment in experimental at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=experimental&var-deployment=cope-b-a4b-predictor-00001-deployment - ... [06:57:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [07:49:44] 06Machine-Learning-Team (Q4 FY2025-26), 13Patch-For-Review: Add Wikidata RevertRisk predictions to mediawiki.page_revert_risk_prediction_change - https://phabricator.wikimedia.org/T420883#11987656 (10isarantopoulos) @Ottomata Is there any straightforward/existing solution that would allow us to do backfilling... [08:08:28] 06Machine-Learning-Team (Q4 FY2025-26), 06Infrastructure-Foundations: Move all Machine Learning Docker images under the /ml prefix in the Docker Registry - https://phabricator.wikimedia.org/T428022#11987709 (10elukey) I had a chat with Aiko yesterday and she raised a very good point - is it easy to move blubbe... [08:26:58] (03CR) 10Nik Gkountas: [C:03+2] update blubber buildkit image to v1.8.0 [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1297797 (https://phabricator.wikimedia.org/T321316) (owner: 10Jelto) [08:28:44] (03Merged) 10jenkins-bot: update blubber buildkit image to v1.8.0 [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1297797 (https://phabricator.wikimedia.org/T321316) (owner: 10Jelto) [08:35:56] (03CR) 10Nik Gkountas: "This fix is not related to article/section suggestion fetching, but rather to page collection cache initialization/update. So I don't thin" [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1296651 (owner: 10Nik Gkountas) [08:57:25] (03PS2) 10Nik Gkountas: Fix unbounded concurrency in collection article fetching [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1296651 [09:02:53] 06Machine-Learning-Team (Q4 FY2025-26), 10Ceph, 06Infrastructure-Foundations, 10SRE-swift-storage: Move the Docker Registry's /ml prefix to S3/apus - https://phabricator.wikimedia.org/T420978#11987833 (10achou) [09:12:12] (03CR) 10Nik Gkountas: Fix unbounded concurrency in collection article fetching (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1296651 (owner: 10Nik Gkountas) [09:12:18] (03PS2) 10Nik Gkountas: Reuse gather_with_concurrency for other concurrent fetches [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1296652 [09:44:55] (03PS3) 10Nik Gkountas: Reuse gather_with_concurrency for other concurrent fetches [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1296652 [10:03:41] 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new: Update prod kserve/knative - https://phabricator.wikimedia.org/T426823#11987982 (10JMeybohm) AIUI the migration has been completed with Iab94dc8c4f064182ee55559aee93203d53cff66a Would you please make sure to remove the old `kserve` artifacts like chart... [10:05:09] 06Machine-Learning-Team (Q4 FY2025-26), 05Goal, 07OKR-Work: Q1 FY2025-26 Goal: Make article topic data available at scale and within SLOs for Year in Review - https://phabricator.wikimedia.org/T392833#11987986 (10BWojtowicz-WMF) **Status Update** It has been a great week as we managed to turn on the integra... [10:25:41] (03CR) 10Clément Goubert: feat(liftwing-openapi-server): Serve OpenAPI specs via Apache httpd (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1297072 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [10:40:36] (03PS12) 10Gkyziridis: feat(liftwing-openapi-server): Serve OpenAPI specs via Apache httpd [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1297072 (https://phabricator.wikimedia.org/T427902) [11:16:20] 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new: Update prod kserve/knative - https://phabricator.wikimedia.org/T426823#11988168 (10isarantopoulos) [11:41:02] 06Machine-Learning-Team (Q4 FY2025-26), 06Research, 05Goal, 13Patch-For-Review: Q4 FY2025-26 Goal: Text-to-Speech - https://phabricator.wikimedia.org/T419288#11988220 (10kevinbazira) **Weekly Update:** * Continued iterating on the feedback provided to fix issues reported in the TTS prototype ** Added word... [12:08:31] 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new, 10ServiceOps-SharedInfra: [draft] Access control for LiftWing LLM services exposed to external clients through REST Gateway - https://phabricator.wikimedia.org/T426749#11988379 (10isarantopoulos) #### Update: based on a discussion with Mediawiki Inter... [12:08:46] 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new, 10ServiceOps-SharedInfra: Access control for LiftWing LLM services exposed to external clients through REST Gateway - https://phabricator.wikimedia.org/T426749#11988391 (10isarantopoulos) [12:54:35] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new, 10ServiceOps-SharedInfra, 13Patch-For-Review: Expose LiftWing API for serving the openapi-specs through the /docs yaml files. - https://phabricator.wikimedia.org/T427902#11988607 (10gkyziridis) @Clement_Goubert thank for your comment... [12:54:42] (03CR) 10AikoChou: "We should also remove the leftover `src/models/liftwing_openapi_server/server.py`, right? It was added in Ibb0b359f as a CI workaround stu" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1297072 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [13:09:42] (03PS1) 10Bartosz Wójtowicz: outlink-topic-model: Optimize requests using `revision_id`. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1298246 [13:29:14] (03PS13) 10Gkyziridis: feat(liftwing-openapi-server): Serve OpenAPI specs via Apache httpd [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1297072 (https://phabricator.wikimedia.org/T427902) [13:32:54] (03PS1) 10Gkyziridis: Fix dublication of "builders" in revscoring blubber. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1298261 [13:34:27] (03CR) 10Gkyziridis: feat(liftwing-openapi-server): Serve OpenAPI specs via Apache httpd (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1297072 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [13:45:17] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new, 10ServiceOps-SharedInfra, 13Patch-For-Review: Expose LiftWing API for serving the openapi-specs through the /docs yaml files. - https://phabricator.wikimedia.org/T427902#11989079 (10Clement_Goubert) >>! In T427902#11988607, @gkyziridi... [15:15:48] (03CR) 10AikoChou: [C:03+1] "LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1297072 (https://phabricator.wikimedia.org/T427902) (owner: 10Gkyziridis) [15:16:07] (03CR) 10AikoChou: [C:03+1] Fix dublication of "builders" in revscoring blubber. [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1298261 (owner: 10Gkyziridis) [15:45:08] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26): Deploy CoPE-B-A4B on LiftWing - https://phabricator.wikimedia.org/T427497#11989609 (10kevinbazira) a:03kevinbazira [19:45:29] 10Lift-Wing, 06Machine-Learning-Team (Q4 FY2025-26), 06ServiceOps new, 10ServiceOps-SharedInfra, 13Patch-For-Review: Expose LiftWing API for serving the openapi-specs through the /docs yaml files. - https://phabricator.wikimedia.org/T427902#11990187 (10BPirkle) > As far as I can tell, the RestSandbox can... [22:50:25] 06Machine-Learning-Team, 06Editing-team, 10VisualEditor Suggestion Mode: Deploy LLM-generated MoS suggestions as experimental - https://phabricator.wikimedia.org/T428311 (10ppelberg) 03NEW [22:53:07] 06Machine-Learning-Team, 06Editing-team, 10VisualEditor Suggestion Mode: Deploy LLM-generated MoS suggestions as experimental - https://phabricator.wikimedia.org/T428311#11990490 (10ppelberg) [23:36:25] 06Machine-Learning-Team, 10EditCheck, 06Growth-Team, 10Revise-Tone-Structured-Task, and 3 others: Tone check: Improve handling of quoted content - https://phabricator.wikimedia.org/T426362#11990558 (10ppelberg) @esanders to be doubly sure I'm understanding... Would it be accurate for me to think this fix:...