[05:25:43] (03CR) 10Kevin Bazira: [C:03+2] test: update outlink transformer test image to support latest ci tests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092759 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [05:30:23] (03Merged) 10jenkins-bot: test: update outlink transformer test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1092759 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [05:36:01] (03PS1) 10Kevin Bazira: test: update langid predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097635 (https://phabricator.wikimedia.org/T360120) [05:37:19] (03CR) 10CI reject: [V:04-1] test: update langid predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097635 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [05:49:38] (03PS2) 10Kevin Bazira: test: update langid predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097635 (https://phabricator.wikimedia.org/T360120) [08:27:31] Hola! [09:21:54] 早上好! [09:46:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [09:46:49] Deployment recommendation-api-ng-main in recommendation-api-ng at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=recommendation-api-ng&var-deployment=recommendation-api-ng-main - ... [09:46:49] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [10:01:51] taking a look at the alerts above --^ [10:01:55] elukey: how's the mandarin studying going? [10:04:17] not great :D [10:05:33] it seems soooo tough [10:08:41] kart_: o/ the rec api pods fall in a CrashLoopBackOff state with the following logs https://phabricator.wikimedia.org/P71167 [10:09:24] since there are 5 replicas the service is up and running but they are restarting one after another [10:09:55] this is in eqiad [10:12:55] I don't see any spike in traffic (checked also over last 2 days) https://grafana.wikimedia.org/goto/Of3byE7Hg?orgId=1 [10:30:24] (03CR) 10Ilias Sarantopoulos: [C:03+1] "LGTM" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097635 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [10:34:24] Morning [10:34:35] isaranto: thanks for looking into the recapi alerts [10:42:16] morning Tobias! [10:43:00] is there anything that I can help with regarding the kserve updates? [10:43:36] klausman: shouldn't we aim for 0.14 directly? [10:43:58] I mean if there isn't any breaking change ofc [10:44:13] I am trying to figure out the magic combination of versions between kserve, knative-serving, knative-net-istio, istio and k8s [10:46:27] ack! [10:51:02] (03CR) 10Kevin Bazira: [C:03+2] test: update langid predictor test image to support latest ci tests (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097635 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [10:51:44] (03Merged) 10jenkins-bot: test: update langid predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097635 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [10:53:39] kevinbazira: o/ Hey! don't even bother changing the nsfw , no need to, it isn't being used [10:54:58] I'll be filing a patch for readability to test the new ci [10:55:13] isaranto: o/ thanks for the reviews! the nsfw is a quick change, I have already made and tested it. [10:56:07] ack! we dont need to test locally. we can just create the all the patches and CI(jenkins) will let us know if things are ok [10:57:05] (03PS1) 10Kevin Bazira: test: update logo_detection predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097984 (https://phabricator.wikimedia.org/T360120) [10:57:34] (03PS1) 10Ilias Sarantopoulos: ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) [10:59:02] (03CR) 10CI reject: [V:04-1] ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [10:59:06] (03PS1) 10Ilias Sarantopoulos: ci: update reference to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) [11:00:27] (03PS1) 10Ilias Sarantopoulos: ci: update revertrisk-multilingual to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) [11:01:27] (03PS1) 10Ilias Sarantopoulos: ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) [11:02:32] (03CR) 10CI reject: [V:04-1] ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [11:05:08] (03PS2) 10Ilias Sarantopoulos: ci: update revertrisk-multilingual to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) [11:05:31] (03PS2) 10Ilias Sarantopoulos: ci: update reference to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) [11:05:59] (03PS3) 10Ilias Sarantopoulos: ci: update reference-quality to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) [11:06:14] (03CR) 10Ilias Sarantopoulos: [C:03+1] test: update logo_detection predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097984 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [11:06:55] (03CR) 10Kevin Bazira: [C:03+2] test: update logo_detection predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097984 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [11:07:26] (03PS2) 10Ilias Sarantopoulos: ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) [11:07:44] klausman: Did we find anything else? We haven't change anything since last week in the code. [11:08:18] Sorry, I haven't been looking into the rec-api stuff, Ilias has all the state [11:08:32] oh OK :) [11:11:43] (03Merged) 10jenkins-bot: test: update logo_detection predictor test image to support latest ci tests [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097984 (https://phabricator.wikimedia.org/T360120) (owner: 10Kevin Bazira) [11:13:53] I didn't see anything else. Since it is related to the workers I suspect it is sth to do with resources on the application side - Since there was no spike in traffic perhaps some requests required more memory to compute (?) [11:14:28] Now that I rechecked I do see some memory spikes at the time of the alert (9:44 UTC) https://grafana.wikimedia.org/goto/WI7zXPnNR?orgId=1 [11:14:52] especially this one https://grafana.wikimedia.org/goto/6nrGXP7NR?orgId=1 [11:15:37] (03PS2) 10Ilias Sarantopoulos: ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) [11:16:14] (03CR) 10Ilias Sarantopoulos: "recheck" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [11:16:18] (03PS4) 10Ilias Sarantopoulos: ci: update reference-quality to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) [11:16:21] (03PS3) 10Ilias Sarantopoulos: ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) [11:16:24] (03PS3) 10Ilias Sarantopoulos: ci: update revertrisk-multilingual to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) [11:16:28] (03PS3) 10Ilias Sarantopoulos: ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) [11:22:02] (03CR) 10Kevin Bazira: ci: update readability to support latest ci entrypoint (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [11:47:32] * isaranto afk lunch! [12:00:31] (03CR) 10Kevin Bazira: [C:03+1] "The python module is used when we run `ci-unit`. When running `ci-lint` it's redundant. The rest LGTM!" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:08:07] (03CR) 10Ilias Sarantopoulos: "So do you think I should delete this? I follow what we did here https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-servic" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:09:57] (03CR) 10Ilias Sarantopoulos: "Nevermind, I just understood that you were referring to the python directory." [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:21:12] (03CR) 10Kevin Bazira: [C:03+1] "okok... yes, in the end we'll delete the if/else clauses in ci_entrypoint.sh" [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:21:28] (03CR) 10Kevin Bazira: [C:03+1] ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:21:34] (03CR) 10Kevin Bazira: [C:03+1] ci: update reference-quality to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:22:19] (03CR) 10Kevin Bazira: [C:03+1] ci: update readability to support latest ci entrypoint (031 comment) [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:27:02] (03CR) 10Ilias Sarantopoulos: [C:03+2] ci: update revertrisk-multilingual to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097987 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:27:16] (03PS4) 10Ilias Sarantopoulos: ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) [13:41:08] (03CR) 10Ilias Sarantopoulos: [C:03+2] ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:41:51] (03Merged) 10jenkins-bot: ci: update readability to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097985 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:42:20] (03PS4) 10Ilias Sarantopoulos: ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) [13:42:24] (03CR) 10Ilias Sarantopoulos: [C:03+2] ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:43:54] 06Machine-Learning-Team, 13Patch-For-Review: Run unit tests for the inference-services repo in CI - https://phabricator.wikimedia.org/T360120#10357623 (10isarantopoulos) [13:46:56] FIRING: [2x] KubernetesDeploymentUnavailableReplicas: Deployment recommendation-api-ng-main in recommendation-api-ng at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [13:51:35] (03Merged) 10jenkins-bot: ci: update revertrisk-wikidata to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097988 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [13:55:30] (03PS5) 10Ilias Sarantopoulos: ci: update reference-quality to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) [13:56:18] 06Machine-Learning-Team: Test the feasibility of deployment of Aya-23 model in LiftWing - https://phabricator.wikimedia.org/T379052#10357700 (10isarantopoulos) p:05Triage→03High [13:56:25] 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use Flash attention 2 for GPU inference - https://phabricator.wikimedia.org/T371344#10357701 (10isarantopoulos) p:05Medium→03High [13:56:47] 06Machine-Learning-Team, 10ORES: Rename ORES extension - https://phabricator.wikimedia.org/T377563#10357704 (10isarantopoulos) p:05Triage→03Low [13:56:58] 10Lift-Wing, 06Machine-Learning-Team: [articletopic-outlink] fetch data from mwapi using revid instead of article title - https://phabricator.wikimedia.org/T371021#10357716 (10isarantopoulos) p:05Triage→03Medium [14:01:37] 10Lift-Wing, 06Machine-Learning-Team: Load test LLMs - https://phabricator.wikimedia.org/T377225#10357724 (10isarantopoulos) Now that we have the ml-labs available, we can test the performance on ml-labs first. The work related to ml-labs highly overlaps with T377496 and we should use the same benchmarks for c... [14:11:49] FIRING: [2x] KubernetesDeploymentUnavailableReplicas: Deployment recommendation-api-ng-main in recommendation-api-ng at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [14:21:49] FIRING: [2x] KubernetesDeploymentUnavailableReplicas: Deployment recommendation-api-ng-main in recommendation-api-ng at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [14:30:18] I think we should just restart the rec-api service [14:45:54] yeah, at least to see if it helps [15:17:27] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Update kserve to v0.13.0 on ML clusters - https://phabricator.wikimedia.org/T380722#10357967 (10klausman) [15:17:53] 06Machine-Learning-Team, 06Data-Platform-SRE, 10Prod-Kubernetes, 06serviceops, and 2 others: Update knative-serving+net-istio to v1.12.x on ML clusters - https://phabricator.wikimedia.org/T380723#10357971 (10klausman) [15:25:32] 06Machine-Learning-Team, 07sre-alert-triage: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100) - https://phabricator.wikimedia.org/T380024#10357986 (10isarantopoulos) 05Open→03Resolved [15:33:11] (03PS1) 10Nik Gkountas: fix cache update in a single thread for production [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) [15:33:33] 10Lift-Wing, 06Machine-Learning-Team: Log and export preprocess size in inference services as a prometheus metric - https://phabricator.wikimedia.org/T374034#10358001 (10isarantopoulos) 05Open→03Resolved [15:33:39] 06Machine-Learning-Team: Update kserve to 0.13.1 - https://phabricator.wikimedia.org/T367048#10358005 (10isarantopoulos) p:05Medium→03Low [15:33:57] 06Machine-Learning-Team, 13Patch-For-Review: Run unit tests for the inference-services repo in CI - https://phabricator.wikimedia.org/T360120#10358008 (10isarantopoulos) p:05Medium→03Low [15:34:21] 06Machine-Learning-Team, 13Patch-For-Review: Run unit tests for the inference-services repo in CI - https://phabricator.wikimedia.org/T360120#10358011 (10isarantopoulos) p:05Low→03Medium [15:34:23] 06Machine-Learning-Team, 13Patch-For-Review: Run unit tests for the inference-services repo in CI - https://phabricator.wikimedia.org/T360120#10358012 (10isarantopoulos) p:05Medium→03Low [15:34:37] (03CR) 10CI reject: [V:04-1] fix cache update in a single thread for production [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) (owner: 10Nik Gkountas) [15:35:07] 10Lift-Wing, 06Machine-Learning-Team, 13Patch-For-Review: [LLM] Use vllm for ROCm in huggingface image - https://phabricator.wikimedia.org/T370149#10358013 (10isarantopoulos) p:05Medium→03High [15:48:57] (03PS1) 10Sbisson: Return [] instead of None or "" when an array is expected [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098068 (https://phabricator.wikimedia.org/T380838) [15:53:08] 06Machine-Learning-Team, 10ORES, 13Patch-For-Review: ORES doesn't work (at least for ru- and ukwiki) - https://phabricator.wikimedia.org/T362503#10358060 (10isarantopoulos) 05Open→03Resolved [15:56:36] (03PS2) 10Nik Gkountas: fix cache update in a single thread for production [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) [16:34:11] (03CR) 10Sbisson: fix cache update in a single thread for production (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) (owner: 10Nik Gkountas) [16:48:45] (03PS1) 10Sbisson: Minimal fix for zh-min-nan site name [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098090 (https://phabricator.wikimedia.org/T380838) [16:53:52] (03Abandoned) 10Ilias Sarantopoulos: ci: update reference-quality to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [16:53:55] (03Restored) 10Ilias Sarantopoulos: ci: update reference-quality to support latest ci entrypoint [machinelearning/liftwing/inference-services] - 10https://gerrit.wikimedia.org/r/1097986 (https://phabricator.wikimedia.org/T360120) (owner: 10Ilias Sarantopoulos) [17:02:28] klausman: could you delete the recapi pods that are in crashloopbackoff in eqiad? [17:02:45] will do [17:03:06] thanks, they would be recreated. I don't know if it would fix things as they have already been restarting [17:04:33] Done, but they are crashlooping again [17:08:46] thank you for doing that. I am checking the resources and the traffic again [17:14:34] (03PS3) 10Nik Gkountas: fix cache update in a single thread for production [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) [17:14:38] (03CR) 10Nik Gkountas: fix cache update in a single thread for production (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) (owner: 10Nik Gkountas) [17:37:20] isaranto: they're all crashlooping now [17:39:56] shall I increase their resources to see if it will fix anything? [17:40:23] Might be worth it, but I have little hope [17:40:48] since it is happening at both sites and traffic hasn't peaked I do believe it is on the software side, workers seem to fail to initialize [17:45:31] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1098103 [17:45:35] https://phabricator.wikimedia.org/P71186 [17:46:20] lol I missed that through the logs [17:46:27] kart_:^^^ any ideas if the errors in that paste are actionable? [17:49:27] I dunno much about the application internals, but that looks a lot like a query-of-death scenario to me. [17:49:45] I think the time is too late for kartik now [17:49:55] it seems related to the cache that has been added [17:50:12] Yes. Almost Midnight now, I have asked Nik and Stephane already. [17:50:46] We have a patch from Santhosh to fix, I should be able to deploy it tomorrow. [17:51:09] Roger! [17:51:31] isaranto: we can keep the resource patch around just in case, but I don't think it makes sense to push it now [17:51:45] ack [17:52:22] I abandoned it already, but we can revisit if we have any similar spikes [17:53:18] going afk folks, have a nice evening/rest of day! [17:54:49] \po [18:21:49] FIRING: [2x] KubernetesDeploymentUnavailableReplicas: Deployment recommendation-api-ng-main in recommendation-api-ng at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [18:42:57] (03PS1) 10Sbisson: Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) [18:43:36] (03CR) 10CI reject: [V:04-1] Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) (owner: 10Sbisson) [18:52:42] (03PS2) 10Sbisson: Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) [18:53:20] (03CR) 10CI reject: [V:04-1] Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) (owner: 10Sbisson) [19:01:29] (03PS3) 10Sbisson: Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) [19:14:35] (03PS4) 10Sbisson: Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) [19:16:49] FIRING: [2x] KubernetesDeploymentUnavailableReplicas: Deployment recommendation-api-ng-main in recommendation-api-ng at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [19:34:09] (03CR) 10Nik Gkountas: [C:03+2] Return [] instead of None or "" when an array is expected [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098068 (https://phabricator.wikimedia.org/T380838) (owner: 10Sbisson) [19:34:51] (03Merged) 10jenkins-bot: Return [] instead of None or "" when an array is expected [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098068 (https://phabricator.wikimedia.org/T380838) (owner: 10Sbisson) [19:37:23] (03PS1) 10Nik Gkountas: Use sitematrix and interwiki map to properly find dbname for links [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098124 (https://phabricator.wikimedia.org/T380838) [19:40:25] (03PS5) 10Sbisson: Cache update: skip iw links already discovered through wikidata [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098115 (https://phabricator.wikimedia.org/T380838) [19:56:40] (03CR) 10Sbisson: [C:03+2] fix cache update in a single thread for production [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) (owner: 10Nik Gkountas) [19:57:19] (03Merged) 10jenkins-bot: fix cache update in a single thread for production [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098064 (https://phabricator.wikimedia.org/T379036) (owner: 10Nik Gkountas) [20:00:46] (03PS1) 10Sbisson: Filter out articles in other NS for IW links [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098130 [20:01:25] (03CR) 10CI reject: [V:04-1] Filter out articles in other NS for IW links [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098130 (owner: 10Sbisson) [20:01:49] FIRING: [2x] KubernetesDeploymentUnavailableReplicas: Deployment recommendation-api-ng-main in recommendation-api-ng at codfw has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [20:03:11] (03PS2) 10Sbisson: Filter out articles in other NS for IW links [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098130 [20:33:03] (03PS1) 10Sbisson: Let the page collection cache return [] when empty [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1098139 (https://phabricator.wikimedia.org/T380838)