[04:59:05] (03PS1) 10Santhosh: Improve logging and exception handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) [05:00:26] (03CR) 10CI reject: [V:04-1] Improve logging and exception handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [05:04:08] (03PS2) 10Santhosh: Improve logging and exception handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) [05:04:46] (03CR) 10CI reject: [V:04-1] Improve logging and exception handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [05:47:29] (03PS3) 10Santhosh: Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) [05:47:59] (03PS4) 10Santhosh: Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) [05:48:36] (03CR) 10CI reject: [V:04-1] Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [05:54:00] (03PS5) 10Santhosh: Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) [07:55:46] Santhosh is fixing the deployment blocker.. [08:01:18] kevinbazira: I'll attempt a deploy in staging once deployment patch is merged. Let's see how it goes :) [08:09:23] and it seems successful! [08:11:49] \o/ [08:11:52] niceeee [08:12:20] I saw the patch, makes sense, we had a similar issue in ML with wikidata as well, totally forgot about it sorry :( [08:27:33] elukey: should I go ahead with production deployment or wait for a while? [08:30:57] kart_: I'd defer to kevinbazira for this, but from my point of view if you tested the staging endpoint and everything looks good it seems ok to proceed [08:31:31] not sure what are the clients of the rec-api-ng endpoint, we should probably check post-deployment if everything looks good on their side too [08:33:21] kart_: nice! were you able to test the endpoint in staging? https://phabricator.wikimedia.org/P69386 [08:35:42] I've tested and they LGTM! please proceed to prod ... [08:38:34] Yes. Tested endpoints from the paste kevinbazira [08:45:54] kevinbazira: going ahead with production deployment.. [08:46:18] okok ... [08:50:02] eqiad seems happy [08:50:26] codfw as well [08:51:56] Done. Thanks kevinbazira and elukey [08:52:10] nice! [08:56:19] I'll do another round of deployment for Santhosh's long term fix to avoid www issue and some other changes. Mostly later today. [08:56:36] I've checked, pods are up and running in eqiad and codfw. Internal and external endpoints are returning the expected output. kart: congratulations on your first rec-api deployment 🎉 [08:57:32] Thanks! :) [11:17:00] (03CR) 10Nikerabbit: Avoid following redirects in external API calls, improve error handling (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [11:21:57] (03PS6) 10Santhosh: Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) [11:22:49] (03CR) 10Santhosh: Avoid following redirects in external API calls, improve error handling (031 comment) [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [12:28:52] 06Machine-Learning-Team, 06Infrastructure-Foundations, 06serviceops: Migrate the ownership of Docker images in production-images repo to mailing lists - https://phabricator.wikimedia.org/T373526#10316316 (10BTullis) Removing the #data-platform-sre tag because I think that our element of this has been complet... [18:31:04] (03CR) 10Sbisson: [C:03+2] Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [18:31:45] (03Merged) 10jenkins-bot: Avoid following redirects in external API calls, improve error handling [research/recommendation-api] - 10https://gerrit.wikimedia.org/r/1090590 (https://phabricator.wikimedia.org/T379592) (owner: 10Santhosh) [18:40:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [18:40:49] Deployment nlwiki-articlequality-predictor-default-00021-deployment in revscoring-articlequality at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - ... [18:40:54] https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revscoring-articlequality&var-deployment=nlwiki-articlequality-predictor-default-00021-deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [18:45:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [18:45:49] Deployment nlwiki-articlequality-predictor-default-00021-deployment in revscoring-articlequality at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - ... [18:45:49] https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revscoring-articlequality&var-deployment=nlwiki-articlequality-predictor-default-00021-deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [19:04:49] FIRING: KubernetesDeploymentUnavailableReplicas: ... [19:04:49] Deployment glwiki-articlequality-predictor-default-00021-deployment in revscoring-articlequality at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - ... [19:04:49] https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revscoring-articlequality&var-deployment=glwiki-articlequality-predictor-default-00021-deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [19:09:49] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [19:09:49] Deployment glwiki-articlequality-predictor-default-00021-deployment in revscoring-articlequality at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - ... [19:09:49] https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s-mlserve&var-namespace=revscoring-articlequality&var-deployment=glwiki-articlequality-predictor-default-00021-deployment - https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas