[08:30:32] 10serviceops, 10CFSSL-PKI, 10Infrastructure-Foundations, 10Machine-Learning-Team, 10Patch-For-Review: Extend cfssl-issuer to return the Root CA certificate - https://phabricator.wikimedia.org/T299906 (10JMeybohm) a:03JMeybohm [08:32:13] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: kube-apiserver need to reach webhooks running inside of the cluster - https://phabricator.wikimedia.org/T290967 (10JMeybohm) [09:10:46] 10serviceops, 10CFSSL-PKI, 10Infrastructure-Foundations, 10Machine-Learning-Team, 10Patch-For-Review: Extend cfssl-issuer to return the Root CA certificate - https://phabricator.wikimedia.org/T299906 (10JMeybohm) [09:39:24] 10serviceops, 10CFSSL-PKI, 10Infrastructure-Foundations, 10Machine-Learning-Team, 10Patch-For-Review: Extend cfssl-issuer to return the Root CA certificate - https://phabricator.wikimedia.org/T299906 (10JMeybohm) [11:47:12] the canaries now seem to have an additional service dc=eqiad,dc=appserver,service=canary which has weight 0, is that intentional? [11:47:45] it prevents pool/depool from working with "You cannot pool a node where weight is equal to 0" [11:51:06] <_joe_> moritzm: I wouldn't know! [11:51:24] <_joe_> but adding a weight won't harm anyone [11:53:49] <_joe_> moritzm: uhm so the canary role is there since july [11:54:16] <_joe_> and I see, we only ever do a restart, where we don't care about pooling all services [11:54:26] <_joe_> if a service is depooled, we leave it like that [11:54:45] yeah, was added by Jelto in https://github.com/wikimedia/puppet/commit/d77457a0f34d36a34cb773969678b90a6f1e8d24 for https://phabricator.wikimedia.org/T279309 [11:58:00] <_joe_> yeah but was just moved from another position AIUI [11:58:10] <_joe_> so two servers were ok, two still had weight zero [11:58:12] <_joe_> ok anyways [11:58:16] <_joe_> fixed now [12:01:29] ack, thx! [12:36:26] 10serviceops, 10CFSSL-PKI, 10Infrastructure-Foundations, 10Prod-Kubernetes, 10Kubernetes: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) p:05Triage→03Low [13:29:58] hi, is anyone around to help debug why I can't curl the staging instances for the linkrecommendation chart? I just did a deploy (to bump the image version) that also updated the chart to 0.2.0, I don't know if that is related. [13:30:41] 10serviceops, 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10ops-monitoring-bot) Icinga downtime set by jelto@cumin1001 for 6:00:00 1 host(s) and their services with reason: move gitla... [13:45:51] kostajh: I guess I'm responsible for the chart bump to 0.2.0. What curl command do you use to test the linkrecommendation service? For me the service is reachable at nodePort 4005 (for example curl https://staging.svc.eqiad.wmnet:4005/apidocs/). What endpoint/url do you use? [13:47:26] 10serviceops, 10Security-Team, 10GitLab (CI & Job Runners), 10Patch-For-Review, and 2 others: Setup GitLab Runner in trusted environment - https://phabricator.wikimedia.org/T295481 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jelto@cumin1001 for hosts: `gitlab-runner1001.eqiad.wmnet`... [14:34:50] jelto: I use `service-checker-swagger staging.svc.eqiad.wmnet https://staging.svc.eqiad.wmnet:4005 -t 2 -s /apispec_1.json` [14:35:01] and also `curl "https://staging.svc.eqiad.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15"` [14:35:54] We don't deploy that often, I'm going by what my team documented a while back on https://wikitech.wikimedia.org/wiki/Add_Link#Deployment [14:58:12] kostajh: indeed, the curl with Barack Obama is not working. I checked the logs of the tls proxy and it returns a 503 from the application container. If I query the application container directly, I get a empty reply. I would assume the issue more on the application side. Could you try to go back to the old image version and check if the issue is still present? [14:58:46] OK, it's certainly possible. Will revert. Sorry for the trouble! [15:00:51] no problem. But from what I see in the logs I don't see a correlation between the chart update of 0.2.0 and this behavior. If you still have problems with the service let me know, then we can take a look again! [15:04:25] jelto: all is fine after a revert. So yeah, the problem was with the image. [15:05:13] I was confused that I was seeing what looked like a timeout rather than an error or error code when running the curl request [15:08:18] kostajh: I can confirm, the curl is working again with the old image and the chart linkrecommendation-0.2.0 [16:01:07] 10serviceops, 10Release-Engineering-Team, 10Scap: Deploy Scap version 4.2.0 - https://phabricator.wikimedia.org/T300058 (10dancy) [18:15:48] 10serviceops, 10GitLab (Infrastructure): Migrate gitlab-test instance to puppet - https://phabricator.wikimedia.org/T297411 (10Dzahn) @Jelto Thank you! I deployed your amended change. looked good now. compiled and deployed in production, confirmed noop. Then re-applied the gitlab role on our cloud instance.... [19:48:02] 10serviceops, 10Deployments, 10Release-Engineering-Team (Radar), 10Sustainability (Incident Followup), 10User-jijiki: Remove provisioning for 'mwscript', 'foreachwikiindblist' etc from deployment host - https://phabricator.wikimedia.org/T253822 (10dancy) I'm in favor of removing /srv/mediawiki entirely f... [21:24:26] 10serviceops, 10GitLab (CI & Job Runners): upgrade gitlab-runners to bullseye - https://phabricator.wikimedia.org/T297659 (10brennen)