[07:01:44] 10serviceops, 10SRE: Provide node14 and node16 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10Joe) a:03Joe [07:34:14] <_joe_> jayme: I see at least one patch to production-images that was never pulled to build2001 [07:34:18] <_joe_> and it was for vcert manager [07:34:55] <_joe_> which means that if I try to build my image, it will build and publish yours as well [07:35:54] <_joe_> ah I guess you used deneb [07:41:17] let's move over the remaining systemd timers (for docker-report etc.) some time this week? then we can just as well retire deneb entirely [07:41:39] 10serviceops, 10SRE, 10Patch-For-Review: Provide node14 and node16 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10Joe) I've build and published the `nodejs14-slim` and the `nodejs16-slim` images, using the nodejs package from the components. One importaant t... [07:41:46] 10serviceops, 10SRE: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10Joe) [07:41:49] 10serviceops, 10SRE, 10Patch-For-Review: Provide node14 and node16 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10Joe) 05Open→03Resolved [07:41:51] there was a remaining use case for deneb to build the CAS debs, but that got resolved two weeks ago [07:42:04] <_joe_> ah ok [07:42:13] <_joe_> I was assuming deneb was gone until today :) [08:17:16] _joe_: hmm..I might have build that on deneb I guess [09:23:42] 10serviceops: move mw241[2-9].codfw.wmnet into production - https://phabricator.wikimedia.org/T307255 (10Jelto) `mw241[2-9]` where pooled in an incident this morning (accidentally depool and pool of codfw datacenter) . I run a `scap pull` on all machines to make sure they are up to date. I've just red about the... [09:29:44] 10serviceops, 10Wikimedia-Developer-Portal, 10Goal, 10Patch-For-Review, 10Service-deployment-requests: New Service Request: developer-portal - https://phabricator.wikimedia.org/T297140 (10akosiaris) @bd808, just for greater visibility, as I said in https://gerrit.wikimedia.org/r/c/773994, you can proceed... [09:51:11] akosiaris: In the context of a helm chart using ingress how should tls.public_port and service.nodePort be configured? [09:56:28] hnowlan: ah, now I understand the question. Nothing changes there, they are configured as usual. [09:57:55] the ingress part (where a single port is used by everything) is about not having to configure LVS (and thus the service::catalog entries) that change [10:03:57] actually, nodePort does not need to be configured (and will be ignored) in case of Ingress [10:05:37] tls.public_port you can define whatever you like, as that is not going to be a nodePort as well (read: it does no longer need to be unique per cluster) in case of Ingress [10:17:24] ah, makes sense - thanks! [10:49:28] <_joe_> I think expanding the docs to state that clearly could help :) [10:52:43] 10serviceops: move mw241[2-9].codfw.wmnet into production - https://phabricator.wikimedia.org/T307255 (10Joe) @Jelto what is the config-master issue exactly? I see the boxes here https://config-master.wikimedia.org/pybal/codfw/api-https and here https://config-master.wikimedia.org/pybal/codfw/appservers-https... [11:00:15] seems like I was mistaken regarding ingress. need to catch up with that thing... [11:06:14] 10serviceops: move mw241[2-9].codfw.wmnet into production - https://phabricator.wikimedia.org/T307255 (10Jelto) [11:07:51] 10serviceops: move mw241[2-9].codfw.wmnet into production - https://phabricator.wikimedia.org/T307255 (10Jelto) 05Open→03Resolved a:03Jelto >>! In T307255#7913372, @Joe wrote: > Maybe you got confused by the stale files there that we should remove for the non-https LVSes? exactly this ^. Thanks for clarif... [12:24:21] hi folks, I wanted to gently nudge this story to try to get it some attention: https://phabricator.wikimedia.org/T307351 [12:27:00] <_joe_> jnuche: that stuff is usually for the on clinic duty person to look at, I'll nudge them [12:27:32] _joe_: thanks [12:49:28] 10serviceops, 10GitLab (Infrastructure): bring new gitlab hardware servers into production - https://phabricator.wikimedia.org/T307142 (10Jelto) [13:21:17] 10serviceops, 10Data-Catalog, 10Data-Engineering, 10SRE, and 2 others: New Service Request: DataHub - https://phabricator.wikimedia.org/T303049 (10Ottomata) > For the duration of the upgrade plus some safety time windows before/after the upgrade, traffic will always be served from codfw and thus have the w... [13:28:19] _joe_: Did the nodejs14-slim and nodejs16-slim images get pushed? Locally I'm getting a 404 from the registry. [13:49:43] <_joe_> James_F: yes, it should [13:50:18] I can wait 'til next week to see if the caches expire or whatever. [13:51:06] <_joe_> uhm I dount that's the issue tbh [15:49:31] any idea what a failure of "NoMethodError: undefined method `indent' for nil:NilClass" might mean in helm tests? Not sure if it's a problem with the tests or something I left out (guessing as much given that it's testing nil?) https://integration.wikimedia.org/ci/job/helm-lint/7303/consoleFull [15:51:50] hnowlan: I'd guess it's a CI/test error [15:52:40] <_joe_> hnowlan: sigh, that's another bug of CI yes [15:52:56] <_joe_> hnowlan: what is the change? [15:53:24] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/789876 [15:53:52] <_joe_> this is luckily something stupid [15:55:07] <_joe_> hnowlan: can it wait for tomorrow? [15:55:58] yeah no rush [15:56:45] <_joe_> unless you want to dive into a big bowl of ruby yourself, that is :P [16:01:28] ohh eh my irc client is going through a tunnel i think i'm losing si [16:22:48] 10serviceops, 10SRE: Provide node14 and node16 images for running production node-based services - https://phabricator.wikimedia.org/T306996 (10bd808) >>! In T306996#7912881, @Joe wrote: > I've build and published the `nodejs14-slim` and the `nodejs16-slim` images, using the nodejs package from the components.... [16:57:27] 10serviceops: move mw241[2-9].codfw.wmnet into production - https://phabricator.wikimedia.org/T307255 (10Dzahn) @Jelto mw2412 is not pooled. expected? > Maybe you got confused by the stale files there that we should remove for the non-https LVSes? I _thought_ I had checked the https versions as well when I wro... [17:04:31] 10serviceops, 10SRE, 10ops-eqiad: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Dzahn) [17:06:22] ^ an old appserver died. do we even still ask for it to be fixed? I guess a quick fix by dcops is warranted but if hardware is broken..it's just out of the pool [17:10:35] oh, nevermind. that is NOT that old. it's one of the new ones. definitely needs check by dcops..doing that [17:10:54] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) p:05Triage→03High [17:11:59] 10serviceops, 10SRE, 10ops-eqiad: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Dzahn) @Cmjohnson or @Jclark-ctr This server just went down, server itself AND mgmt at the same time. So we can't add much here. But it's only been purchased in 2021.So that should... [17:12:35] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Add kubernetes 1.17+ topology annotations - https://phabricator.wikimedia.org/T270191 (10JMeybohm) [17:12:38] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [17:12:41] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [17:12:48] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Support multiple kubernetes versions with puppet - https://phabricator.wikimedia.org/T278329 (10JMeybohm) [17:12:50] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [17:12:54] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [17:15:33] 10serviceops, 10SRE, 10ops-eqiad: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Dzahn) IPMI from remote also fails: Error: Unable to establish IPMI v2 / RMCP+ session [17:15:48] 10serviceops, 10SRE, 10ops-eqiad: mw1415 (canary appserver) is down, incl. mgmt - https://phabricator.wikimedia.org/T307755 (10Dzahn) p:05Triage→03Medium [17:17:26] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times - https://phabricator.wikimedia.org/T270271 (10JMeybohm) [17:17:28] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [17:17:30] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Support multiple kubernetes versions with puppet - https://phabricator.wikimedia.org/T278329 (10JMeybohm) [17:17:34] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [17:17:41] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Migrate from command line flags to config files for kubernetes components - https://phabricator.wikimedia.org/T300499 (10JMeybohm) [17:17:43] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [17:17:48] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [17:18:01] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [17:18:07] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to a security supported (LTS) version - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [17:19:20] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [17:20:32] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters to v1.16 - https://phabricator.wikimedia.org/T244335 (10JMeybohm) [17:20:37] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Upgrade kubernetes clusters v1.16 - https://phabricator.wikimedia.org/T244335 (10JMeybohm) 05Open→03Resolved [17:21:58] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Update Kubernets clusters to v1.23 - https://phabricator.wikimedia.org/T307943 (10JMeybohm) [20:22:38] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-Access-Requests, 10Release-Engineering-Team (Radar): Need a service account on deploy servers for automated train pre-sync operations - https://phabricator.wikimedia.org/T303857 (10thcipriani) >>! In T303857#7901981, @Joe wrote: > To give some cont...