[06:49:14] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10JMeybohm) Wow, that's quite some time. :) >>! In T284628#7329186, @Legoktm wrote: > I think as long as the most recent MW image is kept it should be... [11:28:23] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10SRE-Access-Requests, and 2 others: Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Dzahn) 05Open→03Resolved a:03Dzahn This should be resolved now. [11:52:12] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Move mathoid to use TLS only - https://phabricator.wikimedia.org/T255875 (10JMeybohm) 05Open→03Resolved [11:52:15] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Add TLS termination to services running on kubernetes - https://phabricator.wikimedia.org/T235411 (10JMeybohm) [11:52:32] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Add TLS termination to services running on kubernetes - https://phabricator.wikimedia.org/T235411 (10JMeybohm) 05Open→03Resolved [12:30:16] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10Jelto) >>! In T251305#7319375, @elukey wrote: > Adding a comment in here since I am trying to figure out a similar thing (although I have way less context) for what we'll probably call `... [14:37:34] 10serviceops, 10SRE, 10ops-codfw: mw2264 went down - https://phabricator.wikimedia.org/T290242 (10Papaul) @Dzahn fist let us swap A1 with B1 and see if we still have the error on A1. Memory swap complete and IDRAC upgrade from 2.50 to 2.80. i will leave the task open for now until next week. thanks [14:40:25] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10dancy) >>! In T284628#7330071, @JMeybohm wrote: > @dancy Looking at staging I see there is still a ~10h old `mediawiki-bruce` deployment and pod runni... [16:08:20] I would like to try to deploy Toolhub to the staging cluster again today. Is there any "OMG NO! It is Friday!" objection to that? [16:09:28] * bd808 will wait a short while before BOLD'ly continuing [16:38:48] I think the "OMG NO! It is Friday!" usually comes after the failure happens, not before :) [16:46:08] I think it worked. :) I can at least see deployments and replicasets and pods in my namespace now. [16:57:42] 10serviceops, 10SRE, 10Toolhub, 10Patch-For-Review, 10Service-deployment-requests: New Service Request Toolhub - https://phabricator.wikimedia.org/T280881 (10bd808) [17:23:12] bd808: woot, awesome! [17:24:40] it's so close! Getting into staging revealed that I had forgotten to add the memcache library needed in that environment. Easy enough to fix. [17:29:48] legoktm: is there a way for a mortal like me to attach to a running container in a pod? `kubectl exec -it ...` doesn't work because the toolhub k8s user doesn't have pods/exec rights in the namespace. [17:31:48] I don't think so :/ [17:32:03] that is going to make things hard :/ [17:32:26] I remember the linkrecommendation folks asked for that as well, I don't remember what the resolution on that was [17:34:04] is it something simple I can run for you or? [17:36:23] I could figure out the commands and pass them along, but "simple" is probably not the right word. Basically I need to create the initial database and things like that to bootstrap the deployment. [17:37:51] And going forward it would be very useful to be able to use the djanog repl to investigate bugs [17:38:06] *django [17:38:51] hmm ok. I'll bring it up in our team meeting that I assume is moving to Tuesday [17:39:27] I may be able to figure out how to run a container locally that is connected by ssh tunnels into the backend services, but it would be much nicer to just hop into a container in the staging cluster to do stuff like that. [17:40:26] basically I guess I want a deployment specific mwmaint equivalent [17:41:45] legoktm: should I write up a task so it less of a game of telephone for you bring this up with the team? [17:42:33] that would be appreciated :) [17:43:00] will do :) [19:13:54] 10serviceops, 10SRE, 10Thumbor, 10User-jijiki: Upgrade Thumbor to Buster - https://phabricator.wikimedia.org/T216815 (10AntiCompositeNumber) [19:22:47] Does anyone know if logs from pods in the staging cluster are sent to logstash? Wondering if I can verify that part of the setup for Toolhub before moving into the main clusters. [19:25:30] possible clarification: I am looking in logstash and only see events mentioning toolhub from the kubestage* hosts. I can see some log messages in the pod with `kubectl logs ...`, and I am wondering if not seeing the same logs in logstash means that the shipping is not working as hoped or if that cluster is not wired into the log shipping pipeline. [20:18:34] 10serviceops, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10thcipriani) [20:19:49] 10serviceops, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10thcipriani) >>! In T290209#7328337, @Dzahn wrote: > I don't know if this is the reason it was left out of the package but using npm to install software would conflict with L3... [20:44:21] 10serviceops, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10Legoktm) There have been various related issues like {T286212} (maybe the same issue here) and {T284112}. My recommendation would be to stop using the stretch-based nodejs-de... [20:48:08] 10serviceops, 10Regression, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10Legoktm) ` $ podman run --rm -it --entrypoint=bash docker-registry.wikimedia.org/nodejs-devel:0.1.1 Trying to pull docker-registry.wikimedia.org/nodejs-devel:... [20:51:01] 10serviceops, 10Regression, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10Legoktm) a:03Legoktm Yeah, so same thing as earlier: ` root@be63654e236f:/# apt-cache policy nodejs nodejs: Installed: 6.11.0~dfsg-1+wmf5 Candidate: 6.1... [21:04:10] 10serviceops, 10Patch-For-Review, 10Regression, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10Legoktm) 05Open→03Resolved ` $ podman run --rm -it --entrypoint=bash docker-registry.wikimedia.org/nodejs-devel:0.1.2 Trying to pull... [21:05:27] I'm not sure about the logstash question [21:28:32] legoktm: T290357 is my write up of my wish for a maintenance env for Toolhub. Let me know (or doo the needful) if you think it should have other tags, etc. [21:31:39] thanks [21:31:40] 10serviceops, 10Toolhub: Maintenance environment needed for running one-off commands - https://phabricator.wikimedia.org/T290357 (10Legoktm)