[03:11:09] 10serviceops, 10MediaWiki-Uploading, 10SRE, 10Traffic: Unexpected upload speed to commons - https://phabricator.wikimedia.org/T288481 (10Krinkle) [07:30:11] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10JMeybohm) >>! In T284628#7144976, @JMeybohm wrote: > The default timeout is 2min here, unfortunately that is not configurable for pull only but for al... [08:06:34] 10serviceops, 10Analytics, 10Analytics-Kanban, 10Prod-Kubernetes, and 2 others: Move eventgate services to use TLS only - https://phabricator.wikimedia.org/T255871 (10JMeybohm) 05Open→03Resolved I see you've deployed all eventgates, thanks! Resolving this [08:06:40] 10serviceops, 10Prod-Kubernetes, 10SRE, 10Kubernetes: Add TLS termination to services running on kubernetes - https://phabricator.wikimedia.org/T235411 (10JMeybohm) [11:18:18] 10serviceops, 10Parsoid, 10Release-Engineering-Team, 10SRE-Access-Requests, 10Performance-Team (Radar): Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Krinkle) [11:23:19] 10serviceops, 10Parsoid, 10Release-Engineering-Team, 10SRE-Access-Requests, 10Performance-Team (Radar): Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Urbanecm) I support this. After all, any deployer already has sufficient access to SSH in via the `mwdeploy` syste... [11:24:09] 10serviceops, 10Parsoid, 10Release-Engineering-Team, 10SRE-Access-Requests, 10Performance-Team (Radar): Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Krinkle) [11:55:08] 10serviceops, 10SRE, 10Datacenter-Switchover: Use encrypted rsync for releases - https://phabricator.wikimedia.org/T289858 (10Dzahn) ACK, alright! [12:45:40] 10serviceops, 10Parsoid, 10SRE, 10SRE-Access-Requests, 10Sustainability (Incident Followup): Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Krinkle) [12:53:17] 10serviceops, 10Parsoid, 10SRE, 10SRE-Access-Requests, 10Sustainability (Incident Followup): Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Krinkle) [13:21:46] trying to deploy miscweb staging and I see "Status: Running" in "get pods" and with "kubectl get logs" I see my service is starting.. just that it always stays "READY 0/1" and after waiting 10 minutes, eventually it just times out and rolls back.. but why... if it already runs [13:23:04] mutante: sounds like it's failing its configured health checks [13:23:07] as jelto showed me the readyness probe fails because https vs http [13:23:21] majavah: seems right, yes, thank you [13:27:21] mutante: helpful dashboard in this case: https://logstash.wikimedia.org/goto/14ec6de21fbc547cd4a2b199f6de8313 [13:28:40] aha, thanks jayme [14:48:27] effie: if you have an ideas about how I should resolve https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/714867/5..7/helmfile.d/services/toolhub/values.yaml#35 I would very much appreciate them. I'm not sure if exposing the listener or picking a different one is the better way out. [14:49:11] If there is someone else who would be a better advisor on this I'm happy to have a referral too. [14:51:02] bd808: I have had a chance to take a look yet I am afraid, maybe akosiaris or jayme could shed some light [14:51:11] I haven't had * [14:51:21] sigh, sorry, it is getting late here [14:51:44] typing human words is always hard for my fingers [14:52:10] that is why grunting is a higher form of communication ! [15:02:17] effie: I asked the search folks and they said I should use their chi cluster. So hopefully mischief managed. [15:02:37] it is in the list of listeners mw is using [15:02:41] that I know [15:03:30] if it is used, I cant say for sure :p [15:04:53] it is. there are 3 sub-clusters for cirrussearch which is a bit like the database sections. Apparently the "extra" indices like the ones for Phabricator are in the chi cluster. [15:07:01] ah I see [15:07:53] there is a vague future plan for a 4th cluster to hold all the not-cirrus things [15:08:43] ok, do they have a suggestion as to which listener you should choose? [15:09:06] search-chi-{codfw,eqiad} [15:09:17] ok, have a go then [15:09:22] and let's see what happens [15:10:54] 10serviceops, 10Parsoid, 10SRE, 10SRE-Access-Requests, 10Sustainability (Incident Followup): Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Legoktm) +1 to granting permissions like normal appservers, this seems like an oversight once Parsoid moved to PHP and is now... [15:21:46] 10serviceops, 10Infrastructure-Foundations, 10Parsoid, 10SRE, and 2 others: Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10jijiki) [15:57:29] 10serviceops, 10MW-on-K8s, 10Kubernetes: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10dancy) >>! In T284628#7320141, @dancy wrote: >>>! In T284628#7319705, @Legoktm wrote: >> A timer is simpler, we could just pull `:latest` every minute... [16:07:41] 10serviceops, 10Infrastructure-Foundations, 10Parsoid, 10SRE, and 2 others: Deployers unable to ssh to parse* hosts - https://phabricator.wikimedia.org/T290144 (10Dzahn) What Lego said, access should mimick what we do with regular appservers. [16:43:40] 10serviceops, 10MW-on-K8s, 10Kubernetes, 10Patch-For-Review: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10jijiki) To sum up a bit, as already mentioned, we could have a timer pulling the most recent image based on a tag pattern (and a... [17:01:55] 10serviceops, 10MW-on-K8s, 10Kubernetes, 10Patch-For-Review: Kubernetes timeing out before pulling the mediawiki-multiversion image - https://phabricator.wikimedia.org/T284628 (10Legoktm) I put up a simple patch using `:latest` for now (I was already in the process of writing it before you commented), I th... [18:35:02] 10serviceops, 10DC-Ops, 10ops-codfw: Q1:(Need By: TBD) rack/setup/install thumbor200[56].codfw.wmnet - https://phabricator.wikimedia.org/T290190 (10RobH) [18:35:14] 10serviceops, 10DC-Ops, 10ops-codfw: Q1:(Need By: TBD) rack/setup/install thumbor200[56].codfw.wmnet - https://phabricator.wikimedia.org/T290190 (10RobH) [18:36:03] 10serviceops, 10DC-Ops, 10ops-codfw: Q1:(Need By: TBD) rack/setup/install thumbor200[56].codfw.wmnet - https://phabricator.wikimedia.org/T290190 (10RobH) a:03Papaul [18:40:15] 10serviceops, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10RobH) [18:40:28] 10serviceops, 10DC-Ops, 10ops-codfw: (Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10RobH) [18:40:55] 10serviceops, 10DC-Ops, 10ops-codfw: Q1:(Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10RobH) [18:41:39] 10serviceops, 10DC-Ops, 10ops-codfw: Q1:(Need By: TBD) rack/setup/install mw241[2-9].codfw.wmnet - https://phabricator.wikimedia.org/T290192 (10RobH) a:03Papaul [19:56:09] 10serviceops, 10Data-Persistence-Backup, 10GitLab, 10Patch-For-Review, 10User-brennen: Backups for GitLab - https://phabricator.wikimedia.org/T274463 (10brennen) [20:37:01] 10serviceops, 10DC-Ops, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[19-22] - https://phabricator.wikimedia.org/T290202 (10RobH) [20:37:09] 10serviceops, 10DC-Ops, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[19-22] - https://phabricator.wikimedia.org/T290202 (10RobH) [20:37:34] 10serviceops, 10DC-Ops, 10ops-eqiad: Q1:(Need By: TBD) rack/setup/install kubernetes10[19-22] - https://phabricator.wikimedia.org/T290202 (10RobH) a:03Jclark-ctr [21:36:14] 10serviceops, 10SRE, 10Wikimedia-Site-requests, 10Service-deployment-requests, and 2 others: Split search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Legoktm) [23:06:57] 10serviceops, 10SRE, 10Wikimedia-Site-requests, 10Patch-For-Review, and 3 others: Split search.wikimedia.org out of ops/mediawiki-config into separate service - https://phabricator.wikimedia.org/T289224 (10Legoktm) [23:45:10] 10serviceops, 10Release-Engineering-Team (Radar): nodejs-devel image does not contain npm - https://phabricator.wikimedia.org/T290209 (10thcipriani)