[06:13:37] How can I see wha't wrong with pending-upgrade in codfw for machinetranslation? https://grafana.wikimedia.org/d/UT4GtK3nz/helm-releases?var-site=codfw&var-cluster=k8s&var-namespace=machinetranslation&orgId=1 - and retrying deployment didn't work for me. [07:21:35] Anyone? ^ [08:03:47] 10serviceops, 10MediaWiki-REST-API, 10Platform Team Initiatives (MW REST API in PHP): Move CORE REST API to be served from the MW API Cluster - https://phabricator.wikimedia.org/T246002 (10Aklapper) Adding missing #mediawiki-rest-api code project tag as #platform-team-initiative-mw-rest-api team tag is archi... [08:28:19] kart_: checking [08:30:25] I would assume some deployment was prematurely aborted [08:32:25] jayme: For codfw, yes. But, nothing seems in other logs. [08:32:46] I tried redoing deployment, but it didn't start. [08:33:02] yeah, I think it left helm in a unclear state [08:33:35] Is there any documented way to fix? I would love to update that somewhere in Wikitech too :) [08:34:27] not really, I looked at "helm -n machinetranslation history production" which listed the last revison (23) as deployed and the current (24) as pending-upgrade [08:34:42] I now did a manual rollback to 23 elm -n machinetranslation rollback production 23 [08:34:55] you may try to deploy again now [08:35:02] Thanks! [08:35:05] yw [08:38:54] jayme: Thanks a lot. Seems fixed! [08:43:07] cool [08:44:29] bottom line is: The upgrade process is partly controlled by the client (helmfile/helm). If that gets terminated before the deploy is finished or rolled back, it's possible that it ends up in a limbo state [08:46:39] Noted! [08:48:28] What can be reason that I can't run history command? [08:48:41] kartik@deploy2002:/srv/deployment-charts/helmfile.d/services/machinetranslation$ helm -n machinetranslation history production [08:48:41] WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /etc/kubernetes/machinetranslation-codfw.config [08:48:41] Error: query: failed to query with labels: secrets is forbidden: User "machinetranslation" cannot list resource "secrets" in API group "" in the namespace "machinetranslation" [09:02:42] ah, yeah. You need deploy user permissions to read secrets: export KUBECONFIG=/etc/kubernetes/XYZs-deploy-codfw.config [10:01:35] 10serviceops, 10SRE, 10API Platform (RESTbase Deprecation Roadmap), 10Patch-For-Review: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10Mvolz) [11:28:37] jayme: Thanks! [12:41:25] 10serviceops, 10Growth-Team, 10Growth-Team-Filtering, 10MW-on-K8s, 10Notifications: Broken (empty) cross-wiki notification when using $wgLocalHTTPProxy (e.g. on Kubernetes) - https://phabricator.wikimedia.org/T223413 (10Kizule) I wanted to create a new task, but I think that this one is actually the same... [14:10:22] 10serviceops, 10SRE, 10API Platform (RESTbase Deprecation Roadmap), 10Patch-For-Review: Migrate node-based services in production to node14 - https://phabricator.wikimedia.org/T306995 (10jijiki) [14:14:03] 10serviceops, 10Data-Engineering, 10Event-Platform: Traffic for eventstreams-internal seems to be zero for the past months - https://phabricator.wikimedia.org/T348763 (10Ottomata) Good question. I had expected Product teams to use this more often, but perhaps the ssh tunnel barrier is enough for them to nev... [14:44:17] 10serviceops, 10MediaWiki-libs-Stats, 10Observability-Metrics, 10Patch-For-Review: Decide on default histogram buckets for MediaWiki timers - https://phabricator.wikimedia.org/T344751 (10herron) Perfect, I've updated https://gerrit.wikimedia.org/r/c/operations/puppet/+/954114 to reflect this and I think wi... [15:09:09] 10serviceops, 10GrowthExperiments-Homepage, 10GrowthExperiments-ImpactModule, 10SRE, and 2 others: RefreshUserImpactJob consumes too many file descriptors - https://phabricator.wikimedia.org/T344428 (10KStoller-WMF) [15:42:43] hi all, anyone intrested in reviewing a patch to the compile_redirects function https://gerrit.wikimedia.org/r/c/operations/puppet/+/965786 [15:47:42] effie: o/ when you have a moment during the next days, lemme know your thoughts about https://gerrit.wikimedia.org/r/c/operations/puppet/+/965124 [15:47:58] (not urgent, just to wrap up the ores cleanup) [16:37:14] 10serviceops, 10Abstract Wikipedia team, 10CX-cxserver, 10Citoid, and 5 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Jdforrester-WMF) [16:38:06] 10serviceops, 10ChangeProp, 10EventStreams, 10Image-Suggestion-API, and 5 others: Migrate node-based services in production to node12 - https://phabricator.wikimedia.org/T290750 (10Jdforrester-WMF) >>! In T290750#9252559, @elukey wrote: > Eventstreams has been ported to nodejs18, the last LTS. I am working... [17:30:42] 10serviceops, 10Abstract Wikipedia team, 10CX-cxserver, 10Citoid, and 7 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Krinkle) > Platform: > mediawiki/services/example-node-api > mediawiki/services/image-suggestion-api > mediawiki/ser... [17:49:20] 10serviceops, 10Abstract Wikipedia team, 10CX-cxserver, 10Citoid, and 7 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Jdforrester-WMF) >>! In T349118#9259044, @Krinkle wrote: >> Platform: >> mediawiki/services/example-node-api >> mediawik... [17:49:37] 10serviceops, 10Abstract Wikipedia team, 10CX-cxserver, 10Citoid, and 7 others: Migrate node-based services in production to node18 - https://phabricator.wikimedia.org/T349118 (10Jdforrester-WMF) [18:35:39] 10serviceops, 10API Platform (RESTbase Deprecation Roadmap), 10Patch-For-Review: Migrate node-based services in production to node16 - https://phabricator.wikimedia.org/T308371 (10Jdforrester-WMF)