[06:56:24] 10serviceops, 10SRE: Clean up old Docker images on deneb - https://phabricator.wikimedia.org/T287222 (10razzi) My home directory's down to 2.5M now too :) [08:48:22] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) Over all the steps you listed seem pretty comprehensive already. Ideally, all this would be handled by the still-to-be written k8s maintenance cookbook (T277677). For downtime... [09:16:09] 10serviceops, 10Maps, 10Patch-For-Review: tegola-vector-tiles doesnt execute new tile pregeneration jobs - https://phabricator.wikimedia.org/T295290 (10JMeybohm) >>! In T295290#7490049, @Jgiannelos wrote: > Yeah i was wondering if there is some sort of dependency between containers in a pod. Thanks will add... [10:11:20] 10serviceops, 10Prod-Kubernetes, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Write and adapt Runbooks and cookbooks related to the WDQS Streaming Updater and kubernetes - https://phabricator.wikimedia.org/T293063 (10JMeybohm) @dcausse IIRC we said that "something in the areas of hours" would be co... [10:56:33] 10serviceops, 10Prod-Kubernetes, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Write and adapt Runbooks and cookbooks related to the WDQS Streaming Updater and kubernetes - https://phabricator.wikimedia.org/T293063 (10dcausse) >>! In T293063#7491903, @JMeybohm wrote: > @dcausse IIRC we said that "so... [11:06:51] 10serviceops, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: 502 Server Hangup Error on esams for "Upload a new version of this file" on Special:Upload on Commons - https://phabricator.wikimedia.org/T247454 (10Aklapper) Seeing T247454, T284974, T239382, T273032, T295343, this... [11:57:07] 10serviceops, 10CFSSL-PKI, 10Infrastructure-Foundations, 10Prod-Kubernetes, and 2 others: Automate issuing of TLS certificates in kubernetes clusters - https://phabricator.wikimedia.org/T294560 (10JMeybohm) [13:25:45] 10serviceops, 10Maps, 10Patch-For-Review: tegola-vector-tiles doesnt execute new tile pregeneration jobs - https://phabricator.wikimedia.org/T295290 (10Jgiannelos) Unfortunately deleting the pod didnt do the trick. From kubernetes events: ` 2m30s Warning FailedNeedsStart cronjob/tegola-vector-tile... [13:44:34] 10serviceops, 10SRE, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10Jelto) Thanks for the feedback. Then instead of using `sre.switchdc.services` I would like to depool the services using conftool directly. I queried `confctl` and checked what services... [15:37:48] Hi is there anyone available to help us with https://phabricator.wikimedia.org/T295290? It looks like new tasks are still not spawning. [15:38:50] nemo-yiannis: I'll have a look in a minute [15:38:57] thanks! [15:38:57] I am around too [15:47:49] nemo-yiannis: I would assume that deploying https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/737665 would fix this situation as well as the cronjob object itself is updated [15:48:15] but to be sure, I can totaly delete the cronjob prior to your deploy [15:50:13] I am not sure how this works internally but I assume that since there was no job success since we killed the pod the cronjob wont spawn new instances [15:50:24] Lets delete cronjob just to be on the safe side [15:50:34] ack [15:50:41] +1 ed your change as well [15:51:49] nemo-yiannis: deleted in all 3 clusters [15:51:54] Thanks! [15:53:10] What do you think about the docs I pasted? I hope I understood correctly how pods fail with restartPolicy [15:56:11] tbh. it's not 100% clear to me but one of the possible interpretations is that of yours :) [15:56:52] yup I am not sure either, lets see :) [15:57:03] the worst thing that might happen is that the/a container of your job pod is restarted once (backoffLimit=1) so no harm will be done [15:57:34] restarted once and than still hand in this a state where envoy is running and the job is failing I mean [16:19:12] new tasks are being spawned so we should be OK for now [16:30:18] okay. Let us know if things go south again [16:34:31] 10serviceops, 10Maps, 10Patch-For-Review: tegola-vector-tiles doesnt execute new tile pregeneration jobs - https://phabricator.wikimedia.org/T295290 (10JMeybohm) 05Open→03Resolved a:03JMeybohm As said on IRC I've deleted the CronJob objects in all 3 clusters and @Jgiannelos re-deployed them with https:... [17:13:08] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Provision TLS certificates for k8s services in istio-system namespace - https://phabricator.wikimedia.org/T295385 (10JMeybohm) [18:12:21] 10serviceops, 10Shellbox, 10User-brennen, 10Wikimedia-production-error: Shellbox\ShellboxError: Shellbox server returned status code 503 - https://phabricator.wikimedia.org/T292663 (10Legoktm) a:03Legoktm [18:58:22] 10serviceops, 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): Create and host assetlinks.json file. (Android 12 deeplinking support) - https://phabricator.wikimedia.org/T294776 (10Dzahn) https://wikipedia.org/.well-known/assetlinks.json [20:07:41] 10serviceops, 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): Create and host assetlinks.json file. (Android 12 deeplinking support) - https://phabricator.wikimedia.org/T294776 (10Dzahn) >>! In T294776#7489343, @Dbrant wrote: > @Dzahn Does this kind of change get merged/deployed in the same way as... [21:04:06] 10serviceops, 10Wikipedia-Android-App-Backlog (Android Release FY2021-22): Create and host assetlinks.json file. (Android 12 deeplinking support) - https://phabricator.wikimedia.org/T294776 (10Dzahn) deployed to prod servers (which refreshes apache on all of them) carefully via: - disabled puppet on mw* - re-... [22:50:14] 10serviceops, 10Infrastructure-Foundations, 10SRE: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn) [22:50:46] 10serviceops, 10SRE, 10Znuny: rename OTRS role/module/cumin aliases - https://phabricator.wikimedia.org/T293942 (10Dzahn) [22:50:55] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10Znuny: upgrade/replace VRTS (formerly ORTS) buster to bullseye - https://phabricator.wikimedia.org/T295416 (10Dzahn)