[05:57:18] 10serviceops, 10Dumps-Generation, 10Patch-For-Review, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Joe) [07:30:31] 10serviceops, 10Dumps-Generation, 10Performance-Team (Radar): Remove php 7.2 from production - https://phabricator.wikimedia.org/T318894 (10Joe) [08:58:52] 10serviceops, 10Release-Engineering-Team, 10docker-pkg: docker-pkg / docker downloads all versions of parent image upon building - https://phabricator.wikimedia.org/T310458 (10Clement_Goubert) Actions needed : [X] Push 3.0.3 tag pointing to 66b22ed50 Release 3.0.3 [] Update submodule in docker-pkg-deploy [10:03:39] 10serviceops, 10DBA, 10Phabricator, 10serviceops-collab, and 2 others: sort out mysql privileges for phab1004/phab2002 - https://phabricator.wikimedia.org/T315713 (10Marostegui) @Dzahn I think this is now sorted. It was a big mess, so if you can do some good double checking I would appreciate it. [10:27:02] 10serviceops, 10MediaWiki-Releasing, 10PHP 7.2 support, 10PHP 7.3 support, 10Patch-For-Review: Drop PHP 7.2 & 7.3 support from MediaWiki master branch, once Wikimedia production is on 7.4 - https://phabricator.wikimedia.org/T261872 (10taavi) Policy-required wikitech-l message: https://lists.wikimedia.org... [10:55:52] 10serviceops, 10Dumps-Generation, 10Patch-For-Review, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Lucas_Werkmeister_WMDE) [10:56:33] 10serviceops, 10SRE: Undeploy patch to use old PHP serialization in PHP 7.4 - https://phabricator.wikimedia.org/T318918 (10taavi) [10:57:31] 10serviceops, 10SRE: Undeploy patch to use old PHP serialization in PHP 7.4 - https://phabricator.wikimedia.org/T318918 (10Lucas_Werkmeister_WMDE) [11:02:31] 10serviceops, 10Continuous-Integration-Infrastructure, 10SRE: Undeploy patch to use old PHP serialization in PHP 7.4 - https://phabricator.wikimedia.org/T318918 (10Lucas_Werkmeister_WMDE) Adding #continuous-integration-infrastructure (or should it be #continuous-integration-config?) since the patched PHP is... [11:48:27] 10serviceops, 10Dumps-Generation, 10Patch-For-Review, 10Performance-Team (Radar): Migrate WMF production from PHP 7.2 to PHP 7.4 - https://phabricator.wikimedia.org/T271736 (10Ladsgroup) awarding token is not enough: <3 <3 <3 <3 <3 <3 [12:43:33] 10serviceops, 10MediaWiki-Releasing, 10PHP 7.2 support, 10PHP 7.3 support, 10Patch-For-Review: Drop PHP 7.2 & 7.3 support from MediaWiki master branch, once Wikimedia production is on 7.4 - https://phabricator.wikimedia.org/T261872 (10Jdforrester-WMF) >>! In T261872#8271964, @taavi wrote: > Policy-requir... [12:50:14] 10serviceops, 10Release Pipeline: Clean-up / delete old versions of service pipeline created docker images from the public docker registry? - https://phabricator.wikimedia.org/T307797 (10Jdforrester-WMF) >>! In T307797#8272133, @akosiaris wrote: > I think this is a duplicate of T242604. Unless someone objects,... [14:13:25] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install kubernetes102[34] - https://phabricator.wikimedia.org/T313873 (10Jclark-ctr) kubernetes1023 c6 u42 port 42 cableid 23000039 reseated cable on 1024 it has light now [14:13:35] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q1:rack/setup/install kubernetes102[34] - https://phabricator.wikimedia.org/T313873 (10Jclark-ctr) a:05Jclark-ctr→03Cmjohnson [14:21:00] 10serviceops, 10MediaWiki-Releasing, 10PHP 7.2 support, 10PHP 7.3 support, 10Patch-For-Review: Drop PHP 7.2 & 7.3 support from MediaWiki master branch, once Wikimedia production is on 7.4 - https://phabricator.wikimedia.org/T261872 (10Jdforrester-WMF) 05Stalled→03Resolved a:03Jdforrester-WMF [14:22:08] 10serviceops, 10Patch-For-Review: Ensure wikimedia::memcached role bootstraps cleanly - https://phabricator.wikimedia.org/T318697 (10Clement_Goubert) 05In progress→03Resolved [14:22:10] 10serviceops: Ensure that all appserver-related roles can be cleanly applied on bootstrap - https://phabricator.wikimedia.org/T318671 (10Clement_Goubert) [14:24:37] 10serviceops: Ensure that all appserver-related roles can be cleanly applied on bootstrap - https://phabricator.wikimedia.org/T318671 (10Clement_Goubert) `memcached` role works fine on pontoon. Onto `configcluster`. After battling with certificates for the etcd cluster, joe dug up https://gerrit.wikimedia.org/r... [14:26:08] 10serviceops: Ensure configcluster bootstraps cleanly - https://phabricator.wikimedia.org/T318699 (10Clement_Goubert) Trying to replicate https://gerrit.wikimedia.org/r/c/operations/puppet/+/668701 on pontoon for etcd server bootstrap. [14:39:08] hello folks, I posted a change to coredns' chart, if anybody has time https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/836811 - I am not happy about doing it, but for the moment Istio doesn't collaborate very well with ml-serve clusters :D [14:39:35] feedback etc.. welcome, lemme know if it is ok or if you prefer something else [15:21:36] <_joe_> elukey: given it's a noop on all other clusters, it seems acceptable to me [15:22:08] ack thanks! [15:23:45] <_joe_> although [15:23:55] <_joe_> what do you want to add? [15:24:36] atm I have only tested the following [15:24:36] rewrite continue { [15:24:37] ttl exact cluster-local-gateway.istio-system.svc.cluster.local. 30 [15:24:40] } [15:24:44] and the pressure to coredns pods went down a lot [15:24:55] https://grafana.wikimedia.org/d/-sq5te5Wk/kubernetes-dns?orgId=1&var-dc=codfw%20prometheus%2Fk8s-mlserve&from=1664444026829&to=1664449949744 [15:25:14] there is probably some obscure way to inject this value into all envoy configs, but I didn't find it yet [15:25:24] and istio upstream is not clear what to patch/fix [15:28:10] we have also applied a change to all pods to decrease resolv.conf's ndots from 5 to 2, that helped as well [15:28:35] https://grafana.wikimedia.org/d/-sq5te5Wk/kubernetes-dns?orgId=1&var-dc=eqiad%20prometheus%2Fk8s-mlserve&from=1664358263069&to=1664386128556 [15:28:43] (yeah it is crazy I know) [15:29:36] my hope is to find a good setting to avoid 5s ttls in istio/envoy, but so far didn't find one [15:48:41] <_joe_> I am quite sure of where it is in envoy's config [15:48:56] <_joe_> as in, I've seen it [15:49:01] <_joe_> but also [15:49:17] <_joe_> how can you have 100s of envoy sidecars with so little stuff running on the cluster [15:49:26] * _joe_ perplexed [15:49:51] <_joe_> OTOH I have no idea if istio allows you to tune that TTL [16:01:41] sadly we have one pod for each ores model, and it is now around ~170 pods (including system ones) [16:02:25] most of them trying to refresh every 5s various endpoints etc.. [16:02:58] istio doesn't force envoy, that respect dns ttls in our version (previously there was a specific setting for dns ttls) [16:04:15] and all that dns traffic hammers the coredns pods [16:07:48] there are plans in the future to deprecate ores models as well, replacing them with fewer ones [16:07:57] buuut for the moment we have to maintain them :) [16:08:40] quick workout, back in ~40 [16:22:00] 10serviceops, 10Parsoid: Parsoid migration: Cleanup - https://phabricator.wikimedia.org/T318946 (10Clement_Goubert) [16:22:47] 10serviceops, 10Parsoid: Parsoid migration: Cleanup - https://phabricator.wikimedia.org/T318946 (10Clement_Goubert) p:05Triage→03Medium [16:38:09] back [17:33:46] 10serviceops, 10Observability-Metrics, 10Kubernetes: Don't scrape every containerPort for metrics - https://phabricator.wikimedia.org/T318707 (10bking) [17:35:47] ^^ for the above ticket, do we only care about the services defined in deployment-charts/helmfile.d/services repo path? [17:44:12] 10serviceops: Put mw14[57-98] in production - https://phabricator.wikimedia.org/T313327 (10Cmjohnson) [17:44:27] 10serviceops, 10DC-Ops, 10SRE, 10ops-eqiad: Q4: (Need By: TBD) rack/setup/install mw14[57-98] - https://phabricator.wikimedia.org/T306121 (10Cmjohnson) 05Open→03Resolved updated their status. [17:46:08] lunch, back in ~45 [18:38:08] back [19:51:53] 10serviceops, 10Observability-Metrics, 10Kubernetes: Don't scrape every containerPort for metrics - https://phabricator.wikimedia.org/T318707 (10bking) Speaking from a position of almost total ignorance: Do we only care about the pods spawned by [[ https://github.com/wikimedia/operations-deployment-charts/t... [21:15:51] 10serviceops, 10Release Pipeline: Clean-up / delete old versions of service pipeline created docker images from the public docker registry? - https://phabricator.wikimedia.org/T307797 (10akosiaris) >>! In T307797#8272360, @Jdforrester-WMF wrote: >>>! In T307797#8272133, @akosiaris wrote: >> I think this is a d...