[08:44:39] Can I echo u.random's question? data-persistence are getting notified every 4 hours about this, seemingly apropos kubernetes2039.codfw.wmnet and mw2296.codfw.wmnet [10:00:36] Emperor: I can have a quieck look [10:08:21] 10serviceops, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [10:09:41] nemo-yiannis: we're still at a slightly elevated level of 500 on restbase, would another round of purges be worth a go? [10:10:12] already did like half an hour ago [10:10:36] ah nice [10:10:53] effie: thanks :) [10:11:57] hnowlan: i am not really convinced though that we can completely eliminate errors since we don't know exactly whats left in storage [10:13:30] meawhile i found the missing header on PCS side and i am preparing a patch, but there is an error on MW side still to be investigated. [10:13:58] Emperor: I would like something in return :p [10:14:29] Emperor: are you lot still interested in moveing the thanos-swift hosts to cfssl ? [10:14:34] nemo-yiannis: ack, thanks [10:19:16] effie: good question [10:20:52] Emperor: is there a task we could talk ? https://phabricator.wikimedia.org/T343987 was o11y's , thus I reckon we need a new one? [10:25:13] I think make a new one, reference that one, tag it sre-swift-storage? Honestly, my limited opinions are "as close to swift::proxy as possible" and "whatever is standard" :) thanos-fe* and swift-fe* have diverged, and I'd like to reduce that divergence where possible [10:26:28] * Emperor trying to remember the details from when we migrated swift fes to envoy [10:28:15] effie: right, answering my own question, AIUI swift ms-fe nodes are using profile::tlsproxy::envoy::ssl_provider: sslcert [10:29:04] so I think if I were to start changing thanos-fe ssl setup, I'd be more inclined to move it to sslcert to match the swift frontends unless there was a compelling reason to do otherwise [10:29:47] ok, when you create a new task, please add me [10:29:58] ...and if there is a compelling reason then I guess it would be better to move both thanos-fe and swift-fe to it. [10:30:26] effie: sorry, stupid question, is there a reason I should be thinking about changing this? [10:40:45] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: httpbb needs to be setup on cumin1002 and removed from cumin1001 - https://phabricator.wikimedia.org/T356054 (10Clement_Goubert) a:03Scott_French [11:03:26] Emperor: urandom, re: kask best guess is when kask was redeployed, the pods were not all redeployed on dedicated kask nodes (meaning nodes that have the dedicated=kask:NoExecute taint, and yes the syntax is counterintuitive) because of some condition conflict that I haven't found yet [11:04:06] I can try and kill one of those misplaced pod to see if it goes back to the right node, and I guess we may need to add another dedicated kask node [11:04:44] thanks [11:08:41] claime: I was looking into it already, I just had a meeting in between [11:08:47] ah sorry [11:09:27] I didn't see your message, I really need to fix these term colors [11:10:13] no problem [11:15:04] err, sorry, yes I should have spotted that too. More caffeine required... [11:18:55] 10serviceops, 10Infrastructure-Foundations, 10Puppet-Core, 10SRE, and 5 others: Migrate roles to puppet7 - https://phabricator.wikimedia.org/T349619 (10MoritzMuehlenhoff) [11:27:29] Emperor: cfssl is our default PKI and will simplify your cert handling compared to cergen. also cergen is unowned at this point and even making it compatible with buster was a pain: https://phabricator.wikimedia.org/T235405 [11:28:13] moritzm: how does sslcert fit in? [11:29:03] it is what we are migrating away from [11:29:23] pah, I only just moved ms-fe to it last quarter [11:57:38] Emperor: I have to pop out for a doc appt, I will keep looking into it later [12:00:11] TY :) [12:16:51] 10serviceops, 10iPoid-Service, 10Patch-For-Review, 10Service-deployment-requests, 10Trust and Safety Product Sprint: New Service Request 'iPoid' - https://phabricator.wikimedia.org/T325147 (10kostajh) [12:17:09] 10serviceops, 10iPoid-Service (iPoid 1.0): Define service level indicators and service level objectives - https://phabricator.wikimedia.org/T348935 (10kostajh) [12:18:40] 10serviceops, 10iPoid-Service, 10Patch-For-Review, 10Service-deployment-requests, 10Trust and Safety Product Sprint: New Service Request 'iPoid' - https://phabricator.wikimedia.org/T325147 (10kostajh) 05Open→03Resolved a:03kostajh Closing this; future work can tag #ipoid-service. [14:34:47] effie: I've created T356412 re the thanos/ms SSL question [14:39:45] <_joe_> uhm where is wikibugs [14:42:02] it's responding to CTCP at least [15:16:45] 10serviceops, 10Content-Transform-Team-WIP, 10Wikimedia-production-error: TypeError: Argument 4 passed to Wikimedia\Parsoid\Utils\Title::__construct() must be of the type string, null given, called in /srv/mediawiki/php-1.42.0-wmf.15/vendor/wikimedia/parsoid/src/Util... - https://phabricator.wikimedia.org/T356024 [15:30:48] 10serviceops, 10Maps, 10Patch-For-Review: Repool maps primaries in Kartotherian - https://phabricator.wikimedia.org/T355892 (10hnowlan) 05In progress→03Resolved [15:48:42] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2448.codfw.wmnet with OS bullseye [15:48:49] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2449.codfw.wmnet with OS bullseye [15:48:52] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw2447.codfw.wmnet with OS bullseye [16:26:40] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2448.codfw.wmnet with OS bullseye completed: - mw2448 (**PA... [16:30:12] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2447.codfw.wmnet with OS bullseye completed: - mw2447 (**PA... [16:33:15] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw2449.codfw.wmnet with OS bullseye completed: - mw2449 (**PA... [18:08:44] 10serviceops, 10Content-Transform-Team, 10Maintenance-Worktype, 10Wikimedia-Incident: Maps Unavailability due to thanos-swift cfssl rollout (14 Aug 2023) - https://phabricator.wikimedia.org/T344324 (10jijiki) [18:09:32] 10serviceops, 10Content-Transform-Team, 10Maintenance-Worktype, 10Wikimedia-Incident: Maps Unavailability due to thanos-swift cfssl rollout (14 Aug 2023) - https://phabricator.wikimedia.org/T344324 (10jijiki) 05In progress→03Stalled Stalled until T356412 is picked up by #data-persistence [18:40:54] 10serviceops, 10MW-on-K8s, 10MediaWiki-Platform-Team (Radar), 10Patch-For-Review: mcrouter daemonset on mw-on-k8s - https://phabricator.wikimedia.org/T346690 (10jijiki) 05Open→03Stalled [18:41:04] 10serviceops, 10SRE: Memcached, mcrouter in MediaWiki on Kubernetes - https://phabricator.wikimedia.org/T277711 (10jijiki)