[07:24:35] 10serviceops, 10Thumbor, 10Patch-For-Review: Revisit thumbor's poolcounter integration - https://phabricator.wikimedia.org/T338297 (10Joe) p:05Triage→03Low Low priority as we've found out the problem is fundamentally deeper than we'd like. [07:28:17] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Patch-For-Review, 10Video: Port videoscaling to kubernetes - https://phabricator.wikimedia.org/T355292 (10Joe) >>! In T355292#9468360, @TheDJ wrote: > Related: T105951, T155114, T292322 At least as far as this task is concerned, T292322 isn't a problem -... [07:35:04] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Move video transcoding to use Shellbox - https://phabricator.wikimedia.org/T356241 (10Joe) [07:46:17] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Convert TimedMediaHandler to use BoxedCommand/Shellbox - https://phabricator.wikimedia.org/T356242 (10Joe) [07:46:36] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Move video transcoding to use Shellbox - https://phabricator.wikimedia.org/T356241 (10Joe) p:05Triage→03High [07:47:09] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Move video transcoding to use Shellbox - https://phabricator.wikimedia.org/T356241 (10Joe) [07:47:41] 10serviceops, 10MW-on-K8s, 10TimedMediaHandler, 10Video: Convert TimedMediaHandler to use BoxedCommand/Shellbox - https://phabricator.wikimedia.org/T356242 (10Joe) p:05Triage→03High [08:50:46] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10Gehel) Discussion with @Joe : no objection to enabling compaction as long as w... [09:39:02] 10serviceops, 10MW-on-K8s, 10Quality-and-Test-Engineering-Team, 10SRE: Move testwiki over to mw-on-k8s - https://phabricator.wikimedia.org/T355534 (10Clement_Goubert) p:05Triage→03Medium [09:39:33] 10serviceops, 10Infrastructure-Foundations, 10SRE, 10Patch-For-Review: httpbb needs to be setup on cumin1002 and removed from cumin1001 - https://phabricator.wikimedia.org/T356054 (10Clement_Goubert) p:05Triage→03Medium [10:26:16] 👋 I have a patch for mobileapps deployment charts in case anyone is available for a quick review: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/994215 [10:29:10] <_joe_> nemo-yiannis: let me take a look [10:29:52] <_joe_> nemo-yiannis: I can +1 that but I don't really know what that change entails :) [10:30:08] there was a typo from my end when defining the config entry [10:30:13] tpl/req [10:48:19] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1425.eqiad.wmnet with OS bullseye [10:48:42] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1423.eqiad.wmnet with OS bullseye [10:49:02] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin2002 for host mw1424.eqiad.wmnet with OS bullseye [11:03:50] <_joe_> nemo-yiannis: you don't need our +1 for such changes btw [11:04:05] ok [11:04:15] <_joe_> you can just get it from a teammate, maybe we should craft an explicit policy about this [11:04:46] <_joe_> it's clearly not a patch that changes how the chart works, it's application-specific, and you need to act self-serve as much as possible [11:24:25] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1425.eqiad.wmnet with OS bullseye completed: - mw1425 (**PA... [11:27:04] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1423.eqiad.wmnet with OS bullseye completed: - mw1423 (**PA... [11:29:57] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Move servers from the appserver/api cluster to kubernetes - https://phabricator.wikimedia.org/T351074 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin2002 for host mw1424.eqiad.wmnet with OS bullseye completed: - mw1424 (**PA... [11:44:37] 10serviceops, 10MW-on-K8s, 10MediaWiki-Platform-Team (Radar), 10Patch-For-Review: mcrouter daemonset on mw-on-k8s - https://phabricator.wikimedia.org/T346690 (10jijiki) >>! In T346690#9382549, @Clement_Goubert wrote: > I think we can also set it in the php-fpm pool conf like > ` > env[MCROUTER_SERVER] = $M... [12:23:31] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, and 2 others: Move 40% of mediawiki external requests to mw on k8s - https://phabricator.wikimedia.org/T355532 (10Clement_Goubert) [12:55:51] 10serviceops, 10MediaWiki-Internationalization, 10MediaWiki-extensions-General, 10WMF-General-or-Unknown, and 4 others: Update footer links to direct to proper locations on Foundation Governance Wiki - https://phabricator.wikimedia.org/T331680 (10Varnent) The last step is checking and fixing local overridi... [13:44:53] 10serviceops, 10Content-Transform-Team, 10Parsoid, 10Patch-For-Review, 10Wikimedia-production-error: TypeError: Argument 4 passed to Wikimedia\Parsoid\Utils\Title::__construct() must be of the type string, null given, called in /srv/mediawiki/php-1.42.0-wmf.15/ve... - https://phabricator.wikimedia.org/T356024 [13:54:52] 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes, 10Patch-For-Review: Increase visibility of container/pod ressource exhaustion - https://phabricator.wikimedia.org/T266216 (10akosiaris) 05Open→03Resolved a:03akosiaris The 2 patches linked worked just fine. There is taskmanager left to... [13:57:03] 10serviceops, 10Prod-Kubernetes, 10observability, 10Kubernetes, 10Patch-For-Review: Increase visibility of container/pod ressource exhaustion - https://phabricator.wikimedia.org/T266216 (10akosiaris) [13:57:06] 10serviceops, 10ChangeProp, 10Kubernetes, 10Sustainability (Incident Followup): Investigate the iowait issues plaguing kubernetes nodes since 2020-05-29 - https://phabricator.wikimedia.org/T255975 (10akosiaris) [13:57:21] 10serviceops, 10ChangeProp, 10Prod-Kubernetes, 10SRE-Sprint-Week-Sustainability-March2023, and 2 others: Raise an alarm on container restarts/OOMs in kubernetes - https://phabricator.wikimedia.org/T256256 (10akosiaris) 05Open→03Resolved Patches reviewed and merged, I had some followup patches in T26621... [14:43:23] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) I'm going to go ahead, and go with solution 1. As there's so strong... [14:55:28] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) Looking at the [[ https://thanos.wikimedia.org/graph?g0.expr=sum(ka... [14:58:50] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) ` brouberol@kafka-main1003:~$ kafka configs --entity-type topics --... [15:03:39] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) The topic is so small, the effect of compaction went completely unr... [15:29:08] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) a:03brouberol [15:29:19] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) ` brouberol@kafka-main1003:~$ kafka configs --entity-type topics --... [15:33:42] 10serviceops, 10Dumps-Generation, 10MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Reedy) >>! In T319432#9234734, @MoritzMuehlenhoff wrote: >>>! In T319432#9234692, @kostajh wrote: >> @Krinkle this task is marked as stalled, is it blocke... [16:22:08] 10serviceops, 10CirrusSearch, 10Discovery-Search, 10Data-Platform-SRE (2024.01.22 - 2024.02.11): Requesting permission to enable kafka log compaction for page_rerender on kafka-main - https://phabricator.wikimedia.org/T354794 (10brouberol) 05Open→03Stalled This is blocked until the next codfw -> eqiad... [16:32:19] 10serviceops, 10Dumps-Generation, 10MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Jdforrester-WMF) Yup, AIUI that for simplicity it's been decided that this is blocked by {T290536}; should we mark that formally in Phab? [16:37:09] 10serviceops, 10Dumps-Generation, 10MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Reedy) Yeah, otherwise it's unclear what/why this is stalled [16:39:46] 10serviceops, 10Dumps-Generation, 10MediaWiki-Platform-Team: Migrate WMF production from PHP 7.4 to PHP 8.1 - https://phabricator.wikimedia.org/T319432 (10Jdforrester-WMF) [16:39:54] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Jdforrester-WMF) [16:51:56] 10serviceops, 10SRE: Migrate MW appservers to bullseye - https://phabricator.wikimedia.org/T356293 (10Jdforrester-WMF) [16:53:00] 10serviceops, 10SRE: Migrate MW appservers to bullseye - https://phabricator.wikimedia.org/T356293 (10Jdforrester-WMF) [16:53:11] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Serve production traffic via Kubernetes - https://phabricator.wikimedia.org/T290536 (10Jdforrester-WMF) [16:53:19] 10serviceops, 10SRE: Migrate MW appservers to bullseye - https://phabricator.wikimedia.org/T356293 (10Jdforrester-WMF) 05Open→03Stalled [17:05:01] 10serviceops, 10SRE: confd setup left without configuration doesn't stop confd - https://phabricator.wikimedia.org/T356296 (10Volans) [20:00:22] mutante: eoghan has setup the rsync replication for Gerrit LFS data ! :) [20:02:42] hashar: I heard/saw the patch, very good:) [20:03:53] hashar: the fixes to the soy templates are deployed.. laters [20:09:47] 10serviceops, 10SRE: Migrate MW appservers' base images to bullseye - https://phabricator.wikimedia.org/T356293 (10Jdforrester-WMF) [21:14:29] 10serviceops, 10SRE: Migrate MW appservers' base images to bullseye - https://phabricator.wikimedia.org/T356293 (10MoritzMuehlenhoff) FWIF, the PHP packages for this are already available in the component/php74 (and used on some initial snapshot* hosts( . They are the same version as the component/php74 we use... [21:57:51] How does https://alerts.wikimedia.org/?q=%40state%3Dactive&q=alertname%3DSessionStoreOnNonDedicatedHost happen, and how bad/urgent is this? [21:58:03] "Sessionstore k8s pods are running on hosts that don't have the 'kask' taint"