[03:32:37] 06serviceops, 10MediaWiki-Documentation, 10Wikimedia-Apache-configuration, 07Documentation, 13Patch-Needs-Improvement: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950#9874068 (10Pppery) [03:39:36] 06serviceops, 10MediaWiki-Documentation, 10Wikimedia-Apache-configuration, 07Documentation, 13Patch-Needs-Improvement: Repair "svn.wikimedia.org/doc/" redirect for doc.wikimedia.org - https://phabricator.wikimedia.org/T109950#9874070 (10Pppery) I think more went inactive generally than gave up on this sp... [06:57:58] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate charts to Calico Network Policies - https://phabricator.wikimedia.org/T359423#9874255 (10brouberol) [07:38:53] 06serviceops, 06Data-Platform-SRE, 10Prod-Kubernetes, 07Kubernetes, 13Patch-For-Review: Migrate charts to Calico Network Policies - https://phabricator.wikimedia.org/T359423#9874351 (10brouberol) [07:39:57] FYI, kubetcd2005 will go down for a few minutes for a reboot of the underlying Ganeti node [07:40:21] <_joe_> somehow my mind read "kube etcd will go down" [07:40:30] <_joe_> and I had a panic reaction :D [07:41:12] heh :-) [08:33:48] likewise, kubetcd1004 will go down for a few minutes for a reboot of the underlying Ganeti node [11:01:06] 06serviceops, 06Infrastructure-Foundations, 06Release-Engineering-Team, 13Patch-For-Review: Deprecate buster-backports - https://phabricator.wikimedia.org/T362518#9874823 (10Clement_Goubert) `docker-registry.wikimedia.org/php7.4-cli-icu67` and `docker-registry.wikimedia.org/php7.4-fpm-icu67` deleted, all b... [11:01:32] _joe_ (and rest of serviceops): Any objection to deploying this patch by a volunteer: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1040809 [11:01:45] <_joe_> Amir1: one sec please [11:01:54] It makes our outage error page follow dark mode [11:02:02] sure, let me know [11:02:31] <_joe_> uhm [11:02:42] <_joe_> do we just inject that into k8s? [11:02:44] <_joe_> yeah we do [11:03:03] <_joe_> Amir1: you will need a scap run afterwards, but that should hold a sec [11:03:09] <_joe_> I need to verify something first [11:03:25] <_joe_> so you can merge but to get it onto k8s, you'll have to wait in line a few minutes [11:03:41] no worries on my side, there is no rush [11:11:22] <_joe_> yeah it's gonna take longer [11:11:24] <_joe_> sorry [11:19:06] <_joe_> Amir1: you should be able to test your error page on mw-debug [11:19:21] thank you! [11:21:38] https://usercontent.irccloud-cdn.com/file/u6ikbW40/grafik.png [11:21:40] Looks good! [11:35:22] <_joe_> with my next scap run it will deployed everywhere [11:42:53] likewise, kubetcd1006 will go down for a few minutes for a reboot of the underlying Ganeti node [11:45:28] ack [12:34:06] likewise, kubetcd2006 will go down for a few minutes for a reboot of the underlying Ganeti node [12:35:10] hello folks! If you are ok I am going to rollout the new versions of eventrouter, k8s-controller-sidecars and kube-state-metrics (see https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1040153) [12:47:31] elukey: roll ahead :) [13:04:46] 06serviceops, 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875251 (10Aklapper) [13:04:47] 06serviceops, 10Wikimedia-Apache-configuration: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9875252 (10Aklapper) [13:14:14] 06serviceops, 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875279 (10akosiaris) I am not sure what this task asks to be honest. Care to add a bit more information as to what the problem is? [13:17:56] 06serviceops, 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875292 (10Aklapper) https://2030.wikimedia.org currently redirects to https://meta.wikimedia.org/wiki/Wikimedia_2030 but should redirect to https://meta.wikimedia.org/wiki/Moveme... [13:25:17] 06serviceops, 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9875314 (10akosiaris) And I am still not sure. I assume that the double redirect is considered a problem? If yes, why? Alternatively, is there some intent to remove the redire... [13:37:43] folks I have also upgraded recommendation-api to use prometheus [13:37:59] it was something that I wanted to do for https://phabricator.wikimedia.org/T205870 [13:38:21] and now it also runs node-18 [13:38:56] the service works fine, I'll keep checking the metrics since they may need some tweaking [13:39:12] (the new images also carry the bookworm upgrades, so I did a single deploy for all) [13:48:14] <_joe_> it's using prom natively? [13:48:19] <_joe_> nice [13:49:06] yep! [13:52:14] nice :) [14:07:46] 06serviceops, 06Infrastructure-Foundations, 10Puppet-Core: Extend puppet ipresolve() to support SRV records - https://phabricator.wikimedia.org/T366465#9875482 (10MoritzMuehlenhoff) 05Open→03Declined Given an alternative solution was found for etcd, closing this one (after checking with Janis) [14:32:52] I am going to rollout the new docker images for changeprop [14:50:39] oh, thanks for that. /me updating an internal tracking sheet [14:51:56] akosiaris: o/ just to understand - I am working on https://phabricator.wikimedia.org/T356252, is there another list that I should be aware of, or is it a separate thing? [14:53:57] elukey: mgmt thing, unrelated. [14:54:28] ooook I was worried that there was another list :D [15:00:22] likewise, kubetcd2004 will go down for a few minutes for a reboot of the underlying Ganeti node [15:00:38] 06serviceops, 06Infrastructure-Foundations, 06Release-Engineering-Team, 13Patch-For-Review: Deprecate buster-backports - https://phabricator.wikimedia.org/T362518#9875731 (10Jdforrester-WMF) >>! In T362518#9874823, @Clement_Goubert wrote: > * `docker-registry.wikimedia.org/wikimedia/labs-libraryupgrader:we... [15:00:40] aaand changeprop updated [15:17:49] 06serviceops, 06Infrastructure-Foundations, 06Release-Engineering-Team, 13Patch-For-Review: Deprecate buster-backports - https://phabricator.wikimedia.org/T362518#9875798 (10Clement_Goubert) Deleted `docker-registry.wikimedia.org/wikimedia/labs-libraryupgrader:web`, merged change to stop building `docker-r... [15:29:27] While migrating k8s mediawiki to the new postfix servers I noticed that we currently configure only a single egress email server, e.g. wiki-mail-eqiad.wikimedia.org, which means we potentially lose all outbound emails when that server is down. Is this a known issue? [15:31:07] we configured 1 per DC, right? [15:31:38] I remember something about the software that does the mail delivery, let me refresh my memory [15:32:34] right 1 per dc [15:32:35] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9875854 (10kamila) @Papaul could you please let me know when would be a good time for you to do this? We don't have any specific... [15:33:09] at one point the software was msmtp [15:34:23] at least from git log [15:57:20] jhathaway: https://phabricator.wikimedia.org/T325131 [15:58:16] * jhathaway embarrassingly notices my own name on the ticket [15:58:51] Pear::mail expects the host provided to have MX records, and will cycle through them if needed, if the smtpx interface is used; mediawiki uses currently the smtp interface, so that would not be viable. [15:59:07] this probably explains why we don't have multiple. [15:59:43] it's nice to see all that effort documented (including my own comments) and somewhat remembering [16:00:00] it's 2022 btw, I have no recollections of those years, aside from covid and lockdowns [16:00:26] yup, glad its on a ticket [16:00:41] 06serviceops, 10MW-on-K8s, 10Observability-Logging, 06SRE: benthos mw-accesslog-metrics kafka lag and interpolation errors - https://phabricator.wikimedia.org/T367076#9876028 (10fgiunchedi) [16:06:28] 06serviceops, 06DC-Ops, 10ops-codfw, 06SRE-OnFire, 10Sustainability (Incident Followup): codfw:(3) wikikube-ctrl NIC upgrade to 10G - https://phabricator.wikimedia.org/T366205#9876057 (10Papaul) @kamila ? There are some planning that we need to do around this. We will need to relocate those servers for... [16:19:26] jayme: aleks was going to do the mw-page-content-change enrich flink k8s deployment today, but he needs deployment group access first. If you know how to expedite https://phabricator.wikimedia.org/T367073 access request we can get it done sooner. [16:19:33] I'm happy to merge and apply, just need releng approval [16:26:01] ottomata: you'd probably have to find a replacement for thcipriani for approval as they seem to be out of office [16:59:12] yeah. i asked in releng, maybe gotta ask tajh [17:04:14] might be a good time to ask for a backup group approver to be added to data.yaml [17:04:41] good to have more than 1 in general [17:06:20] btw, in this context.. for the approvals for analytics groups I have been asked that we start tagging those tickets with the team tag - instead of pinging individuals - to make the process more efficient [17:06:34] i'll see about adding that to the docs somewhere [17:36:08] jayme: or anyone, Aleks and I are trying to deploy mw-page-content-change-enrich in k8s staging, but there don't seem to be any resources deployed there? [17:36:40] helmfile diff showed what we expected. We applied. And then checked kubectl for resources, pods, etc. but nothing is there? [17:36:40] and now apply won't do anything? [17:40:53] mutante: The new Director over Developer Experience is starting in July. They would be a logical backup for Tyler in approving things like the deployer group membership. [17:41:12] (This is Kate's backfill) [17:43:26] bd808: aha! thanks for that. makes sense [17:59:09] ottomata: I don't know anything beyond what's here https://logstash.wikimedia.org/goto/1dedfd3138f05cd63b1f0a2073d190f1 [18:00:48] btw ottomata `kubectl get events` is also handy for recent history for that [18:01:27] seems like this is the error that's actually blocking: "JobManager deployment is missing and HA data is not available to make stateful upgrades. It is possible that the job has finished or terminally failed, or the configmaps have been deleted. Manual restore required." [18:48:43] hm. oookaayyyyyy. [18:49:08] thank you that is helpful [19:09:11] ottomata: have you looked at the logs for whatever controller is actually driving the flink objects on k8s? [19:09:16] its logs will probably have more useful info [20:05:36] flink k8s operater, no i haven't really had time to look into it! i'll make a task for our team [20:24:53] 06serviceops, 10MW-on-K8s, 07Datacenter-Switchover: Control mw-on-k8s periodic maintenance jobs with an etcd value - https://phabricator.wikimedia.org/T367118 (10RLazarus) 03NEW [21:17:01] 06serviceops, 10Wikimedia-Apache-configuration: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9877520 (10Dzahn) It would make sense to me to link to a specific version rather than a list of snapshots. But I disagree that it should have a relation to www.sep11memorie... [22:33:27] 06serviceops, 10Wikimedia-Apache-configuration: Change redirect target of sep11.wikipedia.org - https://phabricator.wikimedia.org/T367014#9877807 (10Pppery) From ~2007 to August 2015 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/225043) sep11.wikipedia.org was a redirect to sep11memories.org. That's wh... [22:55:31] 06serviceops, 10Language-Technical Support, 06SRE, 10Wikimedia-Site-requests, 13Patch-For-Review: Change $wgMaxArticleSize limit from byte-based to character-based - https://phabricator.wikimedia.org/T275319#9877825 (10stjn) While discussing performance issues on Discord, I looked at https://he.wikisourc... [22:58:08] 06serviceops, 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9877831 (10Pppery) It's not a big problem, (hence why I triaged this as low priority) and there are no plans to do anything with the redirect. But it would still be nice to keep t... [23:14:26] 06serviceops, 10Wikimedia-Apache-configuration: 2030.wikimedia.org is a double redirect - https://phabricator.wikimedia.org/T367013#9877857 (10Dzahn) We have changed this a couple times now from 2017... T158981 to 2020... -> T202498 -> to 2030 ... T264797 the Wikimedia_2030 on meta was the last one requeste...