[02:46:53] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284 (10RLazarus) Super helpful explanation, thank you! https://gerrit.wikimedia.org/r/981703 should do the above, and https://gerrit.wikimedia.org/r/981704 adds the bind... [06:48:49] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284 (10Joe) >>! In T348284#9395386, @RLazarus wrote: > Yeah, good point. Fortunately it looks like a pretty straightforward Go patch to add a --namespace flag if need be... [09:44:24] hello folks [09:44:35] going to retry the nodejs18 upgrade for rec-api [09:56:23] deployed, so far the metrics seem to be available (so the fix seems to work) [09:56:26] will watch for a bit [09:56:32] Cc: James_F: --^ [09:57:00] (more precisely https://grafana.wikimedia.org/d/Y5wk80oGk/recommendation-api) [09:59:55] spoke too soon of course [10:14:04] we use stuff like service_runner_request_duration_seconds_count in the dashboard, but afaics those metrics are not there anymore [10:14:09] there are other ones [10:14:26] stuff like [10:14:27] # TYPE recommendation_api_router___domain_v1_description_translation_from___source_to___target_GET_200 gauge [10:14:30] recommendation_api_router___domain_v1_description_translation_from___source_to___target_GET_200 1.702289597967e+12 [10:15:26] and we are doing https://github.com/wikimedia/service-runner/blob/master/doc/2.7-2.8_Migration_Guide.md [10:17:01] from the code, in theory, we should get recommendation_api_router_request_duration_seconds [10:17:23] okok it is not right, now I know what to test [10:17:31] rolling back (again) [10:18:38] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) [10:32:59] 10serviceops, 10MW-on-K8s, 10Discovery-Search (Current work), 10Patch-For-Review: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error - https://phabricator.wikimedia.org/T352906 (10hnowlan) 05Resolved→03Open [10:33:09] 10serviceops, 10MW-on-K8s, 10SRE, 10Traffic, 10Release-Engineering-Team (Seen): Move MediaWiki jobs to mw-on-k8s - https://phabricator.wikimedia.org/T349796 (10hnowlan) [11:51:56] 10serviceops, 10SRE: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10Clement_Goubert) [11:52:58] 10serviceops, 10SRE: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10Clement_Goubert) p:05Triage→03Medium [12:30:02] 10serviceops, 10MinT, 10Language-Team (Language-2023-October-December), 10Patch-For-Review: Provide python3-build-bookworm docker image - https://phabricator.wikimedia.org/T352733 (10KartikMistry) Thanks @hashar for updating the patch and for some insights into the history of image building. @Clement_Goube... [12:52:01] claime: for the python3-build-bookworm image, you would have to +2 / build it. Kartik and I lack the access to do so :-] [12:53:03] hashar: ah right [12:53:07] forgot about that [12:58:27] hashar: kart_: build in progress [12:59:44] Thanks! [13:00:14] I have a couple images that fail to build, but yours should be ok [13:07:48] \o/ [13:17:34] https://docker-registry.wikimedia.org/python3-build-bookworm/tags/ yay! [13:17:40] I'll test MinT with it. [13:23:26] 10serviceops, 10Traffic: Handling inbound IPIP traffic on low traffic LVS k8s based realservers - https://phabricator.wikimedia.org/T352956 (10Vgutierrez) @akosiaris as mentioned on the meeting we need the following questions answered: * Is it OK to clamp all egress traffic on a k8s node? * IPIP encapsulation... [13:32:11] Looks like it is broken yet: https://integration.wikimedia.org/ci/job/machinetranslation-pipeline-test/372/console claime hashar [13:34:13] hmm [13:34:24] will check, I'm about to go to lunch [13:40:18] No problem. [13:49:52] That looks like it breaks after [13:50:10] I'm not too well versed on the pipeline-build so err, hashar, if you have any idea [13:50:18] I really need to go to lunch x) [13:59:16] Yeah that looks like the error comes from blubber's python builder [14:02:00] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10Jdforrester-WMF) [14:09:19] jetlag stuff and such [14:09:46] kart_: essentially that is the same issue the python3-build-bookworm image had, pip require to be run with --break-system-package [14:10:03] which we previously encountered with Blubber support for python [14:12:47] I will comment on the change [14:27:56] kart_: I guess you are off by now, but I went to upgrade blubber and maybe that will fix it https://gerrit.wikimedia.org/r/c/mediawiki/services/machinetranslation/+/982083/1..2/.pipeline/blubber.yaml :) [14:28:54] yeah, just got it :) [14:29:04] (I'm at the desk till dawn ;)) [14:29:14] :-] [14:29:44] https://integration.wikimedia.org/ci/job/machinetranslation-pipeline-test/373/console [14:29:47] I think that fixed it [14:30:13] Finished: SUCCESS [14:30:34] Looks like I override your commit msg. Will fix. [14:30:35] so something got build, but I can't tell whether it is entirely correct [14:35:13] hi all, I plan to deploy changeprop shortly for https://phabricator.wikimedia.org/T351247 [14:37:06] ottomata: currently going on is the backport window, please sync with -operations :) [14:37:12] k [14:37:13] ty [14:37:35] hashar: oh, looks like they just closed it :) [14:37:43] great timing! [14:38:04] as for changeprop, I know nothing about it [14:44:53] no one does apparently :) [14:47:44] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host sessionstore2006.codfw.wmnet with OS bullseye [15:00:49] ottomata: generally as long as it doesn't overlap with a window we deploy it whenever [15:15:49] k ty [15:40:01] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host sessionstore2006.codfw.wmnet with OS bullseye completed: - sessionstore... [15:41:14] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host sessionstore2005.codfw.wmnet with OS bullseye [15:52:38] 10serviceops, 10Infrastructure-Foundations: Load IP ranges in reverse-proxy.php from Netbox/Puppet network module - https://phabricator.wikimedia.org/T324020 (10CDanis) hi serviceops, any plans to work on this soon? I/F would be happy to help with an implementation but we kind of want serviceops to figure out... [16:02:10] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1059.eqiad.wmnet with OS bullseye [16:02:47] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1060.eqiad.wmnet with OS bullseye [16:03:22] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1061.eqiad.wmnet with OS bullseye [16:03:52] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by cgoubert@cumin1001 for host kubernetes1062.eqiad.wmnet with OS bullseye [16:18:58] 10serviceops: Load IP ranges in reverse-proxy.php from Netbox/Puppet network module - https://phabricator.wikimedia.org/T324020 (10Volans) Removing I/F as we're not directly involved. Feel free to re-add if/when you might need help on the Netbox side. [16:20:41] 10serviceops, 10Infrastructure-Foundations, 10SRE-tools, 10Wikimedia-Mailing-lists: Support services VIPs with not marked as VIP in Netbox - https://phabricator.wikimedia.org/T295793 (10Volans) a:03cmooney Assigning to Cathal as per meeting discussion. [16:21:16] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host sessionstore2005.codfw.wmnet with OS bullseye completed: - sessionstore... [16:23:04] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1001 for host sessionstore2004.codfw.wmnet with OS bullseye [16:43:51] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1059.eqiad.wmnet with OS bullseye completed: - kubernetes1059 (**WARN**) - Down... [16:47:41] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1062.eqiad.wmnet with OS bullseye completed: - kubernetes1062 (**WARN**) - Down... [16:49:43] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1061.eqiad.wmnet with OS bullseye completed: - kubernetes1061 (**WARN**) - Down... [16:50:14] 10serviceops, 10SRE, 10Patch-For-Review: setup/install kubernetes10[59-62] - https://phabricator.wikimedia.org/T353135 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by cgoubert@cumin1001 for host kubernetes1060.eqiad.wmnet with OS bullseye completed: - kubernetes1060 (**WARN**) - Down... [17:00:52] 10serviceops, 10DC-Ops, 10SRE, 10ops-codfw: Q2:rack/setup/install 3 sessionstore hosts (codfw) - https://phabricator.wikimedia.org/T349876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1001 for host sessionstore2004.codfw.wmnet with OS bullseye completed: - sessionstore... [17:07:22] 10serviceops, 10RESTBase Sunsetting, 10Code-Health-Objective, 10Data Products (Data Products Sprint 05), 10Patch-For-Review: Route to new AQS Knowledge Gaps endpoint - https://phabricator.wikimedia.org/T342213 (10WDoranWMF) [17:39:59] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) 05In progress→03Resolved [17:54:22] 10serviceops, 10MW-on-K8s, 10Patch-For-Review: Handle sidecar containers in one-off Kubernetes jobs - https://phabricator.wikimedia.org/T348284 (10RLazarus) >>! In T348284#9395546, @Joe wrote: > I reckon this technique will be useful for all charts that need a `Job` object. Agreed. Per discussion up-thread...