[07:33:35] o/ seeing "connect to host parse1002.eqiad.wmnet port 22: Connection timed out" during scap backport [07:40:36] <_joe_> dcausse: uh taking a look [07:41:04] <_joe_> dcausse: for operational matters, ping the oncall people on #-sre, you're more guaranteed to receive a response [07:41:18] _joe_: oh good point thanks [07:41:28] <_joe_> dcausse: no need now, ofc [07:41:33] <_joe_> the server is down, sigh [08:44:47] 10serviceops: Rebalance kafka partitions in main-{eqiad,codfw} clusters - 2023 edition - https://phabricator.wikimedia.org/T341558 (10elukey) I haven't done all the moves, since the current status seems ok to me. Relevant highlights: * The data stored on each broker is more balanced now, it may vary of course i... [12:35:13] 10serviceops, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Deploy wikidiff2 1.14.1 - https://phabricator.wikimedia.org/T340087 (10TheresNoTime) >>! In T340087#9008668, @akosiaris wrote: > [...] > @TheresNoTime let me know when we should proceed with the next step of the deployment.... [12:48:14] 10serviceops, 10wikidiff2, 10Better-Diffs-2023, 10Community-Tech (CommTech-Kanban): Deploy wikidiff2 1.14.1 - https://phabricator.wikimedia.org/T340087 (10akosiaris) >>! In T340087#9027714, @TheresNoTime wrote: >>>! In T340087#9008668, @akosiaris wrote: >> [...] >> @TheresNoTime let me know when we should... [13:13:30] 10serviceops: Alert review: KubeletOperationalLatency - https://phabricator.wikimedia.org/T342250 (10LSobanski) [13:19:25] 10serviceops: Alert triage - https://phabricator.wikimedia.org/T342250 (10LSobanski) [13:27:22] 10serviceops, 10Prod-Kubernetes, 10Kubernetes: Alert triage - https://phabricator.wikimedia.org/T342250 (10JMeybohm) [13:28:59] sobanski: thanks - is the task title supposed to be so meta? [13:29:42] It's a test drive Filippo and I did, open to ideas how it could be improved :) [13:30:29] I initially included the alert type (as you probably noticed) but then adjusted it to just say "alert review" as the other task we created covered two alert types [13:30:39] So I adjusted yours as well for consistency [13:31:03] that's was my assumption ans is why I'm asking. I can imagine the title being a bit to generic in "my" case [13:31:21] I thought you might want to keep it that way for now to be able to easily find them [13:31:30] I'll take a note of it, we're still figuring things out [13:32:13] in that case a tag would probably be nice so we can get an overview...but you probably already thought of that as well [13:32:29] Yup, just wrote exactly that down :) [13:34:49] nice [13:35:57] thank you for the feedback jayme -- appreciated [13:37:22] I would insist that I left that alert firing for you to have something to look at - but unfortunately that is not very true :-p [13:45:07] lolz [13:45:49] 10serviceops, 10Data Engineering and Event Platform Team, 10MW-on-K8s, 10SRE: Migrate flink-cluster-taskmanager to connect to mw-on-k8s - https://phabricator.wikimedia.org/T342252 (10Joe) [13:47:18] 10serviceops, 10Data Engineering and Event Platform Team, 10MW-on-K8s, 10SRE: Migrate rdf-streaming-updater to connect to mw-on-k8s - https://phabricator.wikimedia.org/T342252 (10Joe) [13:48:34] 10serviceops, 10Data Engineering and Event Platform Team, 10MW-on-K8s, 10SRE: Migrate rdf-streaming-updater to connect to mw-on-k8s - https://phabricator.wikimedia.org/T342252 (10Joe) @dcausse not sure if you're the right person to ask, if not apologies; but I wanted to know if we're making any write reque... [13:57:18] 10serviceops, 10Data Engineering and Event Platform Team, 10MW-on-K8s, 10SRE: Migrate rdf-streaming-updater to connect to mw-on-k8s - https://phabricator.wikimedia.org/T342252 (10dcausse) >>! In T342252#9028035, @Joe wrote: > @dcausse not sure if you're the right person to ask, if not apologies; but I want... [13:59:02] 10serviceops, 10Data Engineering and Event Platform Team, 10MW-on-K8s, 10SRE: Migrate rdf-streaming-updater to connect to mw-on-k8s - https://phabricator.wikimedia.org/T342252 (10Joe) >>! In T342252#9028119, @dcausse wrote: >>>! In T342252#9028035, @Joe wrote: >> @dcausse not sure if you're the right perso... [15:51:06] 10serviceops, 10SRE, 10Abstract Wikipedia team (Phase λ – Launch), 10Patch-For-Review, 10Service-deployment-requests: New Service Request: function-orchestrator and function-evaluator (for Wikifunctions launch) - https://phabricator.wikimedia.org/T297314 (10JMeybohm) https://gerrit.wikimedia.org/r/c/oper... [16:00:36] 10serviceops, 10Prod-Kubernetes, 10Kubernetes, 10Patch-For-Review: Use cert-manager for service-proxy certificate creation - https://phabricator.wikimedia.org/T300033 (10JMeybohm) helm module support and dependencies have been implemented and the ipoid chart is running with the new cert-manager certs in st... [16:10:29] 10serviceops: Rebalance kafka partitions in main-{eqiad,codfw} clusters - 2023 edition - https://phabricator.wikimedia.org/T341558 (10elukey) Created the main-eqiad plan with: ` ./topicmappr rebuild --force-rebuild --topics __consumer_offsets,__transaction_state,codfw.changeprop.error,codfw.cpjobqueue.partitio... [16:10:50] hi folks! I added in https://gitlab.wikimedia.org/elukey/kafka_main_rebalance/-/tree/main/main-eqiad/topicmappr_json the plan to move main eqiad topics [16:10:58] I'll start tomorrow if nobody objects [16:24:34] elukey: thanks! [20:28:47] 10serviceops: parse1002 down - https://phabricator.wikimedia.org/T342298 (10RhinosF1) [20:29:04] 10serviceops: parse1002 down - https://phabricator.wikimedia.org/T342298 (10RhinosF1) p:05Triage→03High [20:29:32] 10serviceops: parse1002 down - https://phabricator.wikimedia.org/T342298 (10TheresNoTime)