[07:52:57] 10netops, 06Infrastructure-Foundations, 13Patch-For-Review: Juniper: use export-format state-data json compact - https://phabricator.wikimedia.org/T362523#9828645 (10ayounsi) Opened JTAC case `2024-0524-163553` [08:19:03] 06Traffic, 06Content-Transform-Team, 06MW-Interfaces-Team, 10RESTBase Sunsetting: Remove long term caching and active purging for Parsoid endpoints in RESTBase - https://phabricator.wikimedia.org/T365630#9828691 (10daniel) >>! In T365630#9827033, @BBlack wrote: > I think I'm lost in some confusion here, as... [10:48:00] 10Acme-chief, 06Traffic, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 06SRE: Revert back to fleet-wide acmechief config once all ACME consumers are on Puppet 7 - https://phabricator.wikimedia.org/T365799 (10MoritzMuehlenhoff) 03NEW [11:13:32] 06Traffic, 10MoveComms-Support, 10MW-on-K8s, 06serviceops, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki) - https://phabricator.wikimedia.org/T362323#9829076 (10akosiaris) [11:14:22] 06Traffic, 10MoveComms-Support, 10MW-on-K8s, 06serviceops, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki) - https://phabricator.wikimedia.org/T362323#9829077 (10akosiaris) [11:24:03] moritzm: could you take a look to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035724 when you have the chance? no rush [11:24:58] sure thing, I'll have a look later or Monday [11:25:21] I'm pretty sure it will need more work, especially handling services IPv4-only or IPv6-only... low-triaffic LVS has a lot of realservers that are IPv4-only IIRC [11:25:27] what's the initial target audience for this, will this also be needed for nftables-using systems? [11:25:40] right now we have only 20ish roles on nftables [11:25:51] moritzm: those would trigger a fail() at the moment [11:26:12] ok [11:26:43] I could cover those with tcp-mss-clamper but no rush in doing that [11:26:48] right now the only impacted cluster would be ncredir [11:27:15] ack, sounds good [11:35:40] hmm [11:35:50] $ sudo cumin 'P{R:ferm::service and C:profile::lvs::realserver}' [11:35:50] No hosts found that matches the query [11:36:09] I'm missing something here cause at least ncredir instances should be listed [11:39:05] fixed [11:39:35] moritzm: 'P{C:ferm} and P{C:profile::lvs::realserver}' cumin query shows that 792 hosts could use it [11:40:08] moritzm: apparently right now 0 hosts use nftables and are acting as realservers for LVS [11:44:54] ack that's fine. we're still working out some other bits (like support for notrack) before this will hit the bigger clusters [11:53:03] 06Traffic, 10MW-on-K8s, 06serviceops, 13Patch-For-Review: XWD: Allow choosing datacentre in k8s-mwdebug - https://phabricator.wikimedia.org/T365478#9829131 (10jijiki) 05Open→03In progress p:05Triage→03Low a:03jijiki [12:22:59] 06Traffic, 10Data-Platform-SRE (2024.05.27 - 2024.06.16), 13Patch-For-Review, 10Sustainability (Incident Followup): LVS hosts: Monitor/alert when pooled nodes are outside broadcast domain - https://phabricator.wikimedia.org/T363702#9829266 (10Gehel) [14:27:56] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: Problem re-imaging hosts on row-wide vlan on EVPN switches - https://phabricator.wikimedia.org/T365204#9829797 (10cmooney) 05Open→03Resolved Change has been pushed out in codfw where we have the issue. Closing this one for now... [15:56:11] brett: for the remaining steps to decommission blubberoid, does it make more sense for me to line up a chain of operations/puppet patches to be reviewed as a whole or to patch/review/deploy for each step? [15:57:25] dduvall: I'd say do what you think would be most clear to your head/workflow as you go through it. I'll be here to review either way [15:57:59] sounds good [16:00:13] so i have https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035589 which according to the docs puts us at the second to last step. however, i still see references to blubberoid elsewhere in ops/puppet and i'm wondering how that fits into the workflow [16:02:04] https://www.irccloud.com/pastebin/z3cSsOqg/ [16:04:43] Good question! I think that the puppet stuff can be absented/removed after we're done with following the steps [16:05:37] right on. yeah, the `kubernetes::deployment_server` parts for sure. i want to be able to actually undeploy it :D [16:05:41] i'll make some patches [16:27:18] brett: alright, i think i have the patch chain lined up correctly in puppet, each patch corresponding to a step in that process and then a final ops/puppet patch to be deployed following the undeployment. looks like the only other things (not traffic related) will be to clean up deployment-charts and some other random stuff in cookbooks [16:28:14] i also see some blubberoid related config in https://gerrit.wikimedia.org/g/cloud/instance-puppet/+/5d315de6374f62679018a12ee06a6f0948d0058c/traffic/traffic-dnsbox.traffic.eqiad1.wikimedia.cloud.yaml but i'm guessing i need to talk to wmcs folks about that? [16:34:45] I'd guess so, yeah [16:41:10] * brett reviews the patches [16:44:47] dduvall: Looks good! +1ed it all [16:45:24] None of the k8s stuff in the last patch requires any sort of initial patch to e.g. "absent" anything? [16:54:20] ooh, i'm not 100% sure about that [16:54:42] i'll investigate a bit more [16:55:08] btw, wmcs sent me back here on the question of what to do about https://gerrit.wikimedia.org/g/cloud/instance-puppet/+/5d315de6374f62679018a12ee06a6f0948d0058c/traffic/traffic-dnsbox.traffic.eqiad1.wikimedia.cloud.yaml [16:56:24] that's a deprecated instance so we can just ignore it [16:56:33] brett: I think we can safely remove this file [16:56:55] that's something internal to the traffic team > removing the service might or might not break things on their VMs depending on if that hiera code is still consumed by a living VM [16:56:59] sukhe: ack [16:57:04] dduvall: +1 for removal fwiw [16:57:10] \o/ [16:57:18] or skipping it, either way and Traffic can handle it [16:57:23] kk [17:29:23] brett: thanks again for the reviews. looks like the next puppet deployment window that works for me is tuesday at 1600Z. would it work to deploy then or is there a better window for you all? [17:31:14] dduvall: If you'd prefer to do it on Tuesday we can. Happy to do it now if you want to get it over with, though [17:31:41] oh! that works for me :) [17:36:23] Which one? [17:37:03] now [17:37:37] now now as they say in spaceballs [17:38:09] hahahaha [17:38:42] :D [17:40:00] but what happened to then? [17:40:24] At least it'll be now soon [17:40:28] passed it, just now [17:40:34] SOON [18:00:22] brett: i don't have +2 so i'm awkwardly waiting on you if you're wondering :) [18:11:16] oh dammit [18:11:36] no worries! [18:12:01] I thought it was just for ds [18:12:04] dns [18:12:12] let’s wait until Tuesday then [18:12:29] I’ll put it on the deployment calendar [18:12:51] you sure? I was just about to start merging [18:12:57] No worries if you'd rather wait [18:13:18] yeah let’s wait just in case I have issues with the k8s part [18:13:38] Okay, no problem. Feel free to add me to a calendar event if you wish [18:13:41] Sorry for the delay ._. [18:13:50] also my back is tweaked and I need to move around afk a bit heheh [18:13:57] good luck :( [18:14:01] Hey no problem will do! [18:33:41] 06Traffic, 10MoveComms-Support, 10MW-on-K8s, 06serviceops, and 2 others: Move 100% of external traffic to Kubernetes (excluding Votewiki) - https://phabricator.wikimedia.org/T362323#9830829 (10Jdforrester-WMF) Note that the votewiki blocker is apparently also now fixed.