[10:50:09] cc from our gdoc: Ceph-based object storage and persistent volumes are available in aux-k8s-eqiad. codfw is wip. Although running privileged:true would be a dealbreaker in aux-k8s (not possible by Kubernetes admissionPolicy) [17:18:20] Should we have a single tracking/parent ticket for everything releated to the zuul(3) project? Then link all other tasks to that? [17:18:56] Sounds like a good idea [17:18:58] We already have VM request, then access requests next.. and there will probably be more. Or we can just use a specific phab tag. [17:19:25] I edited the VM request ticket to ask for 6 instead of just 1 VM. [17:19:38] alright! [17:20:33] we are using https://phabricator.wikimedia.org/project/view/7592/ [17:20:43] having a tracking test would be good there. [17:20:54] (if we want one) [17:22:03] initial doc is here: https://docs.google.com/document/d/1Y-0nX_0n0AymZ3N-KDRIwuy6V-WXb_EhEP11O0vvAek/edit?tab=t.0 [17:22:36] ACK! :) [17:24:08] adds "in progress" column [17:34:01] goal for the doc: initial design for entire working zuulv3+ system: both ci workers and production component set up. sobanski jelto mutante work through tentative design of the admin/operations side of zuul, the vision for how we want to run these things in prod and what your needs/functional requirements are (e.g., active/passive failover). hashar dduvall and I will work on CI/product/end [17:34:03] user needs (e.g., workers, workloads). And we'll all use comments to ask questions/seek advice/bug corvus for input. How's that sound? The output should be a doc that we can refer back to answer our questions along the way as we bump into new functional requirements/friciton points. [17:40:52] told Moritz about the updated VM request for a total of 6 machines. He did not veto it but advised we should do only 2GB for the executor and 4GB for the runner. with additional comment that they can be adjusted later. [17:41:34] ticket for the new admin group needs some comment who exactly should be in it besides James [17:44:42] this is the group of people able to login to those VMs, correct? [17:45:51] yes [17:46:02] called it "zuul-devs" so far [17:46:36] you could say "all of releng" or specific names [18:03:09] offhand, I'd say same members as contint-roots, dduvall hashar or bd808 any opinions on ^ ? [18:04:45] this feels like the same stuff as contint-roots to me. [18:05:44] `members: &contint_roots_members [*releng_members, jforrester, bd808]` [18:06:42] ok! yep [18:06:46] eventually we would want to open things up to contint-admins, but we need to know what sudo tasks to allow for that [18:07:41] it feels like it's going to be "sudo docker*" or so .. but yes [18:07:50] everything inside container [18:14:32] what did we say needs to be able to talk to cloud VPS [18:58:45] mutante: Zuul executor will need to be able to communicate via ssh to workers, including cloud VPS VMs. [18:59:36] I added a "Components" section to the doc that has a "communicates with" point under each component. I added a comment asking for verification that that's the only thing that needs to communicate to workers. [19:00:50] ACK, I will make a ticket for that too so we can tag netops etc [19:00:56] The other option that was mentioned today is moving the executor outside the prod cluster, but then the executor needs to communicate with zookeeper, so either way: some communication to prod cluster needed. ssh matches how jenkins behaves currently. [19:23:50] I am going with executors in production and allowing them to talk to cloud VPS.. for now. [19:49:45] thcipriani: yep, executor is the only one that needs to talk to worker nodes; nodepool-launcher can, but it's not necessary when you're configuring static worker nodes. [19:50:04] (it's more important for nodepool-launcher to be able to reach the nodes when using cloud drivers) [20:31:52] ah, thanks for confirming, that's helpful. And, yeah, we've been using static worker nodes and likely won't change that as part of this migration. [23:56:17] ssh access into Cloud VPS space from prod shouldn't require anything special at the network level. The Cloud VPS bastions have public IPs that prod traffic would use to hop into the Neutron network space where the instances live. https://wikitech.wikimedia.org/wiki/Cross-Realm_traffic_guidelines#Case_3:_generic_network_access_prod_--%3E_cloud