[09:02:23] morning [09:06:37] o/ [09:23:04] taavi: I'm adding a new k8s control node [09:23:07] soon [09:25:39] ack [09:53:55] I'm removing the old control node now [09:58:32] ouch, I just pushed directly to the wmcs-cookbooks.repo [09:58:58] https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/wmcs-cookbooks/+/065ba61a1cfb714925471475eaea18501906b24c%5E%21/#F0 [10:01:05] hmm I wonder how that is even possible [10:01:08] I guess is this settings that allows it [10:01:10] https://usercontent.irccloud-cdn.com/file/X99MT9eU/image.png [10:01:33] from https://gerrit.wikimedia.org/r/admin/repos/cloud/wmcs-cookbooks,access [10:02:40] yeah. I removed that, let's see if anything breaks [10:03:02] let me see if I can push again [10:03:38] correctly fails now https://www.irccloud.com/pastebin/LXXHNyui/ [10:45:23] taavi: this time I also had to do the static pod dancing by hand. Do you think it is worth automating? [10:45:45] at least writing a separate cookbook might be useful [10:46:00] ok, let me create a phab ticket [10:51:55] T358476 [10:51:56] T358476: toolforge k8s: some static pods needs manual restart - https://phabricator.wikimedia.org/T358476 [12:15:57] taavi: if I recall correctly, the cookbooks/wmcs/toolforge/k8s/kubeadm_certs_renew.py cookbook is now obsolete because there is auto renewal? [12:16:21] arturo: that cookbook is needed in case we take too long between kubernetes upgrades [12:16:30] (so, hopefully never, but useful to keep around just in case) [12:16:46] ack [12:16:58] then I will refactor the static pod restart functions [14:56:01] oooh you can now select your own color in etherpad [15:02:09] hmm I think I could change my color even in the past? has something changed? [15:06:20] I certainly had not noticed that before [15:39:26] I just wrote a badly-edited brain dump of my thoughts about toolforge + s3/swift on T358496. If anyone (looks at bd808 and taavi) has already written something about this topic please let me know and we can merge. [15:39:27] T358496: Provide per-tool access to cloud-vps object storage - https://phabricator.wikimedia.org/T358496 [15:40:05] I'm also hoping someone will jump in and respond to my "Is it possible/practical to make per-container credentials?" with "yes, and here's how" [15:40:26] now... breakfast [15:54:44] andrewbogott: did you consider the option of having a second radosgw instance where authentication is not tied to openstack? [16:00:37] I didn't but I also don't think that would be very hard. [16:01:01] and a dedicated ceph pool [16:01:05] sounds interesting [16:01:24] could they share the same ingress port? [16:01:30] yeah, I think a separate radosgw instance would imply a different pool (as far as I know) [16:01:54] arturo: we can certainly do host-based http routing with haproxy, that's not a problem [16:02:08] what would be the new fqdn ? [16:02:09] wouldn't we want it to be a different endpoint anyway? [16:02:19] Oh, I see what you mean [16:02:49] yeah, we would want it either on some subpath of object.eqiad1.wikimediacloud.org or we could invent a new subdomain [16:02:56] anyhow, that seems a relatively minor detail to me [16:03:13] or that service domain you were thinking for toolforge [16:07:15] highlight color has been user selectable on etherpad as long as I've been using it, but the UX for discovering that was maybe not great? [16:09:20] T306039 [16:09:23] T306039: Decision request - Toolforge external infrastructure domain usage - https://phabricator.wikimedia.org/T306039 [16:12:16] andrewbogott: I haven't really thought about the "how" of it, but per-tool access to storage buckets with the option of making a bucket read-only to the public seems ideal. There is probably a good argument to be made for >1 bucket per tool as well to allow both internal and user facing usage separation. [17:01:02] * arturo offline [17:07:38] * andrewbogott adds 'toolforge-specific rados server' option to that ticket [17:11:13] * dcaro off [18:03:54] fyi all, I'm going to do the designate -> cloudcontrol move tomorrow around 17:00 UTC. I don't expect it to affect any running services but might cause some unexpected alerts during the transition. [18:50:31] * bd808 lunch [18:59:02] Rook: I see that paws-prometheus-1.paws.eqiad1.wikimedia.cloud has been removed, is there a different host that I should replace it with? (Seeing this on metricsinfra-alertmanager-1) [18:59:27] It's inside the paws k8s cluster now. So no? [19:06:42] hmmm [19:06:54] does that mean we had alerting before and now we don't? Or was that vestigial anyway? [19:08:39] oh actually that's under 'profile::wmcs::metricsinfra::alertmanager::project_proxy::trusted_hosts:' so probably if it's working then we don't need that anymore... [19:08:43] * andrewbogott reads a bit more puppet code [19:18:47] rook, at your leisure: T358519 [19:18:47] T358519: paws prometheus no longer 'trusted' in metricsinfra::alertmanager - https://phabricator.wikimedia.org/T358519 [22:08:08] * bd808 walk