[06:51:48] good morning :) [08:52:32] Good morning! :) [08:54:34] o/ [10:29:27] Morning! [11:37:56] * elukey lunch! [14:46:35] elukey: So I've been rummaging in puppet. As for the role files, we currently have hieradata/role/eqiad/ml_k8s/ and hieradata/role/codfw/ml_k8s/ Am I I right in thinking that we'd then also have hieradata/role/codfw/ml_staging_k8s/ (or similar)? Or separate the two codfw setups differently? [14:49:40] yes yes it makes sense, in the staging use case we could also use hieradata/role/common, since there is only one cluster [14:56:41] Hmm. I wonder which of the choices is going to bite me harder in 1y :) [15:00:04] if we'll ever have another staging cluster we'll likely have to split options to eqiad/codfw and keep the common ones, so hiera will need to be reviewed and changed. I don't think it will bite [15:00:30] Yeah, going with the specific setup now, at least that I think will be less confusing [15:02:16] you can already use common + codfw and add options that are specific to codfw only in the related yaml [15:05:19] (that is more future proof and correct in my opinion) [15:08:52] I don't think there's much. At least for the masters, we only have host lists and IPs/Ranges. Both of which will be all different [15:09:04] Same for workers. [15:09:25] The only commonality is cr1/2 as BGP peers [15:11:22] https://phabricator.wikimedia.org/P22433 What I have for just etcd [15:11:31] the current prod setup has a lot of common settings in the "common" hiera, and just the BGP settings etc.. in the per-dc specific ones [15:11:46] If we follow the same for staging it will be more consistent with the rest [15:11:48] this is my point [15:12:07] SO I accidentally did the right thing! \o/ [15:12:42] I am not following anymore, I'll check the code review [15:12:50] :D [15:14:31] Dangit, now I think I get what you mean [15:15:27] Hrm. Not sure I know how to factor this out [15:16:41] https://gerrit.wikimedia.org/r/c/operations/puppet/+/770522 is clearly missing changes to modules/role/manifests/etcd/v3/ml_etcd.pp [15:17:02] But I have no clue how to make the `staging` part of the name factor into that [15:19:27] what is the goal of the change? [15:19:39] Just set up etcd on the three machines [15:19:47] I see the ml-staging-etcd nodes to get a role in site.pp, but there hiera config seems to be for the k8s master [15:20:09] The etcds live in master.yaml as well [15:20:45] e.g. ml-etcd2001.codfw.wmnet is configured in hieradata/role/codfw/ml_k8s/master.yaml [15:21:36] sure but that is the hiera config that is picked up by the k8s master roles [15:21:59] wait, where are those etcd's then configured? [15:22:00] I'd add it when we'll set up the masters not now [15:22:22] so they are getting role::etcd::v3:ml_etcd [15:22:36] which is modules/role/manifests/etcd/v3/ml_etcd.pp [15:22:43] so there will be hiera configs in hieradata/role/common/etcd/v3/ml_etcd.yaml [15:23:08] Eryes. [15:23:24] that atm seems only to configure the prod clusters [15:23:45] Yes, that was my point about being unsure how to make that also configure staging clusters. [15:23:57] so we could add a staging role/hiera config (see the kubernetes dir where ml_etcd.yaml is defined) [15:24:16] Would we add another profile var like "type" that can be "staging" or "", and use that in cluster_name? [15:25:05] that role includes profile::etcd::v3 that wants a specific cluster name [15:25:15] so not sure how much flexibility we have [15:25:38] creating another role/hiera-config combination may be the quickest, like serviceops does [15:25:45] The other option would be an add'l file in hieradata/role/common/etcd/v3/ ? [15:26:59] yes it is what I am saying, something like hieradata/role/common/etcd/v3/staging/ml_etcd.yaml, and related role file.pp [15:27:23] so the new role will be basically identical to the ml_etcd.pp one [15:27:25] Yeah, that's probably easiest [15:28:06] you can checj role::etcd::v3::kubernetes::staging vs role::etcd::v3::kubernetes [15:28:14] it is basically the same use case [15:28:21] I will do some hacking [15:28:34] (in the non-security sense :)) [15:33:19] So I remember correctly that we likely want only one etcd host configured at first to avoid deadlocking? Did we do that by makign extra git changes or by just disabling puppet on two nodes before the merge? [15:35:49] I recall something similar, the info are in https://wikitech.wikimedia.org/wiki/Etcd#Bootstrapping_an_etcd_cluster [15:36:35] ack [15:36:44] `modules/role/manifests/etcd/v3/staging/ml_etcd.pp:3 ERROR role::etcd::v3::ml_staging_etcd not in autoload module layout (autoloader_layout)` [15:36:58] ^^^ does this mean I have to either the fix the name of the role or the path? [15:37:25] (i.e. role::etcd::v3::staging::ml_etcd) [15:41:46] yep the :: namespace in puppet needs to follow the directories layout [15:42:47] (if you want to run the ci locally there is ./utils/run_ci_locally.sh in the puppet repo, it only needs docker) [15:43:00] ack [15:43:42] I dunno if I should do this stuff more often so I get better working memory, or less often, so I am not puzzled as often. [15:45:43] I'm back! Morning all [15:46:34] Hey Chris [15:46:52] elukey: and of course the DNS SRV records should be in before we proceed. [15:46:56] Hey! [15:47:15] Also heads up I won't be in the team meeting today, I have to be in another meeting that I can't skip [15:47:17] klausman: yep this is my understanding [15:47:29] Who usually reviews those changes? [15:47:48] chrisalbon: morning! Now we know that you prefer other people for meetings :D [15:48:18] lol nnoooooo [15:48:23] klausman: I can review them, and we can ask to serviceops to check as well [15:48:28] Literally I was told the entire meeting can't happen if I miss it [15:48:28] sgtm [15:49:11] Ah,s so it's not that you like others more, you're just too important for us? [15:51:18] (I kid, of course) [16:02:08] elukey: https://gerrit.wikimedia.org/r/c/operations/dns/+/770529 when you have a moment <3 [16:06:46] LGTM [16:06:52] merci vielmal [16:17:19] the knative revision auto-prune setting seems to work [16:17:35] I have done a lot of changes to an isvc and only the last revision + other 2 are kept [16:17:48] Nice. what is the "steady state" for IPs now? [16:18:59] what do you mean? How many left?? [16:20:39] IIRC, you mentioned that some of the IP exhaustion was caused by old stuff still running [16:20:57] So I wondered how much of a dent (if any) was made by the new autor-expire [16:21:09] ah not much sadly, there were already few revisions :( [16:21:21] so we didn't clean up [16:32:06] Oh well, woulda been nice, but can't have everything. (where would you put it, anyway?) [16:50:59] elukey: I think I now got everything for the etcd stuff (except the main change is not merged yet, and I still need to do the puppet-disable-dance to avoid the deadlock. I will leave that for after the meeting. [16:51:28] klausman: ack, have you run pcc to check the diff for the puppet change? [16:51:52] ah, good point [16:57:58] Well, it failed. With no specific error message :D [16:58:06] lol cool cool [16:59:58] link? [17:00:37] https://puppet-compiler.wmflabs.org/pcc-worker1001/34269/ml-staging-etcd2001.codfw.wmnet/change.ml-staging-etcd2001.codfw.wmnet.err [17:00:45] Cluster ml_staging_etcd not defined in wikimedia_clusters [17:00:54] whoopsiedoodle [17:20:59] 10Machine-Learning-Team, 10artificial-intelligence, 10Bad-Words-Detection-System, 10revscoring, 10Hindi-Sites: Add language support for Hindi - https://phabricator.wikimedia.org/T173122 (10Halfak) 05Open→03Resolved a:03Halfak Looks like this is done as part of {T252581} [17:37:01] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Halfak) Done! [17:38:22] Welp, now it's green but also lists no changes :) [17:38:47] (https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/34273/) [17:43:31] https://puppet-compiler.wmflabs.org/pcc-worker1002/34273/ [17:43:40] the link for --^ is in "Console" [17:45:58] (looks good from a first check) [17:46:17] Going afk for today folks! have a nice day/evening :) [17:51:27] \o [17:55:58] 10Machine-Learning-Team, 10artificial-intelligence, 10Wikilabels, 10articlequality-modeling: Build article quality model for Dutch Wikipedia - https://phabricator.wikimedia.org/T223782 (10Ciell) Thanks! [18:50:50] 10Lift-Wing, 10artificial-intelligence, 10editquality-modeling, 10Epic, 10Machine-Learning-Team (Active Tasks): Migrate editquality models - https://phabricator.wikimedia.org/T301409 (10ACraze)