[10:00:23] Lunch+errand [10:17:52] lunch [13:14:17] o/ [13:19:21] I have a pairing session with Ben & Steve, should be able to work on the Flink/ZK stuff after that [13:20:55] I got a response on the Flink mailing list BTW...flink operator uses the flinkdeployment name as cluster ID. So as long as that is unique, we should be OK [13:22:21] nice! [13:56:01] dcausse: thank you for your approval, I was a bit late to the party with yet another config change. If you have a moment (this is just to be consitent in how we configure output streams/topics): https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/24 [13:56:46] sure [14:00:48] Hm, this PR is broken, I pushed a squashed commit that was intended for the http-config branch [14:01:05] Let me try to untangle that first. [14:01:35] sure no worries [14:10:15] dcausse: PR is ready now. This includes a plain text rendering of all available config options for both applications [14:10:30] nice, thanks! looking [14:33:30] ebernhardson: I started rebasing your search update pipeline PRs to work with the config updates [14:57:34] \o [14:59:02] o/ [14:59:52] we seem very close to being able to put a test instance of cirrus updater into prod, i think once we have input topic filtering that is enough to start test deployment? [15:00:21] By limited, i'm thinking input filtering to restrict the ingestion to only testwiki, and leave the consumer turned off [15:02:12] yes I think so too, we should do a quick double check that all streams & schema are properly configured in the eventstream config in meta [15:02:31] question is where do we push the updates? [15:02:38] i guess we can turn the consumer on, we can point it at relforge [15:03:14] it might not have a service definition though, i don't think it goes through lvs, but we can bolt some networkpolicy on to make it work [15:03:16] do we use the staging env for this then? (might be useful in the future) [15:03:51] hmm, seems reasonable. I'm still fitting my head around what exactly staging is for :) [15:04:10] we should make it something useful to us :) [15:04:27] indeed [15:04:54] ok i'll recheck the chart and see what i need to add to get staging talking to relforge, don't think it will be too bad [15:08:44] oh there is also the open question about staging vs prod for the flink ha directory...doesn't have to be solved before the initial deployment but might want to push directory path configuration down into the chart [15:09:24] clement seems to favor that approach as well, asking for my recent changes to the chart to aim for even more opinonated usage of object storage [15:10:01] yes seems like we should probably work on this sonner than later as it's confusing a lot of person :/ [15:10:49] for the zookeeper root path we agreed with something like "/$env/$namespace/$release" [15:11:14] ok thats easy enough from the chart side with templating, should do the same for HA storage ? [15:11:22] seems it couldn't hurt, and they should all be separate [15:12:02] I think the chart should set something by default that the app can override if it really wants to [15:12:22] sure, can do that [15:13:00] hm... sole diff is that we want a different swift per env IIRC [15:13:33] lemme check again what Gabriele has done [15:14:32] part of the difficulty i had was that we don't have a value for prod vs staging, prod has 2 values and staging has 2 values. With the way values files are layered it's hard to vary by both dimensions (but in templating thats trivial) [15:15:59] oh .Environment.Name is not available from the chart? [15:16:10] it is in the chart, but values files are untemplated [15:16:20] and it's generally not appropriate (afaict) to put values directly in helmfile.yaml [15:18:56] * ebernhardson suspect he is not very clear :P [15:18:59] Do we have a task for the cirrus updater testing? I can make one if not [15:19:19] inflatador: hmm, i don't know if there is a particular task to get the test deployment out there. [15:19:46] np, I'll get one started [15:20:29] maybe to untangle this we try pre-set a bunch of these values from the chart by concatenating things like .Environment.Name, .Release.Name, namespace (hope this one is available too) [15:21:07] yea that seems sensible to me, if we auto-construct paths out of the varied parts it should all just work out. [15:21:31] then we think about customization later, possibly allowing a couple optional placeholder settable in the values.yaml that would be used by the chart [15:23:45] OK, task up at https://phabricator.wikimedia.org/T347075 , feel free to edit as needed [15:25:16] hmm, after seeing david's patch to put topic filtering into event stream utils, feel like i was being lazy :P Didn't occur to me to update that instead [15:25:46] so they have: s3://mw-page-content-change-enrich.wikikube-codfw/flink/high-availability which is s3://$namespace.$env/flink/high-availability (env is special tho is they make "eqiad" specific with "wikikube-eqiad" [15:25:56] sigh... generalization is always hard [15:26:45] ebernhardson: no worries :) was not sure that the approach I took was OK but discussing with Gabriele we might give it a try (at least discuss it in gerrit) [15:31:01] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/959790 manual savepoint patch is ready to review [15:34:53] dcausse: moved generation of rendered config-options to maven: https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/24 [15:35:34] ebernhardson: shall I rewrite the PR to use eventutitlies topic filtering? BTW: the rebased versions are here https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests [15:35:48] Off, back in 60’ [15:39:53] pfischer: for the updated patches, you should be able to push over the top of my branches rather than creating a new mr. I don't mind, we do that in gerrit where appropriate [15:41:03] i can update the current patch to use eventutilities filtering, should be very easy, i suppose the harder part will be releasing the new version of event utilities (randomly guessing they use maven release?) [15:42:32] yes making a release should be straightforward via jenkins [15:45:29] pfischer: re "moved generation of rendered config-options to maven", I think what I find surprising is that running the build might change a source file, what I'm used to is generally a test tail fails if the source file is not what I expect and manual and explicit procedure to update this file [15:46:14] s/tail/that [15:50:28] having a maven step to update this file is very handy but the fact that it's tied to the test phase is what I'm not sure [15:50:56] there are also no guarantees that the build will fail in CI if someone forgets to update it [16:09:50] dcausse: alright, makes sense, I'll add a flag to check during test, just like spotless [16:10:24] thanks! [16:20:41] workout, back in ~40 [16:21:36] dcausse: random other thing, i put together the bits to make flink-app chart emit a broker-jumbo-eqiad.properties file (one for each cluster in kafka.allowed_clusters), but the patch is super awkward because there isn't a central place to extend the flink.app.config_files programatically. Is there another way we can pass them generically? Can go the current way but it's awkward: [16:21:37] https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/959066 [16:22:40] instead of extending config_files it has to change everything that references config_files to `or .Values.flink.app.config_files .Values.kafka.allowed_clusters` [16:26:12] * dcausse tries to understand :) [16:26:54] dcausse: we wanted to stop naming all the kafka hosts in the values files, and instead have the flink-app chart provide kafka broker information generically to the apps [16:27:10] yes [16:27:11] dcausse: https://phabricator.wikimedia.org/T346315 [16:27:50] here you provide a file named "/srv/app/conf/brokers-$cluster.properties" [16:28:03] this is exactly what we want I guess? [16:28:32] yea, it works I just feel like it's awkward that instead of extending config_files, we had to update every place that uses config files. It works but seems non-sensible :P [16:28:39] ideally the content should be something that you can pass to create a kafka consumer/producer [16:28:43] ah [16:28:45] yes [16:29:18] the chart will impose a bunch of stuff [16:29:20] so i was wondering if there was a more generic way to go about things, maybe i'm unfamiliar with something thats possible to do instead :) But we can do it this way [16:30:06] since you can't reference variables in values.yaml I'm not how to do :/ [16:30:12] *sure [16:31:17] ok, will continue as-is then [16:31:28] the use of /srv/app/conf/brokers-$cluster.properties is opt-in tho you don't have to use it, if you want to use it you have to make your app compatible with it [16:32:00] I'm definetely not an expert in all this so there might be better ways but I simply don't know them :( [16:32:05] yea it still requires application support, but i'm not sure how to make it apply without that? [16:32:54] maybe something in event stream utils that takes a properties file to augment some builder? [16:33:02] (not sure which one, have to look closer) [16:33:31] yes could make sense to have a contract between these two [16:38:12] from your patch it seems you expect an new values set from puppet (kafka_cluster_hosts)? [16:38:35] yea i didn't see hostnames in the general-*.yaml files, so i'll have to add those. Pretty easy patch [16:39:04] same for zk hostnames [16:40:12] we do this hack in the python util we use to deploy jobs: https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/deploy/+/refs/heads/master/flink/flink-job.py#429 [16:40:36] perhaps better to repeat the hosts from puppet tho [16:42:03] and also makes me wonder about the port... and if we set the port do we provide two config one with plain and one with SSL? [16:42:30] hmm, i was going to provide the port directly from puppet. Didn't realize it had different ports [16:42:49] we could force SSL tho [16:42:49] shouldn't flink at least always use the ssl one? [16:43:02] i think thats what we want in general, seems sensible [16:43:13] yes agreed [17:08:32] back [17:35:59] lunch, back in time for pairing [17:51:13] inflatador, ryankemper : I'll skip our pairing session today. I need to get moving on Q2 planning [17:57:23] gehel: inflatador: ack [18:00:02] ACK [18:02:23] dinner [18:10:17] back [18:20:01] I added https://phabricator.wikimedia.org/T326409 as a blocker for https://phabricator.wikimedia.org/T326409 ; in other words, we should upgrade to flink 1.17.1 everywhere before we start using the flink-operator [18:20:09] in production, that is [18:24:23] oh meh...environment names are specific to helmfile and not available in a chart template unless passed explicitly :P [18:25:00] If there are other tasks that block the rdf-streaming-updater with flink operator, please add to https://phabricator.wikimedia.org/T326409 [18:32:22] inflatador: 5' late to pairing [18:34:17] ryankemper np [18:39:55] inflatador: T346719 is about the search update pipeline to flink 1.17.1, the flink-k8s-operator is (I believe) able to manage jobs written for 1.16.x and 1.17.x so I believe it's fine to keep the wdqs job running with 1.16.x [18:39:56] T346719: [Search Update Pipeline] Upgrade to flink 1.17.1 - https://phabricator.wikimedia.org/T346719 [18:40:55] dcausse OK, so you don't think we'll need the 1.17.1 upgrade to go to production with rdf-streaming-updater? [18:42:02] no I did not plan to do this upgrade [18:43:03] just filed a ticket about that for the new search update pipeline, as it's always better to this kind of upgrade before going live, but overall this should not affect the work around the flink-k8s-operator [18:43:23] the operator claims "Multiple Flink version support: v1.13, v1.14, v1.15, v1.16, v1.17" so we should be good [20:01:48] understood, will remove this as a blocker for rdf-streaming-updater then [20:06:03] first patch for Elasticsearch jbod if anyone wants to take look. Note that cloudelastic will be the only jbod host in the short term https://gerrit.wikimedia.org/r/c/operations/puppet/+/959854 [20:06:16] err...cloudelastic hosts will be the only hosts using JBOD, that is [20:09:54] inflatador: not clear why profile::elasticsearch::is_jbod is set to true in cloudelastic.yaml, but false in all the per-host files [20:11:20] ebernhardson we're bringing on 6 new jbod hosts that will replace the current hosts, so we thought making it the default for cloudelastic going fwd made sense [20:11:33] ahh, ok i didn't realize there were new hosts coming in. that makes sense [20:12:17] I'm guessing the bit that actually mounts the disks in the right place is somewhere else with partmon or whatever? [20:12:42] Y, there's a lot of config left to write. Partman I haven't used, but I've done the Red Hat version (kickstart) plenty of times [20:13:00] inflatador: i guess my other surprise, i thought jbod would be managed at the OS level and there would still be a single directory [20:13:44] inflatador: elastic does support storing data to multiple directorie, but a single shard will always live in a single directory and it wont do any kind of balancing, which seems like could lead to disk space problems [20:14:28] or maybe i'm just misunderstanding the bit that creates numbered directories for the jbod [20:14:30] we could do it either way [20:15:16] I think we would have less potential error cases if the os manages balancing between disks and presents elasticsearch a single directory [20:15:19] No, you're right...the typical JBOD setup is a filesystem per disk, so let's say 3x1 TB disks (will be 3x2TB in this) [20:16:01] each disk mounted at a separate dir. But we could also combine all disks into a single RAID-0 or LVM VG [20:16:44] using separate disks would allow us to lose a single disk without losing all data on the host, which is why I'd rather do it that way [20:17:34] particularly for cloudelastic. Other hosts that don't need as much storage space, it matters less [20:17:56] hmm, i think our general plan was if a host has hardware problems we shut down the host and remove from the cluster, let the other nodes handle it [20:17:56] Plus they have larger clusters and can withstand the loss of a single host more easily [20:18:20] I could go either way [20:18:21] Elasticsearch seems to consider multiple data directories enough of a mis-feature that they removed it in 7.13. So we can still use it, but it's not recommended [20:18:30] see https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html [20:19:19] Ah OK, then that's not something we probably want to start using [20:20:10] looks like opensearch doesn't have any plans to remove it, but maybe they just don't particularly care about that use case :) [20:21:26] I still think it might be better on Cloudelastic since we have a smaller cluster and higher disk space needs [20:21:43] But I'd defer to you on that one. I agree it doesn't make much sense for the normal clusters [20:25:45] How big are our largest individual shards, I wonder [20:25:46] could see what everyone else thinks, but i think i would still lean towards os-managed [20:25:49] 50G [20:26:13] but the thing is it wont balance or doing anything smart about the data paths, it just chooses one and puts the shard there [20:26:37] and do we have an anti-affinity settings for those shards in Cloudelastic? [20:26:44] for the large shards, that is [20:27:00] indices maybe? [20:27:19] I think we have some restrictions on how many enwiki_content shards are allowed to live on a particular host? [20:27:35] just the general ones, could check but it probably sets number of shards per node on some [20:28:05] nope, we set it to -1 [20:28:38] so that restriction doesn't exist in cloudelastic? [20:28:41] right [20:28:48] it just fits things however they fit [20:29:08] OK, I see what you're saying re: disk space troubles [20:32:02] Let's say we did fill up one disk on one JBOD host, I guess we'd have to force a reallocation? [20:32:56] found the bit where elasticsearch decides where a shard goes (ShardPath.selectNewPathForShard). It's not totally arbitrary where things go, they choose the one with the most free space currently. But it doesn't take into account anything thats in-progress (like shard moves) and is only point-in-time [20:33:30] if one disk failed on a JBOD we would probably depool and stop elasticsearch, once in hardware is ready re-join the cluster [20:33:56] after deleting the data directory, maybe just a fresh os install (or would be if that was easy, but never seems to be easy for us:P) [20:34:46] maybe deleting the data directory is unnecessary these days, elasticsearch has sequence numbers so it knows the data is old and useless and will delete it itself [20:34:47] Funnily enough, installing baremetal works better than VMs ;) [20:35:23] perhaps i'm being optimistic, but i suspect cloudelastic would be fine at 5. It "worked" at 4 instances it just complained all the time [20:36:19] In the JBOD case, I was thinking Elastic would be smart enough to stop using its broken data path and still continue to work. I believe the disks are hot-swappable (need to verify that with DC Ops) so we could theoretically keep using that host without downtime [20:37:15] hmm, i'm not sure what elastic would do if it lost one of the data directories really. I suppose the instance would have to be restarted and hopefully it's smart enough. Would probably be best to test with some docker volumes if we want to be sure [20:37:38] But I'm also fine with losing data from a host and reimaging, just a little wary in the particular case of cloudelastic where don't have that many hosts [20:37:47] ya understandable [20:43:12] I minus 1'd the patch for now, we can talk this over with g-ehel, d-causse, r-yankemper etc [20:43:27] I do have confirmation that the disks are hot-swappable [21:01:18] Updated https://phabricator.wikimedia.org/T231010 with some details on our disk choices, feel free to add/change anything [21:24:10] Another thought: maybe we want the OS to be on RAID-0 too? I doubt anything good will happen if we have corrupt Elastic data and a working OS [21:25:10] at least, if we go with the single data.path plan (and it sounds like we should)