[06:01:18] goog morning folks [06:01:31] going to restart the kafka main-codfw topic rebalancing, last topics [06:25:22] <_joe_> atol: yes, it's known, it's the old Letsencrypt validation path; but you also have the new, valid validation path [06:26:02] <_joe_> and the problem is fully with travis-ci using a largely outdated version of openssl in their images [06:26:09] <_joe_> so the issue is with travis, not us [06:26:50] <_joe_> Anyone with an ubuntu 16 who is keeping it up to date should be able to connect to gerrit [06:27:52] <_joe_> (basically, if you didn't update openssl, your old ubuntu would not validate what is a perfectly valid cert chain) [06:28:59] Would sudo apt-get update && sudo apt-get upgrade -y be a fix for Travis? [06:42:57] <_joe_> RhinosF1: I *think* it should,, but I never tried [06:44:54] atol: ^ [06:58:30] _joe_ I think that in ~20 mins I should be able to do eqiad.resource-purge (the last one), I've done topics with similar size/throughput and everything went fine [06:58:45] I can alert Traffic just in case [06:59:00] <_joe_> elukey: nah it's ok [06:59:17] <_joe_> in case of need we can move all consumers to eqiad [07:00:14] purged should pick up the change correctly, will check when the move is in progress [07:01:13] <_joe_> elukey: yes, I meant if the move causes lag in consumers, we can move them to the other kafka cluster easily [07:01:54] yep yep [07:42:37] all topic moved successfully, will leave it running for a bit to observe metrics. Overall it looks good, there is not a perfect balance between the 5 brokers yet but way better than before [07:43:48] <_joe_> \o/ [09:02:15] as a follow up to spread the traffic to the 5 brokers I'd also pick the top most trafficated topics and move them to 5 partitions (3 currently) [09:02:18] https://grafana.wikimedia.org/d/000000027/kafka?viewPanel=46&orgId=1&from=now-3h&to=now&var-datasource=codfw%20prometheus%2Fops&var-kafka_cluster=main-codfw&var-cluster=kafka_main&var-kafka_broker=All&var-disk_device=All [09:02:57] it may probably suffice to do eqiad.resource-purge and mediawiki.job.cirrusElasticaWrite [09:03:30] (I could have proposed this before moving them of course but I just realized it :P) [09:03:45] (and it could be a good thing to do for main-eqiad as well [09:06:40] <_joe_> *.resource-purge [09:06:46] <_joe_> I'd do that first [09:06:50] <_joe_> (both DCs) [09:07:22] <_joe_> both DC names, I mean [09:07:52] ah yes yes sorry, both [09:10:24] <_joe_> because RN there is more traffic on the eqiad topic, but when we switch to codfw it's the opposite [09:11:33] yep yep (I moved both variants for all the topics as well) [09:12:19] need to run a little errand and then I'll propose the changes in the task [09:12:30] (so that people can review pebcaks) [10:11:31] I'm disabling puppet on all A:cp nodes to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/726912 and test it on one cp host only [10:29:28] change looks good, re-enabling puppet on A:cp [10:30:52] _joe_ proposal for topic partitions increase in https://phabricator.wikimedia.org/T288825#7419227 [10:31:00] lemme know when you have a moment if it makes sense [10:33:38] <_joe_> sure [10:35:28] <_joe_> elukey: gave my +1 [10:36:24] _joe_ thanks! I'll apply the changes after lunch [10:36:42] I'll ask kormat's moral support since yesterday it helped me a lot [10:37:21] <_joe_> lol [11:05:39] elukey: my (im)moral support is at your service 💜 [12:03:27] ahh super interesting, one would expect that increasing partitions to 5 (from 3) on 5 brokers would spread them out one for each broker, but surprise surprise one broker got two partitions [12:03:56] not sure if I missed some parameter in the kafka tool, will check [12:04:09] I am going to create a json to move the partition to the missing broker [12:07:30] mmm maybe it is only a matter of rebalancing partition leaders [12:08:28] not really, anyway it should be ok [12:08:40] will not touch it for the moment [13:20:42] found a trick to move one partition per broker for big topics, will document it on wikitech [13:23:05] narrator: elukey had discovered `rm` [13:29:03] kormat: I wish! [13:39:36] ok Kafka main-codfw is balanced (enough) in my opinion, not perfectly but it should be ok for the moment. We could work with kafka reassign-partitions more but in my opinion is not really worth it.. [14:36:37] created: [14:36:39] - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Rebalance_topic_partitions_to_new_brokers [14:36:52] - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Rebalance_topic_partitions [14:36:57] <_joe_> TLDR [14:37:03] - https://wikitech.wikimedia.org/wiki/Kafka/Administration#Rollback/Stop_topic_partitions_rebalance [14:37:19] feedback/improvements/etc.. welcome [14:39:46] EOF