[09:40:36] pfischer: I just created T345327 and added it to java-scala-standardization [09:40:37] T345327: Create a Maven archetype so that we can easily create new Maven based projects - https://phabricator.wikimedia.org/T345327 [09:41:16] I already have a WIP patch about it: https://gerrit.wikimedia.org/r/c/wikimedia/discovery/discovery-parent-pom/+/934276 [09:47:05] pfischer: looks like we have some alerts about high lag latency on W[CD]QS (see #wikimedia-operations). Could you have a look? [09:48:18] It seems to be correlated with drop and peaks in triple ingestion rates. It looks like the pipeline stalls and then catches up. [09:52:33] there is also a "RdfStreamingUpdaterNotEnoughTaskSlots) firing: The flink session cluster rdf-streaming-updater in eqiad (k8s) does not have enough task slots" alert, which indicates that the pipeline is overloaded. But the overall rate of messages seems to be well below some of the peaks we've had, so not sure what's happening. [09:59:03] !isspull [09:59:29] !issync [09:59:29] Syncing #wikimedia-search (requested by gehel) [09:59:31] Set /cs flags #wikimedia-search pfischer +AVfiortv [09:59:33] Set /cs flags #wikimedia-search ejoseph -AVfiortv [10:00:36] pfischer: you are now "voiced" in this channel. Looks like I forgot to apply that config change, from a very long time ago. Not that you probably care much :/ [10:00:37] https://gerrit.wikimedia.org/r/c/wikimedia/irc/ircservserv-config/+/860620 [10:10:07] lunch [10:41:20] Lunch 2 [13:39:22] Trey314159 Gonna miss the meeting today, it's my wife's birthday and I'm going to pick something up for her [14:29:26] ^^ errand, back in ~30 [14:32:50] inflatador: no problem [14:46:27] \o [14:46:47] if anything voice is usually for the other people that show up here, it's often used to distinguish "members" of a room vs visitors [15:00:42] back [16:10:49] CR to re-enable alerts on wdqs1010 if anyone has a chance to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/954093 [16:24:45] gehel: working on decom'ing wdqs1005, but it's used as the bigdata ldf endpoint; ie it's present in both our cergen cert and `hieradata/common/profile/trafficserver/backend.yaml` had `target: http://query.wikidata.org/bigdata/ldf` -> `https://wdqs1005.eqiad.wmnet/bigdata/ldf` [16:27:00] gehel: so having changed the target to a new host now, I probably need to rebuild the cergen cert to reflect the new hostname right? currently the cert's alt names look like `alt_names: ["wdqs.discovery.wmnet", "wdqs.svc.eqiad.wmnet","wdqs.svc.codfw.wmnet","wdqs.wikimedia.org","wdqs1005.eqiad.wmnet","query.wikidata.org"]` [16:27:39] ryankemper may want to check in sre or sre-foundations, but I think we use cfssl now? [16:27:52] https://wikitech.wikimedia.org/wiki/PKI/Clients [16:34:38] ryankemper: yes, we need to regenerate that cert [17:14:10] inflatador: I get the feeling that that's used for different cases than https://wikitech.wikimedia.org/wiki/Cergen but unsure [17:18:07] ryankemper yeah, not sure myself. Been poking around the puppet repo git log [17:18:36] cergen is way more common there [17:41:06] inflatador, ryankemper: meeting at Augustin's school is longer than I expected. I'll probably be late for our pairing session [17:41:22] gehel ACK [17:41:26] heading to lunch, back in ~1h [18:20:06] back [19:52:26] * ebernhardson tries to find a reasonable way to source the zookeeper egress policy's without copying it all over the place [20:06:46] yeah, I'm used to something like Consul Intentions https://developer.hashicorp.com/consul/docs/connect/intentions . Maybe Istio has something like that? [20:07:35] I was thinking more like how kafka does it, puppet writes some info into /etc/helmfile-defaults/ and something in the networkpolicys package would take a list of clusters from your values and translate them into egress policies [20:07:59] the thing is that just moves the duplication, because then it's mostly copying the existing kafka bits :P [20:08:28] As long as I don't have to see 100 lines of diff when I deploy ;) [20:09:35] we do have something similar for envoy, although it perhaps does more than we need. It both opens up the network, and apears to install a local envoy instance [20:09:46] but it's not clear what we gain from querying through envoy, i guess consistent metrics across projects? [20:11:48] envoy doesn't support zk anyways afaik, so a bit moot, but we are using that to open up connections from the cirrus updater to the search clusters and the mwapi [20:33:04] hmm, wonder if i need some extra pcc magic...i have the puppet part written but pcc fails on a dns lookup for a prod host that it wont find it a general internet dns [20:33:18] but clearly the kafka bits do the same network lookups but don't fail...hmm [20:34:24] oh nevermind i'm assuming too much...it's because the hosts defined in puppet literally don't exist :P [20:36:18] inflatador: are the flink-zk*.codfw.wmnet nodes perhaps still being setup? I see them in the puppet site.pp, but i can't lookup their ip's from anywhere in prod [20:37:15] ebernhardson yeah, they're busted. I need to delete and recreate or something. Not having a lot of luck with those makevm cookbooks these days ;( [20:37:49] the eqiad hosts are up and running if that helps [20:39:06] i don't really need the actual hosts, i guess the thing my patch depends on is that all hosts in common.yaml zookeeper_clusters have to exist, i guess i need to look into if that creates some sort of order-of-operations problem for adding new things [20:39:16] kafka gets by, but they might do things slightly different [20:40:50] I'll try and re-provision the codfw vms now...it's a good end-of-the-day activity anyway [20:42:36] could also pull the new clusters out of common.yaml until the hosts exist, i expect that's how kafka does things (add to common.yaml after vm's exist) but i'm sure it depends on how all these values are used. [21:06:07] OK, 2001 and 2003 are deleted. Assuming the process works (about a 20% chance based on recent runs) they should be ready by tomorrow [21:11:26] Patch to put the codfw hosts back to insetup , hopefully preventing confusion: https://gerrit.wikimedia.org/r/c/operations/puppet/+/954134 flink-zk [21:41:37] OK, I'm out... ryankemper if you want to review/merge patch above feel free, otherwise we can look at it tomorrow