[09:01:51] Unmeeting is starting, feel free to join meet.google.com/hvn-zxxd-xrb [09:04:05] Will join in 3’ [10:00:09] lunch [13:35:21] gehel I'm going to +2 https://gerrit.wikimedia.org/r/c/operations/puppet/+/949503 and roll it out to a single wdqs host...will let you know how it goes [13:50:05] inflatador: thanks ! It's probably gonning to fail with a missing quote or backslash somewhere. [13:50:36] Or with the wildcards not being interpreted in a systemd context [13:52:13] only one way to find out ;) [13:52:18] Make sure to disable puppet on the other hosts [13:53:48] gehel already done [13:54:09] Good! [13:57:18] working from wdqs2007 [14:10:42] everything is working so far [14:11:46] rendered unit files at https://phabricator.wikimedia.org/P50593 and https://phabricator.wikimedia.org/P50594 [14:18:32] nm, forgot that puppet doesn't automatically restart services [14:19:39] `wdqs-blazegraph.service: Failed at step CHDIR spawning /bin/bash: No such file or directory` [14:20:08] might be because bg user doesn't have a shell? Checking [14:27:20] `warning: Cannot open file /var/log/wdqs/wdqs-blazegraph_jvm_gc.wdqs-blazegraph-/run.log` [14:27:30] that's just a warning though [14:40:43] failing with stack trace https://phabricator.wikimedia.org/P50597 [14:45:03] I'm putting troubleshooting on hold for the moment while I work on the ZK VMs...if anyone has time to help with the wdqs stuff let me know [15:01:33] \o [15:52:27] workout, back in ~40 [16:32:27] back [16:39:09] back [17:44:46] Am i right in reading that helm doesn't define what happens between charts and releases, and that we just invented our own middle piece? [17:45:11] Like, helm defines charts, many releases can be made from one chart, but helm doesn't have it's own documentation on how separate releases from the same chart are defined/variables provided/etc. [17:45:24] I don't completely understand that myself [17:47:09] at least i'm not alone :) [17:52:11] I'm going to get out for awhile, but we can kick the k8s stuff around at 2 PM your time if you want [17:52:21] sure [17:56:16] i think the answer is that helmfile is 3rd party, and what we use to bridge that gap [18:19:19] inflatador: i just realized 2pm isn't great, i have to leave from 2:30-3:15 or so for school run [18:23:33] Any objection to inviting all of DPE to our data analysis learning circle? [18:23:44] ebernhardson: ∆ [18:24:25] gehel: sure, why not [18:26:47] ebernhardson no worries, we can do it tomorrow or next wk [18:27:43] makevm cookbook still failing...going to give up until v-olans gets back. We have a working cluster in eqiad, should be good enough to start doing stuff [18:30:41] * ebernhardson hopes the rest of DPE know i'm not a data analyst :P [18:43:25] back in ~1h [18:59:01] what is the significance of the k8s flink-operator only being referenced from admin_ng, what is admin_ng? [19:13:13] ok, i think whats happening is that flink-operator is not the chart we would deploy as, we are supposed to deploy using flink-app, and flink-operator is some admin level tool that we don't directly reference? [19:43:31] back [19:45:22] admin_ng is a privileged namespace; you're right that flink-operator is an admin level tool [19:45:31] * ebernhardson is slowly getting some idea of whats going on here [19:47:16] it's YAML and/or golang templates all the way down ;( [19:48:16] some things i find surprising...like there are values.yaml files for individual services that have lists of kafka servers in them. Surprised that isn't centralized like puppet [19:49:23] yeah, I was thinking that was a job for a service mesh...just set kafka.local or something [19:49:49] but otto.mata said that doesn't work for kafka, there's client-side config [19:50:10] something to do with how a broker is chosen...didn't look at it too closely [19:50:24] hmm, i suppose the client needs to be able to connect to specific instances perhaps? [19:51:16] like elasticsearch has the routing/orchestration layer that accepts queries on any node and fans them out to the appropriate places. But i suspect kafka wants the client to use the available metadata and connect to the host that has the data they want [19:51:31] but maybe it could still bootstrap from a kafka.local or something, i dunno [19:51:48] I wanna say redis is like that too, requires more heavy lifting on the client side [20:32:36] https://phabricator.wikimedia.org/T344462 had a thought about using git-lfs for the plugins repo, LMK if y'all have feedback [20:42:59] seems viable [20:55:13] cool, me too [20:55:19] err..I think so too [20:55:32] meanwhile, the DRAC for wdqs1010 might be boned [20:58:40] ryankemper wdqs1010 reimage failed, DRAC is unresponsive. Sadly, it got far enough to start the reimage so I think we're stuck. Will ask around in sre room [21:42:08] ryankemper wdqs1010 is still unreachable, but I started the data transfer on wdqs1011, so it should be good in ~90m or so