[06:19:49] Won't be at triage tomorrow. Will be around for the SRE staff meeting afterwards [07:50:28] gehel: followup task for the swift cleanup was created by Brian (T348685) [07:50:29] T348685: Track and clean up object storage used by rdf-streaming-updater - https://phabricator.wikimedia.org/T348685 [10:38:57] lunch [10:58:29] lunch [13:01:15] o/ [15:30:53] just realized...since the staging updater is using the prod kafka, that means it will produce to the prod spot [15:31:13] ebernhardson maybe relevant https://phabricator.wikimedia.org/T347515 [15:42:01] ebernhardson: can the producer consumes from kafka-main and produce to kafka-test? [15:42:47] the consumer job would read from kafka-test [15:43:05] dcausse: hmm, i'll have to check how we pass the options around, not sure we can split the kafka config currently [15:43:23] i can just give it a -staging name in the helmfile i guess [15:43:39] oh maybe not, schemas are tied to names? [15:46:10] yes topic names are tied to streams :/ [15:46:49] I think there are checks to enforce this, could use lower-level eventutilities api I guess [15:47:20] the kafka config should accept a separate source & sink options [15:47:32] hmm, i think i would prefer accepting input and output kafka configs over digging into the schema<->topic mapping [15:47:53] we have separate source and sink configs, maybe it's enough [15:48:13] yes I believe so, Peter added that recently, lemme find the MR [15:48:41] yea it looks like i can configure separate input and output clusters already [15:48:59] will do that, also kicked off a build of the latest image, so hopefully get an idea of how it's working in an hour or so [15:49:41] nice [16:27:06] * ebernhardson now realizes he documented the event-stream-http-routes parameter wrong, its host => url, not host => host [16:27:31] or at least, i think thats why it blew up with MalformedURLException: no protocol [16:28:29] * ebernhardson wishs more things would include the url that failed, in addition to the error, to have a better idea where it came from [16:45:15] meh, our config parsing doesn't allow http urls in config keys, have to rework the config param. I guess i should have had a test for that [17:00:34] it does this.customRoutes.put(new URL(sourceURL).getHost(), HttpHost.create(targetURL)) ... [17:00:42] only the source host is actually required [17:00:54] sigh... [17:01:20] I wonder why it was made like that... [17:02:06] yea that's kinda what i had expected :) probably happened to be looking at something that had a url and just used it [17:08:21] could build fake http://$host/ urls from the code but this feels quite awkyard :/ [17:17:09] i wrote up a thing to basically do what the normal routes do, ignore the key and parse out a k=v from the value [17:17:16] just running tests now, will upload in asec [17:28:08] patch for event stream http route parsing (take 2): https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/36 [17:56:13] dinner [18:18:23] Within RDF, is the concept of "predicate" identical to "property" as found on wikidata? [18:19:11] inflatador: hmm, i dont think so [18:19:25] inflatador: well, kinda [18:19:48] inflatador: all RDF triples are of the form (subject, predicate, object). A property is always a predicate, a predicate is not always a property, iiuc [18:20:25] although...maybe properties are the only valid predicates...i'm not sure :P [18:20:54] for context, I'm working on an rdf-streaming-updater presentation...good excuse for me to get a little deeper into this world [18:21:25] in other words, screwing around on wikidata ;P https://www.wikidata.org/wiki/Q2439 [18:21:30] :) [18:22:01] * ebernhardson ponders writing a little helper for helm log...i just want it to give me the logs without having to tab complete [18:22:36] s/helm log/kubectl log/ [18:23:19] * ebernhardson sighs: VerifyException: No topics matching the filter were found [18:24:09] thats the bit that is supposed to limit what we read from mediawiki events to a single datacenter, apparently something wrong... more fun :) [18:24:20] ruh roh [18:25:52] it's progress at least, solved the previous problem. Now more wack-a-mole [19:00:48] dr appointment, back in ~90m [19:02:51] * ebernhardson is failing to reproduce the issue locally, while using the prod schema repo. fun :P [19:10:58] * ebernhardson spoke too soon [19:35:03] who knows...it worked for a minute, fixed a second issue, and now the first issue with no matching topics is back ... [19:44:06] consumer might be running, at least it's not obviously failed. producer still failing :P [19:46:59] ahh, it's just bad error handling. Supplying an unknown stream and a topic filter will fail on the topic filter check for !topics.isEmpty() [19:47:26] (and we have a lovely consistency of choosing _ or - arbitrarily in names :P) [19:58:14] getting further...some sort of timeout talking to kafka-main (although it's certainly in the allowed_clusters list..sigh) [20:10:30] meh: Error: query: failed to query with labels: secrets is forbidden: User "cirrus-streaming-updater" cannot list resource "secrets" in API group "" in the namespace "cirrus-streaming-updater" [20:10:42] basically, i can't ask helm to show me the rendered and deployed values, because they have secrets in them :P [20:27:32] back [20:38:19] * ebernhardson found `kubectl get -o yaml networkpolicy` which seems to do the trick [22:03:47] * ebernhardson wonders if we can get fblog in prod somehow, it turns the mostly unreadable logs from kube_env into something reasonable (for example, stack traces are printed as expected, instead of a single 2000+ char line) [22:03:57] s/kube_env/kubectl/ [22:10:27] fblog? [22:11:12] it's a random thing i found from searching for better ways to read these logs :) You essentially pipe the logs into it, and it parses and prints them nicely: https://github.com/brocode/fblog [22:11:29] * ebernhardson feels dubious about the github user name, but meh [22:12:10] but it seems inappropriate to copy random binaries into prod, so mostly i redirect the logs into a file, then `ssh host cat foo | fblog` [22:16:00] separately...i'm starting to wonder if "TimeoutException: The AdminClient thread has exited. Call: describeTopic" is flink's way of saying the topic doesn't exist... [22:20:37] no i don't think thats it...best guess I have remaining is that the firewall for the kafka-main cluster doesn't allow k8s staging to talk to it [22:20:56] and....honestly that seems kinda sensible so I'm not sure if i want to try to justify changing it :P [22:25:48] yea thats it. the kafka-test and kafka-jumbo clusters both include $STAGING_KUBEPODS_NETWORKS, but i'm not seeing any extra stuff added to kafka-main