[07:39:03] hare: I believe there are couple steps to follow to expose a kafka topic from EventStream, the ones I know that would be needed here are: have deployed schema for this topic + an entry in the stream config, and to actually tell EventStream to expose that topic I'm not clearly sure [08:27:28] inflatador: is there anything left to do on T347505 ? [08:27:28] T347505: Prepare new WDQS hosts for graph splitting - https://phabricator.wikimedia.org/T347505 [09:17:28] Weekly update published: https://wikitech.wikimedia.org/wiki/Search_Platform/Weekly_Updates/2023-10-20 [09:42:50] lunch [13:18:01] o/ [13:57:29] inflatador: I've tagged you on T348048, T348051 and T348052. Could you check and approve the quotes? [13:58:07] I've closed pretty aggressively the tickets that have re-appeared on our board. If you think any of those should stay open, please re-open! [13:58:43] gehel ACK, just resolved T347505 [13:58:44] T347505: Prepare new WDQS hosts for graph splitting - https://phabricator.wikimedia.org/T347505 [13:58:54] thansk! [14:06:38] dcausse re: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/967229/14/helmfile.d/services/rdf-streaming-updater/values.yaml#3 does that mean we need to cut a new release, or just point the chart to a newer image version? [14:07:10] inflatador: we'll have to cut a new release [14:11:34] dcausse ACK, it will just be for the producer, right? In other words, we don't have to touch the streaming updater on the wdqs hosts themselves? [14:12:08] inflatador: right for this particular issue on the flink job was affected [14:12:13] s/on/only [14:13:49] dcausse ACK, kicked off jenkins job [14:14:04] kk [14:48:28] workout, back in ~40 [14:50:26] \o [14:55:03] o/ [15:21:39] interesting, someone did some data analysis to show that the least viewed articles on enwiki are due to being extremely unlucky wrt how Special:Random works: http://colinmorris.github.io/blog/unpopular-wiki-articles [15:22:19] (not only that, but one aspect) [15:24:01] vacation time! have fun! [15:25:21] enjoy! [15:37:41] back [15:37:51] interesting [15:48:57] just confirming, this is the job for building wdqs flink job producer? https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-release-docker/123/ [15:49:17] inflatador: yea, should be the one [15:49:48] ACK thanks [16:01:20] OK, MR for rdf streaming updater image https://gitlab.wikimedia.org/repos/search-platform/flink-rdf-streaming-updater/-/merge_requests/13 [16:07:37] lgtm [16:42:20] thanks, I guess I need to tag the commit or something? It doesn't look like it built anything [16:42:44] hmm, looking [16:43:28] gitlab-ci.yml says it publishes an image if $CI_COMMIT_TAG && $CI_COMMIT_REF_PROTECTED [16:43:53] looking at the available tags, the format seems to be flink-1.16.1-rdf-0.3.135 [16:44:21] on other repos we have a button that makes a tag, but i'm not seeing one here. Maybe you have to push a tag? [16:45:15] I think I made a tag https://gitlab.wikimedia.org/repos/search-platform/flink-rdf-streaming-updater/-/tags [16:45:38] Based it on my latest commit hash...it worked? https://gitlab.wikimedia.org/repos/search-platform/flink-rdf-streaming-updater/-/pipelines/29636 [16:46:05] yup, can see publish rnuning now in https://gitlab.wikimedia.org/repos/search-platform/flink-rdf-streaming-updater/-/pipelines [17:00:53] hasn't appeared at https://docker-registry.wikimedia.org/repos/search-platform/flink-rdf-streaming-updater/tags/ yet. Let me try pulling it [17:03:09] hmm, it says it pushed docker-registry.discovery.wmnet/repos/search-platform/flink-rdf-streaming-updater:flink-1.16.1-rdf-0.3.16 [17:04:09] but indeed i don't see it :S [17:07:19] inflatador: i think the web page might just be slow to update, can see it here: https://docker-registry.wikimedia.org/v2/repos/search-platform/flink-rdf-streaming-updater/tags/list [17:07:33] inflatador: to which i note, your tag should be 0.3.136, but it's 0.3.16 [17:08:06] * ebernhardson thinks this is why a release button is better :P [17:08:09] ebernhardson Ah! That must be it [17:52:00] hmm, curiously the flink ui now shows records sent from the sources, records received by the window and sent out of it, but nothing in the output [17:53:21] * ebernhardson wonders if this means anything: [Producer clientId=producer-1] Got error produce response with correlation id 13 on topic-partition eqiad.cirrussearch.update_pipeline.update-0, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID [17:54:10] might mean nothing, unclear. It has alot of retrys at least :) [17:54:17] LOL, was just about to say that [17:59:53] maybe it is working, we have 7 records in the output topic now. But the first event is from 12:08, the last 17:03. It's not clear to me why that particular range, if anything the producer was started days ago so i would have expected the offsets to include several days of input [17:59:57] maybe kafka-test has low retention? [18:00:16] * ebernhardson also expected more than 8 events in 5 hours :P [18:00:41] Assuming self-updating commons RDF data, outside of Wikimedia production, won't be happening any time soon, my Plan B is to set up a separate Blazegraph server that has Commons data, and to rebuild it periodically. What heap size would you recommend for a Commons-only query service? (How many triples?) [18:01:10] hare: hmm, i can try and look sec [18:01:37] hare: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wcqs [18:01:48] hare: says 6B triples in our commons cluster [18:02:14] That's like half of Wikidata [18:02:39] Or 2/5 [18:02:53] for heap, we probably never tuned it too specifically. we are running 31G heaps which is just kind of a default large-value [18:03:05] (jvm has a behaviour change at 32G heap) [18:03:27] indeed it is quite large, i'm not sure why exactly it has so many [18:04:21] 99 million files (rounding up), 6 billion triples, 60 triples per file? That sounds inflated [18:05:50] there is certainly a variety of default metadata that was added by bots, moving license, creator, etc. info into structured data from the templates and such [18:06:09] but not sure thtas enough to get to 60 triples :) [18:10:09] Is it possible that there are a small percentage of Commons files with a preposterous number of statements, like those journal articles on Wikidata with over 1000 authors? [18:10:22] i mean, anyone can edit it. Anything is possible :) [18:16:04] CR for rdf-streaming-updater bugfix is ready https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/967485 [18:26:19] it looks like the cirrus producer is generally working. Edited a page on test wiki, received records is incremented. 5 minutes later the window output is incremented and i get an output [18:26:32] {◕ ◡ ◕} [18:28:14] but now i have to figure out the consumer :P But i have an idea at least [18:29:02] Still, pretty nice milestone! [18:48:57] can't remember how to get the new artifacts from https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/rdf/ into https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/deploy/ . [18:50:40] inflatador: i believe this still uses archiva and git-fat, which means you have to run the maven release pipeline for rdf and then fetch the jar from archiva [18:51:10] this one for release: https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven-release-docker/ [18:52:12] inflatador: i think you can use the deploy-prepare.sh script in the deploy repo [18:52:21] ebernhardson Ahh...that's what I was missing. Thanks [19:16:08] OK, staging WCQS/WDQS updated [19:16:44] hmm, weird. The consumer now fails with an exception from the elasticsearch client, it got a generic wmf 404 about not sending a hostname [19:17:04] but we point the elasticsearch client directly at the relforge hostnames so .... huh [19:19:12] maybe it's picking up the http routes somehow..hmm [19:24:53] i suspect we also need to update relforge's firewall to allow wikikube staging [19:25:14] ACK, let me see about that [19:42:48] ebernhardson patch up for FW changes: https://gerrit.wikimedia.org/r/c/operations/puppet/+/967523 [19:44:50] inflatador: lgtm, thanks! [19:47:19] hmm, elasticsearch is still trying to talk to localhost:6500 which is the mediawiki api port..more debugging :) [19:47:27] * ebernhardson thought adding explicit routes would help...but i guess not [19:51:04] so far I'm not seeing the firewall changes [19:51:43] nm, they're active [19:54:13] inflatador: looks to work! my test with a python repl now gets an ssl cert problem instead of timing [19:54:15] out [19:55:14] interesting. I didn't see any traffic come thru but was only looking on 1003 [19:55:37] hmm, i did hit relforge1003.eqiad.wmnet:9243 [19:56:18] if i remove verification it seems to work and i get the banner response [19:56:45] cool...may have hit it before I started the dump then [19:57:09] sadly the consumer is still not talking to it....i need to do some testing with the http routes i suspect i just have my syntax wrong or something [19:58:18] oh hmm, this can't work anyways :P We match against HttpHost.toURI, but that doesn't include the port in the output, so even if i get the patterns matching it wont send things to the right place [20:04:50] * ebernhardson was trying to add an http route for each cluster that it would match instead of matching the .* route [20:10:13] ok the consumer is now running, but my test edit only managed to increment the counter going into page enrichment, it hasn't come back out the other side. More fun things to investigate :) [20:10:23] (i removed the .* route, which wont work in general but will work in staging) [21:39:16] * ebernhardson finally realizes for some reason i configured the consumer to talk to three relforge ports [21:39:26] it doesn't even have three :P [21:45:13] yup one step closer, now the super detect noop script seems to be failing to decode the update requests