[07:22:37] <dcausse>	 ryankemper: sweet!
[13:04:20] <dcausse>	 pfischer: see my last question on https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1040211 i.e. would throttling possibly result in an increased number of failed message in the error queue?
[13:05:28] <inflatador>	 <o/
[13:05:44] <dcausse>	 my understanding is that in the way we handle retry, when throttling kicks in we might possibly degrade consistency while I thought we would rather degrade latency
[13:05:46] <dcausse>	 o/
[13:11:33] <gehel>	 dcausse/dr0ptp4kt: would you be opened to a Friday meeting with Alexandra Paskulin (Documentation office hours) to chat about the WDQS graph split?
[13:12:18] <dr0ptp4kt>	 Yes
[13:12:49] <dcausse>	 gehel: what time is it, might have to leave earlier than usual tomorrow
[13:13:13] <gehel>	 4:30pm CEST
[13:14:05] <dcausse>	 gehel: thanks, should be good
[13:14:44] <gehel>	 thanks!
[13:35:39] <dcausse>	 ryankemper: could you possibly start a full run of the reload script with the updater enabled on wdqs2023 (see https://phabricator.wikimedia.org/P64016#259799)? it'll help to get a rough estimation of the time required to do a reload of the graph split
[14:13:34] <pfischer>	 dcausse: forgot to hit reply. You are right, 429s are processed in the regular retry queue but letting the HTTP client handle 429s might be an alternative
[14:27:23] <dcausse>	 pfischer: makes sense, could also perhaps relax the AsyncWaitOperator timeout so that it's less likely to fail the pipeline, my understanding is that it seems like we got things under control and don't really need that timeout protection
[14:54:07] <ebernhardson>	 \o
[14:56:15] <dcausse>	 o/
[15:05:09] <pfischer>	 o/
[15:23:17] <ebernhardson>	 random java design question, re: https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/1041789/3/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/throttling/SimpleBooleanExpression.java
[15:23:58] <ebernhardson>	 i made it do the work in the static create method, because i have some idea that you aren't supposed to do any work in the constructor. But then the constructor isn't useful on it's own so i made it private, but then from the outside, it seems like it could just be the constructor?
[15:24:37] <ebernhardson>	 s/it could/the create method/
[15:24:55] <ebernhardson>	 perhaps the code is so tiny its irrelevant :P
[15:32:05] <dcausse>	 ebernhardson: kind of make sense to keep the constructor lean, in this case could be that in the future you might want to support another syntax but still keep the same runtime logic
[15:32:51] <dcausse>	 not sure that making it private is necessary tho
[15:33:52] <dcausse>	 I think would naturally have went the same way with a static builder method
[15:34:46] <ebernhardson>	 dcausse: ok, sounds reasonable. I guess i felt odd about making it private, but on the other hand it seemed like a really weird interface to expose
[15:35:03] <dcausse>	 yes I agree
[16:09:22] * ebernhardson is tempted to try the new dont reindex if nothing changed...in theory can kick off the reindex script and it runs to completion in a few tens of minutes
[16:09:39] * ebernhardson should probably try with closed wikis on cloudelastic like before :P
[16:12:31] <ryankemper>	 dcausse: alright kicking off a data reload on the dummy dataset to test the updater/kafka timestamp logic is working properly and then I'll kick off a run with the full dump once that has completed
[16:28:35] <pfischer>	 BTW: rate-limiting worked on staging, we just went 700 rps over the limit and they where rejected. However, the fetch_failure rate spiked, so we might have to handle 429-retries differently.
[16:29:30] <ryankemper>	 Alright kicked off the full data reload ~2 mins ago. In retrospect I should have used a tee command to write the output to a filepath d.causse can read. Oh well
[16:31:35] <ryankemper>	 Oh, that failed fast. At least I can tee it properly next time :D
[16:31:54] <ryankemper>	 Failure is from the following function. It's complaining about rm having no args which means the `find` command failed to find anything:
[16:31:57] <ryankemper>	 https://www.irccloud.com/pastebin/DtcEwO08/
[16:33:08] <ryankemper>	 We can just add a `-f` flag to the rm to make it not complain about that. Unless we want it to fail if it doesn't find any .gz to cleanup but I doubt that's the behavior we want
[16:33:23] <inflatador>	 pfischer  nice, sounds like progress
[16:37:47] <ebernhardson>	 ryankemper: i might prefer passing --no-run-if-empty to xargs, but its not a big deal
[16:38:07] <ebernhardson>	 indeed the better question is should the .gz be there, and i dunno :P
[16:41:13] <ryankemper>	 okay added that flag
[16:49:42] * ebernhardson wonders where x-request-id gets injected into the wdqs requests
[16:59:55] <ebernhardson>	 find curious things while looking for it in vcl. Apparently the  text-cache injects additional ip reputation information into non-ve edit requests. 
[17:14:21] <dcausse>	 dinner
[17:15:06] <dcausse>	 ryankemper: hm this means the hdfs folder was empty...
[17:17:31] <dcausse>	 ryankemper: perhaps it needs a trailing '/' on the folder, might have some meaning for hdfs-rsync so perhaps try with --from-hdfs hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ ?
[17:18:01] <ryankemper>	 I'll try that
[17:18:07] <dcausse>	 thx!
[17:39:19] <ryankemper>	 That seems to have done it
[17:45:36] <inflatador>	 lunch, back in time for pairing
[18:09:57] * ebernhardson now realizes that due to how we configure blazegraph, the '&&' in the config property goes into an environment var and expanded in a shell script...wonder if it works
[18:49:31] <ebernhardson>	 actually, i'm not finding any reasonable way to not split args that come from an env var on spaces...but the && works fine and bash ignores it. So i guess we just don't use spaces
[19:29:36] <gehel>	 dcausse / ebernhardson: is it easy to have a quick estimate of the number of user queries on WCQS? I suspect that https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wcqs&viewPanel=18&from=now-30d&to=now is mostly about probes.
[19:30:15] <inflatador>	 We have a user-agent for the probes, so we should be able to filter them out
[19:30:49] <gehel>	 With ~ 3 req / sec, that's about 250K requests / day, which is not huge, but not nothing. If 90% of those are probes, that's a different story
[19:30:58] <gehel>	 Oh, I can probably have a look in turnilo then
[19:37:18] <ebernhardson>	 hmm, yea maybe can look at the incoming webrequests
[19:37:57] <gehel>	 https://w.wiki/AN$7 seems to tell me that we have between 1000 and 300k requests per day
[19:38:21] <gehel>	 if I'm correct and I need to multiply by 128 when looking at webrequest_sampled_128
[19:38:53] <cdanis>	 gehel: correct!
[19:39:12] <gehel>	 even a manager can get things right every now and then!
[21:58:19] <inflatador>	 pfischer I created https://gerrit.wikimedia.org/r/c/operations/alerts/+/1043198 based on your comment https://phabricator.wikimedia.org/T349772#9681519 . Feel free to review and let me know if I need to adjust