[00:33:53] Trey314159 Thanks! I'll follow up in the morning. [06:27:02] justinl: hi, I believe this was resolved in T124196 [06:27:03] T124196: Fatal "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB) - https://phabricator.wikimedia.org/T124196 [06:34:32] ryankemper: noted. I will try to see what can be done there [06:34:35] Thanks! [06:40:57] Amir1: thanks for working on this! [06:41:19] oh we are sorry we didn't get it done sooner :( [06:41:22] It is horrible [06:41:24] :D [06:41:30] :) [06:44:06] dcausse: Hi, as I see from your comment on previous Josephs commit, I am supposed to look into `UrisSchemeFactory.WIKIDATA` for WD prefixes. This doesn't seem to have all the prefixes, couldnt find anything from code searching for 'mwapi' for example. Any pointers? [06:45:32] tanny411: hmm... mwapi is declared as a service, lemme check if we can access this without depending on the whole blazegraph stack [06:48:58] ah "mwapi" was added as a shortcut to "wikibase:mwapi" [06:49:55] there all in dist/src/script/prefixes.conf [06:50:45] not sure how to source that from the rdf-spark-tools ... [06:51:12] humm..thanks! looking at it [06:51:24] tanny411: I would suggest to copy/paste the content of dist/src/script/prefixes.conf somewhere in your repo for now [06:51:39] okay [06:55:00] dcausse: humm.. the conf file seems to have only a couple prefixes. [06:55:00] Should I accumulate all required prefixes into the list we already have in QueryInfo? or look for ways to get all prefixes from blazegraph? [06:57:41] tanny411: I think one way is to concat org.wikidata.query.rdf.common.uri.UrisScheme#prefixes with the content of dist/src/script/prefixes.conf [06:57:58] UrisScheme can be obtained from UrisSchemeFactory.WIKIDATA [06:58:32] so UrisSchemeFactory.WIKIDATA.prefixes() + dist/src/script/prefixes.conf should give most of the prefixes if I'm not mistaken [06:58:39] ahh, was looking at that. ok, let me check if that covers all prefixes [08:12:48] dcausse Thanks! Hopefully I can move back from Redis to MySQL and have one less moving part in my wiki system, especially one that's easier for me to manage (dealing with stuck jobs in Redis for a non-MW expert is worrisome). [08:43:40] dcausse So UrisScheme is in 'common' sub-module. Wondering if rdf-spark-tools will be able to access it in runtime. Since we provide the latter jar only. [08:43:40] Plus, I'd like to use the other uris defined in common/uri like OWL, RDF. Do I have to each prefix individually or is there a way to get all these prefixes listed in one go? [08:44:11] have to declare* [08:46:46] dcausse: ^ [09:49:10] tanny411: "common" should be packaged with rdf-spark-tools, as long as the code does compile it will work (we create a "fat" jar with all dependencies merged in it) [09:50:39] ok [09:51:26] it might be that blazegraph declares prefixes on its own (rdf, owl) if jena does not do so I'd suggest adding them to the ones you copied from the conf file (prefixes.conf) [09:51:58] to get the list I think we can ask blazegraph, lemme see [10:04:03] dcausse: are there things that would benefit from me picking them up while you're away? [10:05:33] zpapierski: I think that would be mostly monitoring that the updater works well in k8s@eqiad and possibly test failure scenarios [10:05:58] we could take some time this afternoon so that I show you how it's deployed there? [10:06:09] I was just going to propose that, sure [10:12:12] tanny411: it's a bit of a mess and I'm not how to compile a comprehensice list :( [10:12:21] found this: https://gerrit.wikimedia.org/r/plugins/gitiles/wikidata/query/blazegraph/+/refs/heads/master/bigdata-war-html/src/main/webapp/html/js/workbench.js#85 [10:12:32] that you could add [10:12:56] Indeed, I did come across this. But i'd have to copy and add them, right? [10:14:20] yes I think so? even if I'm not 100% sure they're all usefull [10:15:32] Okay, great! [10:19:55] zpapierski: sent an invite, feel to move it [10:20:56] can it be about 3PM? [10:21:43] I'll have to drop at 3:30 [10:22:07] made it shorter, hopefully this is enough [10:26:26] let's do it at the original time then, I'll make my meal faster (it [10:26:31] 's not really an issue) [10:27:13] ok [10:28:54] lunch [12:01:28] gehel, dcausse : I might be wrong but haven't we just requested additional resources matching WDQS updater (T280485)? [12:01:29] T280485: Additional capacity on the k8s Flink cluster for WCQS updater - https://phabricator.wikimedia.org/T280485 [12:05:02] zpapierski: not sure what you mean? I think the goal is to evaluate our needs and figure out if we actually need more resources [12:05:28] I'm just trying to remember if we already did something about it [12:05:35] but probably not [12:08:13] practically speaking it means doubling the resources allocated to the task managers of the session cluster for wikidata [12:08:55] if we assume similar throughput, I'm a bit tempted to do that, but I'll verify the throughput [12:09:03] an actual one, for sdoc [12:10:00] 6 replicas instead of 3 (+12 cores, +6G) [12:10:12] huh [12:10:14] that's not much [12:10:39] I'm not sure we'll gain anything from doing a more in depth analysis [12:10:53] throughput is not really the problem here, state size is what matters [12:12:31] tbf it works with a single taskmanager on staging (but with an empty state) [12:13:05] I assumed it's both (one for parallelism, other for available memory) [12:13:23] but from what you're saying, paralellism isn't really an issue [12:13:52] makes sense - our tests showed we can easily handle throughput well above what we have right now in wikidata [12:14:26] yes, parallelism here is mainly to split the state accross machines not to increase throughput [12:30:20] dcausse: give me 5min [13:02:28] dcausse: did the prefix addition. Before sending a patch, wanted to check if we get anymore prefix errors, but spark in jupyter hub is acting up today. [13:04:13] Who can help review and deploy this next week? Or should I wait [13:08:12] tanny411: zpapierski can review and ebernhardson can help with deploying to airflow [13:15:39] ah, nvm. spark worked! [13:15:39] dcausse: Thanks! [13:30:06] break [13:30:07] away [13:30:17] that required a / [14:23:33] I just found out late yesterday that Monday is a holiday for both US and non-US staff. [14:24:33] Should we move triage to Tuesday? [14:47:52] Wait, what? [14:49:06] It was in the recent digest, huh [15:07:02] \o [15:15:28] o/ [16:28:40] zpapierski: I've restarted the updater for yarn to keep query-preview up to date [16:29:07] why did it require a restart? [16:29:55] because I don't want to start 2 on the same output that would not work [16:30:12] the idea was to stop using yarn and start using k8s [16:30:17] ah, ok [16:30:55] I could still follow through with that [16:31:20] to redo the same the procedure is (from stat1004:/home/dcausse/flink-1.12.1-wdqs) : identify the job id with: sudo -u analytics-search kerberos-run-command analytics-search sh -c 'HADOOP_CLASSPATH="`hadoop classpath`" ./bin/flink list' [16:32:02] then stop and save: sudo -u analytics-search kerberos-run-command analytics-search sh -c 'HADOOP_CLASSPATH="`hadoop classpath`" ./bin/flink stop $JOB_ID -p swift://rdf-streaming-updater-eqiad.thanos-swift/wikidata/savepoints' [16:32:28] and use the save point to start the one in k8s [16:33:03] on that note I going offline [16:56:02] hmm, rememinds me that i really need an alias for `sudo -u analytics-search kerberos-run-command analytics-search` ... easy to add but naming is hard :P