[07:25:17] <gehel>	 dcausse: looks like ryankemper deployed a new WDQS version this morning (~6:15am CEST) which killed the data import
[07:25:25] <gehel>	 I'll have a look to restart it manually
[07:25:43] <dcausse>	 :/
[07:25:48] <dcausse>	 I can have a look too
[07:29:27] <dcausse>	 actually I think you have the information about the chunk file being processed
[07:30:29] <gehel>	 yep, I do have the in fo in the logs
[07:34:08] <gehel>	 dcausse: quick check before I do something stupid, the command line to restart the import on wdqs2008:
[07:34:10] <gehel>	 `/srv/deployment/wdqs/wdqs/loadData.sh -s 638 -n wdq -d /srv/wdqs/munged/`
[07:34:53] <gehel>	 we should make that import script more robust to having blazegraph restarted
[07:35:12] <gehel>	 having a script that runs for > 1 week and not able to retry on error is an issue !
[07:36:12] <dcausse>	 it should... but apparently it did not...
[07:36:40] <dcausse>	 looking at the command
[07:38:31] <dcausse>	 gehel: sounds good to me (beware that the chunk file might not be the same on the two machines)
[07:38:42] <dcausse>	 lexemes will have to be imported after
[07:39:44] <dcausse>	 let's sync-up before someone touch /srv/wdqs/data_loaded anyways
[07:39:55] <gehel>	 sure!
[07:41:45] <gehel>	 data load restarted on wdqs1009 and wdqs2008
[07:42:29] <gehel>	 we've done just over half :(
[07:43:52] <dcausse>	 sigh...
[07:53:04] <dcausse>	 mpham: we might have to send a quick com to say that the data-reload is still not done and say Oct 18 instead of Oct 11
[07:57:11] <zpapierski>	 can we do something about the script, since we just restarted anyway?
[07:59:53] <dcausse>	 we should certainly do someting, I thought I added retry support via curl with "--retry 10 --retry-delay 30 --retry-connrefused" but that was not sufficient apparently
[08:00:17] <dcausse>	 I mean I'm not clear what the fix should be
[08:00:54] <dcausse>	 there's a "set -e" so perhaps curl is returning non-zero when it retries causing the whole script to abandon? 
[08:08:44] <zpapierski>	 dcausse: how is the categories endpoint doing?
[08:09:09] <dcausse>	 zpapierski: I loaded mediawiki (small) into dgraph
[08:09:23] <dcausse>	 and now trying to load the big one (commons)
[08:09:58] <dcausse>	 but the tooling I have is not robust enough... the python rdflib I used loads everything into mem
[08:10:26] <zpapierski>	 is there any code for querying?
[08:10:46] <dcausse>	 so I have to change my plans: rdf -> json -> dgraph and do rdf -> dgraph (using lowlever rdflib parsers)
[08:11:01] <dcausse>	 I have a couple queries but nothing usable
[08:11:22] <dcausse>	 I can push what I have so that you can play with the dgraph dql
[08:11:29] <zpapierski>	 happy to!
[08:11:33] <dcausse>	 using mediawiki data
[08:11:37] <dcausse>	 pushing
[08:14:14] <dcausse>	 sigh I need to cleanup my import script a bit it does not even compile
[08:20:12] <dcausse>	 this script is a battlefield pushing anyways so that you can start with setting up dgraph
[08:20:19] <zpapierski>	 thx!
[08:22:10] <dcausse>	 zpapierski: https://github.com/nomoa/category-graph/tree/main/dgrah-backend
[10:19:12] <hashar>	 dcausse: with cycle detections? ;]
[10:21:39] <dcausse>	 hashar: you mean cycles in the category "tree"? yes that's why I want a graph backend, I'm not going to write that on my own :P
[10:21:50] <hashar>	 ah
[10:22:18] <hashar>	 back in the old days I retrieved the categories and tries to dedupe the tree or unbreak cycles
[10:23:30] <hashar>	 something like an ISP being in the ISP and Internet categories while the ISP category is already in the internet one
[10:23:48] <hashar>	 so instead of having Internet > ISP > ISP (article)   and  Internet > ISP (article)
[10:23:58] <hashar>	 I would drop the internet category since it is already a parent of the ISP one
[10:24:54] <hashar>	 I guess that was my crazy attempt at having a well organized tree much like https://en.wikipedia.org/wiki/List_of_Dewey_Decimal_classes  ( books with a number 5xx are in the sciences category)
[10:26:28] <hashar>	 that was for $wgUseCategoryBrowser to add some breadcumb to the article based on the categories tree https://www.mediawiki.org/wiki/Manual:%24wgUseCategoryBrowser
[10:26:45] <hashar>	 bunch of legacy old stuff at https://phabricator.wikimedia.org/T35614
[10:27:03] <dcausse>	 hierarchy becomes rapidly a mess for organizing knowledge, semi-related here's an article shared by Mike couple days ago: https://www.theverge.com/22684730/students-file-folder-directory-structure-education-gen-z
[10:28:27] <hashar>	 I guess cause we use the categories both for hierarchical organization and as keywords
[10:28:53] <dcausse>	 true, but I doubt that hierarchical organization is something that can scale tbh
[10:29:58] <hashar>	 I love the categories of https://en.wikipedia.org/wiki/Barack_Obama  which are oddly very specific, lot of them being "American x y z"
[10:30:34] <hashar>	 or simingly redundant such as "Nobel Peace Prize laureates" and "American Nobel laureates" :]
[10:30:55] <hashar>	 anyway, can't way to see what users will end up being to create if we provide a graph of categories
[10:31:39] <dcausse>	 we already provide that somehow, it's hidden behind the deepcat:CategoryXYZ search keywords
[10:31:57] <dcausse>	 and we have a sparql endpoint for that
[10:32:23] <dcausse>	 it has not been used that much perhaps because it's "sparql"
[10:32:49] <dcausse>	 this hackathon project is mainly to experiment with something else than blazegraph
[10:35:41] <hashar>	 so it is all about giving it some exposure isn't it?
[10:36:53] <dcausse>	 could be yes but mainly it's reduce our tech dependencies on blazegraph (that we want to remove from the infra)
[10:39:56] <hashar>	 I have seen the rfp yup
[10:40:09] <hashar>	 thx for the article about file folders and generation z. That is a good one
[10:40:20] <hashar>	 I should migrate my Documents folder to elasticsearch
[10:40:28] <dcausse>	 :)
[10:42:16] <dcausse>	 RDF & python are not really friends... had to use a painfull mix of lightrdf + rdflib to parse big files without loading them completely into mem... 
[10:45:07] <dcausse>	 lunch
[10:53:49] <zpapierski>	 googling for anything dgraph related is a pain in the rear - I guess D being close to G makes a "dgraph" a common typo of "graph"...
[11:16:15] <hashar>	 do you have any additional public details regarding the graph consultant offer at https://boards.greenhouse.io/wikimedia/jobs/3546920 ?   I have two contacts that might be interested or could at least pass it around to their contacts ;)
[11:16:39] <hashar>	 both french speaking so maybe I can look them up with gehel and dcausse for some chat? ;)
[11:20:00] <gehel>	 hashar: always happy to chat !
[11:20:22] <gehel>	 We're looking for someone who can help define the strategy to move away from Blazegraph
[11:20:47] <gehel>	 So someone who has real life experience with managing big RDF stores.
[11:21:44] <gehel>	 hashar: I'm talking with Denny next Tuesday, he might have a few contacts as well. I can forward you the invite
[11:21:57] <gehel>	 But it's a late meeting
[11:36:19] <zpapierski>	 break
[11:51:47] <hashar>	 gehel: mind if i loop you in an email with my two contacts and we find a way to have a chat all together? I can do the introductions ;)
[11:52:23] <gehel>	 Sure !
[11:52:39] <hashar>	 je mail tout le monde david en plus ;)
[12:04:32] <hashar>	 fait
[13:11:59] <zpapierski>	 you're using their secret language!
[13:59:13] <zpapierski>	 huh, the simplest java call to dgraph resulted in exception, good start :)
[14:02:41] <zpapierski>	 ah, wrong port
[14:40:02] <dcausse>	 hashar: this is just a depth of 3 from Category:Trees on commons: https://observablehq.com/d/42741d80feadc852 :) 
[14:40:32] <hashar>	 Sorry, we couldn’t find that page.  :-(
[14:40:42] <zpapierski>	 I couldn't find it either
[14:40:49] <hashar>	 might require an account
[14:41:25] <hashar>	 tatou k c!
[14:41:50] <dcausse>	 :(
[14:42:01] <zpapierski>	 account doesn't seem to help
[14:42:06] <hashar>	 ditto
[14:42:37] <dcausse>	 and with this: https://observablehq.com/@505d1962eaebae6d/radial-tidy-tree
[14:42:39] <dcausse>	 ?
[14:42:56] <hashar>	 that one works
[14:43:03] <hashar>	 beautiful flower
[14:43:22] <zpapierski>	 whoa
[14:43:36] <zpapierski>	 add colour and I could hang it on a wall
[14:44:09] <dcausse>	 :P
[14:44:21] <hashar>	 [[Category:The tree book - A popular guide to a knowledge of the trees of North America and to their uses and cultivation (1920)]]
[14:44:33] <hashar>	 [[Category:SVG Tree]
[14:44:37] <hashar>	 that is a nice rendering really
[14:45:37] <dcausse>	 yes d3.js renders super nice svgs
[14:45:41] <dcausse>	 errand
[14:46:12] <hashar>	 i definitely need to learn d3
[14:46:19] <zpapierski>	 it also spawned a whole lot of js libraries for people unable to learn d3
[14:46:27] <zpapierski>	 seriously, it should be a academic course
[14:47:01] <zpapierski>	 I knew how to use it once, but it's so easy to forget if you don't use it often...
[14:48:08] <hashar>	 at least observablehq.com abstract out a bunch fo the complexity and you can start coding right away
[14:48:30] <hashar>	 anyway, week-end time. Thank you for the flower dcausse!
[15:54:02] <dcausse>	 going offline
[19:40:26] <mpham>	 Are we tracking WDQS/WCQS update lag time right now anywhere? One of our KRs is getting it below 10m. I'm looking at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&refresh=1m&from=now-6M&to=now but it's not immediately clear what our currently baseline is, or if we need to better refine that KR
[19:50:55] <addshore>	 that is indeed the only place it is tracked I think
[19:51:23] <addshore>	 This is the view that highlights the non baseline xD https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&refresh=1m&from=now-7d&to=now
[19:52:46] <addshore>	 most of the time I would say this is below 1 min, if things just a tiny bit slow, below 5 or 10 for sure, and if something is wrong, its hours
[22:14:59] <ryankemper>	 FWIW the multi hour / day update lag can be bandaged by restarting regularly, but that brings with it all the usual problems w/ relying on frequent service restarts (makes it easier to not notice problems until it's too late)
[22:15:18] <ryankemper>	 since when we see those huge spikes that's due to the blazegraph process locking up on the host, preventing the updater from doing its thing