[07:25:17] dcausse: looks like ryankemper deployed a new WDQS version this morning (~6:15am CEST) which killed the data import [07:25:25] I'll have a look to restart it manually [07:25:43] :/ [07:25:48] I can have a look too [07:29:27] actually I think you have the information about the chunk file being processed [07:30:29] yep, I do have the in fo in the logs [07:34:08] dcausse: quick check before I do something stupid, the command line to restart the import on wdqs2008: [07:34:10] `/srv/deployment/wdqs/wdqs/loadData.sh -s 638 -n wdq -d /srv/wdqs/munged/` [07:34:53] we should make that import script more robust to having blazegraph restarted [07:35:12] having a script that runs for > 1 week and not able to retry on error is an issue ! [07:36:12] it should... but apparently it did not... [07:36:40] looking at the command [07:38:31] gehel: sounds good to me (beware that the chunk file might not be the same on the two machines) [07:38:42] lexemes will have to be imported after [07:39:44] let's sync-up before someone touch /srv/wdqs/data_loaded anyways [07:39:55] sure! [07:41:45] data load restarted on wdqs1009 and wdqs2008 [07:42:29] we've done just over half :( [07:43:52] sigh... [07:53:04] mpham: we might have to send a quick com to say that the data-reload is still not done and say Oct 18 instead of Oct 11 [07:57:11] can we do something about the script, since we just restarted anyway? [07:59:53] we should certainly do someting, I thought I added retry support via curl with "--retry 10 --retry-delay 30 --retry-connrefused" but that was not sufficient apparently [08:00:17] I mean I'm not clear what the fix should be [08:00:54] there's a "set -e" so perhaps curl is returning non-zero when it retries causing the whole script to abandon? [08:08:44] dcausse: how is the categories endpoint doing? [08:09:09] zpapierski: I loaded mediawiki (small) into dgraph [08:09:23] and now trying to load the big one (commons) [08:09:58] but the tooling I have is not robust enough... the python rdflib I used loads everything into mem [08:10:26] is there any code for querying? [08:10:46] so I have to change my plans: rdf -> json -> dgraph and do rdf -> dgraph (using lowlever rdflib parsers) [08:11:01] I have a couple queries but nothing usable [08:11:22] I can push what I have so that you can play with the dgraph dql [08:11:29] happy to! [08:11:33] using mediawiki data [08:11:37] pushing [08:14:14] sigh I need to cleanup my import script a bit it does not even compile [08:20:12] this script is a battlefield pushing anyways so that you can start with setting up dgraph [08:20:19] thx! [08:22:10] zpapierski: https://github.com/nomoa/category-graph/tree/main/dgrah-backend [10:19:12] dcausse: with cycle detections? ;] [10:21:39] hashar: you mean cycles in the category "tree"? yes that's why I want a graph backend, I'm not going to write that on my own :P [10:21:50] ah [10:22:18] back in the old days I retrieved the categories and tries to dedupe the tree or unbreak cycles [10:23:30] something like an ISP being in the ISP and Internet categories while the ISP category is already in the internet one [10:23:48] so instead of having Internet > ISP > ISP (article) and Internet > ISP (article) [10:23:58] I would drop the internet category since it is already a parent of the ISP one [10:24:54] I guess that was my crazy attempt at having a well organized tree much like https://en.wikipedia.org/wiki/List_of_Dewey_Decimal_classes ( books with a number 5xx are in the sciences category) [10:26:28] that was for $wgUseCategoryBrowser to add some breadcumb to the article based on the categories tree https://www.mediawiki.org/wiki/Manual:%24wgUseCategoryBrowser [10:26:45] bunch of legacy old stuff at https://phabricator.wikimedia.org/T35614 [10:27:03] hierarchy becomes rapidly a mess for organizing knowledge, semi-related here's an article shared by Mike couple days ago: https://www.theverge.com/22684730/students-file-folder-directory-structure-education-gen-z [10:28:27] I guess cause we use the categories both for hierarchical organization and as keywords [10:28:53] true, but I doubt that hierarchical organization is something that can scale tbh [10:29:58] I love the categories of https://en.wikipedia.org/wiki/Barack_Obama which are oddly very specific, lot of them being "American x y z" [10:30:34] or simingly redundant such as "Nobel Peace Prize laureates" and "American Nobel laureates" :] [10:30:55] anyway, can't way to see what users will end up being to create if we provide a graph of categories [10:31:39] we already provide that somehow, it's hidden behind the deepcat:CategoryXYZ search keywords [10:31:57] and we have a sparql endpoint for that [10:32:23] it has not been used that much perhaps because it's "sparql" [10:32:49] this hackathon project is mainly to experiment with something else than blazegraph [10:35:41] so it is all about giving it some exposure isn't it? [10:36:53] could be yes but mainly it's reduce our tech dependencies on blazegraph (that we want to remove from the infra) [10:39:56] I have seen the rfp yup [10:40:09] thx for the article about file folders and generation z. That is a good one [10:40:20] I should migrate my Documents folder to elasticsearch [10:40:28] :) [10:42:16] RDF & python are not really friends... had to use a painfull mix of lightrdf + rdflib to parse big files without loading them completely into mem... [10:45:07] lunch [10:53:49] googling for anything dgraph related is a pain in the rear - I guess D being close to G makes a "dgraph" a common typo of "graph"... [11:16:15] do you have any additional public details regarding the graph consultant offer at https://boards.greenhouse.io/wikimedia/jobs/3546920 ? I have two contacts that might be interested or could at least pass it around to their contacts ;) [11:16:39] both french speaking so maybe I can look them up with gehel and dcausse for some chat? ;) [11:20:00] hashar: always happy to chat ! [11:20:22] We're looking for someone who can help define the strategy to move away from Blazegraph [11:20:47] So someone who has real life experience with managing big RDF stores. [11:21:44] hashar: I'm talking with Denny next Tuesday, he might have a few contacts as well. I can forward you the invite [11:21:57] But it's a late meeting [11:36:19] break [11:51:47] gehel: mind if i loop you in an email with my two contacts and we find a way to have a chat all together? I can do the introductions ;) [11:52:23] Sure ! [11:52:39] je mail tout le monde david en plus ;) [12:04:32] fait [13:11:59] you're using their secret language! [13:59:13] huh, the simplest java call to dgraph resulted in exception, good start :) [14:02:41] ah, wrong port [14:40:02] hashar: this is just a depth of 3 from Category:Trees on commons: https://observablehq.com/d/42741d80feadc852 :) [14:40:32] Sorry, we couldn’t find that page. :-( [14:40:42] I couldn't find it either [14:40:49] might require an account [14:41:25] tatou k c! [14:41:50] :( [14:42:01] account doesn't seem to help [14:42:06] ditto [14:42:37] and with this: https://observablehq.com/@505d1962eaebae6d/radial-tidy-tree [14:42:39] ? [14:42:56] that one works [14:43:03] beautiful flower [14:43:22] whoa [14:43:36] add colour and I could hang it on a wall [14:44:09] :P [14:44:21] [[Category:The tree book - A popular guide to a knowledge of the trees of North America and to their uses and cultivation (1920)]] [14:44:33] [[Category:SVG Tree] [14:44:37] that is a nice rendering really [14:45:37] yes d3.js renders super nice svgs [14:45:41] errand [14:46:12] i definitely need to learn d3 [14:46:19] it also spawned a whole lot of js libraries for people unable to learn d3 [14:46:27] seriously, it should be a academic course [14:47:01] I knew how to use it once, but it's so easy to forget if you don't use it often... [14:48:08] at least observablehq.com abstract out a bunch fo the complexity and you can start coding right away [14:48:30] anyway, week-end time. Thank you for the flower dcausse! [15:54:02] going offline [19:40:26] Are we tracking WDQS/WCQS update lag time right now anywhere? One of our KRs is getting it below 10m. I'm looking at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&refresh=1m&from=now-6M&to=now but it's not immediately clear what our currently baseline is, or if we need to better refine that KR [19:50:55] that is indeed the only place it is tracked I think [19:51:23] This is the view that highlights the non baseline xD https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&refresh=1m&from=now-7d&to=now [19:52:46] most of the time I would say this is below 1 min, if things just a tiny bit slow, below 5 or 10 for sure, and if something is wrong, its hours [22:14:59] FWIW the multi hour / day update lag can be bandaged by restarting regularly, but that brings with it all the usual problems w/ relying on frequent service restarts (makes it easier to not notice problems until it's too late) [22:15:18] since when we see those huge spikes that's due to the blazegraph process locking up on the host, preventing the updater from doing its thing