[01:59:10] forgot to press send. i'm thinking a 30-50% speed increase. on my machine it was about 8B triples in 5 days for the n3 turtle format. piecing together T347504 and https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=now-6M&to=now&var-cluster_name=wdqs-test&refresh=1m it seems like the newer class machines took about 10 days on n3 turtle to get to a similar point (assuming there wasn't some other factor). [01:59:11] T347504: WDQS graph split: load data from dumps into new hosts - https://phabricator.wikimedia.org/T347504 [02:00:55] of course we plan to only do split graphs eventually. the split graphs of about 7.6b records, which used n-triples instead of n3, took like 6-7 days on the newer class machines ( https://phabricator.wikimedia.org/T350465#9405888 ). one might expect it would take 10 days on those sort of machines to do 10b triples (e.g., after growth). [02:03:09] i'm interested in the idea of having a couple of faster machines for importing purposes, although also have that same sort of sense about what's being optimized for as well. [07:43:46] dr0ptp4kt: thanks for the writeup on perf! [09:54:31] inflatador: open question for you in T348350 [09:54:31] T348350: Set requests (not limits) for cirrus-streaming-updater in k8s - https://phabricator.wikimedia.org/T348350 [11:13:40] lunch [14:16:12] o/ [17:12:10] workout, back in ~40 [17:58:50] back [18:25:24] lunch, back in ~40 [18:54:31] back [19:08:28] I'm currently trying to index another wiki on my server with CirrusSearch/ElasticSearch, as we run multiple wikis with one config. I made sure that commandlinemode would use the site I'm trying to index, as it's not the default, and then I followed the README listed for the 1.39 REL of the CirrusSearch extension. Whereas the default wiki completed without issue, this one gives me an error I'm not familiar with: https://pastebin.com [19:08:28] /i5HLLd2d [19:08:51] Whoops, character limits cut off the link. https://pastebin.com/i5HLLd2d [19:09:18] As far as I'm aware, $dbKey is something internal, right? [19:09:59] jfolv: hmm, that is curious and i can't say why thats failing. $dbKey is typically the database form of a page name, which the page table has a unique index over [19:11:06] this basically loads every page in the database by pageid, and it's finding that one of the pages can't be fetched [19:12:37] unfortunately the error here doesn't tell you anything about the page thta was being loaded. I might suggest running a query against the `page` table in the database to see if the `page_title` column is empty anywhere [19:20:32] I don't see any empty page_titles, unfortunately. [19:27:37] I wonder if there's a page that has a broken revision... The kind that shows "Error" when you try to access it. [19:28:05] Of course, the problem there is that I don't know how to find those via the DB. [19:30:45] I'm unleashing the flink cleanup script ATM...seems to be doing the trick https://grafana.wikimedia.org/d/fdU5Zx-Mk/wdqs-streaming-updater?orgId=1&viewPanel=14&from=now-5m&to=now [19:31:06] I have a savepoint restore patch ready in case things go wrong: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1005572 [19:32:25] just did commons/eqiad with no probs, moving to commons/wikidata [19:32:33] err....wikidata/eqiad that is [19:48:11] eqiad looks good. interestingly, the wikidata deletion erred with `Read timeout on endpoint URL: "https://thanos-swift.discovery.wmnet/rdf-streaming-updater-eqiad?delete"`, but I ran the script again and it actually did delete everything [19:48:32] I'm out for the next ~90m or so but will start on CODFW when I get back [19:59:20] jfolv: hmm, that is quite curious. Sadly i don't have great answers for you. I took a look at the code, this is very clearly loading a row from the database, and the database response must have the page_title field as an empty string. How that happens though is a mystery. There are some options that will cause mediawiki to log all sql queries it runs, could probably try and match some [19:59:22] timestamps up and figure out which page it queried. [20:03:21] jfolv: i guess one thing you could do is backport an (apparently unintended) fix. One this line change MWException to \Throwable: https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/309af932b1816c26aaa2e6a8954a55c7a038e6f7/maintenance/ForceSearchIndex.php#L590 [20:04:23] thats in REL1_42, but it was done as part of removing MWException and not part of fixing this oversight [20:05:32] Well, it's better than nothing for now. I'll need to figure out what's going on with the broken page, but having a working search is still something I'd like done sooner than later. [20:09:06] Ah, looks like my script already has that line. [20:11:20] jfolv: hmm, in 1.39? It should say MWException, and needs to be changed to \Throwable [20:11:31] Oh wait, sorry. I misread. [20:12:06] There we go. Sorry about that; it's probably about time for me to take a break haha [20:12:19] :) no worries. Hope that gets your search going [20:12:43] Appreciate the help. I'll let you know if it gets it through. [20:25:38] ebernhardson: Yeah, that let it skip over the problem. And for now, that's good enough. Thanks a ton! [20:26:08] jfolv: awesome! [21:29:37] I do have one more problem, now, though. The indexes have been bootstrapped, but they aren't updating. I created a page in my userspace hours ago on our main wiki, and it still isn't showing up in searches. [21:37:20] back [23:04:44] Were there any changes to CirrusSearch in between 1.35 and 1.39 that affected how the search index updates? Even if I create a new page and then run a ForceSearchIndex, the new page won't appear in search results. [23:13:21] jfolv: hmm, nothing too significant. That should be something like jul 2020 - sep 2022. A common problem people run into is that cirrus does writes through the job queue [23:14:03] jfolv: if thats the case, you can try https://www.mediawiki.org/wiki/Manual:RunJobs.php [23:14:46] it's possible it fans out to more jobs now, and the $wgJobRunRate is not high enough now i suppose? [23:17:59] Hrm. Maybe it's just backlogged badly from the indexing. [23:24:16] I suppose I could up the runrate. We do utilize multiple apache backends, so it would naturally end up distributed. [23:26:22] jfolv: it's a reasonable solution, i suppose check https://www.mediawiki.org/wiki/Manual:ShowJobs.php to see if there is much backlog [23:35:11] Oh yeah, majorly backlogged. [23:35:23] It probably just needs to chug through those for now. [23:35:52] I set up a systemd service via the instructions on the jobqueue page. Should help whittle it down. [23:39:41] nice, that should certainly take care of it