|
Hi, I use this query :
to get a number of vertices around a central vertex for my graph based mind-map application. The "depth" in the query is the maximum distance from the central vertex that I use to get nodes. When the depth is about 20 or more, the query is slow. But I need it to be fast because I want to load and show the mind map instantly. I would also like that that depth can be really high. I use a Jena-MySQL repository. I know if it was in-memory Jena it would be faster but I don't think that loading an whole graph in memory is scalable for my app. How to speed things up? Thank you. |
|
If you want to stick with using Jena, you might think about trying TDB. imo, you get better mileage if you use a database that's designed for RDF rather than sticking RDF into a relational database, but I'm biased. fwiw, with a depth of 20 or more, that's a non-trivial query; there are a lot of joins going on behind the scenes. If trying a different database is a no-go, you can only try smaller depths or a different query. Yes - a property path over an external database is going to be very expensive. TDB is a far better choice. The way I understand it is that TDB takes a lot of ram. On the requirements page http://jena.apache.org/documentation/tdb/requirements.html it says for a 32 bit machine I need at least 1gb of Java heap space. Does TDB loads an whole graph in memory? With a lot of data, I wonder how that is number of gig is gonna grow. Is it expensive to do a disk write with TDB? Because if it's in memory and the server shuts down, I lose all changes. And in my app, each user have their own "model" or "RDF repository", would it consume too much ram to load many models? |


@VincentBlouin You may want to take a look at this question (http://answers.semanticweb.com/questions/12147/whats-the-best-way-to-parameterize-sparql-queries) which covers how to parameterize queries in common frameworks like Jena and should avoid such unreadable string concatenation like your example query, HTH
Finally I tried TDB but it was still too slow and so I switched to Neo4j which is more suited for graph traversal. For more details look my answer here : http://stackoverflow.com/questions/11733868/efficient-traversal-search-algorithm-to-fetch-data-from-rdf/13617174#13617174