Hi People,

Having a problem with sesame windows client. I am trying to mirror Dbpedia on my local sesame installation[Not really sure if that's a good idea]. I uploaded a 1.85gb file. No problems[did take 3 hours though]. Now when i am trying to upload another file to the same repo, I get Java Heap Space exceeded exception.

Is it possible to mirror the dbpedia entirely ? Or is my understanding lacking somewhere ?

Help :) Is there some other approach ?

asked 22 Feb '13, 01:11

Avinash's gravatar image

Avinash
314
accept rate: 0%


I don't think it's a good idea to try and mirror the complete DBPedia on a "local Sesame installation", unless you have taken quite a bit of care to have your local Sesame installation set up correctly. Simply dropping all of DBPedia's dump files in a single Sesame memory or native store definitely won't work: DBPedia is just too large for that, and Sesame's default stores are not designed for that kind of scale.

You can, of course, mirror parts of DBPedia quite easily, which can really help your query performance already. This is actually a tactic I personally often employ in projects: I create a local Sesame store in which I load things like the DBPedia ontology, and maybe one or two basic instances data-files. I then use SPARQL federated queries to query over the combination of my local store and the remote DBPedia endpoint. If you do this right, you can get quite good query performance, and an added bonus is that you significantly lighten the load on the DBPedia server.

To mirror the complete DBPedia dataset using Sesame, there are two basic approaches.

  1. create separate repositories (native stores) for different chunks of the total dataset. On "typical" hardware, I would expect a Sesame native store to cope with about 100-150 million triples, beyond that it starts to struggle, size-wise, so make sure you partition the data in chunks that have that as a maximum size. When you have multiple repositories, you can query over the combination of them using federated SPARQL queries, or using something like FedX, which is a Sesame extension for federated query.
  2. Get yourself a third-party Sesame backend solution, such as OWLIM-SE, which is designed from the ground up to cope with very large data sets.

Of course, that's just the basics of the approach. Doing this kind of very-large-scale data mirroring will require a bit of tweaking and care. For a start, have a look at this article I wrote on loading large files. I'm not sure that using the Sesame Windows Client is really the best way to go about it (it might work, but it wasn't designed for these kinds of data sizes either).

link

answered 22 Feb '13, 18:07

Jeen%20Broekstra's gravatar image

Jeen Broekstra ♦
11.5k412
accept rate: 37%

Thanks a ton :) Very informative post :)

(26 Feb '13, 23:57) Avinash Avinash's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×173
×151

Asked: 22 Feb '13, 01:11

Seen: 778 times

Last updated: 26 Feb '13, 23:57